分类：自然科学

Kenji Hiranabe《线性代数的艺术》

The-Art-of-Linear-Algebra 下载

2025-04-02
Marc Cabanes, Britta Späth: The McKay Conjecture on character degrees

TheMcKayConjectureoncharacterdegrees 下载

2025-03-02
Hong Wang, Joshua Zahl：Volume estimates for unions of convex sets, and the Kakeya set conjecture in three dimensions

Volumeestimatesforunionsofconvexsets,andtheKakeyaset 下载

2025-03-02
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs viaReinforcement Learning

2501.12948v1 下载

2025-02-03

John D. Kelleher 《Deep Learning》

1 Introduction to Deep Learning
2 Conceptual Foundations
3 Neural Networks: The Building Blocks of Deep Learning
4 A Brief History of Deep Learning
5 Convolutional and Recurrent Neural Networks
6 Learning Functions
7 The Future of Deep Learning

1 Introduction to Deep Learning

Deep learning is the subfield of artificial intelligence that focuses on creating large neural network models that are capable of making accurate data-driven decisions. Deep learning is particularly suited to contexts where the data is complex and where there are large datasets available. Today most online companies and high-end consumer technologies use deep learning. Among other things, Facebook uses deep learning to analyze text in online conversations. Google, Baidu, and Microsoft all use deep learning for image search, and also for machine translation. All modern smart phones have deep learning systems running on them; for example, deep learning is now the standard technology for speech recognition, and also for face detection on digital cameras. In the healthcare sector, deep learning is used to process medical images (X-rays, CT, and MRI scans) and diagnose health conditions. Deep learning is also at the core of self-driving cars, where it is used for localization and mapping, motion planning and steering, and environment perception, as well as tracking driver state.

Perhaps the best-known example of deep learning is DeepMind’s AlphaGo.1 Go is a board game similar to Chess. AlphaGo was the first computer program to beat a professional Go player. In March 2016, it beat the top Korean professional, Lee Sedol, in a match watched by more than two hundred million people. The following year, in 2017, AlphaGo beat the world’s No. 1 ranking player, China’s Ke Jie.

In 2016 AlphaGo’s success was very surprising. At the time, most people expected that it would take many more years of research before a computer would be able to compete with top level human Go players. It had been known for a long time that programming a computer to play Go was much more difficult than programming it to play Chess. There are many more board configurations possible in Go than there are in Chess. This is because Go has a larger board and simpler rules than Chess. There are, in fact, more possible board configurations in Go than there are atoms in the universe. This massive search space and Go’s large branching factor (the number of board configurations that can be reached in one move) makes Go an incredibly challenging game for both humans and computers.

One way of illustrating the relative difficulty Go and Chess presented to computer programs is through a historical comparison of how Go and Chess programs competed with human players. In 1967, MIT’s MacHack-6 Chess program could successfully compete with humans and had an Elo rating2 well above novice level, and, by May 1997, DeepBlue was capable of beating the Chess world champion Gary Kasparov. In comparison, the first complete Go program wasn’t written until 1968 and strong human players were still able to easily beat the best Go programs in 1997.

The time lag between the development of Chess and Go computer programs reflects the difference in computational difficulty between these two games. However, a second historic comparison between Chess and Go illustrates the revolutionary impact that deep learning has had on the ability of computer programs to compete with humans at Go. It took thirty years for Chess programs to progress from human level competence in 1967 to world champion level in 1997. However, with the development of deep learning it took only seven years for computer Go programs to progress from advanced amateur to world champion; as recently as 2009 the best Go program in the world was rated at the low-end of advanced amateur. This acceleration in performance through the use of deep learning is nothing short of extraordinary, but it is also indicative of the types of progress that deep learning has enabled in a number of fields.

AlphaGo uses deep learning to evaluate board configurations and to decide on the next move to make. The fact that AlphaGo used deep learning to decide what move to make next is a clue to understanding why deep learning is useful across so many different domains and applications. Decision-making is a crucial part of life. One way to make decisions is to base them on your “intuition” or your “gut feeling.” However, most people would agree that the best way to make decisions is to base them on the relevant data. Deep learning enables data-driven decisions by identifying and extracting patterns from large datasets that accurately map from sets of complex inputs to good decision outcomes.

Artificial Intelligence, Machine Learning, and Deep Learning

Deep learning has emerged from research in artificial intelligence and machine learning. Figure 1.1 illustrates the relationship between artificial intelligence, machine learning, and deep learning.

Deep learning enables data-driven decisions by identifying and extracting patterns from large datasets that accurately map from sets of complex inputs to good decision outcomes.

The field of artificial intelligence was born at a workshop at Dartmouth College in the summer of 1956. Research on a number of topics was presented at the workshop including mathematical theorem proving, natural language processing, planning for games, computer programs that could learn from examples, and neural networks. The modern field of machine learning draws on the last two topics: computers that could learn from examples, and neural network research.

Figure 1.1 The relationship between artificial intelligence, machine learning, and deep learning.

Machine learning involves the development and evaluation of algorithms that enable a computer to extract (or learn) functions from a dataset (sets of examples). To understand what machine learning means we need to understand three terms: dataset, algorithm, and function.

In its simplest form, a dataset is a table where each row contains the description of one example from a domain, and each column contains the information for one of the features in a domain. For example, table 1.1 illustrates an example dataset for a loan application domain. This dataset lists the details of four example loan applications. Excluding the ID feature, which is only for ease of reference, each example is described using three features: the applicant’s annual income, their current debt, and their credit solvency.

Table 1.1. A dataset of loan applicants and their known credit solvency ratings

ID	Annual Income	Current Debt	Credit Solvency
1	$150	-$100	100
2	$250	-$300	-50
3	$450	-$250	400
4	$200	-$350	-300

An algorithm is a process (or recipe, or program) that a computer can follow. In the context of machine learning, an algorithm defines a process to analyze a dataset and identify recurring patterns in the data. For example, the algorithm might find a pattern that relates a person’s annual income and current debt to their credit solvency rating. In mathematics, relationships of this type are referred to as functions.

A function is a deterministic mapping from a set of input values to one or more output values. The fact that the mapping is deterministic means that for any specific set of inputs a function will always return the same outputs. For example, addition is a deterministic mapping, and so 2+2 is always equal to 4. As we will discuss later, we can create functions for domains that are more complex than basic arithmetic, we can for example define a function that takes a person’s income and debt as inputs and returns their credit solvency rating as the output value. The concept of a function is very important to deep learning so it is worth repeating the definition for emphasis: a function is simply a mapping from inputs to outputs. In fact, the goal of machine learning is to learn functions from data. A function can be represented in many different ways: it can be as simple as an arithmetic operation (e.g., addition or subtraction are both functions that take inputs and return a single output), a sequence of if-then-else rules, or it can have a much more complex representation.

A function is a deterministic mapping from a set of input values to one or more output values.

One way to represent a function is to use a neural network. Deep learning is the subfield of machine learning that focuses on deep neural network models. In fact, the patterns that deep learning algorithms extract from datasets are functions that are represented as neural networks. Figure 1.2 illustrates the structure of a neural network. The boxes on the left of the figure represent the memory locations where inputs are presented to the network. Each of the circles in this figure is called a neuron and each neuron implements a function: it takes a number of values as input and maps them to an output value. The arrows in the network show how the outputs of each neuron are passed as inputs to other neurons. In this network, information flows from left to right. For example, if this network were trained to predict a person’s credit solvency, based on their income and debt, it would receive the income and debt as inputs on the left of the network and output the credit solvency score through the neuron on the right.

A neural network uses a divide-and-conquer strategy to learn a function: each neuron in the network learns a simple function, and the overall (more complex) function, defined by the network, is created by combining these simpler functions. Chapter 3 will describe how a neural network processes information.

Figure 1.2 Schematic illustration of a neural network.

What Is Machine Learning?

A machine learning algorithm is a search process designed to choose the best function, from a set of possible functions, to explain the relationships between features in a dataset. To get an intuitive understanding of what is involved in extracting, or learning, a function from data, examine the following set of sample inputs to an unknown function and the outputs it returns. Given these examples, decide which arithmetic operation (addition, subtraction, multiplication, or division) is the best choice to explain the mapping the unknown function defines between its inputs and output:

Most people would agree that multiplication is the best choice because it provides the best match to the observed relationship, or mapping, from the inputs to the outputs:

In this particular instance, choosing the best function is relatively straightforward, and a human can do it without the aid of a computer. However, as the number of inputs to the unknown function increases (perhaps to hundreds or thousands of inputs), and the variety of potential functions to be considered gets larger, the task becomes much more difficult. It is in these contexts that harnessing the power of machine learning to search for the best function, to match the patterns in the dataset, becomes necessary.

Machine learning involves a two-step process: training and inference. During training, a machine learning algorithm processes a dataset and chooses the function that best matches the patterns in the data. The extracted function will be encoded in a computer program in a particular form (such as if-then-else rules or parameters of a specified equation). The encoded function is known as a model, and the analysis of the data in order to extract the function is often referred to as training the model. Essentially, models are functions encoded as computer programs. However, in machine learning the concepts of function and model are so closely related that the distinction is often skipped over and the terms may even be used interchangeably.

In the context of deep learning, the relationship between functions and models is that the function extracted from a dataset during training is represented as a neural network model, and conversely a neural network model encodes a function as a computer program. The standard process used to train a neural network is to begin training with a neural network where the parameters of the network are randomly initialized (we will explain network parameters later; for now just think of them as values that control how the function the network encodes works). This randomly initialized network will be very inaccurate in terms of its ability to match the relationship between the various input values and target outputs for the examples in the dataset. The training process then proceeds by iterating through the examples in the dataset, and, for each example, presenting the input values to the network and then using the difference between the output returned by the network and the correct output for the example listed in the dataset to update the network’s parameters so that it matches the data more closely. Once the machine learning algorithm has found a function that is sufficiently accurate (in terms of the outputs it generates matching the correct outputs listed in the dataset) for the problem we are trying to solve, the training process is completed, and the final model is returned by the algorithm. This is the point at which the learning in machine learning stops.

Once training has finished, the model is fixed. The second stage in machine learning is inference. This is when the model is applied to new examples—examples for which we do not know the correct output value, and therefore we want the model to generate estimates of this value for us. Most of the work in machine learning is focused on how to train accurate models (i.e., extracting an accurate function from data). This is because the skills and methods required to deploy a trained machine learning model into production, in order to do inference on new examples at scale, are different from those that a typical data scientist will possess. There is a growing recognition within the industry of the distinctive skills needed to deploy artificial intelligence systems at scale, and this is reflected in a growing interest in the field known as DevOps, a term describing the need for collaboration between development and operations teams (the operations team being the team responsible for deploying a developed system into production and ensuring that these systems are stable and scalable). The terms MLOps, for machine learning operations, and AIOps, for artificial intelligence operations, are also used to describe the challenges of deploying a trained model. The questions around model deployment are beyond the scope of this book, so we will instead focus on describing what deep learning is, what it can be used for, how it has evolved, and how we can train accurate deep learning models.

One relevant question here is: why is extracting a function from data useful? The reason is that once a function has been extracted from a dataset it can be applied to unseen data, and the values returned by the function in response to these new inputs can provide insight into the correct decisions for these new problems (i.e., it can be used for inference). Recall that a function is simply a deterministic mapping from inputs to outputs. The simplicity of this definition, however, hides the variety that exists within the set of functions. Consider the following examples:

• Spam filtering is a function that takes an email as input and returns a value that classifies the email as spam (or not).
• Face recognition is a function that takes an image as input and returns a labeling of the pixels in the image that demarcates the face in the image.
• Gene prediction is a function that takes a genomic DNA sequence as input and returns the regions of the DNA that encode a gene.
• Speech recognition is a function that takes an audio speech signal as input and returns a textual transcription of the speech.
• Machine translation is a function that takes a sentence in one language as input and returns the translation of that sentence in another language.

It is because the solutions to so many problems across so many domains can be framed as functions that machine learning has become so important in recent years.

Why Is Machine Learning Difficult?

There are a number of factors that make the machine learning task difficult, even with the help of a computer. First, most datasets will include noise3 in the data, so searching for a function that matches the data exactly is not necessarily the best strategy to follow, as it is equivalent to learning the noise. Second, it is often the case that the set of possible functions is larger than the set of examples in the dataset. This means that machine learning is an ill-posed problem: the information given in the problem is not sufficient to find a single best solution; instead multiple possible solutions will match the data. We can use the problem of selecting the arithmetic operation (addition, subtraction, multiplication, or division) that best matches a set of example input-output mappings for an unknown function to illustrate the concept of an ill-posed problem. Here are the example mappings for this function selection problem:

Given these examples, multiplication and division are better matches for the unknown function than addition and subtraction. However, it is not possible to decide whether the unknown function is actually multiplication or division using this sample of data, because both operations are consistent with all the examples provided. Consequently, this is an ill-posed problem: it is not possible to select a single best answer given the information provided in the problem.

One strategy to solve an ill-posed problem is to collect more data (more examples) in the hope that the new examples will help us to discriminate between the correct underlying function and the remaining alternatives. Frequently, however, this strategy is not feasible, either because the extra data is not available or is too expensive to collect. Instead, machine learning algorithms overcome the ill-posed nature of the machine learning task by supplementing the information provided by the data with a set of assumptions about the characteristics of the best function, and use these assumptions to influence the process used by the algorithm that selects the best function (or model). These assumptions are known as the inductive bias of the algorithm because in logic a process that infers a general rule from a set of specific examples is known as inductive reasoning. For example, if all the swans that you have seen in your life are white, you might induce from these examples the general rule that all swans are white. This concept of inductive reasoning relates to machine learning because a machine learning algorithm induces (or extracts) a general rule (a function) from a set of specific examples (the dataset). Consequently, the assumptions that bias a machine learning algorithm are, in effect, biasing an inductive reasoning process, and this is why they are known as the inductive bias of the algorithm.

So, a machine learning algorithm uses two sources of information to select the best function: one is the dataset, and the other (the inductive bias) is the assumptions that bias the algorithm to prefer some functions over others, irrespective of the patterns in the dataset. The inductive bias of a machine learning algorithm can be understood as providing the algorithm with a perspective on a dataset. However, just as in the real world, where there is no single best perspective that works in all situations, there is no single best inductive bias that works well for all datasets. This is why there are so many different machine learning algorithms: each algorithm encodes a different inductive bias. The assumptions encoded in the design of a machine leanring algorithm can vary in strength. The stronger the assumptions the less freedom the algorithm is given in selecting a function that fits the patterns in the dataset. In a sense, the dataset and inductive bias counterbalance each other: machine learning algorithms that have a strong inductive bias pay less attention to the dataset when selecting a function. For example, if a machine learning algorithm is coded to prefer a very simple function, no matter how complex the patterns in the data, then it has a very strong inductive bias.

In chapter 2 we will explain how we can use the equation of a line as a template structure to define a function. The equation of the line is a very simple type of mathematical function. Machine learning algorithms that use the equation of a line as the template structure for the functions they fit to a dataset make the assumption that the model they generate should encode a simple linear mapping from inputs to output. This assumption is an example of an inductive bias. It is, in fact, an example of a strong inductive bias, as no matter how complex (or nonlinear) the patterns in the data are the algorithm will be restricted (or biased) to fit a linear model to it.

One of two things can go wrong if we choose a machine learning algorithm with the wrong bias. First, if the inductive bias of a machine learning algorithm is too strong, then the algorithm will ignore important information in the data and the returned function will not capture the nuances of the true patterns in the data. In other words, the returned function will be too simple for the domain,4 and the outputs it generates will not be accurate. This outcome is known as the function underfitting the data. Alternatively, if the bias is too weak (or permissive), the algorithm is allowed too much freedom to find a function that closely fits the data. In this case, the returned function is likely to be too complex for the domain, and, more problematically, the function is likely to fit to the noise in the sample of the data that was supplied to the algorithm during training. Fitting to the noise in the training data will reduce the function’s ability to generalize to new data (data that is not in the training sample). This outcome is known as overfitting the data. Finding a machine learning algorithm that balances data and inductive bias appropriately for a given domain is the key to learning a function that neither underfits or overfits the data, and that, therefore, generalizes successfully in that domain (i.e., that is accurate at inference, or processing new examples that were not in the training data).

However, in domains that are complex enough to warrant the use of machine learning, it is not possible in advance to know what are the correct assumptions to use to bias the selection of the correct model from the data. Consequently, data scientists must use their intuition (i.e., make informed guesses) and also use trial-and-error experimentation in order to find the best machine learning algorithm to use in a given domain.

Neural networks have a relatively weak inductive bias. As a result, generally, the danger with deep learning is that the neural network model will overfit, rather than underfit, the data. It is because neural networks pay so much attention to the data that they are best suited to contexts where there are very large datasets. The larger the dataset, the more information the data provides, and therefore it becomes more sensible to pay more attention to the data. Indeed, one of the most important factors driving the emergence of deep learning over the last decade has been the emergence of Big Data. The massive datasets that have become available through online social platforms and the proliferation of sensors have combined to provide the data necessary to train neural network models to support new applications in a range of domains. To give a sense of the scale of the big data used in deep learning research, Facebook’s face recognition software, DeepFace, was trained on a dataset of four million facial images belonging to more than four thousand identities (Taigman et al. 2014).

The Key Ingredients of Machine Learning

The above example of deciding which arithmetic operation best explains the relationship between inputs and outputs in a set of data illustrates the three key ingredients in machine learning:
1. Data (a set of historical examples).
2. A set of functions that the algorithm will search through to find the best match with the data.
3. Some measure of fitness that can be used to evaluate how well each candidate function matches the data.

All three of these ingredients must be correct if a machine learning project is to succeed; below we describe each of these ingredients in more detail.

We have already introduced the concept of a dataset as a two-dimensional table (or n × m matrix),5 where each row contains the information for one example, and each column contains the information for one of the features in the domain. For example, table 1.2 illustrates how the sample inputs and outputs of the first unknown arithmetic function problem in the chapter can be represented as a dataset. This dataset contains four examples (also known as instances), and each example is represented using two input features and one output (or target) feature. Designing and selecting the features to represent the examples is a very important step in any machine learning project.

As is so often the case in computer science, and machine learning, there is a tradeoff in feature selection. If we choose to include only a minimal number of features in the dataset, then it is likely that a very informative feature will be excluded from the data, and the function returned by the machine learning algorithm will not work well. Conversely, if we choose to include as many features as possible in the domain, then it is likely that irrelevant or redundant features will be included, and this will also likely result in the function not working well. One reason for this is that the more redundant or irrelevant features that are included, the greater the probability for the machine learning algorithm to extract patterns that are based on spurious correlations between these features. In these cases, the algorithm gets confused between the real patterns in the data and the spurious patterns that only appear in the data due to the particular sample of examples that have been included in the dataset.

Finding the correct set of features to include in a dataset involves engaging with experts who understand the domain, using statistical analysis of the distribution of individual features and also the correlations between pairs of features, and a trial-and-error process of building models and checking the performance of the models when particular features are included or excluded. This process of dataset design is a labor-intensive task that often takes up a significant portion of the time and effort expended on a machine learning project. It is, however, a critical task if the project is to succeed. Indeed, identifying which features are informative for a given task is frequently where the real value of machine learning projects emerge.

The second ingredient in a machine learning project is the set of candidate functions that the algorithm will consider as the potential explanation of the patterns in the data. In the unknown arithmetic function scenario previously given, the set of considered functions was explicitly specified and restricted to four: addition, subtraction, multiplication, or division. More generally, the set of functions is implicitly defined through the inductive bias of the machine learning algorithm and the function representation (or model) that is being used. For example, a neural network model is a very flexible function representation.

Table 1.2. A simple tabular dataset

Input 1	Input 2	Target
5	5	25
2	6	12
4	4	16
2	2	04

The third and final ingredient to machine learning is the measure of fitness. The measure of fitness is a function that takes the outputs from a candidate function, generated when the machine learning algorithm applies the candidate function to the data, and compares these outputs with the data, in some way. The result of this comparison is a value that describes the fitness of the candidate function relative to the data. A fitness function that would work for our unknown arithmetic function scenario is to count in how many of the examples a candidate function returns a value that exactly matches the target specified in the data. Multiplication would score four out of four on this fitness measure, addition would score one out of four, and division and subtraction would both score zero out of four. There are a large variety of fitness functions that can be used in machine learning, and the selection of the correct fitness function is crucial to the success of a machine learning project. The design of new fitness functions is a rich area of research in machine learning. Varying how the dataset is represented, and how the candidate functions and the fitness function are defined, results in three different categories of machine learning: supervised, unsupervised, and reinforcement learning.

Supervised, Unsupervised, and Reinforcement Learning

Supervised machine learning is the most common type of machine learning. In supervised machine learning, each example in the dataset is labeled with the expected output (or target) value. For example, if we were using the dataset in table 1.1 to learn a function that maps from the inputs of annual income and debt to a credit solvency score, the credit solvency feature in the dataset would be the target feature. In order to use supervised machine learning, our dataset must list the value of the target feature for every example in the dataset. These target feature values can sometimes be very difficult, and expensive, to collect. In some cases, we must pay human experts to label each example in a dataset with the correct target value. However, the benefit of having these target values in the dataset is that the machine learning algorithm can use these values to help the learning process. It does this by comparing the outputs a function produces with the target outputs specified in the dataset, and using the difference (or error) to evaluate the fitness of the candidate function, and use the fitness evaluation to guide the search for the best function. It is because of this feedback from the target labels in the dataset to the algorithm that this type of machine learning is considered supervised. This is the type of machine learning that was demonstrated by the example of choosing between different arithmetic functions to explain the behavior of an unknown function.

Unsupervised machine learning is generally used for clustering data. For example, this type of data analysis is useful for customer segmentation, where a company wishes to segment its customer base into coherent groups so that it can target marketing campaigns and/or product designs to each group. In unsupervised machine learning, there are no target values in the dataset. Consequently, the algorithm cannot directly evaluate the fitness of a candidate function against the target values in the dataset. Instead, the machine learning algorithm tries to identify functions that map similar examples into clusters, such that the examples in a cluster are more similar to the other examples in the same cluster than they are to examples in other clusters. Note that the clusters are not prespecified, or at most they are initially very underspecified. For example, the data scientist might provide the algorithm with a target number of clusters, based on some intuition about the domain, without providing explicit information on relative sizes of the clusters or regarding the characteristics of examples that belong in each cluster. Unsupervised machine learning algorithms often begin by guessing an initial clustering of the examples and then iteratively adjusting the clusters (by dropping instances from one cluster and adding them to another) so as to improve the fitness of the cluster set. The fitness functions used in unsupervised machine learning generally reward candidate functions that result in higher similarity within individual clusters and, also, high diversity between clusters.

Reinforcement learning is most relevant for online control tasks, such as robot control and game playing. In these scenarios, an agent needs to learn a policy for how it should act in an environment in order to be rewarded. In reinforcement learning, the goal of the agent is to learn a mapping from its current observation of the environment and its own internal state (its memory) to what action it should take: for instance, should the robot move forward or backward or should the computer program move the pawn or take the queen. The output of this policy (function) is the action that the agent should take next, given the current context. In these types of scenarios, it is difficult to create historic datasets, and so reinforcement learning is often carried out in situ: an agent is released into an environment where it experiments with different policies (starting with a potentially random policy) and over time updates its policy in response to the rewards it receives from the environment. If an action results in a positive reward, the mapping from the relevant observations and state to that action is reinforced in the policy, whereas if an action results in a negative reward, the mapping is weakened. Unlike in supervised and unsupervised machine learning, in reinforcement learning, the fact that learning is done in situ means that the training and inference stages are interleaved and ongoing. The agent infers what action it should do next and uses the feedback from the environment to learn how to update its policy. A distinctive aspect of reinforcement learning is that the target output of the learned function (the agent’s actions) is decoupled from the reward mechanism. The reward may be dependent on multiple actions and there may be no reward feedback, either positive or negative, available directly after an action has been performed. For example, in a chess scenario, the reward may be +1 if the agent wins the game and -1 if the agent loses. However, this reward feedback will not be available until the last move of the game has been completed. So, one of the challenges in reinforcement learning is designing training mechanisms that can distribute the reward appropriately back through a sequence of actions so that the policy can be updated appropriately. Google’s DeepMind Technologies generated a lot of interest by demonstrating how reinforcement learning could be used to train a deep learning model to learn control policies for seven different Atari computer games (Mnih et al. 2013). The input to the system was the raw pixel values from the screen, and the control policies specified what joystick action the agent should take at each point in the game. Computer game environments are particularly suited to reinforcement learning as the agent can be allowed to play many thousands of games against the computer game system in order to learn a successful policy, without incurring the cost of creating and labeling a large dataset of example situations with correct joystick actions. The DeepMind system got so good at the games that it outperformed all previous computer systems on six of the seven games, and outperformed human experts on three of the games.

Deep learning can be applied to all three machine learning scenarios: supervised, unsupervised, and reinforcement. Supervised machine learning is, however, the most common type of machine learning. Consequently, the majority of this book will focus on deep learning in a supervised learning context. However, most of the deep learning concerns and principles introduced in the supervised learning context also apply to unsupervised and reinforcement learning.

Why Is Deep Learning So Successful?

In any data-driven process the primary determinant of success is knowing what to measure and how to measure it. This is why the processes of feature selection and feature design are so important to machine learning. As discussed above, these tasks can require domain expertise, statistical analysis of the data, and iterations of experiments building models with different feature sets. Consequently, dataset design and preparation can consume a significant portion of time and resources expended in the project, in some cases approaching up to 80% of the total budget of a project (Kelleher and Tierney 2018). Feature design is one task in which deep learning can have a significant advantage over traditional machine learning. In traditional machine learning, the design of features often requires a large amount of human effort. Deep learning takes a different approach to feature design, by attempting to automatically learn the features that are most useful for the task from the raw data.

In any data-driven process the primary determinant of success is knowing what to measure and how to measure it.

To give an example of feature design, a person’s body mass index (BMI) is the ratio of a person’s weight (in kilograms) divided by their height (in meters squared). In a medical setting, BMI is used to categorize people as underweight, normal, overweight, or obese. Categorizing people in this way can be useful in predicting the likelihood of a person developing a weight-related medical condition, such as diabetes. BMI is used for this categorization because it enables doctors to categorize people in a manner that is relevant to these weight-related medical conditions. Generally, as people get taller they also get heavier. However, most weight-related medical conditions (such as diabetes) are not affected by a person’s height but rather the amount they are overweight compared to other people of a similar stature. BMI is a useful feature to use for the medical categorization of a person’s weight because it takes the effect of height on weight into account. BMI is an example of a feature that is derived (or calculated) from raw features; in this case the raw features are weight and height. BMI is also an example of how a derived feature can be more useful in making a decision than the raw features that it is derived from. BMI is a hand-designed feature: Adolphe Quetelet designed it in the eighteenth century.

As mentioned above, during a machine learning project a lot of time and effort is spent on identifying, or designing, (derived) features that are useful for the task the project is trying to solve. The advantage of deep learning is that it can learn useful derived features from data automatically (we will discuss how it does this in later chapters). Indeed, given large enough datasets, deep learning has proven to be so effective in learning features that deep learning models are now more accurate than many of the other machine learning models that use hand-engineered features. This is also why deep learning is so effective in domains where examples are described with very large numbers of features. Technically datasets that contain large numbers of features are called high-dimensional. For example, a dataset of photos with a feature for each pixel in a photo would be high-dimensional. In complex high-dimensional domains, it is extremely difficult to hand-engineer features: consider the challenges of hand-engineering features for face recognition or machine translation. So, in these complex domains, adopting a strategy whereby the features are automatically learned from a large dataset makes sense. Related to this ability to automatically learn useful features, deep learning also has the ability to learn complex nonlinear mappings between inputs and outputs; we will explain the concept of a nonlinear mapping in chapter 3, and in chapter 6 we will explain how these mappings are learned from data.

Summary and the Road Ahead

This chapter has focused on positioning deep learning within the broader field of machine learning. Consequently, much of this chapter has been devoted to introducing machine learning. In particular, the concept of a function as a deterministic mapping from inputs to outputs was introduced, and the goal of machine learning was explained as finding a function that matches the mappings from input features to the output features that are observed in the examples in the dataset.

Within this machine learning context, deep learning was introduced as the subfield of machine learning that focuses on the design and evaluation of training algorithms and model architectures for modern neural networks. One of the distinctive aspects of deep learning within machine learning is the approach it takes to feature design. In most machine learning projects, feature design is a human-intensive task that can require deep domain expertise and consume a lot of time and project budget. Deep learning models, on the other hand, have the ability to learn useful features from low-level raw data, and complex nonlinear mappings from inputs to outputs. This ability is dependent on the availability of large datasets; however, when such datasets are available, deep learning can frequently outperform other machine learning approaches. Furthermore, this ability to learn useful features from large datasets is why deep learning can often generate highly accurate models for complex domains, be it in machine translation, speech processing, or image or video processing. In a sense, deep learning has unlocked the potential of big data. The most noticeable impact of this development has been the integration of deep learning models into consumer devices. However, the fact that deep learning can be used to analyze massive datasets also has implications for our individual privacy and civil liberty (Kelleher and Tierney 2018). This is why understanding what deep learning is, how it works, and what it can and can’t be used for, is so important. The road ahead is as follows:
• Chapter 2 introduces some of the foundational concepts of deep learning, including what a model is, how the parameters of a model can be set using data, and how we can create complex models by combining simple models.
• Chapter 3 explains what neural networks are, how they work, and what we mean by a deep neural network.
• Chapter 4 presents a history of deep learning. This history focuses on the major conceptual and technical breakthroughs that have contributed to the development of the field of machine learning. In particular, it provides a context and explanation for why deep learning has seen such rapid development in recent years.
• Chapter 5 describes the current state of the field, by introducing the two deep neural architectures that are the most popular today: convolutional neural networks and recurrent neural networks. Convolutional neural networks are ideally suited to processing image and video data. Recurrent neural networks are ideally suited to processing sequential data such as speech, text, or time-series data. Understanding the differences and commonalities across these two architectures will give you an awareness of how a deep neural network can be tailored to the characteristics of a specific type of data, and also an appreciation of the breadth of the design space of possible network architectures.
• Chapter 6 explains how deep neural networks models are trained, using the gradient descent and backpropagation algorithms. Understanding these two algorithms will give you a real insight into the state of artificial intelligence. For example, it will help you to understand why, given enough data, it is currently possible to train a computer to do a specific task within a well-defined domain at a level beyond human capabilities, but also why a more general form of intelligence is still an open research challenge for artificial intelligence.
• Chapter 7 looks to the future in the field of deep learning. It reviews the major trends driving the development of deep learning at present, and how they are likely to contribute to the development of the field in the coming years. The chapter also discusses some of the challenges the field faces, in particular the challenge of understanding and interpreting how a deep neural network works.

2 Conceptual Foundations

This chapter introduces some of the foundational concepts that underpin deep learning. The basis of this chapter is to decouple the initial presentation of these concepts from the technical terminology used in deep learning, which is introduced in subsequent chapters.

A deep learning network is a mathematical model that is (loosely) inspired by the structure of the brain. Consequently, in order to understand deep learning it is helpful to have an intuitive understanding of what a mathematical model is, how the parameters of a model can be set, how we can combine (or compose) models, and how we can use geometry to understand how a model processes information.

What Is a Mathematical Model?

In its simplest form, a mathematical model is an equation that describes how one or more input variables are related to an output variable. In this form a mathematical model is the same as a function: a mapping from inputs to outputs.

In any discussion relating to models, it is important to remember the statement by George Box that all models are wrong but some are useful! For a model to be useful it must have a correspondence with the real world. This correspondence is most obvious in terms of the meaning that can be associated with a variable. For example, in isolation a value such as 78,000 has no meaning because it has no correspondence with concepts in the real world. But yearly income=$78,000 tells us how the number describes an aspect of the real world. Once the variables in a model have a meaning, we can understand the model as describing a process through which different aspects of the world interact and cause new events. The new events are then described by the outputs of the model.

A very simple template for a model is the equation of a line:

In this equationis the output variable,is the input variable, andandare two parameters of the model that we can set to adjust the relationship the model defines between the input and the output.

Imagine we have a hypothesis that yearly income affects a person’s happiness and we wish to describe the relationship between these two variables.1 Using the equation of a line, we could define a model to describe this relationship as follows:

This model has a meaning because the variables in the model (as distinct from the parameters of the model) have a correspondence with concepts from the real world. To complete our model, we have to set the values of the model’s parameters:and. Figure 2.1 illustrates how varying the values of each of these parameters changes the relationship defined by the model between income and happiness.

One important thing to notice in this figure is that no matter what values we set the model parameters to, the relationship defined by the model between the input and the output variable can be plotted as a line. This is not surprising because we used the equation of a line as the template to define our model, and this is why mathematical models that are based on the equation of a line are known as linear models. The other important thing to notice in the figure is how changing the parameters of the model changes the relationship between income and happiness.

Figure 2.1 Three different linear models of how income affects happiness.

The solid steep line, with parameters, is a model of the world in which people with zero income have a happiness level of 1, and increases in income have a significant effect on people’s happiness. The dashed line, with parameters, is a model in which people with zero income have a happiness level of 1 and increased income increases happiness, but at the slower rate compared to the world modeled by the solid line. Finally, the dotted line, parameters, is a model of the world where no one is particularly unhappy—even people with zero income have a happiness of 4 out of 10—and although increases in income do affect happiness, the effect is moderate. This third model assumes that income has a relatively weak effect on happiness.

More generally, the differences between the three models in figure 2.1 show how making changes to the parameters of a linear model changes the model. Changingcauses the line to move up and done. This is most clearly seen if we focus on the y-axis: notice that the line defined by a model always crosses (or intercepts) the y-axis at the value thatis set to. This is why theparameter in a linear model is known as the intercept. The intercept can be understood as specifying the value of the output variable when the input variable is zero. Changing theparameter changes the angle (or slope) of the line. The slope parameter controls how quickly changes in income effect changes in happiness. In a sense, the slope value is a measure of how important income is to happiness. If income is very important (i.e., if small changes in income result in big changes in happiness), then the slope parameter of our model should be set to a large value. Another way of understanding this is to think of a slope parameter of a linear model as describing the importance, or weight, of the input variable in determining the value of the output.

Linear Models with Multiple Inputs

The equation of a line can be used as a template for mathematical models that have more than one input variable. For example, imagine yourself in a scenario where you have been hired by a financial institution to act as a loan officer and your job involves deciding whether or not a loan application should be granted. From interviewing domain experts you come up with a hypothesis that a useful way to model a person’s credit solvency is to consider both their yearly income and their current debts. If we assume that there is a linear relationship between these two input variables and a person’s credit solvency, then the appropriate mathematical model, written out in English would be:

Notice that in this model the
m
parameter has been replaced by a separate weight for each input variable, with each weight representing the importance of its associated input in determining the output. In mathematical notation this model would be written as:

where

represents the credit solvency output,

represents the income variable,

represents the debt variable, and

represents the intercept. Using the idea of adding a new weight for each new input to the model allows us to scale the equation of a line to as many inputs as we like. All the models defined in this way are still linear within the dimensions defined by the number of inputs and the output. What this means is that a linear model with two inputs and one output defines a flat plane rather than a line because that is what a two-dimensional line that has been extruded to three dimensions looks like.

It can become tedious to write out a mathematical model that has a lot of inputs, so mathematicians like to write things in as compact a form as possible. With this in mind, the above equation is sometimes written in the short form:

This notation tells us that to calculate the output variable
y
we must first go through all

inputs and multiple each input by its corresponding weight, then we should sum together the results of these

multiplications, and finally we add the

intercept parameter to the result of the summation. The

symbol tells us that we use addition to combine the results of the multiplications, and the index

tells us that we multiply each input by the weight with the same index. We can make our notation even more compact by treating the intercept as a weight. One way to do this is to assume an

that is always equal to 1 and to treat the intercept as the weight on this input, that is,

. Doing this allows us to write out the model as follows:

Notice that the index now starts at 0, rather than 1, because we are now assuming an extra input,
input₀=1
, and we have relabeled the intercept
weight₀.

Although we can write down a linear model in a number of different ways, the core of a linear model is that the output is calculated as the sum of the n input values multiplied by their corresponding weights. Consequently, this type of model defines a calculation known as a weighted sum, because we weight each input and sum the results. Although a weighted sum is easy to calculate, it turns out to be very useful in many situations, and it is the basic calculation used in every neuron in a neural network.

Setting the Parameters of a Linear Model

Let us return to our working scenario where we wish to create a model that enables us to calculate the credit solvency of individuals who have applied for a financial loan. For simplicity in presentation we will ignore the intercept parameter in this discussion as it is treated the same as the other parameters (i.e., the weights on the inputs). So, dropping the intercept parameter, we have the following linear model (or weighted sum) of the relationship between a person’s income and debt to their credit solvency:

The multiplication of inputs by weights, followed by a summation, is known as a weighted sum.

In order to complete our model, we need to specify the parameters of the model; that is, we need to specify the value of the weight for each input. One way to do this would be to use our domain expertise to come up with values for each of the parameters.

For example, if we assume that an increase in a person’s income has a bigger impact on their credit solvency than a similar increase in their debt, we should set the weighting for income to be larger than that of the debt. The following model encodes this assumption; in particular this model specifies that income is three times as important as debt in determining a person’s credit solvency:

The drawback with using domain knowledge to set the parameters of a model is that experts often disagree. For example, you may think that weighting income as three times as important as debt is not realistic; in that case the model can be adjusted by, for example, setting both income and debt to have an equal weighting, which would be equivalent to assuming that income and debt are equally important in determining credit solvency. One way to avoid arguments between experts is to use data to set the parameters. This is where machine learning helps. The learning done by machine learning is finding the parameters (or weights) of a model using a dataset.

Learning Model Parameters from Data

Later in the book we will describe the standard algorithm used to learn the weights for a linear model, known as the gradient descent algorithm. However, we can give a brief preview of the algorithm here. We start with a dataset containing a set of examples for which we have both the input values (income and debt) and the output value (credit solvency). Table 2.1 illustrates such a dataset from our credit solvency scenario.2

The learning done by machine learning is finding the parameters (or weights) of a model using a dataset.

We then begin the process of learning the weights by guessing initial values for each weight. It is very likely that this initial, guessed, model will be a very bad model. This is not a problem, however, because we will use the dataset to iteratively update the weights so that the model gets better and better, in terms of how well it matches the data. For the purpose of the example, we will use the model described above as our initial (guessed) model:

Table 2.1. A dataset of loan applications and known credit solvency rating of the applicant

ID	Annual income	Current debt	Credit solvency
1	$150	-$100	100
2	$250	-$300	-50
3	$450	-$250	400
4	$200	-$350	-300

The general process for improving the weights of the model is to select an example from the dataset and feed the input values from the example into the model. This allows us to calculate an estimate of the output value for the example. Once we have this estimated output, we can calculate the error of the model on the example by subtracting the estimated output from the correct output for the example listed in the dataset. Using the error of the model on the example, we can improve how well the model fits the data by updating the weights in the model using the following strategy, or learning rule:
• If the error is 0, then we should not change the weights of the model.
• If the error is positive, then the output of the model was too low, so we should increase the output of the model for this example by increasing the weights for all the inputs that had positive values for the example and decreasing the weights for all the inputs that had negative values for the example.
• If the error is negative, then the output of the model was too high, so we should decrease the output of the model for this example by decreasing the weights for all the inputs that had positive values for the example and increasing the weights for all the inputs that had negative values for the example.

To illustrate the weight update process we will use example 1 from table 2.1 (income = 150, debt = -100, and solvency = 100) to test the accuracy of our guessed model and update the weights according to the resulting error.

When the input values for the example are passed into the model, the credit solvency estimate returned by the model is 350. This is larger than the credit solvency listed for this example in the dataset, which is 100. As a result, the error of the model is negative (100 – 350 = –250); therefore, following the learning rule described above, we should decrease the output of the model for this example by decreasing the weights for positive inputs and increasing the weights for negative inputs. For this example, the income input had a positive value and the debt input had a negative value. If we decrease the weight for income by 1 and increase the weight for debt by 1, we end up with the following model:

We can test if this weight update has improved the model by checking if the new model generates a better estimate for the example than the old model. The following illustrates pushing the same example through the new model:

This time the credit solvency estimate generated by the model matches the value in the dataset, showing that the updated model fits the data more closely than the original model. In fact, this new model generates the correct output for all the examples in the dataset.

In this example, we only needed to update the weights once in order to find a set of weights that made the behavior of the model consistent with all the examples in the dataset. Typically, however, it takes many iterations of presenting examples and updating weights to get a good model. Also, in this example, we have, for the sake of simplicity, assumed that the weights are updated by either adding or subtracting 1 from them. Generally, in machine learning, the calculation of how much to update each weight by is more complicated than this. However, these differences aside, the general process outlined here for updating the weights (or parameters) of a model in order to fit the model to a dataset is the learning process at the core of deep learning.

Combining Models

We now understand how we can specify a linear model to estimate an applicant’s credit solvency, and how we can modify the parameters of the model in order to fit the model to a dataset. However, as a loan officer our job is not simply to calculate an applicant’s credit solvency; we have to decide whether to grant the loan application or not. In other words, we need a rule that will take a credit solvency score as input and return a decision on the loan application. For example, we might use the decision rule that a person with a credit solvency above 200 will be granted a loan. This decision rule is also a model: it maps an input variable, in this case credit solvency, to an output variable, loan decision.

Using this decision rule we can adjudicate on a loan application by first using the model of credit solvency to convert a loan applicant’s profile (described in terms of the annual income and debt) into a credit solvency score, and then passing the resulting credit solvency score through our decision rule model to generate the loan decision. We can write this process out in a pseudomathematical shorthand as follows:

Using this notation, the entire decision process for adjudicating the loan application for example 1 from table 2.1 is:

We are now in a position where we can use a model (composed of two simpler models, a decision rule and a weighted sum) to describe how a loan decision is made. What is more, if we use data from previous loan applications to set the parameters (i.e., the weights) of the model, our model will correspond to how we have processed previous loan applications. This is useful because we can use this model to process new applications in a way that is consistent with previous decisions. If a new loan application is submitted, we simply use our model to process the application and generate a decision. It is this ability to apply a mathematical model to new examples that makes mathematical modeling so useful.

When we use the output of one model as the input to another model, we are creating a third model by combining two models. This strategy of building a complex model by combining smaller simpler models is at the core of deep learning networks. As we will see, a neural network is composed of a large number of small units called neurons. Each of these neurons is a simple model in its own right that maps from a set of inputs to an output. The overall model implemented by the network is created by feeding the outputs from one group of neurons as inputs into a second group of neurons and then feeding the outputs of the second group of neurons as inputs to a third group of neurons, as so on, until the final output of the model is generated. The core idea is that feeding the outputs of some neuron as inputs to other neurons enables these subsequent neurons to learn to solve a different part of the overall problem the network is trying to solve by building on the partial solutions implemented by the earlier neurons—in a similar way to the way the decision rule generates the final adjudication for a loan application by building on the calculation of the credit solvency model. We will return to this topic of model composition in subsequent chapters.

Input Spaces, Weight Spaces, and Activation Spaces

Although mathematical models can be written out as equations, it is often useful to understand the geometric meaning of a model. For example, the plots in figure 2.1 helped us understand how changes in the parameters of a linear model changed the relationship between the variables that the model defined. There are a number of geometric spaces that it is useful to distinguish between, and understand, when we are discussing neural networks. These are the input space, the weight space, and the activation space of a neuron. We can use the decision model for loan applications that we defined in the previous section to explain these three different types of spaces.

We will begin by describing the concept of an input space. Our loan decision model took two inputs: the annual income and current debt of the applicant. Table 2.1 listed these input values for four example loan applications. We can plot the input space of this model by treating each of the input variables as the axis of a coordinate system. This coordinate space is referred to as the input space because each point in this space defines a possible combination of input values to the model. For example, the plot at the top-left of figure 2.2 shows the position of each of the four example loan applications within the models input space.

The weight space for a model describes the universe of possible weight combinations that a model might use. We can plot the weight space for a model by defining a coordinate system with one axis per weight in the model. The loan decision model has only two weights, one weight for the annual income input, and one weight for the current debt input. Consequently, the weight space for this model has two dimensions. The plot at the top-right of figure 2.2 illustrates a portion of the weight space for this model. The location of the weight combination used by the modelis highlighted in this figure. Each point within this coordinate system describes a possible set of weights for the model, and therefore corresponds to a different weighted sum function within the model. Consequently, moving from one location to another within this weight space is equivalent to changing the model because it changes the mapping from inputs to output that the model defines.

Figure 2.2 There are four different coordinate spaces related to the processing of the loan decision model: top-left plots the input space; top-right plots the weight space; bottom-left plots the activation (or decision) space; and bottom-right plots the input space with the decision boundary plotted.

A linear model maps a set of input values to a point in a new space by applying a weighted sum calculation to the inputs: multiply each input by a weight, and sum the results of the multiplication. In our loan decision model it is in this space that we apply our decision rule. Thus, we could call this space the decision space, but, for reasons that will become clear when we describe the structure of a neuron in the next chapter, we call this space the activation space. The axes of a model’s activation space correspond to the weighted inputs to the model. Consequently, each point in the activation space defines a set of weighted inputs. Applying a decision rule, such as our rule that a person with a credit solvency above 200 will be granted a loan, to each point in this activation space, and recording the result of the decision for each point, enables us to plot the decision boundary of the model in this space. The decision boundary divides those points in the activation space that exceed the threshold, from those points in the space below the threshold. The plot in the bottom-left of figure 2.2 illustrates the activation space for our loan decision model. The positions of the four example loan applications listed in table 2.1 when they are projected into this activation space are shown. The diagonal black line in this figure shows the decision boundary. Using this threshold, loan application number three is granted and the other loan applications are rejected. We can, if we wish, project the decision boundary back into the original input space by recording for each location in the input space which side of the decision boundary in the activation space it is mapped to by the weighted sum function. The plot at the bottom-right of figure 2.2 shows the decision boundary in the original input space (note the change in the values on the axes) and was generated using this process. We will return to the concepts of weight spaces and decision boundaries in next chapter when we describe how adjusting the parameters of a neuron changes the set of input combinations that cause the neuron to output a high activation.

Summary

The main idea presented in this chapter is that a linear mathematical model, be it expressed as an equation or plotted as a line, describes a relationship between a set of inputs and an output. Be aware that not all mathematical models are linear models, and we will come across nonlinear models in this book. However, the fundamental calculation of a weighted sum of inputs does define a linear model. Another big idea introduced in this chapter is that a linear model (a weighted sum) has a set of parameters, that is, the weights used in the weighted sum. By changing these parameters we can change the relationship the model describes between the inputs and the output. If we wish we could set these weights by hand using our domain expertise; however, we can also use machine learning to set the weights of the model so that the behavior of the model fits the patterns found in a dataset. The last big idea introduced in this chapter was that we can build complex models by combining simpler models. This is done by using the output from one (or more) models as input(s) to another model. We used this technique to define our composite model to make loan decisions. As we will see in the next chapter, the structure of a neuron in a neural network is very similar to the structure of this loan decision model. Just like this model, a neuron calculates a weighted sum of its inputs and then feeds the result of this calculation into a second model that decides whether the neuron activates or not.

The focus of this chapter has been to introduce some foundational concepts before we introduce the terminology of machine learning and deep learning. To give a quick overview of how the concepts introduced in this chapter map over to machine learning terminology, our loan decision model is equivalent to a two-input neuron that uses a threshold activation function. The two financial indicators (annual income and current debt) are analogous to the inputs the neuron receives. The terms input vector or feature vector are sometimes used to refer to the set of indicators describing a single example; in this context an example is a single loan applicant, described in terms of two features: annual income and current debt. Also, just like the loan decision model, a neuron associates a weight with each input. And, again, just like the loan decision model, a neuron multiplies each input by its associated weight and sums the results of these multiplications in order to calculate an overall score for the inputs. Finally, similar to the way we applied a threshold to the credit solvency score to convert it into a decision of whether to grant or reject the loan application, a neuron applies a function (known as an activation function) to convert the overall score of the inputs. In the earliest types of neurons, these activation functions were actually threshold functions that worked in exactly the same way as the score threshold used in this credit scoring example. In more recent neural networks, different types of activation functions (for example, the logistic, tanh, or ReLU functions) are used. We will introduce these activation functions in the next chapter.

3 Neural Networks: The Building Blocks of Deep Learning

The term deep learning describes a family of neural network models that have multiple layers of simple information processing programs, known as neurons, in the network. The focus of this chapter is to provide a clear and comprehensive introduction to how these neurons work and are interconnected in artificial neural networks. In later chapters, we will explain how neural networks are trained using data.

A neural network is a computational model that is inspired by the structure of the human brain. The human brain is composed of a massive number of nerve cells, called neurons. In fact, some estimates put the number of neurons in the human brain at one hundred billion (Herculano-Houzel 2009). Neurons have a simple three-part structure consisting of: a cell body, a set of fibers called dendrites, and a single long fiber called an axon. Figure 3.1 illustrates the structure of a neuron and how it connects to other neurons in the brain. The dendrites and the axon stem from the cell body, and the dendrites of one neuron are connected to the axons of other neurons. The dendrites act as input channels to the neuron and receive signals sent from other neurons along their axons. The axon acts as the output channel of a neuron, and so other neurons, whose dendrites are connected to the axon, receive the signals sent along the axon as inputs.

Neurons work in a very simple manner. If the incoming stimuli are strong enough, the neuron transmits an electrical pulse, called an action potential, along its axon to the other neurons that are connected to it. So, a neuron acts as an all-or-none switch, that takes in a set of inputs and either outputs an action potential or no output.

This explanation of the human brain is a significant simplification of the biological reality, but it does capture the main points necessary to understand the analogy between the structure of the human brain and computational models called neural networks. These points of analogy are: (1) the brain is composed of a large number of interconnected and simple units called neurons; (2) the functioning of the brain can be understood as processing information, encoded as high or low electrical signals, or activation potentials, that spread across the network of neurons; and (3) each neuron receives a set of stimuli from its neighbors and maps these inputs to either a high- or low-value output. All computational models of neural networks have these characteristics.

Figure 3.1 The structure of a neuron in the brain.

Artificial Neural Networks

An artificial neural network consists of a network of simple information processing units, called neurons. The power of neural networks to model complex relationships is not the result of complex mathematical models, but rather emerges from the interactions between a large set of simple neurons.

Figure 3.2 illustrates the structure of a neural network. It is standard to think of the neurons in a neural network as organized into layers. The depicted network has five layers: one input layer, three hidden layers, and one output layer. A hidden layer is just a layer that is neither the input nor the output layer. Deep learning networks are neural networks that have many hidden layers of neurons. The minimum number of hidden layers necessary to be considered deep is two. However, most deep learning networks have many more than two hidden layers. The important point is that the depth of a network is measured in terms of the number of hidden layers, plus the output layer.

Deep learning networks are neural networks that have many hidden layers of neurons.

In figure 3.2, the squares in the input layer represent locations in memory that are used to present inputs to the network. These locations can be thought of as sensing neurons. There is no processing of information in these sensing neurons; the output of each of these neurons is simply the value of the data stored at the memory location. The circles in the figure represent the information processing neurons in the network. Each of these neurons takes a set of numeric values as input and maps them to a single output value. Each input to a processing neuron is either the output of a sensing neuron or the output of another processing neuron.

Figure 3.2 Topological illustration of a simple neural network.

The arrows in figure 3.2 illustrate how information flows through the network from the output of one neuron to the input of another neuron. Each connection in a network connects two neurons and each connection is directed, which means that information carried along a connection only flows in one direction. Each of the connections in a network has a weight associated with it. A connection weight is simply a number, but these weights are very important. The weight of a connection affects how a neuron processes the information it receives along the connection, and, in fact, training an artificial neural network, essentially, involves searching for the best (or optimal) set of weights.

How an Artificial Neuron Processes Information

The processing of information within a neuron, that is, the mapping from inputs to an output, is very similar to the loan decision model that we developed in chapter 2. Recall that the loan decision model first calculated a weighted sum over the input features (income and debt). The weights used in the weighted sum were adjusted using a dataset so that the results of the weighted sum calculation, given an loan applicant’s income and debt as inputs, was an accurate estimate of the applicant’s credit solvency score. The second stage of processing in the loan decision model involved passing the result of the weighted sum calculation (the estimated credit solvency score) through a decision rule. This decision rule was a function that mapped a credit solvency score to a decision on whether a loan application was granted or rejected.

A neuron also implements a two-stage process to map inputs to an output. The first stage of processing involves the calculation of a weighted sum of the inputs to the neuron. Then the result of the weighted sum calculation is passed through a second function that maps the results of the weighted sum score to the neuron’s final output value. When we are designing a neuron, we can used many different types of functions for this second stage or processing; it may be as simple as the decision rule we used for our loan decision model, or it may be more complex. Typically the output value of a neuron is known as its activation value, so this second function, which maps from the result of the weighted sum to the activation value of the neuron, is known as an activation function.

Figure 3.3 illustrates how these stages of processing are reflected in the structure of an artificial neuron. In figure 3.3, the Σ symbol represents the calculation of the weighted sum, and the φ symbol represents the activation function processing the weighted sum and generating the output from the neuron.

Figure 3.3 The structure of an artificial neuron.

The neuron in figure 3.3 receives n inputson n different input connections, and each connection has an associated weight. The weighted sum calculation involves the multiplication of inputs by weights and the summation of the resulting values. Mathematically this calculation is written as:

This calculation can also be written in a more compact mathematical form as:

For example, assuming a neuron received the inputsand had the following weights
, the weighted sum calculation would be:
z=(3X-3)+(9×1)
=0

The second stage of processing within a neuron is to pass the result of the weighted sum, the value, through an activation function. Figure 3.4 plots the shape of a number of possible activation functions, as the input to each function, ranges across an interval, either [-1, …, +1] or [-10, …, +10] depending on which interval best illustrates the shape of the function. Figure 3.4 (top) plots a threshold activation function. The decision rule we used in the loan decision model was an example of a threshold function; the threshold used in that decision rule was whether the credit solvency score was above 200. Threshold activations were common in early neural network research. Figure 3.4 (middle) plots the logistic and tanh activation functions. The units employing these activation functions were popular in multilayer networks until quite recently. Figure 3.4 (bottom) plots the rectifier (or hinge, or positive linear) activation function. This activation function is very popular in modern deep learning networks; in 2011 the rectifier activation function was shown to enable better training in deep networks (Glorot et al. 2011). In fact, as will be discussed in chapter 4, during the review of the history of deep learning, one of the trends in neural network research has been a shift from threshold activation to logistic and tanh activations, and then onto rectifier activation functions.

Figure 3.4 Top: threshold function; middle: logistic and tanh functions; bottom: rectified linear function.

Returning to the example, the result of the weighted summation step was . Figure 3.4 (middle plot, solid line) plots the logistic function. Assuming that the neuron is using a logistic activation function, this plot shows how the result of the summation will be mapped to an output activation: . The calculation of the output activation of this neuron can be summarized as:

Notice that the processing of information in this neuron is nearly identical to the processing of information in the loan decision model we developed in the last chapter. The major difference is that we have replaced the decision threshold rule that mapped the weighted sum score to an accepted or rejected output with a logistic function that maps the weighted sum score to a value between 0 and 1. Depending on the location of this neuron in the network, the output activation of the neuron, in this instance , will either be passed as input to one or more neurons in the next layer in the network, or will be part of the overall output of the network. If a neuron is at the output layer, the interpretation of what its output value means would be dependent on the task that the neuron is designed to model. If a neuron is in one of the hidden layers of the network, then it may not be possible to put a meaningful interpretation on the output of the neuron apart from the general interpretation that it represents some sort of derived feature (similar to the BMI feature we discussed in chapter 1) that the network has found useful in generating its outputs. We will return to the challenge of interpreting the meaning of activations within a neural network in chapter 7.

The key point to remember from this section is that a neuron, the fundamental building block of neural networks and deep learning, is defined by a simple two-step sequence of operations: calculating a weighted sum and then passing the result through an activation function.

Figure 3.4 illustrates that neither the tanh nor the logistic function is a linear function. In fact, the plots of both of these functions have a distinctive s-shaped (rather than linear) profile. Not all activation functions have an s-shape (for example, the threshold and rectifier are not s-shaped), but all activation functions do apply a nonlinear mapping to the output of the weighted sum. In fact, it is the introduction of the nonlinear mapping into the processing of a neuron that is the reason why activation functions are used.

Why Is an Activation Function Necessary?

To understand why a nonlinear mapping is needed in a neuron, it is first necessary to understand that, essentially, all a neural network does is define a mapping from inputs to outputs, be it from a game position in Go to an evaluation of that position, or from an X-ray to a diagnosis of a patient. Neurons are the basic building blocks of neural networks, and therefore they are the basic building blocks of the mapping a network defines. The overall mapping from inputs to outputs that a network defines is composed of the mappings from inputs to outputs that each of the neurons within the network implement. The implication of this is that if all the neurons within a network were restricted to linear mappings (i.e., weighted sum calculations), the overall network would be restricted to a linear mapping from inputs to outputs. However, many of the relationships in the world that we might want to model are nonlinear, and if we attempt to model these relationships using a linear model, then the model will be very inaccurate. Attempting to model a nonlinear relationship with a linear model would be an example of the underfitting problem we discussed in chapter 1: underfitting occurs when the model used to encode the patterns in a dataset is too simple and as a result it is not accurate.

A linear relationship exists between two things when an increase in one always results in an increase or decrease in the other at a constant rate. For example, if an employee is on a fixed hourly rate, which does not vary at weekends or if they do overtime, then there is a linear relationship between the number of hours they work and their pay. A plot of their hours worked versus their pay will result in a straight line; the steeper the line the higher their fixed hourly rate of pay. However, if we make the payment system for our hypothetical employee just slightly more complex, by, for example, increasing their hourly rate of pay when they do overtime or work weekends, then the relationship between the number of hours they work and their pay is no longer linear. Neural networks, and in particular deep learning networks, are typically used to model relationships that are much more complex than this employee’s pay. Modeling these relationships accurately requires that a network be able to learn and represent complex nonlinear mappings. So, in order to enable a neural network to implement such nonlinear mappings, a nonlinear step (the activation function) must be included within the processing of the neurons in the network.

In principle, using any nonlinear function as an activation function enables a neural network to learn a nonlinear mapping from inputs to outputs. However, as we shall see later, most of the activation functions plotted in figure 3.4 have nice mathematical properties that are helpful when training a neural network, and this is why they are so popular in neural network research.

The fact that the introduction of a nonlinearity into the processing of the neurons enables the network to learn a nonlinear mapping between input(s) and output is another illustration of the fact that the overall behavior of the network emerges from the interactions of the processing carried out by individual neurons within the network. Neural networks solve problems using a divide-and-conquer strategy: each of the neurons in a network solves one component of the larger problem, and the overall problem is solved by combining these component solutions. An important aspect of the power of neural networks is that during training, as the weights on the connections within the network are set, the network is in effect learning a decomposition of the larger problem, and the individual neurons are learning how to solve and combine solutions to the components within this problem decomposition.

Within a neural network, some neurons may use different activation functions from other neurons in the network. Generally, however, all the neurons within a given layer of a network will be of the same type (i.e., they will all use the same activation function). Also, sometimes neurons are referred to as units, with a distinction made between units based on the activation function the units use: neurons that use a threshold activation function are known as threshold units, units that use a logistic activation function are known as logistic units, and neurons that use the rectifier activation function are known as rectified linear units, or ReLUs. For example, a network may have a layer of ReLUs connected to a layer of logistic units. The decision regarding which activation functions to use in the neurons in a network is made by the data scientist who is designing the network. To make this decision, a data scientist may run a number of experiments to test which activation functions give the best performance on a dataset. However, frequently data scientists default to using whichever activation function is popular at a given point. For example, currently ReLUs are the most popular type of unit in neural networks, but this may change as new activation functions are developed and tested. As we will discuss at the end of this chapter, the elements of a neural network that are set manually by the data scientist prior to the training process are known as hyperparameters.

Neural networks solve problems using a divide-and-conquer strategy: each of the neurons in a network solves one component of the larger problem, and the overall problem is solved by combining these component solutions.

The term hyperparameter is used to describe the manually fixed parts of the model in order to distinguish them from the parameters of the model, which are the parts of the model that are set automatically, by the machine learning algorithm, during the training process. The parameters of a neural network are the weights used in the weighted sum calculations of the neurons in the network. As we touched on in chapters 1 and 2, the standard training process for setting the parameters of a neural network is to begin by initializing the parameters (the network’s weights) to random values, and during training to use the performance of the network on the dataset to slowly adjust these weights so as to improve the accuracy of the model on the data. Chapter 6 describes the two algorithms that are most commonly used to train a neural network: the gradient descent algorithm and the backpropagation algorithm. What we will focus on next is understanding how changing the parameters of a neuron affects how the neuron responds to the inputs it receives.

How Does Changing the Parameters of a Neuron Affect Its Behavior?

The parameters of a neuron are the weights the neuron uses in the weighted sum calculation. Although the weighted sum calculation in a neuron is the same weighted sum used in a linear model, in a neuron the relationship between the weights and the final output of neuron is more complex because the result of the weighted sum is passed through an activation function in order to generate the final output. To understand how a neuron makes a decision on a given input, we need to understand the relationship between the neuron’s weights, the input it receives, and the output it generates in response.

The relationship between a neuron’s weights and the output it generates for a given input is most easily understood in neurons that use a threshold activation function. A neuron using this type of activation function is equivalent to our loan decision model that used a decision rule to classify the credit solvency scores, generated by the weighted sum calculation, to reject or grant loan applications. At the end of chapter 2, we introduced the concepts of an input space, a weight space, and an activation space (see figure 2.2). The input space for our two-input loan decision model could be visualized as a two-dimensional space, with one input (annual income) plotted along the x-axis, and the other input (current debt) on the y-axis. Each point in this plot defined a potential combination of inputs to the model, and the set of points in the input space defines the set of possible inputs the model could process. The weights used in the loan decision model can be understood as dividing the input space into two regions: the first region contains all of the inputs that result in the loan application being granted, and the other region contains all the inputs that result in the loan application being rejected. In that scenario, changing the weights used by the decision model would change the set of loan applications that were accepted or rejected. Intuitively, this makes sense because it changes the weighting that we put on an applicant’s income relative to their debt when we are deciding on granting the loan or not.

We can generalize the above analysis of the loan decision model to a neuron in a neural network. The equivalent neuron structure to the loan decision model is a two-input neuron with a threshold activation function. The input space for such a neuron has a similar structure to the input space for a loan decision model. Figure 3.5 presents three plots of the input space for a two-input neuron using a threshold function that outputs a high activation if the weighted sum result is greater than zero, and a low activation otherwise. The differences between each of the plots in this figure is that the neuron defines a different decision boundary in each case. In each plot, the decision boundary is marked with a black line.

Each of the plots in figure 3.5 was created by first fixing the weights of the neuron and then for each point in the input space recording whether the neuron returned a high or low activation when the coordinates of the point were used as the inputs to the neuron. The input points for which the neuron returned a high activation are plotted in gray, and the other points are plotted in white. The only difference between the neurons used to create these plots was the weights used in calculating the weighted sum of the inputs. The arrow in each plot illustrates the weight vector used by the neuron to generate the plot. In this context, a vector describes the direction and distance of a point from the origin.1 As we shall see, interpreting the set of weights used by a neuron as defining a vector (an arrow from the origin to the coordinates of the weights) in the neuron’s input space is useful in understanding how changes in the weights change the decision boundary of the neuron.

Figure 3.5 Decision boundaries for a two-input neuron. Top: weight vector [w1=1, w2=1]; middle: weight vector [w1=-2, w2=1]; bottom: weight vector [w1=1, w2=-2].

The weights used to create each plot change from one plot to the next. These changes are reflected in the direction of the arrow (the weight vector) in each plot. Specifically, changing the weights rotates the weight vector around the origin. Notice that the decision boundary in each plot is sensitive to the direction of the weight vector: in all the plots, the decision boundary is orthogonal (i.e., at a right, or 90°, angle) to the weight vector. So, changing the weights not only rotates the weight vector, it also rotates the decision boundary of the neuron. This rotation changes the set of inputs that the neuron outputs a high activation in response to (the gray regions).

To understand why this decision boundary is always orthogonal to the weight vector, we have to shift our perspective, for a moment, to linear algebra. Remember that every point in the input space defines a potential combination of input values to the neuron. Now, imagine each of these sets of input values as defining an arrow from the origin to the coordinates of the point in the input space. There is one arrow for each point in the input space. Each of these arrows is very similar to the weight vector, except that it points to the coordinates of the inputs rather than to the coordinates of the weights. When we treat a set of inputs as a vector, the weighted sum calculation is the same as multiplying two vectors, the input vector by the weight vector. In linear algebra terminology, multiplying two vectors is known as the dot product operation. For the purposes of this discussion, all we need to know about the dot product is that the result of this operation is dependent on the angle between the two vectors that are multiplied. If the angle between the two vectors is less than a right angle, then the result will be positive; otherwise, it will be negative. So, multiplying the weight vector by an input vector will return a positive value for all the input vectors at an angle less than a right angle to the weight vector, and a negative value for all the other vectors. The activation function used by this neuron returns a high activation when positive values are input and a low activation when negative values are input. Consequently, the decision boundary lies at a right angle to the weight vector because all the inputs at an angle less than a right angle to the weight vector will result in a positive input to the activation function and, therefore, trigger a high-output activation from the neuron; conversely, all the other inputs will result in a low-output activation from the neuron.

Switching back to the plots in figure 3.5, although the decision boundaries in each of the plots are at different angles, all the decision boundaries go through the point in space that the weight vectors originate from (i.e., the origin). This illustrates that changing the weights of a neuron rotates the neuron’s decision boundary but does not translate it. Translating the decision boundary means moving the decision boundary up and down the weight vector, so that the point where it meets the vector is not the origin. The restriction that all decision boundaries must pass through the origin limits the distinctions that a neuron can learn between input patterns. The standard way to overcome this limitation is to extend the weighted sum calculation so that it includes an extra element, known as the bias term. This bias term is not the same as the inductive bias we discussed in chapter 1. It is more analogous to the intercept parameter in the equation of a line, which moves the line up and down the y-axis. The purpose of this bias term is to move (or translate) the decision boundary away from the origin.

The bias term is simply an extra value that is included in the calculation of the weighted sum. It is introduced into the neuron by adding the bias to the result of the weighted summation prior to passing it through the activation function. Here is the equation describing the processing stages in a neuron with the bias term represented by the term b:

Figure 3.6 illustrates how the value of the bias term affects the decision boundary of a neuron. When the bias term is negative, the decision boundary is moved away from the origin in the direction that the weight vector points to (as in the top and middle plots in figure 3.6); when the bias term is positive, the decision boundary is translated in the opposite direction (see the bottom plot of figure 3.6). In both cases, the decision boundary remains orthogonal to the weight vector. Also, the size of the bias term affects the amount the decision boundary is moved from the origin; the larger the value of the bias term, the more the decision boundary is moved (compare the top plot of figure 3.6 with the middle and bottom plots).

Figure 3.6 Decision boundary plots for a two-input neuron that illustrate the effect of the bias term on the decision boundary. Top: weight vector [w1=1, w2=1] and bias equal to -1; middle: weight vector [w1=-2, w2=1] and bias equal to -2; bottom: weight vector [w1=1, w2=-2] and bias equal to 2.

Instead of manually setting the value of the bias term, it is preferable to allow a neuron to learn the appropriate bias. The simplest way to do this is to treat the bias term as a weight and allow the neuron to learn the bias term at the same time that it is learning the rest of the weights for its inputs. All that is required to achieve this is to augment all the input vectors the neuron receives with an extra input that is always set to 1. By convention, this input is input 0 (), and, consequently, the bias term is specified by weight 0 ().2 Figure 3.7 illustrates the structure of an artificial neuron when the bias term has been integrated as .

When the bias term has been integrated into the weights of a neuron, the equation specifying the mapping from input(s) to output activation of the neuron can be simplified (at least from a notational perspective) as follows:

Notice that in this equation the index goes from to , so that it now includes the fixed input, , and the bias term, ; in the earlier version of this equation, the index only went from to . This new format means that the neuron is able to learn the bias term, simply by learning the appropriate weight , using the same process that is used to learn the weights for the other inputs: at the start of training, the bias term for each neuron in the network will be initialized to a random value and then adjusted, along with the weights of the network, in response to the performance of the network on the dataset.

Figure 3.7 An artificial neuron with a bias term included as w₀.

Accelerating Neural Network Training Using GPUs

Merging the bias term is more than a notational convenience; it enables us to use specialized hardware to accelerate the training of neural networks. The fact that a bias term can be treated as the same as a weight means that the calculation of the weighted sum of inputs (including the addition of the bias term) can be treated as the multiplication of two vectors. As we discussed earlier, during the explanation of why the decision boundary was orthogonal to the weight vector, we can think of a set of inputs as a vector. Recognizing that much of the processing within a neural network involves vector and matrix multiplications opens up the possibility of using specialized hardware to speed up these calculations. For example, graphics processing units (GPUs) are hardware components that have specifically been designed to do extremely fast matrix multiplications.

In a standard feedforward network, all the neurons in one layer receive all the outputs (i.e., activations) from all the neurons in the preceding layer. This means that all the neurons in a layer receive the same set of inputs. As a result, we can calculate the weighted sum calculation for all the neurons in a layer using only a single vector by matrix multiplication. Doing this is much faster than calculating a separate weighted sum for each neuron in the layer. To do this calculation of weighted sums for an entire layer of neurons in a single multiplication, we put the outputs from the neurons in the preceding layer into a vector and store all the weights of the connections between the two layers of neurons in a matrix. We then multiply the vector by the matrix, and the resulting vector contains the weighted sums for all the neurons.

Figure 3.8 illustrates how the weighted summation calculations for all the neurons in a layer in a network can be calculated using a single matrix multiplication operation. This figure is composed of two separate graphics: the graphic on the left illustrates the connections between neurons in two layers of a network, and the graphic on the right illustrates the matrix operation to calculate the weighted sums for the neurons in the second layer of the network. To help maintain a correspondence between the two graphics, the connections into neuron E are highlighted in the graphic on the left, and the calculation of the weighted sum in neuron E is highlighted in the graphic on the right.

Focusing on the graphic on the right, the vector (1 row, 3 columns) on the bottom-left of this graphic, stores the activations for the neurons in layer 1 of the network; note that these activations are the outputs from an activation function (the particular activation function is not specified—it could be a threshold function, a tanh, a logistic function, or a rectified linear unit/ReLU function). The matrix (three rows and four columns), in the top-right of the graphic, holds the weights for the connections between the two layers of neurons. In this matrix, each column stores the weights for the connections coming into one of the neurons in the second layer of the network. The first column stores the weights for neuron D, the second column for neuron E, etc.3 Multiplying the vector of activations from layer 1 by the weight matrix results in a vector corresponding to the weighted summations for the four neurons in layer 2 of the network: is the weighted sum of inputs for neuron D, for neuron E, and so on.

To generate the vector containing the weighted summations for the neurons in layer 2, the activation vector is multiplied by each column in the matrix in turn. This is done by multiplying the first (leftmost) element in the vector by the first (topmost) element in the column, then multiplying the second element in the vector by the element in the second row in the column, and so on, until each element in the vector has been multiplied by its corresponding column element. Once all the multiplications between the vector and the column have been completed, the results are summed together and the stored in the output vector. Figure 3.8 illustrates multiplication of the activation vector by the second column in the weight matrix (the column containing the weights for inputs to neuron E) and the storing of the summation of these multiplications in the output vector as the value .

Figure 3.8 A graphical illustration of the topological connections of a specific neuron E in a network, and the corresponding vector by matrix multiplication that calculates the weighted summation of inputs for the neuron E, and its siblings in the same layer.⁵

Indeed, the calculation implemented by an entire neural network can be represented as a chain of matrix multiplications, with an element-wise application of activation functions to the results of each multiplication. Figure 3.9 illustrates how a neural network can be represented in both graph form (on the left) and as a sequence of matrix operations (on the right). In the matrix representation, the symbol represents standard matrix multiplication (described above) and the notation represents the application of an activation function to each element in the vector created by the preceding matrix multiplication. The output of this element-wise application of the activation function is a vector containing the activations for the neurons in a layer of the network. To help show the correspondence between the two representations, both figures show the inputs to the network, and , the activations from the three hidden units, , , and , and the overall output of the network, .

Figure 3.9 A graph representation of a neural network (left), and the same network represented as a sequence of matrix operations (right).⁶

As a side note, the matrix representation provides a transparent view of the depth of a network; the network’s depth is counted as the number of layers that have a weight matrix associated with them (or equivalently, the depth of a network is the number of weight matrices required by the network). This is why the input layer is not counted when calculating the depth of a network: it does not have a weight matrix associated with it.

As mentioned above, the fact that the majority of calculations in a neural network can be represented as a sequence of matrix operations has important computational implications for deep learning. A neural network may contain over a million neurons, and the current trend is for the size of these networks to double every two to three years.4 Furthermore, deep learning networks are trained by iteratively running a network on examples sampled from very large datasets and then updating the network parameters (i.e., the weights) to improve performance. Consequently, training a deep learning network can require very large numbers of network runs, with each network run requiring millions of calculations. This is why computational speedups, such as those that can be achieved by using GPUs to perform matrix multiplications, have been so important for the development of deep learning.

The relationship between GPUs and deep learning is not one-way. The growth in demand for GPUs generated by deep learning has had a significant impact on GPU manufacturers. Deep learning has resulted in these companies refocusing their business. Traditionally, these companies would have focused on the computer games market, since the original motivation for developing GPU chips was to improve graphics rendering, and this had a natural application to computer games. However, in recent years these companies have focused on positioning GPUs as hardware for deep learning and artificial intelligence applications. Furthermore, GPU companies have also invested to ensure that their products support the top deep learning software frameworks.

Summary

The primary theme in this chapter has been that deep learning networks are composed of large numbers of simple processing units that work together to learn and implement complex mappings from large datasets. These simple units, neurons, execute a two-stage process: first, a weighted summation over the inputs to the neuron is calculated, and second, the result of the weighted summation is passed through a nonlinear function, known as an activation function. The fact that a weighted summation function can be efficiently calculated across a layer of neurons using a single matrix multiplication operation is important: it means that neural networks can be understood as a sequence of matrix operations; this has permitted the use of GPUs, hardware optimized to perform fast matrix multiplication, to speed up the training of networks, which in turn has enabled the size of networks to grow.

The compositional nature of neural networks means that it is possible to understand at a very fundamental level how a neural network operates. Providing a comprehensive description of this level of processing has been the focus of this chapter. However, the compositional nature of neural networks also raises a raft of questions in relation to how a network should be composed to solve a given task, for example:
• Which activation functions should the neurons in a network use?
• How many layers should there be in a network?
• How many neurons should there be in each layer?
• How should the neurons be connected together?

Unfortunately, many of these questions cannot be answered at a level of pure principle. In machine learning terminology, the types of concepts these questions are about are known as hyperparameters, as distinct from model parameters. The parameters of a neural network are the weights on the edges, and these are set by training the network using large datasets. By contrast, hyperparameters are the parameters of a model (in these cases, the parameters of a neural network architecture) and/or training algorithm that cannot be directly estimated from the data but instead must be specified by the person creating the model, either through the use of heuristic rules, intuition, or trial and error. Often, much of the effort that goes into the creation of a deep learning network involves experimental work to answer the questions in relation to hyperparameters, and this process is known as hyperparameter tuning. The next chapter will review the history and evolution of deep learning, and the challenges posed by many of these questions are themes running through the review. Subsequent chapters in the book will explore how answering these questions in different ways can create networks with very different characteristics, each suited to different types of tasks. For example, recurrent neural networks are best suited to processing sequential/time-series data, whereas convolutional neural networks were originally developed to process images. Both of these network types are, however, built using the same fundamental processing unit, the artificial neuron; the differences in the behavior and abilities of these networks stems from how these neurons are arranged and composed.

4 A Brief History of Deep Learning

The history of deep learning can be described as three major periods of excitement and innovation, interspersed with periods of disillusionment. Figure 4.1 shows a timeline of this history, which highlights these periods of major research: on threshold logic units (early 1940s to the mid 1960s), connectionism (early 1980s to mid-1990s), and deep learning (mid 2000s to the present). Figure 4.1 distinguishes some of the primary characteristics of the networks developed in each of these three periods. The changes in these network characteristics highlight some of the major themes within the evolution of deep learning, including: the shift from binary to continuous values; the move from threshold activation functions, to logistic and tanh activation, and then onto ReLU activation; and the progressive deepening of the networks, from single layer, to multiple layer, and then onto deep networks. Finally, the upper half of figure 4.1 presents some of the important conceptual breakthroughs, training algorithms, and model architectures that have contributed to the evolution of deep learning.

Figure 4.1 provides a map of the structure of this chapter, with the sequence of concepts introduced in the chapter generally following the chronology of this timeline. The two gray rectangles in figure 4.1 represent the development of two important deep learning network architectures: convolutional neural networks (CNNs), and recurrent neural networks (RNNs). We will describe the evolution of these two network architectures in this chapter, and chapter 5 will give a more detailed explanation of how these networks work.

Early Research: Threshold Logic Units

In some of the literature on deep learning, the early neural network research is categorized as being part of cybernetics, a field of research that is concerned with developing computational models of control and learning in biological units. However, in figure 4.1, following the terminology used in Nilsson (1965), this early work is categorized as research on threshold logic units because this term transparently describes the main characteristics of the systems developed during this period. Most of the models developed in the 1940s, ’50s, and ’60s processed Boolean inputs (true/false represented as +1/-1 or 1/0) and generated Boolean outputs. They also used threshold activation functions (introduced in chapter 3), and were restricted to single-layer networks; in other words, they were restricted to a single matrix of tunable weights. Frequently, the focus of this early research was on understanding whether computational models based on artificial neurons had the capacity to learn logical relations, such as conjunction or disjunction.

In 1943, Walter McCulloch and Walter Pitts published an influential computational model of biological neurons in a paper entitled: “A Logical Calculus of the Ideas Immanent in Nervous Activity” (McCulloch and Pitts 1943). The paper highlighted the all-or-none characteristic of neural activity in the brain and set out to mathematically describe neural activity in terms of a calculus of propositional logic. In the McCulloch and Pitts model, all the inputs and the output to a neuron were either 0 or 1. Furthermore, each input was either excitatory (having a weight of +1) or inhibitory (having a weight of -1). A key concept introduced in the McCulloch and Pitts model was a summation of inputs followed by a threshold function being applied to the result of the summation. In the summation, if an excitatory input was on, it added 1; if an inhibitory input was on, it subtracted 1. If the result of the summation was above a preset threshold, then the output of the neuron was 1; otherwise, it output a 0. In the paper, McCulloch and Pitts demonstrated how logical operations (such as conjunction, disjunction, and negation) could be represented using this simple model. The McCulloch and Pitts model integrated the majority of the elements that are present in the artificial neurons introduced in chapter 3. In this model, however, the neuron was fixed; in other words the weights and threshold were set by han.

In 1949, Donald O. Hebb published a book entitled The Organization of Behavior, in which he set out a neuropsychological theory (integrating psychology and the physiology of the brain) to explain general human behavior. The fundamental premise of the theory was that behavior emerged through the actions and interactions of neurons. For neural network research, the most important idea in this book was a postulate, now known as Hebb’s postulate, which explained the creation of lasting memory in animals based on a process of changes to the connections between neurons:
When an axon of a cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. (Hebb 1949, p. 62)

This postulate was important because it asserted that information was stored in the connections between neurons (i.e., in the weights of a network), and furthermore that learning occurred by changing these connections based on repeated patterns of activation (i.e., learning can take place within a network by changing the weights of the network).

Rosenblatt’s Perceptron Training Rule

In the years following Hebb’s publication, a number of researchers proposed computational models of neuron activity that integrated the Boolean threshold activation units of McCulloch and Pitts, with a learning mechanism based on adjusting the weights applied to the inputs. The best known of these models was Frank Rosenblatt’s perceptron model (Rosenblatt 1958). Conceptually, the perceptron model can be understood as a neural network consisting of a single artificial neuron that uses a threshold activation unit. Importantly, a perceptron network only has a single layer of weights. The first implementation of a perceptron was a software implementation on an IBM 704 system (and this was probably the first implementation of any neural network). However, Rosenblatt always intended the perceptron to be a physical machine and it was later implemented in custom-built hardware known as the “Mark 1 perceptron.” The Mark 1 perceptron received input from a camera that generated a 400-pixel image that was passed into the machine via an array of 400 photocells that were in turn connected to the neurons. The weights on connections to the neurons were implemented using adjustable electrical resistors known as potentiometers, and weight adjustments were implemented by using electric motors to adjust the potentiometers.

Rosenblatt proposed an error-correcting training procedure for updating the weights of a perceptron so that it could learn to distinguish between two classes of input: inputs for which the perceptron should produce the output, and inputs for which the perceptron should produce the output(Rosenblatt 1960). The training procedure assumes a set of Boolean encoded input patterns, each with an associated target output. At the start of training, the weights in the perceptron are initialized to random values. Training then proceeds by iterating through the training examples, and after each example has been presented to the network, the weights of the network are updated based on the error between the output generated by the perceptron and the target output specified in the data. The training examples can be presented to the network in any order and examples may be presented multiple times before training is completed. A complete training pass through the set of examples is known as an iteration, and training terminates when the perceptron correctly classifies all the examples in an iteration.

Rosenblatt defined a learning rule (known as the perceptron training rule) to update each weight in a perceptron after a training example has been processed. The strategy the rule used to update the weights is the same as the three-condition strategy we introduced in chapter 2 to adjust the weights in the loan decision model:
1. If the output of the model for an example matches the output specified for that example in the dataset, then don’t update the weights.
2. If the output of the model is too low for the current example, then increase the output of the model by increasing the weights for the inputs that had positive value for the example and decreasing the weights for the inputs that had a negative value for the example.
3. If the output of the model is too high for the current example, then reduce the output of the model by decreasing the weights for the inputs that had a positive value and increasing the weights for the inputs that had a negative value for the example.

Written out in an equation, Rosenblatt’s learning rule updates a weight (
) as:

In this rule,
is the value of weight i after the network weights have been updated in response to the processing of example t, is the value of weight i used during the processing of example t, is a preset positive constant (known as the learning rate, discussed below), is the expected output for example t as specified in the training dataset, is the output generated by the perceptron for example t, and is the component of input t that was weighted by during the processing of the example.

Although it may look complex, the perceptron training rule is in fact just a mathematical specification of the three-condition weight update strategy described above. The primary part of the equation to understand is the calculation of the difference between the expected output and what the perceptron actually predicted: . The outcome of this subtraction tells us which of the three update conditions we are in. In understanding how this subtraction works, it is important to remember that for a perceptron model the desired output is always either or . The first condition is when ; then the output of the perceptron is correct and the weights are not changed.

The second weight update condition is when the output of the perceptron is too large. This condition can only be occur when the correct output for example is and so this condition is triggered when . In this case, if the perceptron output for the example is , then the error term is negative () and the weight is updated by . Assuming, for the purpose of this explanation, that is set to 0.5, then this weight update simplifies to . In other words, when the perceptron’s output is too large, the weight update rule subtracts the input values from the weights. This will decrease the weights on inputs with positive values for the example, and increase the weights on inputs with negative values for the example (subtracting a negative number is the same as adding a positive number).

The third weight update condition is when the output of the perceptron is too small. This weight update condition is the exact opposite of the second. It can only occur when and so is triggered when . In this case (), and the weight is updated by . Again assuming that is set to 0.5, then this update simplifies to , which highlights that when the error of the perceptron is positive, the rule updates the weight by adding the input to the weight. This has the effect of decreasing the weights on inputs with negative values for the example and increasing the weight on inputs with positive values for the example.

At a number of points in the preceding paragraphs we have referred to learning rate, . The purpose of the learning rate, , is to control the size of the adjustments that are applied to a weight. The learning rate is an example of a hyperparameter that is preset before the model is trained. There is a tradeoff in setting the learning rate:
• If the learning rate is too small, it may take a very long time for the training process to converge on an appropriate set of weights.
• If the learning rate is too large, the network’s weights may jump around the weight space too much and the training may not converge at all.

One strategy for setting the learning rate is to set it to a relatively small positive value (e.g., 0.01), and another strategy is to initialize it to a larger value (e.g., 1.0) but to systematically reduce it as the training progresses

(e.g.,

To make this discussion regarding the learning rate more concrete, imagine you are trying to solve a puzzle that requires you to get a small ball to roll into a hole. You are able to control the direction and speed of the ball by tilting the surface that the ball is rolling on. If you tilt the surface too steeply, the ball will move very fast and is likely to go past the hole, requiring you to adjust the surface again, and if you overadjust you may end up repeatedly tilting the surface. On the other hand, if you only tilt the surface a tiny bit, the ball may not start to move at all, or it may move very slowly taking a long time to reach the hole. Now, in many ways the challenge of getting the ball to roll into the hole is similar to the problem of finding the best set of weights for a network. Think of each point on the surface the ball is rolling across as a possible set of network weights. The ball’s position at each point in time specifies the current set of weights of the network. The position of the hole specifies the optimal set of network weights for the task we are training the network to complete. In this context, guiding the network to the optimal set of weights is analogous to guiding the ball to the hole. The learning rate allows us to control how quickly we move across the surface as we search for the optimal set of weights. If we set the learning rate to a high value, we move quickly across the surface: we allow large updates to the weights at each iteration, so there are big differences between the network weights in one iteration and the next. Or, using our rolling ball analogy, the ball is moving very quickly, and just like in the puzzle when the ball is rolling too fast and passes the hole, our search process may be moving so fast that it misses the optimal set of weights. Conversely, if we set the learning rate to a low value, we move very slowly across the surface: we only allow small updates to the weights at each iteration; or, in other words, we only allow the ball to move very slowly. With a low learning rate, we are less likely to miss the optimal set of weights, but it may take an inordinate amount of time to get to them. The strategy of starting with a high learning rate and then systematically reducing it is equivalent to steeply tilting the puzzle surface to get the ball moving and then reducing the tilt to control the ball as it approaches the hole.

Rosenblatt proved that if a set of weights exists that enables the perceptron to properly classify all of the training examples correctly, the perceptron training algorithm will eventually converge on this set of weights. This finding is known as the perceptron convergence theorem (Rosenblatt 1962). The difficulty with training a perceptron, however, is that it may require a substantial number of iterations through the data before the algorithm converges. Furthermore, for many problems it is unknown whether an appropriate set of weights exists in advance; consequently, if training has been going on for a long time, it is not possible to know whether the training process is simply taking a long time to converge on the weights and terminate, or whether it will never terminate.

The Least Mean Squares Algorithm

Around the same time that Rosenblatt was developing the perceptron, Bernard Widrow and Marcian Hoff were developing a very similar model called the ADALINE (short for adaptive linear neuron), along with a learning rule called the LMS (least mean square) algorithm (Widrow and Hoff 1960). An ADALINE network consists of a single neuron that is very similar to a perceptron; the only difference is that an ADALINE network does not use a threshold function. In fact, the output of an ADALINE network is the just the weighted sum of the inputs. This is why it is known as a linear neuron: a weighted sum is a linear function (it defines a line), and so an ADALINE network implements a linear mapping from inputs to output. The LMS rule is nearly identical to the perceptron learning rule, except that the output of the perceptron for a given example is replaced by the weighted sum of the inputs:

The logic of the LMS update rule is the same as that of the perceptron training rule. If the output is too large, then weights that were applied to a positive input caused the output to be larger, and these weights should be decreased, and those that were applied to a negative input should be increased, thereby reducing the output the next time this input pattern is received. And, by the same logic, if the output is too small, then weights that were applied to a positive input are increased and those that were applied to a negative input should be decreased.

If the output of the model is too large, then weights associated with positive inputs should be reduced, whereas if the output is too small, then these weights should be increased.

One of the important aspects of Widrow and Hoff’s work was to show that LMS rule could be used to train network to predict a number of any value, not just a +1 or -1. This learning rule was called the least mean square algorithm because using the LMS rule to iteratively adjust the weights in a neuron is equivalent to minimizing the average squared error on the training set. Today, the LMS learning rule is sometimes called the Widrow-Hoff learning rule, after the inventors; however, it is more commonly called the delta rule because it uses the difference (or delta) between desired output and the actual output to calculate the weight adjustments. In other words, the LMS rule specifies that a weight should be adjusted in proportion to the difference between the output of an ADALINE network and the desired output: if the neuron makes a large error, then the weights are adjusted by a large amount, if the neuron makes a small error, then weights are adjusted by a small amount.

Today, the perceptron is recognized as important milestone in the development of neural networks because it was the first neural network to be implemented. However, most modern algorithms for training neural networks are more similar to the LMS algorithm. The LMS algorithm attempts to minimize the mean squared error of the network. As will be discussed in chapter 6, technically this iterative error reduction process involves a gradient descent down an error surface; and, today, nearly all neural networks are trained using some variant of gradient descent.

The XOR Problem

The success of Rosenblatt, Widrow and Hoff, and others, in demonstrating that neural network models could automatically learn to distinguish between different sets of patterns, generated a lot of excitement around artificial intelligence and neural network research. However, in 1969, Marvin Minsky and Seymour Papert published a book entitled Perceptrons, which, in the annals of neural network research, is attributed with single-handedly destroying this early excitement and optimism (Minsky and Papert 1969). Admittedly, throughout the 1960s neural network research had suffered from a lot of hype, and a lack of success in terms of fulfilling the correspondingly high expectations. However, Minsky and Papert’s book set out a very negative view of the representational power of neural networks, and after its publication funding for neural network research dried up.

Minsky and Papert’s book primarily focused on single layer perceptrons. Remember that a single layer perceptron is the same as a single neuron that uses a threshold activation function, and so a single layer perceptron is restricted to implementing a linear (straight-line) decision boundary.1 This means that a single layer perceptron can only learn to distinguish between two classes of inputs if it is possible to draw a straight line in the input space that has all of the examples of one class on one side of the line and all examples of the other class on the other side of the line. Minsky and Papert highlighted this restriction as a weakness of these models.

To understand Minsky and Papert’s criticism of single layer perceptrons, we must first understand the concept of a linearly separable function. We will use a comparison between the logical AND and OR functions with the logical XOR function to explain the concept of a linearly separable function. The AND function takes two inputs, each of which can be either TRUE or FALSE, and returns TRUE if both inputs are TRUE. The plot on the left of figure 4.4 shows the input space for the AND function and categorizes each of the four possible input combinations as either resulting in an output value of TRUE (shown in the figure by using a clear dot) or FALSE (shown in the figure by using black dots). This plot illustrates that is possible to draw a straight line between the inputs for which the AND function returns TRUE, (T,T), and the inputs for which the function returns FALSE, {(F,F), (F,T), (T,F)}. The OR function is similar to the AND function, except that it returns TRUE if either or both inputs are TRUE. The middle plot in figure 4.4 shows that it is possible to draw a line that separates the inputs that the OR function classifies as TRUE, {(F,T), (T,F), (T,T)}, from those it classifies as FALSE, (F,F). It is because we can draw a single straight line in the input space of these functions that divides the inputs belonging to one category of output from the inputs belonging to the other output category that the AND and OR functions are linearly separable functions.

The XOR function is also similar in structure to the AND and OR functions; however, it only returns TRUE if one (but not both) of its inputs are TRUE. The plot on the right of figure 4.2 shows the input space for the XOR function and categorizes each of the four possible input combinations as returning either TRUE (shown in the figure by using a clear dot) or FALSE (shown in the figure by using black dots). Looking at this plot you will see that it is not possible to draw a straight line between the inputs the XOR function classifies as TRUE and those that it classifies as FALSE. It is because we cannot use a single straight line to separate the inputs belonging to different categories of outputs for the XOR function that this function is said to be a nonlinearly separable function. The fact that the XOR function is nonlinearly separable does not make the function unique, or even rare—there are many functions that are nonlinearly separable.

Figure 4.2 Illustrations of the linearly separable function. In each figure, black dots represent inputs for which the function returns FALSE, circles represent inputs for which the function returns TRUE. (T stands for true and F stands for false.)

The key criticism that Minsky and Papert made of single layer perceptrons was that these single layer models were unable to learn nonlinearly separable functions, such as the XOR function. The reason for this limitation is that the decision boundary of a perceptron is linear and so a single layer perceptron cannot learn to distinguish between the inputs that belong to one output category of a nonlinearly separable function from those that belong to the other category.

It was known at the time of Minsky and Papert’s publication that it was possible to construct neural networks that defined a nonlinear decision boundary, and thus learn nonlinearly separable functions (such as the XOR function). The key to creating networks with more complex (nonlinear) decision boundaries was to extend the network to have multiple layers of neurons. For example, figure 4.3 shows a two-layer network that implements the XOR function. In this network, the logical TRUE and FALSE values are mapped to numeric values: FALSE values are represented by 0, and TRUE values are represented by 1. In this network, units activate (output +1) if the weighted sum of inputs is ; otherwise, they output 0. Notice that the units in the hidden layer implement the logical AND and OR functions. These can be understood as intermediate steps to solving the XOR challenge. The unit in the output layer implements the XOR by composing the outputs of these hidden layers. In other words, the unit in the output layer returns TRUE only when the AND node is off (output=0) and the OR node is on (output=1). However, it wasn’t clear at the time how to train networks with multiple layers. Also, at the end of their book, Minsky and Papert argued that “in their judgment” the research on extending neural networks to multiple layers was “sterile” (Minsky and Papert 1969, sec. 13.2 page 23).

Figure 4.3 A network that implements the XOR function. All processing units use a threshold activation function with a threshold of ≥1.

In a somewhat ironic historical twist, contemporaneous with Minsky and Papert’s publication, Alexey Ivakhnenko, a Ukrainian researcher, proposed the group method for data handling (GMDH), and in 1971 published a paper that described how it could be used to learn a neural network with eight layers (Ivakhnenko 1971). Today Ivakhnenko’s 1971 GMDH network is credited with being the first published example of a deep network trained from data (Schmidhuber 2015). However, for many years, Ivaknenko’s accomplishment was largely overlooked by the wider neural network community. As a consequence, very little of the current work in deep learning uses the GMDH method for training: in the intervening years other training algorithms, such as backpropagation (described below), became standardized in the community. At the same time of Ivakhnenko’s overlooked accomplishment, Minsky and Papert’s critique was proving persuasive and it heralded the end of the first period of significant research on neural networks.

This first period of neural network research, did, however, leave a legacy that shaped the development of the field up to the present day. The basic internal structure of an artificial neuron was defined: a weighted sum of inputs fed through an activation function. The concept of storing information within the weights of a network was developed. Furthermore, learning algorithms based on iteratively adapting weights were proposed, along with practical learning rules, such as the LMS rule. In particular, the LMS approach, of adjusting the weights of neurons in proportion to the difference between the output of the neuron and the desired output, is present in most modern training algorithms. Finally, there was recognition of the limitations of single layer networks, and an understanding that one way to address these limitations was to extend the networks to include multiple layers of neurons. At this time, however, it was unclear how to train networks with multiple layers. Updating a weight requires an understanding of how the weight affects the error of the network. For example, in the LMS rule if the output of the neuron was too large, then weights that were applied to positive inputs caused the output to increase. Therefore, decreasing the size of these weight would reduce the output and thereby reduce the error. But, in the late 1960s, the question of how to model the relationship between the weights of the inputs to neurons in the hidden layers of a network and the overall error of the network was still unanswered; and, without this estimation of the contribution of the weight to the error, it was not possible to adjust the weights in the hidden layers of a network. The problem of attributing (or assigning) an amount of error to the components in a network is sometimes referred to as the credit assignment problem, or as the blame assignment problem.

Connectionism: Multilayer Perceptrons

In the 1980s, people began to reevaluate the criticisms of the late 1960s as being overly severe. Two developments, in particular, reinvigorated the field: (1) Hopfield networks; and (2) the backpropagation algorithm.

In 1982, John Hopfield published a paper where he described a network that could function as an associative memory (Hopfield 1982). During training, an associative memory learns a set of input patterns. Once the associate memory network has been trained, then, if a corrupted version of one of the input patterns is presented to the network, the network is able to regenerate the complete correct pattern. Associative memories are useful for a number of tasks, including pattern completion and error correction. Table 4.12 illustrates the tasks of pattern completion and error correction using the example of an associative memory that has been trained to store information on people’s birthdays. In a Hopfield network, the memories, or input patterns, are encoded in binary strings; and, assuming binary patterns are relatively distinct from each other, a Hopfield network can store up to 0.138N of these strings, where N is the number of neurons in the network. So to store 10 distinct patterns requires a Hopfield network with 73 neurons, and to store 14 distinct patterns requires 100 neurons.

Table 4.1. Illustration of the uses of an association memory for pattern completion and error correction

Training patterns	Pattern completion
`John**12May`	`Liz***?????`	→	`Liz***25Feb`
`Kerry*03Jan`	`???***10Mar`	→	`Des***10Mar`
`Liz***25Feb`	Error correction
`Des***10Mar`	`Kerry*01Apr`	→	`Kerry*03Jan`
`Josef*13Dec`	`Jxsuf*13Dec`	→	`Josef*13Dec`

Backpropagation and Vanishing Gradients

In 1986, a group of researchers known as the parallel distributed processing (PDP) research group published a two-book overview of neural network research (Rumelhart et al. 1986b, 1986c). These books proved to be incredibly popular, and chapter 8 in volume one described the backpropagation algorithm (Rumelhart et al. 1986a). The backpropagation algorithm has been invented a number of times,3 but it was this chapter by Rumelhart, Hinton, and Williams, published by PDP, that popularized its use. The backpropagation algorithm is a solution to the credit assignment problem and so it can be used to train a neural network that has hidden layers of neurons. The backpropagation algorithm is possibly the most important algorithm in deep learning. However, a clear and complete explanation of the backpropagation algorithm requires first explaining the concept of an error gradient, and then the gradient descent algorithm. Consequently, the in-depth explanation of backpropagation is postponed until chapter 6, which begins with an explanation of these necessary concepts. The general structure of the algorithm, however, can be described relatively quickly. The backpropagation algorithm starts by assigning random weights to each of the connections in the network. The algorithm then iteratively updates the weights in the network by showing training instances to the network and updating the network weights until the network is working as expected. The core algorithm works in a two-stage process. In the first stage (known as the forward pass), an input is presented to the network and the neuron activations are allowed to flow forward through the network until an output is generated. The second stage (known as the backward pass) begins at the output layer and works backward through the network until the input layer is reached. This backward pass begins by calculating an error for each neuron in the output layer. This error is then used to update the weights of these output neurons. Then the error of each output neuron is shared back (backpropagated) to the hidden neurons that connect to it, in proportion to the weights on the connections between the output neuron and the hidden neuron. Once this sharing (or blame assignment) has been completed for a hidden neuron, the total blame attributable to that hidden neuron is summed and this total is used to update the weights on that neuron. The backpropagation (or sharing back) of blame is then repeated for the neurons that have not yet had blame attributed to them. This process of blame assignment and weight updates continues back through the network until all the weights have been updated.

A key innovation that enabled the backpropagation algorithm to work was a change in the activation functions used in the neurons. The networks that were developed in the early years of neural network research used threshold activation functions. The backpropagation algorithm does not work with threshold activation functions because backpropagation requires that the activation functions used by the neurons in the network be differentiable. Threshold activation functions are not differentiable because there is a discontinuity in the output of the function at the threshold. In other words, the slope of a threshold function at the threshold is infinite and therefore it is not possible to calculate the gradient of the function at that point. This led to the use of differentiable activation functions in multilayer neural networks, such as the logistic and tanh functions.

There is, however, an inherent limitation with using the backpropagation algorithm to train deep networks. In the 1980s, researchers found that backpropagation worked well with relatively shallow networks (one or two layers of hidden units), but that as the networks got deeper, the networks either took an inordinate amount of time to train, or else they entirely failed to converge on a good set of weights. In 1991, Sepp Hochreiter (working with Jürgen Schmidhuber) identified the cause of this problem in his diploma thesis (Hochreiter 1991). The problem is caused by the way the algorithm backpropagates errors. Fundamentally, the backpropagation algorithm is an implementation of the chain rule from calculus. The chain rule involves the multiplication of terms, and backpropagating an error from one neuron back to another can involve multiplying the error by a number terms with values less than 1. These multiplications by values less than 1 happen repeatedly as the error signal gets passed back through the network. This results in the error signal becoming smaller and smaller as it is backpropagated through the network. Indeed, the error signal often diminishes exponentially with respect to the distance from the output layer. The effect of this diminishing error is that the weights in the early layers of a deep network are often adjusted by only a tiny (or zero) amount during each training iteration. In other words, the early layers either train very, very slowly or do not move away from their random starting positions at all. However, the early layers in a neural network are vitally important to the success of the network, because it is the neurons in these layers that learn to detect the features in the input that the later layers of the network use as the fundamental building blocks of the representations that ultimately determine the output of the network. For technical reasons, which will be explained in chapter 6, the error signal that is backpropagated through the network is in fact the gradient of the error of the network, and, as a result, this problem of the error signal rapidly diminishing to near zero is known as the vanishing gradient problem.

Connectionism and Local versus Distributed Representations

Despite the vanishing gradient problem, the backpropagation algorithm opened up the possibility of training more complex (deeper) neural network architectures. This aligned with the principle of connectionism. Connectionism is the idea that intelligent behavior can emerge from the interactions between large numbers of simple processing units. Another aspect of connectionism was the idea of a distributed representation. A distinction can be made in the representations used by neural networks between localist and distributed representations. In a localist representation there is a one-to-one correspondence between concepts and neurons, whereas in a distributed representation each concept is represented by a pattern of activations across a set of neurons. Consequently, in a distributed representation each concept is represented by the activation of multiple neurons and the activation of each neuron contributes to the representation of multiple concepts.

In a distributed representation each concept is represented by the activation of multiple neurons and the activation of each neuron contributes to the representation of multiple concepts.

To illustrate the distinction between localist and distributed representations, consider a scenario where (for some unspecified reason) a set of neuron activations is being used to represent the absence or presence of different foods. Furthermore, each food has two properties, the country of origin of the recipe and its taste. The possible countries of origin are: Italy, Mexico, or France; and, the set of possible tastes are: Sweet, Sour, or Bitter. So, in total there are nine possible types of food: Italian+Sweet, Italian+Sour, Italian+Bitter, Mexican+Sweet, etc. Using a localist representation would require nine neurons, one neuron per food type. There are, however, a number of ways to define a distributed representation of this domain. One approach is to assign a binary number to each combination. This representation would require only four neurons, with the activation pattern 0000 representing Italian+Sweet, 0001 representing Italian+Sour, 0010 representing Italian+Bitter, and so on up to 1000 representing French+Bitter. This is a very compact representation. However, notice that in this representation the activation of each neuron in isolation has no independently meaningful interpretation: the rightmost neuron would be active (***1) for Italian+Sour, Mexican+Sweet, Mexican+Bitter, and France+Sour, and without knowledge of the activation of the other neurons, it is not possible know what country or taste is being represented. However, in a deep network the lack of semantic interpretability of the activations of hidden units is not a problem, so long as the neurons in the output layer of the network are able to combine these representations in such a way so as to generate the correct output. Another, more transparent, distributed representation of this food domain is to use three neurons to represent the countries and three neurons to represent the tastes. In this representation, the activation pattern 100100 could represent Italian+Sweet, 001100 could represent French+Sweet, and 001001 could represent French+Bitter. In this representation, the activation of each neuron can be independently interpreted; however the distribution of activations across the set of neurons is required in order to retrieve the full description of the food (country+taste). Notice, however, that both of these distributed representations are more compact than the localist representation. This compactness can significantly reduce the number of weights required in a network, and this in turn can result in faster training times for the network.

The concept of a distributed representation is very important within deep learning. Indeed, there is a good argument that deep learning might be more appropriately named representation learning—the argument being that the neurons in the hidden layers of a network are learning distributed representations of the input that are useful intermediate representations in the mapping from inputs to outputs that the network is attempting to learn. The task of the output layer of a network is then to learn how to combine these intermediate representations so as to generate the desired outputs. Consider again the network in figure 4.3 that implements the XOR function. The hidden units in this network learn an intermediate representation of the input, which can be understood as composed of the AND and OR functions; the output layer then combines this intermediate representation to generate the required output. In a deep network with multiple hidden layers, each subsequent hidden layer can be interpreted as learning a representation that is an abstraction over the outputs of the preceding layer. It is this sequential abstraction, through learning intermediate representations, that enables deep networks to learn such complex mappings from inputs to outputs.

Network Architectures: Convolutional and Recurrent Neural Networks

There are a considerable number of ways in which a set of neurons can be connected together. The network examples presented so far in the book have been connected together in a relatively uncomplicated manner: neurons are organized into layers and each neuron in a layer is directly connected to all of the neurons in the next layer of the network. These networks are known as feedforward networks because there are no loops within the network connections: all the connections point forward from the input toward the output. Furthermore, all of our network examples thus far would be considered to be fully connected, because each neuron is connected to all the neurons in the next layer. It is possible, and often useful, to design and train networks that are not feedforward and/or that are not fully connected. When done correctly, tailoring network architectures can be understood as embedding into the network architecture information about the properties of the problem that the network is trying to learn to model.

A very successful example of incorporating domain knowledge into a network by tailoring the networks architecture is the design of convolutional neural networks (CNNs) for object recognition in images. In the 1960s, Hubel and Wiesel carried out a series of experiments on the visual cortex of cats (Hubel and Wiesel 1962, 1965). These experiments used electrodes inserted into the brains of sedated cats to study the response of the brain cells as the cats were presented with different visual stimuli. Examples of the stimuli used included bright spots or lines of light appearing at a location in the visual field, or moving across a region of the visual field. The experiments found that different cells responded to different stimuli at different locations in the visual field: in effect a single cell in the visual cortex would be wired to respond to a particular type of visual stimulus occurring within a particular region of the visual field. The region of the visual field that a cell responded to was known as the receptive field of the cell. Another outcome of these experiments was the differentiation between two types of cells: “simple” and “complex.” For simple cells, the location of the stimulus is critical with a slight displacement of the stimulus resulting in a significant reduction in the cell’s response. Complex cells, however, respond to their target stimuli regardless of where in the field of vision the stimulus occurs. Hubel and Wiesel (1965) proposed that complex cells behaved as if they received projections from a large number of simple cells all of which respond to the same visual stimuli but differing in the position of their receptive fields. This hierarchy of simple cells feeding into complex cells results in funneling of stimuli from large areas of the visual field, through a set of simple cells, into a single complex cell. Figure 4.4 illustrates this funneling effect. This figure shows a layer of simple cells each monitoring a receptive field at a different location in the visual field. The receptive field of the complex cell covers the layer of simple cells, and this complex cell activates if any of the simple cells in its receptive field activates. In this way the complex cell can respond to a visual stimulus if it occurs at any location in the visual field.

Figure 4.4 The funneling effect of receptive fields created by the hierarchy of simple and complex cells.

In the late 1970s and early 1980s, Kunihiko Fukushima was inspired by Hubel and Wiesel’s analysis of the visual cortex and developed a neural network architecture for visual pattern recognition that was called the neocognitron (Fukushima 1980). The design of the neocognitron was based on the observation that an image recognition network should be able to recognize if a visual feature is present in an image irrespective of location in the image—or, to put it slightly more technically, the network should be able to do spatially invariant visual feature detection. For example, a face recognition network should be able to recognize the shape of an eye no matter where in the image it occurs, similar to the way a complex cell in Hubel and Wiesel’s hierarchical model could detect the presence of a visual feature irrespective of where in the visual field it occurred.

Fukushima realized that the functioning of the simple cells in the Hubel and Wiesel hierarchy could be replicated in a neural network using a layer of neurons that all use the same set of weights, but with each neuron receiving inputs from fixed small regions (receptive fields) at different locations in the input field. To understand the relationship between neurons sharing weights and spatially invariant visual feature detection, imagine a neuron that receives a set of pixel values, sampled from a region of an image, as its inputs. The weights that this neuron applies to these pixel values define a visual feature detection function that returns true (high activation) if a particular visual feature (pattern) occurs in the input pixels, and false otherwise. Consequently, if a set of neurons all use the same weights, they will all implement the same visual feature detector. If the receptive fields of these neurons are then organized so that together they cover the entire image, then if the visual feature occurs anywhere in the image at least one of the neurons in the group will identify it and activate.

Fukushima also recognized that the Hubel and Wiesel funneling effect (into complex cells) could be obtained by neurons in later layers also receiving as input the outputs from a fixed set of neurons in a small region of the preceding layer. In this way, the neurons in the last layer of the network each receive inputs from across the entire input field allowing the network to identify the presence of a visual feature anywhere in the visual input.

Some of the weights in neocognitron were set by hand, and others were set using an unsupervised training process. In this training process, each time an example is presented to the network a single layer of neurons that share the same weights is selected from the layers that yielded large outputs in response to the input. The weights of the neurons in the selected layer are updated so as to reinforce their response to that input pattern and the weights of neurons not in the layer are not updated. In 1989 Yann LeCun developed the convolutional neural network (CNN) architecture specifically for the task of image processing (LeCun 1989). The CNN architecture shared many of the design features found in the neocognitron; however, LeCun showed how these types of networks could be trained using backpropagation. CNNs have proved to be incredibly successful in image processing and other tasks. A particularly famous CNN is the AlexNet network, which won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2012 (Krizhevsky et al. 2012). The goal of the ILSVRC competition is to identify objects in photographs. The success of AlexNet at the ILSVRC competition generated a lot of excitement about CNNs, and since AlexNet a number of other CNN architectures have won the competition. CNNs are one of the most popular types of deep neural networks, and chapter 5 will provide a more detailed explanation of them.

Recurrent neural networks (RNNs) are another example of a neural network architecture that has been tailored to the specific characteristics of a domain. RNNs are designed to process sequential data, such as language. An RNN network processes a sequence of data (such as a sentence) one input at a time. An RNN has only a single hidden layer. However, the output from each of these hidden neurons is not only fed forward to the output neurons, it is also temporarily stored in a buffer and then fed back into all of the hidden neurons at the next input. Consequently, each time the network processes an input, each neuron in the hidden layer receives both the current input and the output the hidden layer generated in response to the previous input. In order to understand this explanation, it may at this point be helpful to briefly skip forward to figure 5.2 to see an illustration of the structure of an RNN and the flow of information through the network. This recurrent loop, of activations from the output of the hidden layer for one input being fed back into the hidden layer alongside the next input, gives an RNN a memory that enables it to process each input in the context of the previous inputs it has processed.4 RNNs are considered deep networks because this evolving memory can be considered as deep as the sequence is long.

An early well-known RNN is the Elman network. In 1990, Jeffrey Locke Elman published a paper that described an RNN that had been trained to predict the endings of simple two- and three-word utterances (Elman 1990). The model was trained on a synthesized dataset of simple sentences generated using an artificial grammar. The grammar was built using a lexicon of twenty-three words, with each word assigned to a single lexical category (e.g., man=NOUN-HUM, woman=NOUN-HUM, eat=VERB-EAT, cookie=NOUN-FOOD, etc.). Using this lexicon, the grammar defined fifteen sentence generation templates (e.g., NOUN-HUM+VERB-EAT+NOUN-FOOD which would generate sentences such as man eat cookie). Once trained, the model was able to generate reasonable continuations for sentences, such as woman+eat+? = cookie. Furthermore, once the network was started, it was able to generate longer strings consisting of multiple sentences, using the context it generated itself as the input for the next word, as illustrated by this three-sentence example:

girl eat bread dog move mouse mouse move book

Although this sentence generation task was applied to a very simple domain, the ability of the RNN to generate plausible sentences was taken as evidence that neural networks could model linguistic productivity without requiring explicit grammatical rules. Consequently, Elman’s work had a huge impact on psycholinguistics and psychology. The following quote, from Churchland 1996, illustrates the importance that some researchers attributed to Elman’s work:
The productivity of this network is of course a feeble subset of the vast capacity that any normal English speaker commands. But productivity is productivity, and evidently a recurrent network can possess it. Elman’s striking demonstration hardly settles the issue between the rule-centered approach to grammar and the network approach. That will be some time in working itself out. But the conflict is now an even one. I’ve made no secret where my own bets will be placed. (Churchland 1996, p. 143)5

Although RNNs work well with sequential data, the vanishing gradient problem is particularly severe in these networks. In 1997, Sepp Hochreiter and Jürgen Schmidhuber, the researchers who in 1991 had presented an explanation of the vanishing gradient problem, proposed the long short-term memory (LSTM) units as a solution to this problem in RNNs (Hochreiter and Schmidhuber 1997). The name of these units draws on a distinction between how a neural network encodes long-term memory (understood as concepts that are learned over a period of time) through training and short-term memory (understood as the response of the system to immediate stimuli). In a neural network, long-term memory is encoded through adjusting the weights of the network and once trained these weights do not change. Short-term memory is encoded in a network through the activations that flow through the network and these activation values decay quickly. LSTM units are designed to enable the short-term memory (the activations) in the network to be propagated over long periods of time (or sequences of inputs). The internal structure of an LSTM is relatively complex, and we will describe it in chapter 5. The fact that LSTM can propagate activations over long periods enables them to process sequences that include long-distance dependencies (interactions between elements in a sequence that are separated by two or more positions). For example, the dependency between the subject and the verb in an English sentence: The dog/dogs in that house is/are aggressive. This has made LSTM networks suitable for language processing, and for a number of years they have been the default neural network architecture for many natural language processing models, including machine translation. For example, the sequence-to-sequence (seq2seq) machine translation architecture introduced in 2014 connects two LSTM networks in sequence (Sutskever et al. 2014). The first LSTM network, the encoder, processes the input sequence one input at a time, and generates a distributed representation of that input. The first LSTM network is called an encoder because it encodes the sequence of words into a distributed representation. The second LSTM network, the decoder, is initialized with the distributed representation of the input and is trained to generate the output sequence one element at a time using a feedback loop that feeds the most recent output element generated by the network back in as the input for the next time step. Today, this seq2seq architecture is the basis for most modern machine translation systems, and is explained in more detail in chapter 5.

By the late 1990s, most of the conceptual requirements for deep learning were in place, including both the algorithms to train networks with multiple layers, and the network architectures that are still very popular today (CNNs and RNNs). However, the problem of the vanishing gradients still stifled the creation of deep networks. Also, from a commercial perspective, the 1990s (similar to the 1960s) experienced a wave of hype based on neural networks and unrealized promises. At the same time, a number of breakthroughs in other forms of machine learning models, such as the development of support vector machines (SVMs), redirected the focus of the machine learning research community away from neural networks: at the time SVMs were achieving similar accuracy to neural network models but were easier to train. Together these factors led to a decline in neural network research that lasted up until the emergence of deep learning.

The Era of Deep Learning

The first recorded use of the term deep learning is credited to Rina Dechter (1986), although in Dechter’s paper the term was not used in relation to neural networks; and the first use of the term in relation to neural networks is credited to Aizenberg et al. (2000).6 In the mid-2000s, interest in neural networks started to grow, and it was around this time that the term deep learning came to prominence to describe deep neural networks. The term deep learning is used to emphasize the fact that the networks being trained are much deeper than previous networks.

One of the early successes of this new era of neural network research was when Geoffrey Hinton and his colleagues demonstrated that it was possible to train a deep neural network using a process known as greedy layer-wise pretraining. Greedy layer-wise pretraining begins by training a single layer of neurons that receives input directly from the raw input. There are a number of different ways that this single layer of neurons can be trained, but one popular way is to use an autoencoder. An autoencoder is a neural network with three layers: an input layer, a hidden (encoding) layer, and an output (decoding) layer. The network is trained to reconstruct the inputs it receives in the output layer; in other words, the network is trained to output the exact same values that it received as input. A very important feature in these networks is that they are designed so that it is not possible for the network to simply copy the inputs to the outputs. For example, an autoencoder may have fewer neurons in the hidden layer than in the input and output layer. Because the autoencoder is trying to reconstruct the input at the output layer, the fact that the information from the input must pass through this bottleneck in the hidden layer forces the autoencoder to learn an encoding of the input data in the hidden layer that captures only the most important features in the input, and disregards redundant or superfluous information.7

Layer-Wise Pretraining Using Autoencoders

In layer-wise pretraining, the initial autoencoder learns an encoding for the raw inputs to the network. Once this encoding has been learned, the units in the hidden encoding layer are fixed, and the output (decoding) layer is thrown away. Then a second autoencoder is trained—but this autoencoder is trained to reconstruct the representation of the data generated by passing it through the encoding layer of the initial autoencoder. In effect, this second autoencoder is stacked on top of the encoding layer of the first autoencoder. This stacking of encoding layers is considered to be a greedy process because each encoding layer is optimized independently of the later layers; in other words, each autoencoder focuses on finding the best solution for its immediate task (learning a useful encoding for the data it must reconstruct) rather than trying to find a solution to the overall problem for the network.

Once a sufficient number8 of encoding layers have been trained, a tuning phase can be applied. In the tuning phase, a final network layer is trained to predict the target output for the network. Unlike the pretraining of the earlier layers of the network, the target output for the final layer is different from the input vector and is specified in the training dataset. The simplest tuning is where the pretrained layers are kept frozen (i.e., the weights in the pretrained layers don’t change during the tuning); however, it is also feasible to train the entire network during the tuning phase. If the entire network is trained during tuning, then the layer-wise pretraining is best understood as finding useful initial weights for the earlier layers in the network. Also, it is not necessary that the final prediction model that is trained during tuning be a neural network. It is quite possible to take the representations of the data generated by the layer-wise pretraining and use it as the input representation for a completely different type of machine learning algorithm, for example, a support vector machine or a nearest neighbor algorithm. This scenario is a very transparent example of how neural networks learn useful representations of data prior to the final prediction task being learned. Strictly speaking, the term pretraining describes only the layer-wise training of the autoencoders; however, the term is often used to refer to both the layer-wise training stage and the tuning stage of the model.

Figure 4.5 shows the stages in layer-wise pretraining. The figure on the left illustrates the training of the initial autoencoder where an encoding layer (the black circles) of three units is attempting to learn a useful representation for the task of reconstructing an input vector of length 4. The figure in the middle of figure 4.5 shows the training of a second autoencoder stacked on top of the encoding layer of the first autoencoder. In this autoencoder, a hidden layer of two units is attempting to learn an encoding for an input vector of length 3 (which in turn is an encoding of a vector of length 4). The grey background in each figure demarcates the components in the network that are frozen during this training stage. The figure on the right shows the tuning phase where a final output layer is trained to predict the target feature for the model. For this example, in the tuning phase the pretrained layers in the network have been frozen.

Figure 4.5 The pretraining and tuning stages in greedy layer-wise pretraining. Black circles represent the neurons whose training is the primary objective at each training stage. The gray background marks the components in the network that are frozen during each training stage.

Layer-wise pretraining was important in the evolution of deep learning because it was the first approach to training deep networks that was widely adopted.9 However, today most deep learning networks are trained without using layer-wise pretraining. In the mid-2000s, researchers began to appreciate that the vanishing gradient problem was not a strict theoretical limit, but was instead a practical obstacle that could be overcome. The vanishing gradient problem does not cause the error gradients to disappear entirely; there are still gradients being backpropagated through the early layers of the network, it is just that they are very small. Today, there are a number of factors that have been identified as important in successfully training a deep network.

In the mid-2000s, researchers began to appreciate that the vanishing gradient problem was not a strict theoretical limit, but was instead a practical obstacle that could be overcome.

Weight Initialization and ReLU Activation Functions

One factor that is important in successfully training a deep network is how the network weights are initialized. The principles controlling how weight initialization affects the training of a network are still not clear. There are, however, weight initialization procedures that have been empirically shown to help with training a deep network. Glorot initialization10 is a frequently used weight initialization procedure for deep networks. It is based on a number of assumptions but has empirical success to support its use. To get an intuitive understanding of Glorot initialization, consider the fact that there is typically a relationship between the magnitude of values in a set and the variance of the set: generally the larger the values in a set, the larger the variance of the set. So, if the variance calculated on a set of gradients propagated through a layer at one point in the network is similar to the variance for the set of gradients propagated through another layer in a network, it is likely that the magnitude of the gradients propagated through both of these layers will also be similar. Furthermore, the variance of gradients in a layer can be related to the variance of the weights in the layer, so a potential strategy to maintain gradients flowing through a network is to ensure similar variances across each of the layer in a network. Glorot initialization is designed to initialize the weight in a network in such a way that all of the layers in a network will have a similar variance in terms of both forward pass activations and the gradients propagated during the backward pass in backpropagation. Glorot initialization defines a heuristic rule to meet this goal that involves sampling the weights for a network using the following uniform distribution (where w is the weight on a connection between layer j and j+i that is being initialized, U[-a,a] is the uniform distribution over the interval (-a,a), is the number of neurons in layer , and the notation w ~ U indicates that the value of w is sampled from distribution U)^11:

Another factor that contributes to the success or failure of training a deep network is the selection of the activation function used in the neurons. Backpropagating an error gradient through a neuron involves multiplying the gradient by the value of the derivative of the activation function at the activation value of the neuron recorded during the forward pass. The derivatives of the logistic and tanh activation functions have a number of properties that can exacerbate the vanishing gradient problem if they are used in this multiplication step. Figure 4.6 presents a plot of the logistic function and the derivative of the logistic function. The maximum value of the derivative is 0.25. Consequently, after an error gradient has been multiplied by the value of the derivative of the logistic function at the appropriate activation for the neuron, the maximum value the gradient will have is a quarter of the gradient prior to the multiplication. Another problem with using the logistic function is that there are large portions of the domain of the function where the function is saturated (returning values that very close to 0 or 1), and the rate of change of the function in these regions is near zero; thus, the derivative of the function is near 0. This is an undesirable property when backpropagating error gradients because the error gradients will be forced to zero (or close to zero) when backpropagated through any neuron whose activation is within one of these saturated regions. In 2011 it was shown that switching to a rectified linear activation function, , improved training for deep feedforward neural networks (Glorot et al. 2011). Neurons that use a rectified linear activation function are known as rectified linear units (ReLUs). One advantage of ReLUs is that the activation function is linear for the positive portion of its domain with a derivative equal to 1. This means that gradients can flow easily through ReLUs that have positive activation. However, the drawback of ReLUs is that the gradient of the function for the negative part of its domain is zero, so ReLUs do not train in this portion of the domain. Although undesirable, this is not necessarily a fatal flaw for learning because when backpropagating through a layer of ReLUs the gradients can still flow through the ReLUs in the layers that have positive activation. Furthermore, there are a number of variants of the basic ReLU that introduce a gradient on the negative side of the domain, a commonly used variant being the leaky ReLU (Maas et al. 2013). Today, ReLUs (or variants of ReLUs) are the most frequently used neurons in deep learning research.

Figure 4.6 Plots of the logistic function and the derivative of the logistic function.

The Virtuous Cycle: Better Algorithms, Faster Hardware, Bigger Data

Although improved weight initialization methods and new activation functions have both contributed to the growth of deep learning, in recent years the two most important factors driving deep learning have been the speedup in computer power and the massive increase in dataset sizes. From a computational perspective, a major breakthrough for deep learning occurred in the late 2000s with the adoption of graphical processing units (GPUs) by the deep learning community to speed up training. A neural network can be understood as a sequence of matrix multiplications that are interspersed with the application of nonlinear activation functions, and GPUs are optimized for very fast matrix multiplication. Consequently, GPUs are ideal hardware to speed up neural network training, and their use has made a significant contribution to the development of the field. In 2004, Oh and Jung reported a twentyfold performance increase using a GPU implementation of a neural network (Oh and Jung 2004), and the following year two further papers were published that demonstrated the potential of GPUs to speed up the training of neural networks: Steinkraus et al. (2005) used GPUs to train a two-layer neural network, and Chellapilla et al. (2006) used GPUs to train a CNN. However, at that time there were significant programming challenges to using GPUs for training networks (the training algorithm had to be implemented as a sequence of graphics operations), and so the initial adoption of GPUs by neural network researchers was relatively slow. These programming challenges were significantly reduced in 2007 when NVIDIA (a GPU manufacturer) released a C-like programming interface for GPUs called CUDA (compute unified device architecture).12 CUDA was specifically designed to facilitate the use of GPUs for general computing tasks. In the years following the release of CUDA, the use of GPUs to speed up neural network training became standard.

However, even with these more powerful computer processors, deep learning would not have been possible unless massive datasets had also become available. The development of the internet and social media platforms, the proliferation of smartphones and “internet of things” sensors, has meant that the amount of data being captured has grown at an incredible rate over the last ten years. This has made it much easier for organizations to gather large datasets. This growth in data has been incredibly important to deep learning because neural network models scale well with larger data (and in fact they can struggle with smaller datasets). It has also prompted organizations to consider how this data can be used to drive the development of new applications and innovations. This in turn has driven a need for new (more complex) computational models in order to deliver these new applications. And, the combination of large data and more complex algorithms requires faster hardware in order to make the necessary computational workload tractable. Figure 4.7 illustrates the virtuous cycle between big data, algorithmic breakthroughs (e.g., better weight initialization, ReLUs, etc.), and improved hardware that is driving the deep learning revolution.

Figure 4.7 The virtuous cycle driving deep learning. Figure inspired by figure 1.2 in Reagen et al. 2017.

Summary

The history of deep learning reveals a number of underlying themes. There has been a shift from simple binary inputs to more complex continuous valued input. This trend toward more complex inputs is set to continue because deep learning models are most useful in high-dimensional domains, such as image processing and language. Images often have thousands of pixels in them, and language processing requires the ability represents and process hundreds of thousands of different words. This is why some of the best-known applications of deep learning are in these domains, for example, Facebook’s face-recognition software, and Google’s neural machine translation system. However, there are a growing number of new domains where large and complex digital datasets are being gathered. One area where deep learning has the potential to make a significant impact within the coming years is healthcare, and another complex domain is the sensor rich field of self-driving cars.

Somewhat surprisingly, at the core of these powerful models are simple information processing units: neurons. The connectionist idea that useful complex behavior can emerge from the interactions between large numbers of simple processing units is still valid today. This emergent behavior arises through the sequences of layers in a network learning a hierarchical abstraction of increasingly complex features. This hierarchical abstraction is achieved by each neuron learning a simple transformation of the input it receives. The network as a whole then composes these sequences of smaller transformations in order to apply a complex (highly) nonlinear mapping to the input. The output from the model is then generated by the final output layers of neuron, based the learned representation generated through the hierarchical abstraction. This is why depth is such an important factor in neural networks: the deeper the network, the more powerful the model becomes in terms of its ability to learn complex nonlinear mappings. In many domains, the relationship between input data and desired outputs involves just such complex nonlinear mappings, and it is in these domains that deep learning models outdo other machine learning approaches.

An important design choice in creating a neural network is deciding which activation function to use within the neurons in a network. The activation function within each neuron in a network is how nonlinearity is introduced into the network, and as a result it is a necessary component if the network is to learn a nonlinear mapping from inputs to output. As networks have evolved, so too have the activation functions used in them. New activation functions have emerged throughout the history of deep learning, often driven by the need for functions with better properties for error-gradient propagation: a major factor in the shift from threshold to logistic and tanh activation functions was the need for differentiable functions in order to apply backpropagation; the more recent shift to ReLUs was, similarly, driven by the need to improve the flow of error gradients through the network. Research on activations functions is ongoing, and new functions will be developed and adopted in the coming years.

Another important design choice in creating a neural network is to decide on the structure of the network: for example, how should the neurons in the network be connected together? In the next chapter, we will discuss two very different answers to this question: convolution neural networks and recurrent neural networks.

5 Convolutional and Recurrent Neural Networks

Tailoring the structure of a network to the specific characteristics of the data from a task domain can reduce the training time of the network, and improves the accuracy of the network. Tailoring can be done in a number of ways, such as: constraining the connections between neurons in adjacent layers to subsets (rather than having fully connected layers); forcing neurons to share weights; or introducing backward connections into the network. Tailoring in these ways can be understood as building domain knowledge into the network. Another, related, perspective is it helps the network to learn by constraining the set of possible functions that it can learn, and by so doing guides the network to find a useful solution. It is not always clear how to fit a network structure to a domain, but for some domains where the data has a very regular structure (e.g., sequential data such as text, or gridlike data such as images) there are well-known network architectures that have proved successful. This chapter will introduce two of the most popular deep learning architectures: convolutional neural networks and recurrent neural networks.

Convolutional Neural Networks

Convolution neural networks (CNNs) were designed for image recognition tasks and were originally applied to the challenge of handwritten digit recognition (Fukushima 1980; LeCun 1989). The basic design goal of CNNs was to create a network where the neurons in the early layer of the network would extract local visual features, and neurons in later layers would combine these features to form higher-order features. A local visual feature is a feature whose extent is limited to a small patch, a set of neighboring pixels, in an image. For example, when applied to the task of face recognition, the neurons in the early layers of a CNN learn to activate in response to simple local features (such as lines at a particular angle, or segments of curves), neurons deeper in the network combine these low-level features into features that represent body parts (such as eyes or noises), and the neurons in the final layers of the network combine body part activations in order to be able to identify whole faces in an image.

Using this approach, the fundamental task in image recognition is learning the feature detection functions that can robustly identify the presence, or absence, of local visual features in an image. The process of learning functions is at the core of neural networks, and is achieved by learning the appropriate set of weights for the connections in the network. CNNs learn the feature detection functions for local visual features in this way. However, a related challenge is designing the architecture of the network so that the network will identify the presence of a local visual feature in an image irrespective of where in the image it occurs. In other words, the feature detection functions must be able to work in a translation invariant manner. For example, a face recognition system should be able to recognize the shape of an eye in an image whether the eye is in the center of the image or in the top-right corner of the image. This need for translation invariance has been a primary design principle of CNNs for image processing, as Yann LeCun stated in 1989:
It seems useful to have a set of feature detectors that can detect a particular instance of a feature anywhere on the input plane. Since the precise location of a feature is not relevant to the classification, we can afford to lose some position information in the process. (LeCun 1989, p. 14)

CNNs achieve this translation invariance of local visual feature detection by using weight sharing between neurons. In an image recognition setting, the function implemented by a neuron can be understood as a visual feature detector. For example, neurons in the first hidden layer of the network will receive a set of pixel values as input and output a high activation if a particular pattern (local visual feature) is present in this set of pixels. The fact that the function implemented by a neuron is defined by the weights the neuron uses means that if two neurons use the same set of weights then they both implement the same function (feature detector). In chapter 4, we introduced the concept of a receptive field to describe the area that a neuron receives its input from. If two neurons share the same weights but have different receptive fields (i.e., each neuron inspects different areas of the input), then together the neurons act as a feature detector that activates if the feature occurs in either of the receptive fields. Consequently, it is possible to design a network with translation invariant feature detection by creating a set of neurons that share the same weights and that are organized so that: (1) each neuron inspects a different portion of the image; and (2) together the receptive fields of the neurons cover the entire image.

The scenario of searching an image in a dark room with a flashlight that has a narrow beam is sometimes used to explain how a CNN searches an image for local features. At each moment you can point the flashlight at a region of the image and inspect that local region. In this flashlight metaphor, the area of the image illuminated by the flashlight at any moment is equivalent to the receptive field of a single neuron, and so pointing the flashlight at a location is equivalent to applying the feature detection function to that local region. If, however, you want to be sure you inspect the whole image, then you might decide to be more systematic in how you direct the flashlight. For example, you might begin by pointing the flashlight at the top-left corner of the image and inspecting that region. You then move the flashlight to the right, across the image, inspecting each new location as it becomes visible, until you reach the right side of the image. You then point the flashlight back to the left of the image, but just below where you began, and move across the image again. You repeat this process until you reach the bottom-right corner of the image. The process of sequentially searching across an image and at each location in the search applying the same function to the local (illuminated) region is the essence of convolving a function across an image. Within a CNN, this sequential search across an image is implemented using a set of neurons that share weights and whose union of receptive fields covers the entire image.

Figure 5.1 illustrates the different stages of processing that are often found in a CNN. Thematrix on the left of the figure represents the image that is the input to the CNN. Thematrix immediately to the right of the input represents a layer of neurons that together search the entire image for the presence of a particular local feature. Each neuron in this layer is connected to a differentreceptive field (area) in the image, and they all apply the same weight matrix to their inputs:

The receptive field of the neuron(top-left) in this layer is marked with the gray square covering thearea in the top-left of the input image. The dotted arrows emerging from each of the locations in this gray area represent the inputs to neuron. The receptive field of the neighboring neuronis indicated bysquare, outlined in bold in the input image. Notice that the receptive fields of these two neurons overlap. The amount of overlap of receptive fields is controlled by a hyperparameter called the stride length. In this instance, the stride length is one, meaning that for each position moved in the layer the receptive field of the neuron is translated by the same amount on the input. If the stride length hyperparameter is increased, the amount of overlap between receptive fields is decreased.

The receptive fields of both of these neurons (and) are matrices of pixel values and the weights used by these neurons are also matrices. In computer vision, the matrix of weights applied to an input is known as the kernel (or convolution mask); the operation of sequentially passing a kernel across an image and within each local region, weighting each input and adding the result to its local neighbors, is known as a convolution. Notice that a convolution operation does not include a nonlinear activation function (this is applied at a later stage in processing). The kernel defines the feature detection function that all the neurons in the convolution implement. Convolving a kernel across an image is equivalent to passing a local visual feature detector across the image and recording all the locations in the image where the visual feature was present. The output from this process is a map of all the locations in the image where the relevant visual feature occurred. For this reason, the output of a convolution process is sometimes known as a feature map. As noted above, the convolution operation does not include a nonlinear activation function (it only involves a weighted summation of the inputs). Consequently, it is standard to apply a nonlinearity operation to a feature map. Frequently, this is done by applying a rectified linear function to each position in a feature map; the rectified linear activation function is defined as:. Passing a rectified linear activation function over a feature map simply changes all negative values to 0. In figure 5.1, the process of updating a feature map by applying a rectified linear activation function to each of its elements is represented by the layer labeled Nonlinearity.

The quote from Yann LeCun, at the start of this section, mentions that the precise location of a feature in an image may not be relevant to an image processing task. With this in mind, CNNs often discard location information in favor of generalizing the network’s ability to do image classification. Typically, this is achieved by down-sampling the updated feature map using a pooling layer. In some ways pooling is similar to the convolution operation described above, in so far as pooling involves repeatedly applying the same function across an input space. For pooling, the input space is frequently a feature map whose elements have been updated using a rectified linear function. Furthermore, each pooling operation has a receptive field on the input space—although, for pooling, the receptive fields sometimes do not overlap. There are a number of different pooling functions used; the most common is called max pooling, which returns the maximum value of any of its inputs. Calculating the average value of the inputs is also used as a pooling function.

Convolving a kernel across an image is equivalent to passing a local visual feature detector across the image and recording all the locations in the image where the visual feature was present.

The operation sequence of applying a convolution, followed by a nonlinearity, to the feature map, and then down-sampling using pooling, is relatively standard across most CNNs. Often these three operations are together considered to define a convolutional layer in a network, and this is how they are presented in figure 5.1.

The fact that a convolution searches an entire image means that if the visual feature (pixel pattern) that the function (defined by shared kernel) detects occurs anywhere in the image, its presence will be recorded in the feature map (and if pooling is used, also in the subsequent output from the pooling layer). In this way, a CNN supports translation invariant visual feature detection. However, this has the limitation that the convolution can only identify a single type of feature. CNNs generalize beyond one feature by training multiple convolutional layers in parallel (or filters), with each filter learning a single kernel matrix (feature detection function). Note the convolution layer in figure 5.1 illustrates a single filter. The outputs of multiple filters can be integrated in a variety of ways. One way to integrate information from different filters is to take the feature maps generated by the separate filters and combine them into a single multifilter feature map. A subsequent convolutional layer then takes this multifilter feature map as input. Another other way to integrate information from different filter is to use a densely connected layer of neurons. The final layer in figure 5.1 illustrates a dense layer. This dense layer operates in exactly the same way as a standard layer in a fully connected feedforward network. Each neuron in the dense layer is connected to all of the elements output by each of the filters, and each neuron learns a set of weights unique to itself that it applies to the inputs. This means that each neuron in a dense layer can learn a different way to integrate information from across the different filters.

Figure 5.1 Illustrations of the different stages of processing in a convolutional layer. Note in this figure the Image and Feature Map are data structures; the other stages represent operations on data.

The AlexNet CNN, which won the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2012, had five convolutional layers, followed by three dense layers. The first convolutional layer had ninety-six different kernels (or filters) and included a ReLU nonlinearity and pooling. The second convolution layer had 256 kernels and also included ReLU nonlinearity and pooling. The third, fourth, and fifth convolutional layers did not include a nonlinearity step or pooling, and had 384, 384, and 256 kernels, respectively. Following the fifth convolutional layer, the network had three dense layers with 4096 neurons each. In total, AlexNet had sixty million weights and 650,000 neurons. Although sixty million weights is a large number, the fact that many of the neurons shared weights actually reduced the number of weights in the network. This reduction in the number of required weights is one of the advantages of CNN networks. In 2015, Microsoft Research developed a CNN network called ResNet, which won the ILSVRC 2015 challenge (He et al. 2016). The ResNet architecture extended the standard CNN architecture using skip-connections. A skip-connection takes the output from one layer in the network and feeds it directly into a layer that may be much deeper in the network. Using skip-connections it is possible to train very deep networks. In fact, the ResNet model developed by Microsoft Research had a depth of 152 layers.

Recurrent Neural Networks

Recurrent neural networks (RNNs) are tailored to the processing of sequential data. An RNN processes a sequence of data by processing each element in the sequence one at time. An RNN network only has a single hidden layer, but it also has a memory buffer that stores the output of this hidden layer for one input and feeds it back into the hidden layer along with the next input from the sequence. This recurrent flow of information means that the network processes each input within the context generated by processing the previous input, which in turn was processed in the context of the input preceding it. In this way, the information that flows through the recurrent loop encodes contextual information from (potentially) all of the preceding inputs in the sequence. This allows the network to maintain a memory of what it has seen previously in the sequence to help it decide what to do with the current input. The depth of an RNN arises from the fact that the memory vector is propagated forward and evolved through each input in the sequence; as a result an RNN network is considered as deep as a sequence is long.

The depth of an RNN arises from the fact that the memory vector is propagated forward and evolved through each input in the sequence; as a result an RNN network is considered as deep as a sequence is long.

Figure 5.2 illustrates the architecture of an RNN and shows how information flows through the network as it processes a sequence. At each time step, the network in this figure receives a vector containing two elements as input. The schematic on the left of figure 5.2 (time step=1.0) shows the flow of information in the network when it receives the first input in the sequence. This input vector is fed forward into the three neurons in the hidden layer of the network. At the same time these neurons also receive whatever information is stored in the memory buffer. Because this is the initial input, the memory buffer will only contain default initialization values. Each of the neurons in the hidden layer will process the input and generate an activation. The schematic in the middle of figure 5.2 (time step=1.5) shows how this activation flows on through the network: the activation of each neuron is passed to the output layer where it is processed to generate the output of the network, and it is also stored in the memory buffer (overwriting whatever information was stored there). The elements of the memory buffer simply store the information written to them; they do not transform it in any way. As a result, there are no weights on the edges going from the hidden units to the buffer. There are, however, weights on all the other edges in the network, including those from the memory buffer units to the neurons in the hidden layer. At time step 2, the network receives the next input from the sequence, and this is passed to the hidden layer neurons along with the information stored in the buffer. This time the buffer contains the activations that were generated by the hidden neurons in response to the first input.

Figure 5.2 The flow of information in an RNN as it processes a sequence of inputs. The arrows in bold are the active paths of information flow at each time point; the dashed arrows show connections that are not active at that time.

Figure 5.3 shows an RNN that has been unrolled through time as it processes a sequence of inputs. Each box in this figure represents a layer of neurons. The box labeledrepresents the state of the memory buffer when the network is initialized; the boxes labeledrepresent the hidden layer of the network at each time step; and the boxes labeledrepresent the output layer of the network at each time step. Each of the arrows in the figure represents a set of connections between one layer and another layer. For example, the vertical arrow fromtorepresents the connections between the input layer and the hidden layer at time step 1. Similarly, the horizontal arrows connecting the hidden layers represent the storing of the activations from a hidden state at one time step in the memory buffer (not shown) and the propagation of these activations to the hidden layer at the next time step through the connections from the memory buffer to the hidden state. At each time step, an input from the sequence is presented to the network and is fed forward to the hidden layer. The hidden layer generates a vector of activations that is passed to the output layer and is also propagated forward to the next time step along the horizontal arrows connecting the hidden states.

Figure 5.3 An RNN network unrolled through time as it processes a sequence of inputs [x₁,x₂,……,x_t]

Although RNNs can process a sequence of inputs, they struggle with the problem of vanishing gradients. This is because training an RNN to process a sequence of inputs requires the error to be backpropagated through the entire length of the sequence. For example, for the network in figure 5.3, the error calculated on the outputmust be backpropagated through the entire network so that it can be used to update the weights on the connections fromandto. This entails backpropagating the error through all the hidden layers, which in turn involves repeatedly multiplying the error by the weights on the connections feeding activations from one hidden layer forward to the next hidden layer. A particular problem with this process is that it is the same set of weights that are used on all the connections between the hidden layers: each horizontal arrow represents the same set of connections between the memory buffer and the hidden layer, and the weights on these connections are stationary through time (i.e., they don’t change from one time step to the next during the processing of a given sequence of inputs). Consequently, backpropogating an error through k time steps involves (among other multiplications) multiplying the error gradient by the same set of weights k times. This is equivalent to multiplying each error gradient by a weight raised to the power of k. If this weight is less than 1, then when it is raised to a power, it diminishes at an exponential rate, and consequently, the error gradient also tends to diminish at an exponential rate with respect to the length of the sequence—and vanish.

Long short-term memory networks (LSTMs) are designed to reduce the effect of vanishing gradients by removing the repeated multiplication by the same weight vector during backpropagation in an RNN. At the core of an LSTM1 unit is a component called the cell. The cell is where the activation (the short-term memory) is stored and propagated forward. In fact, the cell often maintains a vector of activations. The propagation of the activations within the cell through time is controlled by three components called gates: the forget gate, the input gate, and the output gate. The forget gate is responsible for determining which activations in the cell should be forgotten at each time step, the input gate controls how the activations in the cell should be updated in response to the new input, and the output gate controls what activations should be used to generate the output in response to the current input. Each of the gates consists of layers of standard neurons, with one neuron in the layer per activation in the cell state.

Figure 5.4 illustrates the internal structure of an LSTM cell. Each of the arrows in this image represents a vector of activations. The cell runs along the top of the figure from left () to right (). Activations in the cell can take values in the range -1 to +1. Stepping through the processing for a single input, the input vectoris first concatenated with the hidden state vector that has been propagated forward from the preceding time step. Working from left to right through the processing of the gates, the forget gate takes the concatenation of the input and the hidden state and passes this vector through a layer of neurons that use a sigmoid (also known as logistic)2 activation function. As a result of the neurons in the forget layer using sigmoid activation functions the output of this forget layer is a vector of values in the range 0 to 1. The cell state is then multiplied by this forget vector. The result of this multiplication is that activations in the cell state that are multiplied by components in the forget vector with values near 0 are forgotten, and activations that are multiplied by forget vector components with values near 1 are remembered. In effect, multiplying the cell state by the output of a sigmoid layer acts as a filter on the cell state.

Next, the input gate decides what information should be added to the cell state. The processing in this step is done by the components in the middle block of figure 5.4, marked Input. This processing is broken down into two subparts. First, the gate decides which elements in the cell state should be updated, and second it decides what information should be included in the update. The decision regarding which elements in the cell state should be updated is implemented using a similar filter mechanism to the forget gate: the concatenated inputplus hidden stateis passed through a layer of sigmoid units to generate a vector of elements, the same width as the cell, where each element in the vector is in the range 0 to 1; values near 0 indicate that the corresponding cell element will not be updated, and values near 1 indicate that the corresponding cell element will be updated. At the same time that the filter vector is generated, the concatenated input and hidden state are also passed through a layer of tanh units (i.e., neurons that use the tanh activation function). Again, there is one tanh unit for each activation in the LSTM cell. This vector represents the information that may be added to the cell state. Tanh units are used to generate this update vector because tanh units output values in the range -1 to +1, and consequently the value of the activations in the cell elements can be both increased and decreased by an update.3 Once these two vectors have been generated, the final update vector is calculated by multiplying the vector output from the tanh layer by the filter vector generated from the sigmoid layer. The resulting vector is then added to the cell using vector addition.

Figure 5.4 Schematic of the internal structure of an LSTM unit: σ represents a layer of neurons with sigmoid activations, T represents a layer of neurons with tanh activations, × represents vector multiplication, and + represents vector addition. The figure is inspired by an image by Christopher Olah available at: http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

The final stage of processing in an LSTM is to decide which elements of the cell should be output in response to the current input. This processing is done by the components in the block marked Output (on the right of figure 5.4). A candidate output vector is generated by passing the cell through a tanh layer. At the same time, the concatenated input and propagated hidden state vector are passed through a layer of sigmoid units to create another filter vector. The actual output vector is then calculated by multiplying the candidate output vector by this filter vector. The resulting vector is then passed to the output layer, and is also propagated forward to the next time step as the new hidden state.

The fact that an LSTM unit contains multiple layers of neurons means that an LSTM is a network in itself. However, an RNN can be constructed by treating an LSTM as the hidden layer in the RNN. In this configuration, an LSTM unit receives an input at each time step and generates an output for each input. RNNs that use LSTM units are often known as LSTM networks.

LSTM networks are ideally suited for natural language processing (NLP). A key challenge in using a neural network to do natural language processing is that the words in language must be converted into vectors of numbers. The word2vec models, created by Tomas Mikolov and colleagues at Google research, are one of the most popular ways of doing this conversion (Mikolov et al. 2013). The word2vec models are based on the idea that words that appear in similar contexts have similar meanings. The definition of context here is surrounding words. So for example, the words London and Paris are semantically similar because each of them often co-occur with words that the other word also co-occurs with, such as: capital, city, Europe, holiday, airport, and so on. The word2vec models are neural networks that implement this idea of semantic similarity by initially assigning random vectors to each word and then using co-occurrences within a corpus to iteratively update these vectors so that semantically similar words end up with similar vectors. These vectors (known as word embeddings) are then used to represent a word when it is being input to a neural network.

One of the areas of NLP where deep learning has had a major impact is in machine translation. Figure 5.5 presents a high-level schematic of the seq2seq (or encoder-decoder) architecture for neural machine translation (Sutskever et al. 2014). This architecture is composed of two LSTM networks that have been joined together. The first LSTM network processes the input sentence in a word-by-word fashion. In this example, the source language is French. The words are entered into the system in reverse order as it has been found that this leads to better translations. The symbolis a special end of sentence symbol. As each word is entered, the encoder updates the hidden state and propagates it forward to the next time step. The hidden state generated by the encoder in response to thesymbol is taken to be a vector representation of the input sentence. This vector is passed as the initial input to the decoder LSTM. The decoder is trained to output the translation sentence word by word, and after each word has been generated, this word is fed back into the system as the input for the next time step. In a way, the decoder is hallucinating the translation because it uses its own output to drive its own generation process. This process continues until the decoder outputs an

symbol.

Figure 5.5 Schematic of the seq2seq (or encoder-decoder) architecture.

The idea of using a vector of numbers to represent the (interlingual) meaning of a sentence is very powerful, and this concept has been extended to the idea of using vectors to represent intermodal/multimodal representations. For example, an exciting development in recent years has been the development of automatic image captioning systems. These systems can take an image as input and generate a natural language description of the image. The basic structure of these systems is very similar to the neural machine translation architecture shown in figure 5.5. The main difference is that the encoder LSTM network is replaced by a CNN architecture that processes the input image and generates a vector representation that is then propagated to the decoder LSTM (Xu et al. 2015). This is another example of the power of deep learning arising from its ability to learn complex representations of information. In this instance, the system learns intermodal representations that enable information to flow from what is in an image to language. Combining CNN and RNN architectures is becoming more and more popular because it offers the potential to integrate the advantages of both systems and enables deep learning architectures to handle very complex data.

Irrespective of the network architecture we use, we need to find the correct weights for the network if we wish to create an accurate model. The weights of a neuron determine the transformation the neuron applies to its inputs. So, it is the weights of the network that define the fundamental building blocks of the representation the network learns. Today the standard method for finding these weights is an algorithm that came to prominence in the 1980s: backpropagation. The next chapter will present a comprehensive introduction to this algorithm.

6 Learning Functions

A neural network model, no matter how deep or complex, implements a function, a mapping from inputs to outputs. The function implemented by a network is determined by the weights the network uses. So, training a network (learning the function the network should implement) on data involves searching for the set of weights that best enable the network to model the patterns in the data. The most commonly used algorithm for learning patterns from data is the gradient descent algorithm. The gradient descent algorithm is very like the perceptron learning rule and the LMS algorithm described in chapter 4: it defines a rule to update the weights used in a function based on the error of the function. By itself the gradient descent algorithm can be used to train a single output neuron. However, it cannot be used to train a deep network with multiple hidden layers. This limitation is because of the credit assignment problem: how should the blame for the overall error of a network be shared out among the different neurons (including the hidden neurons) in the network? Consequently, training a deep neural network involves using both the gradient descent algorithm and the backpropagation algorithm in tandem.

The process used to train a deep neural network can be characterized as: randomly initializing the weight of a network, and then iteratively updating the weights of the network, in response to the errors the network makes on a dataset, until the network is working as expected. Within this training framework, the backpropagation algorithm solves the credit (or blame) assignment problem, and the gradient descent algorithm defines the learning rule that actually updates the weights in the network.

This chapter is the most mathematical chapter in the book. However, at a high level, all you need to know about the backpropagation algorithm and the gradient descent algorithm is that they can be used to train deep networks. So, if you don’t have the time to work through the details in this chapter, feel free to skim through it. If, however, you wish to get a deeper understanding of these two algorithms, then I encourage you to engage with the material. These algorithms are at the core of deep learning and understanding how they work is, possibly, the most direct way of understanding its potentials and limitations. I have attempted to present the material in this chapter in an accessible way, so if you are looking for a relatively gentle but still comprehensive introduction to these algorithms, then I believe that this will provide it for you. The chapter begins by explaining the gradient descent algorithm, and then explains how gradient descent can be used in conjunction with the backpropagation algorithm to train a neural network.

Gradient Descent

A very simple type of function is a linear mapping from a single input to a single output. Table 6.1 presents a dataset with a single input feature and a single output. Figure 6.1 presents a scatterplot of this data along with a plot of the line that best fits this data. This line can be used as a function to map from an input value to a prediction of the output value. For example, if x = 0.9, then the response returned by this linear function is y = 0.6746. The error (or loss) of using this line as a model for the data is shown by the dashed lines from the line to each datum.

Table 6.1. A sample dataset with one input feature, x, and an output (target) feature, y

X	Y
0.72	0.54
0.45	0.56
0.23	0.38
0.76	0.57
0.14	0.17

Figure 6.1 Scatterplot of data with “best fit” line and the errors of the line on each example plotted as vertical dashed line segments. The figure also shows the mapping defined by the line for input x=0.9 to output y=0.6746.

In chapter 2, we described how a linear function can be represented using the equation of a line:

whereis the slope of the line, andis the y-intercept, which specifies where the line crosses the y-axis. For the line in figure 6.1,and; this is why the function returns the valuewhen, as in the following:

The slopeand the y-interceptare the parameters of this model, and these parameters can be varied to fit the model to the data.

The equation of a line has a close relationship with the weighted sum operation used in a neuron. This becomes apparent if we rewrite the equation of a line with model parameters rewritten as weights (:

Different lines (different linear models for the data) can be created by varying either of these weights (or model parameters). Figure 6.2 illustrates how a line changes as the intercept and slope of the line varies: the dashed line illustrates what happens if the y-intercept is increased, and the dotted line shows what happens if the slope is decreased. Changing the y-interceptvertically translates the line, whereas modifying the sloperotates the line around the point.

Each of these new lines defines a different function, mapping from to, and each function will have a different error with respect to how well it matches the data. Looking at figure 6.2, we can see that the full line, , fits the data better than the other two lines because on average it passes closer to the data points. In other words, on average the error for this line for each data point is less than those of the other two lines. The total error of a model on a dataset can be measured by summing together the error the model makes on each example in the dataset. The standard way to calculate this total error is to use an equation known as the sum of squared errors (SSE):

Figure 6.2 Plot illustrating how a line changes as the intercept (w₀) and slope (w₁) are varied.

This equation tells us how to add together the errors of a model on a dataset containing n examples. This equation calculates for each of the examples in the dataset the error of the model by subtracting the prediction of the target value returned by the model from the correct target value for that example, as specified in the dataset. In this equation is the correct output value for target feature listed in the dataset for example j, and is the estimate of the target value returned by the model for the same example. Each of these errors is then squared and these squared errors are then summed. Squaring the errors ensures that they are all positive, and therefore in the summation the errors for examples where the function underestimated the target do not cancel out the errors on examples where it overestimated the target. The multiplication of the summation of the errors by , although not important for the current discussion, will become useful later. The lower the SSE of a function, the better the function models the data. Consequently, the sum of squared errors can be used as a fitness function to evaluate how well a candidate function (in this situation a model instantiating a line) matches the data.

Figure 6.3 shows how the error of a linear model varies as the parameters of the model change. These plots show the SSE of a linear model on the example single-input–single-output dataset listed in table 6.1. For each parameter there is a single best setting and as the parameter moves away from this setting (in either direction) the error of the model increases. A consequence of this is that the error profile of the model as each parameter varies is convex (bowl-shaped). This convex shape is particularly apparent in the top and middle plots in figure 6.3, which show that the SSE of the model is minimized when (lowest point of the curve in the top plot), and when (lowest point of the curve in the middle plot).

Figure 6.3 Plots of the changes in the error (SSE) of a linear model as the parameters of the model change. Top: the SSE profile of a linear model with a fixed slope w₁=0.524 when w₀ ranges across the interval 0.3 to 1. Middle: the SSE profile of a linear model with a y-intercept fixed at w₀=0.203 when w₁ ranges across the interval 0 to 1. Bottom: the error surface of the linear model when both w₀ and w₁ are varied.

If we plot the error of the model as both parameters are varied, we generate a three-dimensional convex bowl-shaped surface, known as an error surface. The bowl-shaped mesh in the plot at the bottom of figure 6.3 illustrates this error surface. This error surface was created by first defining a weight space. This weight space is represented by the flat grid at the bottom of the plot. Each coordinate in this weight space defines a different line because each coordinate specifies an intercept (a value) and slope (a value). Consequently, moving across this planar weight space is equivalent to moving between different models. The second step in constructing the error surface is to associate an elevation with each line (i.e., coordinate) in the weight space. The elevation associated with each weight space coordinate is the SSE of the model defined by that coordinate; or, put more directly, the height of the error surface above the weight space plane is the SSE of the corresponding linear model when it is used as a model for the dataset. The weight space coordinates that correspond with the lowest point of the error surface define the linear model that has the lowest SSE on the dataset (i.e., the linear model that best fits the data).

The shape of the error surface in the plot on the right of figure 6.3 indicates that there is only a single best linear model for this dataset because there is a single point at the bottom of the bowl that has a lower elevation (lower error) than any other points on the surface. Moving away from this best model (by varying the weights of the model) necessarily involves moving to a model with a higher SSE. Such a move is equivalent to moving to a new coordinate in the weight space, which has a higher elevation associated with it on the error surface. A convex or bowl-shaped error surface is incredibly useful for learning a linear function to model a dataset because it means that the learning process can be framed as a search for the lowest point on the error surface. The standard algorithm used to find this lowest point is known as gradient descent.

A convex or bowl-shaped error surface is incredibly useful for learning a linear function to model a dataset because it means that the learning process can be framed as a search for the lowest point on the error surface.

The gradient descent algorithm begins by creating an initial model using a randomly selected a set of weights. Next the SSE of this randomly initialized model is calculated. Taken together, the guessed set of weights and the SSE of the corresponding model define the initial starting point on the error surface for the search. It is very likely that the randomly initialized model will be a bad model, so it is very likely that the search will begin at a location that has a high elevation on the error surface. This bad start, however, is not a problem, because once the search process is positioned on the error surface, the process can find a better set of weights by simply following the gradient of the error surface downhill until it reaches the bottom of the error surface (the location where moving in any direction results in an increase in SSE). This is why the algorithm is known as gradient descent: the gradient that the algorithm descends is the gradient of the error surface of the model with respect to the data.

An important point is that the search does not progress from the starting location to the valley floor in one weight update. Instead, it moves toward the bottom of the error surface in an iterative manner, and during each iteration the current set of weights are updated so as to move to a nearby location in the weight space that has a lower SSE. Reaching the bottom of the error surface can take a large number of iterations. An intuitive way of understanding the process is to imagine a hiker who is caught on the side of a hill when a thick fog descends. Their car is parked at the bottom of the valley; however, due to the fog they can only see a few feet in any direction. Assuming that the valley has a nice convex shape to it, they can still find their way to their car, despite the fog, by repeatedly taking small steps that move down the hill following the local gradient at the position they are currently located. A single run of a gradient descent search is illustrated in the bottom plot of figure 6.3. The black curve plotted on the error surface illustrates the path the search followed down the surface, and the black line on the weight space plots the corresponding weight updates that occurred during the journey down the error surface. Technically, the gradient descent algorithm is known as an optimization algorithm because the goal of the algorithm is to find the optimal set of weights.

The most important component of the gradient descent algorithm is the rule that defines how the weights are updated during each iteration of the algorithm. In order to understand how this rule is defined it is first necessary to understand that the error surface is made up of multiple error gradients. For our simple example, the error surface is created by combining two error curves. One error curve is defined by the changes in the SSE as changes, shown in the top plot of figure 6.3. The other error curve is defined by the changes in the SSE as changes, shown in the plot in the middle of figure 6.3. Notice that the gradient of each of these curves can vary along the curve, for example, the error curve has a steep gradient on the extreme left and right of the plot, but the gradient becomes somewhat shallower in the middle of the curve. Also, the gradients of two different curves can vary dramatically; in this particular example the error curve generally has a much steeper gradient than the error curve.

The fact that the error surface is composed of multiple curves, each with a different gradient, is important because the gradient descent algorithm moves down the combined error surface by independently updating each weight so as to move down the error curve associated with that weight. In other words, during a single iteration of the gradient descent algorithm, is updated to move down the error curve and is updated the move down the error curve. Furthermore, the amount each weight is updated in an iteration is proportional to the steepness of the gradient of the weight’s error curve, and this gradient will vary from one iteration to the next as the process moves down the error curve. For example, will be updated by relatively large amounts in iterations where the search process is located high up on either side of the error curve, but by smaller amounts in iterations where the search process is nearer to the bottom of the error curve.

The error curve associated with each weight is defined by how the SSE changes with respect to the change in the value of the weight. Calculus, and in particular differentiation, is the field of mathematics that deals with rates of change. For example, taking the derivative of a function, , calculates the rate of change of (the output) for each unit change in (the input). Furthermore, if a function takes multiple inputs [] then it is possible to calculate the rate of change of the output, , with respect to changes in each of these inputs, , by taking the partial derivative of the function of with respect to each input. The partial derivative of a function with respect to a particular input is calculated by first assuming that all the other inputs are held constant (and so their rate of change is 0 and they disappear from the calculation) and then taking the derivative of what remains. Finally, the rate of change of a function for a given input is also known as the gradient of the function at the location on the curve (defined by the function) that is specified by the input. Consequently, the partial derivative of the SSE with respect to a weight specifies how the output of the SSE changes as that weight changes, and so it specifies the gradient of the error curve of the weight. This is exactly what is needed to define the gradient descent weight update rule: the partial derivative of the SSE with respect to a weight specifies how to calculate the gradient of the weight’s error curve, and in turn this gradient specifies how the weight should be updated to reduce the error (the output of the SSE).

The partial derivative of a function with respect to a particular variable is the derivative of the function when all the other variables are held constant. As a result there is a different partial derivative of a function with respect to each variable, because a different set of terms are considered constant in the calculation of each of the partial derivatives. Therefore, there is a different partial derivative of the SSE for each weight, although they all have a similar form. This is why each of the weights is updated independently in the gradient descent algorithm: the weight update rule is dependent on the partial derivative of the SSE for each weight, and because there is a different partial derivative for each weight, there is a separate weight update rule for each weight. Again, although the partial derivative for each weight is distinct, all of these derivatives have the same form, and so the weight update rule for each weight will also have the same form. This simplifies the definition of the gradient descent algorithm. Another simplifying factor is that the SSE is defined relative to a dataset with examples. The relevance of this is that the only variables in the SSE are the weights; the target output and the inputs are all specified by the dataset for each example, and so can be considered constants. As a result, when calculating the partial derivative of the SSE with respect to a weight, many of the terms in the equation that do not include the weight can be deleted because they are considered constants.

The relationship between the output of the SSE and each weight becomes more explicit if the SSE definition is rewritten so that the term , denoting the output predicted by the model, is replaced by the structure of the model generating the prediction. For the model with a single input and a dummy input, this rewritten version of the SSE is:

This equation uses a double subscript on the inputs, the first subscript identifies the example (or row in the dataset) and the second subscript specifies the feature (or column in the dataset) of the input. For example, represents feature 1 from example . This definition of the SSE can be generalized to a model with inputs:

Calculating the partial derivative of the SSE with respect to a specific weight involves the application of the chain rule from calculus and a number of standard differentiation rules. The result of this derivation is the following equation (for simplicity of presentation we switch back to the notation to represent the output from the model):

This partial derivative specifies how to calculate the error gradient for weight for the dataset where is the input associated with for each example in the dataset. This calculation involves multiplying two terms, the error of the output and the rate of change of the output (i.e., the weighted sum) with respect to changes in the weight. One way of understanding this calculation is that if changing the weight changes the output of the weighted sum by a large amount, then the gradient of the error with respect to the weight is large (steep) because changing the weight will result in big changes in the error. However, this gradient is the uphill gradient, and we wish to move the weights so as to move down the error curve. So in the gradient descent weight update rule (shown below) the “–” sign in front of the input is dropped. Using to represent the iteration of the algorithm (an iteration involves a single pass through the examples in the dataset), the gradient descent weight update rule is defined as:

There are a number of notable factors about this weight update rule. First, the rule specifies how the weight should be updated after iteration through the dataset. This update is proportional to the gradient of the error curve for the weight for that iteration (i.e., the summation term, which actually defines the partial derivative of the SSE for that weight). Second, the weight update rule can be used to update the weights for functions with multiple inputs. This means that the gradient descent algorithm can be used to descend error surfaces with more than two weight coordinates. It is not possible to visualize these error surfaces because they will have more than three dimensions, but the basic principles of descending an error surface using the error gradient generalizes to learning functions with multiple inputs. Third, although the weight update rule has a similar structure for each weight, the rule does define a different update for each weight during each iteration because the update is dependent on the inputs in the dataset examples to which the weight is applied. Fourth, the summation in the rule indicates that, in each iteration of the gradient descent algorithm, the current model should be applied to all of the examples in the dataset. This is one of the reasons why training a deep learning network is such a computationally expensive task. Typically for very large datasets, the dataset is split up into batches of examples sampled from the dataset, and each iteration of training is based on a batch, rather than the entire dataset. Fifth, apart from the modifications necessary to include the summation, this rule is identical to the LMS (also known as the Widrow-Hoff or delta) learning rule introduced in chapter 4, and the rule implements the same logic: if the output of the model is too large, then weights associated with positive inputs should be reduced; if the output is too small, then these weights should be increased. Moreover, the purpose and function of the learning rate hyperparameter (η) is the same as in the LMS rule: scale the weight adjustments to ensure that the adjustments aren’t so large that the algorithm misses (or steps over) the best set of weights. Using this weight update rule, the gradient descent algorithm can be summarized as follows:
1. Construct a model using an initial set of weights.
2. Repeat until the model performance is good enough.
a. Apply the current model to the examples in the dataset.
b. Adjust each weight using the weight update rule.
3. Return the final model.

One consequence of the independent updating of weights, and the fact that weight updates are proportional to the local gradient on the associated error curve, is that the path the gradient descent algorithm follows to the lowest point on the error surface may not be a straight line. This is because the gradient of each of the component error curves may not be equal at each location on the error surface (the gradient for one of the weights may be steeper than the gradient for the other weight). As a result, one weight may be updated by a larger amount than another weight during a given iteration, and thus the descent to the valley floor may not follow a direct route. Figure 6.4 illustrates this phenomenon. Figure 6.4 presents a set of top-down views of a portion of a contour plot of an error surface. This error surface is a valley that is quite long and narrow with steeper sides and gentler sloping ends; the steepness is reflected by the closeness of the contours. As a result, the search initially moves across the valley before turning toward the center of the valley. The plot on the left illustrates the first iteration of the gradient descent algorithm. The initial starting point is the location where the three arrows, in this plot, meet. The lengths of the dotted and dashed arrows represent the local gradients of the and error curves, respectively. The dashed arrow is longer than the dotted arrow reflecting the fact that the local gradient of the error curve is steeper than that of the error curve. In each iteration, each of the weights is updated in proportion to the gradient of their error curve; so in the first iteration, the update for is larger than for and therefore the overall movement is greater across the valley than along the valley. The thick black arrow illustrates the overall movement in the underlying weight space, resulting from the weight updates in this first iteration. Similarly, the middle plot illustrates the error gradients and overall weight update for the next iteration of gradient descent. The plot on the right shows the complete path of descent taken by the search process from initial location to the global minimum (the lowest point on the error surface).

Figure 6.4 Top-down views of a portion of a contour plot of an error surface, illustrating the gradient descent path across the error surface. Each of the thick arrows illustrates the overall movement of the weight vector for a single iteration of the gradient descent algorithm. The length of dotted and dashed arrows represent the local gradient of the w₀ and w₁ error curves, respectively, for that iteration. The plot on the right shows the overall path taken to the global minimum of the error surface.
It is relatively straightforward to map the weight update rule over to training a single neuron. In this mapping, the weight

It is relatively straightforward to map the weight update rule over to training a single neuron. In this mapping, the weight is the bias term for a neuron, and the other weights are associated with the other inputs to the neuron. The derivation of the partial derivative of the SSE is dependent on the structure of the function that generates . The more complex this function is, the more complex the partial derivative becomes. The fact that the function a neuron defines includes both a weighted summation and an activation function means that the partial derivative of the SSE with respect to a weight in a neuron is more complex than the partial derivative given above. The inclusion of the activation function within the neuron results in an extra term in the partial derivative of the SSE. This extra term is the derivative of the activation function with respect to the output from the weighted summation function. The derivative of the activation function is with respect to the output of the weighted summation function because this is the input that the activation function receives. The activation function does not receive the weight directly. Instead, the changes in the weight only affect the output of the activation function indirectly through the effect that these weight changes have on the output of the weighted summation. The main reason why the logistic function was such a popular activation function in neural networks for so long was that it has a very straightforward derivative with respect to its inputs. The gradient descent weight update rule for a neuron using the logistic function is as follows:

The fact that the weight update rule includes the derivative of the activation function means that the weight update rule will change if the activation function of the neuron is changed. However, this change will simply involve updating the derivative of the activation function; the overall structure of the rule will remain the same.

This extended weight update rule means that the gradient descent algorithm can be used to train a single neuron. It cannot, however, be used to train neural networks with multiple layers of neurons because the definition of the error gradient for a weight depends on the error of the output of the function, the term . Although it is possible to calculate the error of the output of a neuron in the output layer of the network by directly comparing the output with the expected output, it is not possible to calculate this error term directly for the neurons in the hidden layer of the network, and as a result it is not possible to calculate the error gradients for each weight. The backpropagation algorithm is a solution to the problem of calculating error gradients for the weights in the hidden layers of the network.

Training a Neural Network Using Backpropagation

The term backpropagation has two different meanings. The primary meaning is that it is an algorithm that can be used to calculate, for each neuron in a network, the sensitivity (gradient/rate-of-change) of the error of the network to changes in the weights. Once the error gradient for a weight has been calculated, the weight can then be adjusted to reduce the overall error of the network using a weight update rule similar to the gradient descent weight update rule. In this sense, the backpropagation algorithm is a solution to the credit assignment problem, introduced in chapter 4. The second meaning of backpropagation is that it is a complete algorithm for training a neural network. This second meaning encompasses the first sense, but also includes a learning rule that defines how the error gradients of the weights should be used to update the weights within the network. Consequently, the algorithm described by this second meaning involves a two-step process: solve the credit assignment problem, and then use the error gradients of the weights, calculated during credit assignment, to update the weights in the network. It is useful to distinguish between these two meanings of backpropagation because there are a number of different learning rules that can be used to update the weights, once the credit assignment problem has been resolved. The learning rule that is most commonly used with backpropagation is the gradient descent algorithm introduced earlier. The description of the backpropagation algorithm given here focuses on the first meaning of backpropagation, that of the algorithm being a solution to the credit assignment problem.

Backpropagation: The Two-Stage Algorithm

The backpropagation algorithm begins by initializing all the weights of the network using random values. Note that even a randomly initialized network can still generate an output when an input is presented to the network, although it is likely to be an output with a large error. Once the network weights have been initialized, the network can be trained by iteratively updating the weights so as to reduce the error of the network, where the error of the network is calculated in terms of the difference between the output generated by the network in response to an input pattern, and the expected output for that input, as defined in the training dataset. A crucial step in this iterative weight adjustment process involves solving the credit assignment problem, or, in other words, calculating the error gradients for each weight in the network. The backpropagation algorithm solves this problem using a two-stage process. In first stage, known as the forward pass, an input pattern is presented to the network, and the resulting neuron activations flow forward through the network until an output is generated. Figure 6.5 illustrates the forward pass of the backpropagation algorithm. In this figure, the weighted summation of inputs calculated at each neuron (e.g., represents the weighted summation of inputs calculated for neuron 1) and the outputs (or activations, e.g., represents the activation for neuron 1) of each neuron is shown. The reason for listing the and values for each neuron in this figure is to highlight the fact that during the forward pass both of these values, for each neuron, are stored in memory. The reason they are stored in memory is that they are used in the backward pass of the algorithm. The value for a neuron is used to calculate the update to the weights on input connections to the neuron. The value for a neuron is used to calculate the update to the weights on the output connections from a neuron. The specifics of how these values are used in the backward pass will be described below.

The second stage, known as the backward pass, begins by calculating an error gradient for each neuron in the output layer. These error gradients represent the sensitivity of the network error to changes in the weighted summation calculation of the neuron, and they are often denoted by the shorthand notation  (pronounced delta) with a subscript indicating the neuron. For example, δ_k is the gradient of the network error with respect to small changes in the weighted summation calculation of the neuron k. It is important to recognize that there are two different error gradients calculated in the backpropagation algorithm:
1. The first is the  value for each neuron. The  for each neuron is the rate of change of the error of the network with respect to changes in the weighted summation calculation of the neuron. There is one  for each neuron. It is these  error gradients that the algorithm backpropagates.
2. The second is the error gradient of the network with respect to changes in the weights of the network. There is one of these error gradients for each weight in the network. These are the error gradients that are used to update the weights in the network. However, it is necessary to first calculate the  term for each neuron (using backpropagation) in order to calculate the error gradients for the weights.

Note there is only a single per neuron, but there may be many weights associated with that neuron, so the term for a neuron may be used in the calculation of multiple weight error gradients.

Once the s for the output neurons have been calculated, the s for the neurons in the last hidden layer are then calculated. This is done by assigning a portion of the from each output neuron to each hidden neuron that is directly connected to it. This assignment of blame, from output neuron to hidden neuron, is dependent on the weight of the connection between the neurons, and the activation of the hidden neuron during the forward pass (this is why the activations are recorded in memory during the forward pass). Once the blame assignment, from the output layer, has been completed, the for each neuron in the last hidden layer is calculated by summing the portions of the s assigned to the neuron from all of the output neurons it connects to. The same process of blame assignment and summing is then repeated to propagate the error gradient back from the last layer of hidden neurons to the neurons in the second last layer, and so on, back to the input layer. It is this backward propagation of s through the network that gives the algorithm its name. At the end of this backward pass there is a calculated for each neuron in the network (i.e., the credit assignment problem has been solved) and these s can then be used to update the weights in the network (using, for example, the gradient descent algorithm introduced earlier). Figure 6.6 illustrates the backward pass of the backpropagation algorithm. In this figure, the s get smaller and smaller as the backpropagation process gets further from the output layer. This reflects the vanishing gradient problem discussed in chapter 4 that slows down the learning rate of the early layers of the network.

Figure 6.5 The forward pass of the backpropagation algorithm.

In summary, the main steps within each iteration of the backpropagation algorithm are as follows:
1. Present an input to the network and allow the neuron activations to flow forward through the network until an output is generated. Record both the weighted sum and the activation of each neuron.

**Figure 6.6 The backward pass of the backpropagation algorithm.**

2. Calculate a  (delta) error gradient for each neuron in the output layer.
3. Backpropagate the  error gradients to obtain a  (delta) error gradient for each neuron in the network.
4. Use the  error gradients and a weight update algorithm, such as gradient descent, to calculate the error gradients for the weights and use these to update the weights in the network.

The algorithm continues iterating through these steps until the error of the network is reduced (or converged) to an acceptable level.

Backpropagation: Backpropagating the δ s

A term of a neuron describes the error gradient for the network with respect to changes in the weighted summation of inputs calculated by the neuron. To help make this more concrete, figure 6.7 (top) breaks open the processing stages within a neuron and uses the term to denote the result of the weighted summation within the neuron. The neuron in this figure receives inputs (or activations) from three other neurons (), and is the weighted sum of these activations. The output of the neuron, , is then calculated by passing through a nonlinear activation function, , such as the logistic function. Using this notation a for a neuron is the rate of change of the error of the network with respect to small changes in the value of . Mathematically, this term is the partial derivative of the networks error with respect to :

No matter where in a network a neuron is located (output layer or hidden layer), the for the neuron is calculated as the product of two terms:
1. the rate of change of the network error in response to changes in the neuron’s activation (output):

Figure 6.7 Top: the forward propagation of activations through the weighted sum and activation function of a neuron. Middle: The calculation of the δ term for an output neuron (t_k is the expected activation for the neuron and a_k is the actual activation). Bottom: The calculation of the δ term for a hidden neuron. This figure is loosely inspired by figure 5.2 and figure 5.3 in Reed and Marks II 1999.

2. the rate of change of the activation of the neuron with respect to changes in the weighted sum of inputs to the neuron: .

Figure 6.7 (middle) illustrates how this product is calculated for neurons in the output layer of a network. The first step is to calculate the rate of change of the error of the network with respect to the output of the neuron, the term . Intuitively, the larger the difference between the activation of a neuron, , and the expected activation, , the faster the error can be changed by changing the activation of the neuron. So the rate of change of the error of the network with respect to changes in the activation of an output neuron can be calculated by subtracting the neuron’s activation () from the expected activation ():

This term connects the error of the network to the output of the neuron. The neuron’s , however, is the rate of change of the error with respect to the input to the activation function (), not the output of that function (). Consequently, in order to calculate the for the neuron, the value must be propagated back through the activation function to connect it to the input to the activation function. This is done by multiplying by the rate of change of the activation function with respect to the input value to the function, . In figure 6.7, the rate of change of the activation function with respect to its input is denoted by the term: . This term is calculated by plugging the value (stored from the forward pass through the network) into the equation of the derivative of the activation function with respect to . For example, the derivative of the logistic function with respect to its input is:

Figure 6.8 plots this function and shows that plugging a value into this equation will result in a value between 0 and 0.25. For example, figure 6.8 shows that if then . This is why the weighted summation value for each neuron () is stored during the forward pass of the algorithm.

The fact1 that the calculation of a neuron’s involves a product that includes the derivative of the neuron’s activation function makes it necessary to be able to take the derivative of the neuron’s activation function. It is not possible to take the derivative of a threshold activation function because there is a discontinuity in the function at the threshold. As a result, the backpropagation algorithm does not work for networks composed of neurons that use threshold activation functions. This is one of the reasons why neural networks moved away from threshold activation and started to use the logistic and tanh activation functions. The logistic and tanh functions both have very simple derivatives and this made them particularly suitable to backpropagation.

Figure 6.8 Plots of the logistic function and the derivative of the logistic function.

Figure 6.7 (bottom) illustrates how the for a neuron in a hidden layer is calculated. This involves the same product of terms as was used for neurons in the output layer. The difference is that the calculation of the is more complex for hidden units. For hidden neurons, it is not possible to directly connect the output of the neuron with the error of a network. The output of a hidden neuron only indirectly affects the overall error of the network through the variations that it causes in the downstream neurons that receive the output as input, and the magnitude of these variations is dependent on the weight each of these downstream neurons applies to the output. Furthermore, this indirect effect on the network error is in turn dependent on the sensitivity of the network error to these later neurons, that is, their values. Consequently, the sensitivity of the network error to the output of a hidden neuron can be calculated as a weighted sum of the values of the neurons immediately downstream of the neuron:

As a result, the error terms (the values) for all the downstream neurons to which a neuron’s output is passed in the forward pass must be calculated before the for neuron k can be calculated. This, however, is not a problem because in the backward pass the algorithm is working backward through the network and will have calculated the terms for the downstream neurons before it reaches neuron k.

For hidden neurons, the other term in the product, , is calculated in the same way as it is calculated for output neurons: the value for the neuron (the weighted summation of inputs, stored during the forward pass through the network) is plugged into the derivative of the neuron’s activation function with respect to .

Backpropagation: Updating the Weights

The fundamental principle of the backpropagation algorithm in adjusting the weights in a network is that each weight in a network should be updated in proportion to the sensitivity of the overall error of the network to changes in that weight. The intuition is that if the overall error of the network is not affected by a change in a weight, then the error of the network is independent of that weight, and, therefore, the weight did not contribute to the error. The sensitivity of the network error to a change in an individual weight is measured in terms of the rate of change of the network error in response to changes in that weight.

The overall error of a network is a function with multiple inputs: both the inputs to the network and all the weights in the network. So, the rate of change of the error of a network in response to changes in a given network weight is calculated by taking the partial derivative of the network error with respect to that weight. In the backpropagation algorithm, the partial derivative of the network error for a given weight is calculated using the chain rule. Using the chain rule, the partial derivative of the network error with respect a weight  on the connection between a neuron  and a neuron  is calculated as the product of two terms:
1. the first term describes the rate of change of the weighted sum of inputs in neuron  with respect to changes in the weight ;
2. and the second term describes the rate of change of the network error in response to changes in the weighted sum of inputs calculated by the neuron . (This second term is the  for neuron .)

Figure 6.9 shows how the product of these two terms connects a weight to the output error of the network. The figure shows the processing of the last two neurons ( and ) in a network with a single path of activation. Neuron receives a single input and the output from neuron is the sole input to neuron . The output of neuron is the output of the network. There are two weights in this portion of the network, and .

The calculations shown in figure 6.9 appear complicated because they contain a number of different components. However, as we will see, by stepping through these calculations, each of the individual elements is actually easy to calculate; it’s just keeping track of all the different elements that poses a difficulty.

Figure 6.9 An illustration of how the product of derivatives connects weights in the network to the error of the network.

Focusing on , this weight is applied to an input of the output neuron of the network. There are two stages of processing between this weight and the network output (and error): the first is the weighted sum calculated in neuron ; the second is the nonlinear function applied to this weighted sum by the activation function of neuron . Working backward from the output, the term is calculated using the calculation shown in the middle figure of figure 6.7: the difference between the target activation for the neuron and the actual activation is calculated and is multiplied by the partial derivative of the neuron’s activation function with respect to its input (the weighted sum ), . Assuming that the activation function used by neuron is the logistic function, the term is calculated by plugging in the value (stored during the forward pass of the algorithm) into the derivation of the logistic function:

So the calculation of under the assumption that neuron uses a logistic function is:

The term connects the error of the network to the input to the activation function (the weighted sum ). However, we wish to connect the error of the network back to the weight . This is done by multiplying the term by the partial derivative of the weighted summation function with respect to weight : . This partial derivative describes how the output of the weighted sum function changes as the weight changes. The fact that the weighted summation function is a linear function of weights and activations means that in the partial derivative with respect to a particular weight all the terms in the function that do not involve the specific weight go to zero (are considered constants) and the partial derivative simplifies to just the input associated with that weight, in this instance input .

This is why the activations for each neuron in the network are stored in the forward pass. Taken together these two terms, and , connect the weight to the network error by first connecting the weight to , and then connecting to the activation of the neuron, and thereby to the network error. So, the error gradient of the network with respect to changes in weight is calculated as:

The other weight in the figure 6.9 network, , is deeper in the network, and, consequently, there are more processing steps between it and the network output (and error). The term for neuron is calculated, through backpropagation (as shown at the bottom of figure 6.7), using the following product of terms:

Assuming the activation function used by neuron is the logistic function, then the term is calculated in a similar way to : the value is plugged into the equation for the derivative of the logistic function. So, written out in long form the calculation of is:

However, in order to connect the weight with the error of the network, the term must be multiplied by the partial derivative of the weighted summation function with respect to the weight: . As described above, the partial derivative of a weighted sum function with respect to a weight reduces to the input associated with the weight (i.e., ); and the gradient of the networks error with respect to the hidden weight is calculated by multiplying by Consequently, the product of the terms ( and ) forms a chain connecting the weight to the network error. For completeness, the product of terms for , assuming logistic activation functions in the neurons, is:

Although this discussion has been framed in the context of a very simple network with only a single path of connections, it generalizes to more complex networks because the calculation of the terms for hidden units already considers the multiple connections emanating from a neuron. Once the gradient of the network error with respect to a weight has been calculated (), the weight can be adjusted so as to reduce the weight of the network using the gradient descent weight update rule. Here is the weight update rule, specified using the notation from backpropagation, for the weight on the connection between neuron and neuron during iteration of the algorithm:

Finally, an important caveat on training neural networks with backpropagation and gradient descent is that the error surface of a neural network is much more complex than that of a linear models. Figure 6.3 illustrated the error surface of a linear model as a smooth convex bowl with a single global minimum (a single best set of weights). However, the error surface of a neural network is more like a mountain range with multiple valleys and peaks. This is because each of the neurons in a network includes a nonlinear function in its mapping of inputs to outputs, and so the function implemented by the network is a nonlinear function. Including a nonlinearity within the neurons of a network increases the expressive power of the network in terms of its ability to learn more complex functions. However, the price paid for this is that the error surface becomes more complex and the gradient descent algorithm is no longer guaranteed to find the set of weights that define the global minimum on the error surface; instead it may get stuck within a minima (local minimum). Fortunately, however, backpropagation and gradient descent can still often find sets of weights that define useful models, although searching for useful models may require running the training process multiple times to explore different parts of the error surface landscape.

7 The Future of Deep Learning

On March 27, 2019, Yoshua Bengio, Geoffrey Hinton, and Yann LeCun jointly received the ACM A.M. Turing award. The award recognized the contributions they have made to deep learning becoming the key technology driving the modern artificial intelligence revolution. Often described as the “Nobel Prize for Computing,” the ACM A.M Turing award carries a $1 million prize. Sometimes working together, and at other times working independently or in collaboration with others, these three researchers have, over a number of decades of work, made numerous contributions to deep learning, ranging from the popularization of backpropagation in the 1980s, to the development of convolutional neural networks, word embeddings, attention mechanisms in networks, and generative adversarial networks (to list just some examples). The announcement of the award noted the astonishing recent breakthroughs that deep learning has led to in computer vision, robotics, speech recognition, and natural language processing, as well as the profound impact that these technologies are having on society, with billions of people now using deep learning based artificial intelligence on a daily basis through smart phones applications. The announcement also highlighted how deep learning has provided scientists with powerful new tools that are resulting in scientific breakthroughs in areas as diverse as medicine and astronomy. The awarding of this prize to these researchers reflects the importance of deep learning to modern science and society. The transformative effects of deep learning on technology is set to increase over the coming decades with the development and adoption of deep learning continuing to be driven by the virtuous cycle of ever larger datasets, the development of new algorithms, and improved hardware. These trends are not stopping, and how the deep learning community responds to them will drive growth and innovations within the field over the coming years.

Big Data Driving Algorithmic Innovations

Chapter 1 introduced the different types of machine learning: supervised, unsupervised, and reinforcement learning. Most of this book has focused on supervised learning, primarily because it is the most popular form of machine learning. However, a difficulty with supervised learning is that it can cost a lot of money and time to annotate the dataset with the necessary target labels. As datasets continue to grow, the data annotation cost is becoming a barrier to the development of new applications. The ImageNet dataset1 provides a useful example of the scale of the annotation task involved in deep learning projects. This data was released in 2010, and is the basis for the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). This is the challenge that the AlexNet CNN won in 2012 and the ResNet system won in 2015. As was discussed in chapter 4, AlexNet winning the 2012 ILSVRC challenge generated a lot of excitement about deep learning models. However, the AlexNet win would not have been possible without the creation of the ImageNet dataset. This dataset contains more than fourteen million images that have been manually annotated to indicate which objects are present in each image; and more than one million of the images have actually been annotated with the bounding boxes of the objects in the image. Annotating data at this scale required a significant research effort and budget, and was achieved using crowdsourcing platforms. It is not feasible to create annotated datasets of this size for every application.

As datasets continue to grow, the data annotation cost is becoming a barrier to the development of new applications.

One response to this annotation challenge has been a growing interest in unsupervised learning. The autoencoder models used in Hinton’s pretraining (see chapter 4) are one neural network approach to unsupervised learning, and in recent years different types of autoencoders have been proposed. Another approach to this problem is to train generative models. Generative models attempt to learn the distribution of the data (or, to model the process that generated the data). Similar to autoencoders, generative models are often used to learn a useful representation of the data prior to training a supervised model. Generative adversarial networks (GANs) are an approach to training generative models that has received a lot of attention in recent years (Goodfellow et al. 2014). A GAN consists of two neural networks, a generative model and a discriminative model, and a sample of real data. The models are trained in an adversarial manner. The task of the discriminative model is to learn to discriminate between real data sampled from the dataset, and fake data that has been synthesized by the generator. The task of the generator is to learn to synthesize fake data that can fool the discriminative model. Generative models trained using a GAN can learn to synthesize fake images that mimic an artistic style (Elgammal et al. 2017), and also to synthesize medical images along with lesion annotations (Frid-Adar et al. 2018). Learning to synthesize medical images, along with the segmentation of the lesions in the synthesized image, opens the possibility of automatically generating massive labeled datasets that can be used for supervised learning. A more worrying application of GANs is the use of these networks to generate deep fakes: a deep fake is a fake video of a person doing something they never did that is created by swapping their face into a video of someone else. Deep fakes are very hard to detect, and have been used maliciously on a number of occasions to embarrass public figures, or to spread fake news stories.

Another solution to the data labeling bottleneck is that rather than training a new model from scratch for each new application, we rather repurpose models that have been trained on a similar task. Transfer learning is the machine learning challenge of using information (or representations) learned on one task to aid learning on another task. For transfer learning to work, the two tasks should be from related domains. Image processing is an example of a domain where transfer learning is often used to speed up the training of models across different tasks. Transfer learning is appropriate for image processing tasks because low-level visual features, such as edges, are relatively stable and useful across nearly all visual categories. Furthermore, the fact that CNN models learn a hierarchy of visual feature, with the early layers in CNN learning functions that detect these low-level visual features in the input, makes it possible to repurpose the early layers of pretrained CNNs across multiple image processing projects. For example, imagine a scenario where a project requires an image classification model that can identify objects from specialized categories for which there are no samples in general image datasets, such as ImageNet. Rather than training a new CNN model from scratch, it is now relatively standard to first download a state-of-the-art model (such as the Microsoft ResNet model) that has been trained on ImageNet, then replace the later layers of the model with a new set of layers, and finally to train this new hybrid-model on a relatively small dataset that has been labeled with the appropriate categories for the project. The later layers of the state-of-the-art (general) model are replaced because these layers contain the functions that combine the low-level features into the task specific categories the model was originally trained to identify. The fact that the early layers of the model have already been trained to identify the low-level visual features speeds up the training and reduces the amount of data needed to train the new project specific model.

The increased interest in unsupervised learning, generative models, and transfer learning can all be understood as a response to the challenge of annotating increasingly large datasets.

The Emergence of New Models

The rate of emergence of new deep learning models is accelerating every year. A recent example is capsule networks (Hinton et al. 2018; Sabour et al. 2017). Capsule networks are designed to address some of the limitations of CNNs. One problem with CNNs, sometimes known as the Picasso problem, is the fact that a CNN ignores the precise spatial relationships between high-level components within an object’s structure. What this means in practice is that a CNN that has been trained to identify faces may learn to identify the shapes of eyes, the nose, and the mouth, but will not learn the required spatial relationships between these parts. Consequently, the network can be fooled by an image that contains these body parts, even if they are not in the correct relative position to each other. This problem arises because of the pooling layers in CNNs that discard positional information.

At the core of capsule networks is the intuition that the human brain learns to identify object types in a viewpoint invariant manner. Essentially, for each object type there is an object class that has a number of instantiation parameters. The object class encodes information such as the relative relationship of different object parts to each other. The instantiation parameters control how the abstract description of an object type can be mapped to the specific instance of the object that is currently in view (for example, its pose, scale, etc.).

A capsule is a set of neurons that learns to identify whether a specific type of object or object part is present at a particular location in an image. A capsule outputs an activity vector that represents the instantiation parameters of the object instance, if one is present at the relevant location. Capsules are embedded within convolutional layers. However, capsule networks replace the pooling process, which often defines the interface between convolutional layers, with a process called dynamic routing. The idea behind dynamic routing is that each capsule in one layer in the network learns to predict which capsule in the next layer is the most relevant capsule for it to forward its output vector to.

At the time or writing, capsule networks have the state-of-the-art performance on the MNIST handwritten digit recognition dataset that the original CNNs were trained on. However, by today’s standards, this is a relatively small dataset, and capsule networks have not been scaled to larger datasets. This is partly because the dynamic routing process slows down the training of capsule networks. However, if capsule networks are successfully scaled, then they may introduce an important new form of model that extends the ability of neural networks to analyze images in a manner much closer to the way humans do.

Another recent model that has garnered a lot of interest is the transformer model (Vaswani et al. 2017). The transformer model is an example of a growing trend in deep learning where models are designed to have sophisticated internal attention mechanisms that enable a model to dynamically select subsets of the input to focus on when generating an output. The transformer model has achieved state-of-the-art performance on machine translation for some language pairs, and in the future this architecture may replace the encoder-decoder architecture described in chapter 5. The BERT (Bidirectional Encoder Representations from Transformers) model has built on the Transformer architecture (Devlin et al. 2018). The BERT development is particularly interesting because at its core is the idea of transfer learning (as discussed above in relation to the data annotation bottleneck). The basic approach to creating a natural language processing model with BERT is to pretrain a model for a given language using a large unlabeled dataset (the fact that the dataset is unlabeled means that it is relatively cheap to create). This pretrained model can then be used as the basis to create a models for specific tasks for the language (such as sentiment classification or question answering) by fine-tuning the pretrained model using supervised learning and a relatively small annotated dataset. The success of BERT has shown this approach to be tractable and effective in developing state-of-the-art natural language processing systems.

New Forms of Hardware

Today’s deep learning is powered by graphics processing units (GPUs): specialized hardware that is optimized to do fast matrix multiplications. The adoption, in the late 2000s, of commodity GPUs to speed up neural network training was a key factor in many of the breakthroughs that built momentum behind deep learning. In the last ten years, hardware manufacturers have recognized the importance of the deep learning market and have developed and released hardware specifically designed for deep learning, and which supports deep learning libraries, such as TensorFlow and PyTorch. As datasets and networks continue to grow in size, the demand for faster hardware continues. At the same time, however, there is a growing recognition of the energy costs associated with deep learning, and people are beginning to look for hardware solutions that have a reduced energy footprint.

Neuromorphic computing emerged in the late 1980s from the work of Carver Mead.2 A neuromorphic chip is composed of a very-large-scale integrated (VLSI) circuit, connecting potentially millions of low-power units known as spiking neurons. Compared with the artificial neurons used in standard deep learning systems, the design of a spiking neuron is closer to the behavior of biological neurons. In particular, a spiking neuron does not fire in response to the set of input activations propagated to it at a particular time point. Instead, a spiking neuron maintains an internal state (or activation potential) that changes through time as it receives activation pulses. The activation potential increases when new activations are received, and decays through time in the absence of incoming activations. The neuron fires when its activation potential surpasses a specific threshold. Due to the temporal decay of the neuron’s activation potential, a spiking neuron only fires if it receives the requisite number of input activations within a time window (a spiking pattern). One advantage of this temporal based processing is that spiking neurons do not fire on every propagation cycle, and this reduces the amount of energy the network consumes.

In comparison with traditional CPU design, neuromorphic chips have a number of distinctive characteristics, including:
1. Basic building blocks: traditional CPUs are built using transistor based logic gates (e.g., AND, OR, NAND gates), whereas neuromorphic chips are built using spiking neurons.
2. Neuromorphic chips have an analog aspect to them: in a traditional digital computer, information is sent in high-low electrical bursts in sync with a central clock; in a neuromorphic chip, information is sent as patterns of high-low signals that vary through time.
3. Architecture: the architecture of traditional CPUs is based on the von Neumann architecture, which is intrinsically centralized with all the information passing through the CPU. A neuromorphic chip is designed to allow massive parallelism of information flow between the spiking neurons. Spiking neurons communicate directly with each other rather than via a central information processing hub.
4. Information representation is distributed through time: the information signals propagated through a neuromorphic chip use a distributed representation, similar to the distributed representations discussed in chapter 4, with the distinction that in a neuromorphic chip these representations are also distributed through time. Distributed representations are more robust to information loss than local representations, and this is a useful property when passing information between hundreds of thousands, or millions, of components, some of which are likely to fail.

Currently there are a number of major research projects focused on neuromorphic computing. For example, in 2013 the European Commission allocated one billion euros in funding to the ten-year Human Brain Project.3 This project directly employs more than five hundred scientists, and involves research from more than a hundred research centers across Europe. One of the projects key objectives is the development of neuromorphic computing platforms capable of running a simulation of a complete human brain. A number of commercial neuromorphic chips have also been developed. In 2014, IBM launched the TrueNorth chip, which contained just over a million neurons that are connected together by over 286 million synapses. This chip uses approximately 1/10,000th the power of a conventional microprocessor. In 2018, Intel Labs announced the Loihi (pronounced low-ee-hee) neuromorphic chip. The Loihi chip has 131,072 neurons connected together by 130,000,000 synapses. Neuromorphic computing has the potential to revolutionize deep learning; however, it still faces a number of challenges, not least of which is the challenge of developing the algorithms and software patterns for programming this scale of massively parallel hardware.

Finally, on a slightly longer time horizon, quantum computing is another stream of hardware research that has the potential to revolutionize deep learning. Quantum computing chips are already in existence; for example, Intel has created a 49-qubit quantum test chip, code named Tangle Lake. A qubit is the quantum equivalent of a binary digit (bit) in traditional computing. A qubit can store more than one bit of information; however, it is estimated that it will require a system with one million or more qubits before quantum computing will be useful for commercial purposes. The current time estimate for scaling quantum chips to this level is around seven years.

The Challenge of Interpretability

Machine learning, and deep learning, are fundamentally about making data-driven decisions. Although deep learning provides a powerful set of algorithms and techniques to train models that can compete (and in some cases outperform) humans on a range of decision-making tasks, there are many situations where a decision by itself is not sufficient. Frequently, it is necessary to provide not only a decision but also the reasoning behind a decision. This is particularly true when the decision affects a person, be it a medical diagnosis or a credit assessment. This concern is reflected in privacy and ethics regulations in relation to the use of personal data and algorithmic decision-making pertaining to individuals. For example, Recital 714 of the General Data Protection Regulations (GDPR) states that individuals, affected by a decision made by an automated decision-making process, have the right to an explanation with regards to how the decision was reached.

Different machine learning models provide different levels of interpretability with regard to how they reach a specific decision. Deep learning models, however, are possibly the least interpretable. At one level of description, a deep learning model is quite simple: it is composed of simple processing units (neurons) that are connected together into a network. However, the scale of the networks (in terms of the number of neurons and the connections between them), the distributed nature of the representations, and the successive transformations of the input data as the information flows deeper into the network, makes it incredibly difficult to interpret, understand, and therefore explain, how the network is using an input to make a decision.

The legal status of the right to explanation within GDPR is currently vague, and the specific implications of it for machine learning and deep learning will need to be worked out in the courts. This example does, however, highlight the societal need for a better understanding of how deep learning models use data. The ability to interpret and understand the inner workings of a deep learning model is also important from a technical perspective. For example, understanding how a model uses data can reveal if a model has an unwanted bias in how it makes its decisions, and also reveal the corner cases that the model will fail on. The deep learning and the broader artificial intelligence research communities are already responding to this challenge. Currently, there are a number of projects and conferences focused on topics such as explainable artificial intelligence, and human interpretability in machine learning.

Chis Olah and his colleagues summarize the main techniques currently used to examine the inner workings of deep learning models as: feature visualization, attribution, and dimensionality reduction (Olah et al. 2018). One way to understand how a network processes information is to understand what inputs trigger particular behaviors in a network, such as a neuron firing. Understanding the specific inputs that trigger the activation of a neuron enables us to understand what the neuron has learned to detect in the input. The goal of feature visualization is to generate and visualize inputs that cause a specific activity within a network. It turns out that optimization techniques, such a backpropogation, can be used to generate these inputs. The process starts with a random generated input and the input is then iteratively updated until the target behavior is triggered. Once the required necessary input has been isolated, it can then be visualized in order to provide a better understanding of what the network is detecting in the input when it responds in a particular way. Attribution focuses on explaining the relationship between neurons, for example, how the output of a neuron in one layer of the network contributes to the overall output of the network. This can be done by generating a saliency (or heat-map) for the neurons in a network that captures how much weight the network puts on the output of a neuron when making a particular decision. Finally, much of the activity within a deep learning network is based on the processing of high-dimensional vectors. Visualizing data enables us to use our powerful visual cortex to interpret the data and the relationships within the data. However, it is very difficult to visualize data that has a dimensionality greater than three. Consequently, visualization techniques that are able to systematically reduce the dimensionality of high-dimensional data and visualize the results are incredibly useful tools for interpreting the flow of information within a deep network. t-SNE5 is a well-known technique that visualizes high-dimensional data by projecting each datapoint into a two- or three-dimensional map (van der Maaten and Hinton 2008). Research on interpreting deep learning networks is still in its infancy, but in the coming years, for both societal and technical reasons, this research is likely to become a more central concern to the broader deep learning community.

Final Thoughts

Deep learning is ideally suited for applications involving large datasets of high-dimensional data. Consequently, deep learning is likely to make a significant contribution to some of the major scientific challenges of our age. In the last two decades, breakthroughs in biological sequencing technology have made it possible to generate high-precision DNA sequences. This genetic data has the potential to be the foundation for the next generation of personalized precision medicine. At the same time, international research projects, such as the Large Hadron Collider and Earth orbit telescopes, generate huge amounts of data on a daily basis. Analyzing this data can help us to understand the physics of our universe at the smallest and the biggest scales. In response to this flood of data, scientists are, in ever increasing numbers, turning to machine learning and deep learning to enable them to analyze this data.

One way to understand how a network processes information is to understand what inputs trigger particular behaviors in a network, such as a neuron firing.

At a more mundane level, however, deep learning already directly affects our lives. It is likely, that for the last few years, you have unknowingly been using deep learning models on a daily basis. A deep learning model is probably being invoked every time you use an internet search engine, a machine translation system, a face recognition system on your camera or social media website, or use a speech interface to a smart device. What is potentially more worrying is that the trail of data and metadata that you leave as you move through the online world is also being processed and analzsed using deep learning models. This is why it is so important to understand what deep learning is, how it works, what is it capable of, and its current limitations.

2025-02-01

Zack Savitsky：熵是什么

生命是一本关于破坏的文集。你构建的一切最终都会崩溃。每个你爱的人都会死去。任何秩序或稳定感都不可避免地湮灭。整个宇宙都沿着一段惨淡的跋涉走向一种沉闷的终极动荡状态。
为了跟踪这种宇宙衰变，物理学家使用了一种称为熵的概念。熵是无序性的度量标准，而熵总是在上升的宣言——被称为热力学第二定律——是自然界最不可避免的宿命之一。
长期以来，我一直被这种普遍的混乱倾向所困扰。秩序是脆弱的。制作一个花瓶需要艺术性和几个月的精心策划，但用足球破坏它只需要一瞬间。我们一生都在努力理解一个混乱和不可预测的世界，在这个世界里，任何建立控制的尝试似乎都只会适得其反。热力学第二定律断言机器永远不可能达到完美效率，这意味着无论宇宙中结构何时涌现，它最终都只会进一步耗散能量——无论是最终爆炸的恒星，还是将食物转化为热量的生物体。尽管我们的意图是好的，但我们是熵的代理人。
“除了死亡、税收和热力学第二定律之外，生活中没有什么是确定的”，麻省理工学院的物理学家Seth Lloyd写道。这个指示是无法回避的。熵的增长与我们最基本的经历深深交织在一起，解释了为什么时间向前发展，以及为什么世界看起来是确定性的，而不是量子力学上的不确定性。
尽管具有根本的重要性，熵却可能是物理学中最具争议的概念。“熵一直是个问题，”Lloyd告诉我。这种困惑，部分源于这个词在学科之间“辗转反侧”的方式——从物理学到信息论再到生态学，它在各个领域都有相似但不同的含义。但这也正是为何，要真正理解熵，就需要实现一些令人深感不适的哲学飞跃。
在过去的一个世纪里，随着物理学家努力将迥异的领域整合起来，他们用新的视角看待熵——将显微镜重新对准先知，将无序的概念转变为无知的概念。熵不被视为系统固有的属性，而是相对于与该系统交互的观察者的属性。这种现代观点阐明了信息和能量之间的深层联系，现在他正在帮助引领最小尺度上一场微型工业革命。
在熵的种子被首次播下200年后，关于这个量的理解从一种虚无主义转为机会主义。观念上的进化正在颠覆旧的思维方式，不仅仅是关于熵，还是关于科学的目的和我们在宇宙中的角色。
熵的概念源于工业革命期间对双面印刷机的尝试。一位名叫萨迪·卡诺（Sadi Carnot）的28岁法国军事工程师着手计算蒸汽动力发动机的最终效率。1824年，他出版了一本118页的书，标题为《对火的原动力的反思》，他在塞纳河畔以3法郎的价格出售。卡诺的书在很大程度上被科学界所忽视，几年后他死于霍乱。他的尸体被烧毁，他的许多文件也被烧毁了。但他的书的一些副本留存了下来，其中藏着一门新科学“热力学”的余烬——火的原动力。
卡诺意识到，蒸汽机的核心是一台利用热量从热物体流向冷物体的趋势的机器。他描绘了可以想象到的最高效的发动机，对可以转化为功的热量比例建构了一个界限，这个结果现在被称为卡诺定理。他最重要的声明是这本书最后一页的警告：“我们不应该期望在实践中利用可燃物的所有动力”。一些能量总是会通过摩擦、振动或其他不需要的运动形式来耗散。完美是无法实现的。
几十年后，也就是1865年，德国物理学家鲁道夫·克劳修斯（Rudolf Clausius）通读了卡诺的书，他创造了一个术语，用于描述被锁在能量中无法利用的比例。他称之为“熵”（entropy），以希腊语中的转换一词命名。然后，他提出了后来被称为热力学第二定律的东西：“宇宙的熵趋于最大”。
那个时代的物理学家错误地认为热是一种流体[称为“热质”（caloric）]。在接下来的几十年里，他们意识到热量是单个分子碰撞的副产品。这种视角的转变使奥地利物理学家路德维希·玻尔兹曼（Ludwig Boltzmann）能够使用概率重新构建并深化熵的概念。
玻尔兹曼将分子的微观特性（例如它们的各自位置和速度）与气体的宏观特性（如温度和压力）区分开来。考虑一下，不是气体，而是棋盘上的一组相同的游戏棋子。所有棋子的精确坐标列表就是玻尔兹曼所说的“微观状态”，而它们的整体配置——比如说，无论它们形成一个星形，还是全部聚集在一起——都是一个“宏观态”。玻尔兹曼根据产生给定宏观状态的可能微观状态的数量，来定义该宏观状态的熵。高熵宏观状态是具有许多相容的微观状态的宏观状态——许多可能的棋盘格排列，产生相同的整体模式。
棋子可以呈现看起来有序的特定形状的方式只有这么多，而它们看起来随机散布在棋盘上的方式要多得多。因此，熵可以被视为无序的度量。第二定律变成了一个直观的概率陈述：让某物看起来混乱的方式比干净的方式更多，因此，当系统的各个部分随机地在不同可能的配置之间切换时，它们往往会呈现出看起来越来越凌乱的排列。
卡诺发动机中的热量从热流向冷，是因为气体颗粒更有可能全部混合在一起，而不是按速度分离——一侧是快速移动的热颗粒，另一侧则是移动缓慢的冷颗粒。同样的道理也适用于玻璃碎裂、冰融化、液体混合和树叶腐烂分解。事实上，系统从低熵状态移动到高熵状态的自然趋势似乎是唯一可靠地赋予宇宙一致时间方向的东西。熵为那些本可以反向发生的进程刻下了时间箭头。
熵的概念最终将远远超出热力学的范围。艾克斯-马赛大学的物理学家Carlo Rovelli说，“当卡诺写他的论文时……我认为没有人想象过它会带来什么”。
扩展熵
熵在第二次世界大战期间经历了重生。美国数学家克劳德·香农（Claude Shannon）正在努力加密通信通道，包括连接富兰克林·罗斯福（Franklin D. Roosevelt）和温斯顿·丘吉尔（Winston Churchill）的通信通道。那次经历使他在接下来的几年里深入思考了通信的基本原理。香农试图测量消息中包含的信息量。他以一种迂回的方式做到这一点，将知识视为不确定性的减少。
乍一看，香农想出的方程式与蒸汽机无关。给定信息中的一组可能字符，香农公式将接下来出现哪个字符的不确定性定义为每个字符出现的概率之和乘以该概率的对数。但是，如果任何字符的概率相等，则香农公式会得到简化，并变得与玻尔兹曼的熵公式完全相同。据说物理学家约翰·冯·诺伊曼（John von Neumann）敦促香农将他的量称为“熵”——部分原因是它与玻尔兹曼的量非常一致，也因为“没有人知道熵到底是什么，所以在辩论中你总是占优势”。
正如热力学熵描述发动机的效率一样，信息熵捕捉到通信的效率。它与弄清楚消息内容所需的是或否问题的数量相对应。高熵消息是无模式的消息；由于无法猜测下一个角色，这条信息需要许多问题才能完全揭示。具有大量模式的消息包含的信息较少，并且更容易被猜到。“这是一幅非常漂亮的信息和熵环环相扣的画面，”Lloyd说。“熵是我们不知道的信息；信息是我们所知道的信息”。
在1957年的两篇具有里程碑意义的论文中，美国物理学家E.T.Jaynes通过信息论的视角来观察热力学，巩固了这一联系。他认为热力学是一门从粒子的不完整测量中做出统计推断的科学。Jaynes提议，当知道有关系统的部分信息时，我们应该为与这些已知约束相容的每个配置分配相等的可能性。他的“最大熵原理”为对任何有限数据集进行预测提供了偏差最小的方法，现在应用于从统计力学到机器学习和生态学的任何地方。
因此，不同背景下发展起来的熵的概念巧妙地结合在一起。熵的增加对应于有关微观细节的信息的损失。例如，在统计力学中，当盒子中的粒子混合在一起，我们失去了它们的位置和动量时，“吉布斯熵”会增加。在量子力学中，当粒子与环境纠缠在一起，从而扰乱它们的量子态时，“冯·诺伊曼熵”就会增加。当物质落入黑洞，有关它的信息丢失到外部世界时，“贝肯斯坦-霍金熵”就会增加。
熵始终衡量的是无知：缺乏关于粒子运动、一串代码中的下一个数字或量子系统的确切状态的知识。“尽管引入熵的动机各不相同，但今天我们可以将它们都与不确定性的概念联系起来，”瑞士苏黎世联邦理工学院的物理学家Renato Renner说。
然而，这种对熵的统一理解引发了一个令人不安的担忧：我们在谈论谁的无知？
一点主观性
作为意大利北部的一名物理学本科生，Carlo Rovelli从他的教授那里了解了熵和无序的增长。有些事情不对劲。他回到家，在一个罐子里装满油和水，看着液体在他摇晃时分离——这似乎与所描述的第二定律背道而驰。“他们告诉我的东西都是胡说八道，”他回忆起当时的想法。“很明显，教学方式有问题。”
Rovelli的经历抓住了熵如此令人困惑的一个关键原因。在很多情况下，秩序似乎会增加，从孩子打扫卧室到冰箱给火鸡降温。
Rovelli明白，他对第二定律的表面胜利不过是海市蜃楼。具有强大热视觉能力的超人观察者会看到油和水的分离如何向分子释放动能，从而留下更加热无序的状态。“真正发生的事情是，宏观秩序的形成是以微观无序为代价的，”Rovelli说。第二定律始终成立；有时只是看不见。
在Gibbs提出这个悖论一个多世纪后，Jaynes提出了解决方法（他坚称吉布斯已经理解了，但未能清楚地表达出来）。想象一下，盒子里的气体是两种不同类型的氩气，它们相同，只是其中一种可溶于一种称为whifnium的尚未发现的元素中。在发现whifnium之前，没有办法区分这两种气体，因此抬起分流器不会引发明显的熵变化。然而，在whifnium被发现后，一位聪明的科学家可以使用它来区分两种氩物种，计算出熵随着两种类型的混合而增加。此外，科学家可以设计一种基于whifnium的活塞，利用以前无法从气体的自然混合中获得的能量。
Jaynes 明确指出，系统的“有序性”——以及从中提取有用能量的潜力——取决于代理人的相对知识和资源。如果实验者无法区分气体A和B，那么它们实际上是相同的气体。一旦科学家们有办法区分它们，他们就可以通过开发气体混合的趋势来利用功。熵不取决于气体之间的差异，而是取决于它们的可区分性。无序在旁观者的眼中。
“我们可以从任何系统中提取的有用功，显然也必然取决于我们拥有多少关于其微观状态的’主观’信息，”Jaynes写道。
吉布斯悖论强调需要将熵视为一种观察属性，而不是系统固有的属性。然而，熵的主观视图是难以被物理学家接受的。正如科学哲学家肯尼斯·登比（Kenneth Denbigh）1985年在书中写道，“这样的观点，如果它是有效的，将产生一些深刻的哲学问题，并往往会破坏科学事业的客观性”。
接受熵的这个有条件的定义需要重新思考科学的根本目的。这意味着物理学比某些客观现实更准确地描述了个人经验。通过这种方式，熵被卷入了一个更大的趋势，即科学家们意识到许多物理量只有在与观察者有关时才有意义（甚至时间本身也被爱因斯坦的相对论所重新渲染）。“物理学家不喜欢主观性——他们对它过敏”，加州大学圣克鲁斯分校的物理学家Anthony Aguirre 说，“但没有绝对的——这一直都是一种幻觉。”
现在人们已经接受了这种认知，一些物理学家正在探索将主观性融入熵的数学定义的方法。
Aguirre和合作者设计了一种新度量，称之为观测熵（observational entropy）。它提供了一种方法，通过调整这些属性如何模糊或粗粒度化观察者对现实的看法，来指定观察者可以访问哪些属性。然后，它为与这些观察到的特性相容的所有微观状态赋予相等的概率，就像 Jaynes 所提出的那样。该方程将热力学熵（描述广泛的宏观特征）和信息熵（捕获微观细节）连接起来。“这种粗粒化的、部分主观的观点是我们有意义的与现实互动的方式，”Aguirre说。
许多独立团体使用 Aguirre 的公式来寻求第二定律更严格的证明。就Aguirre而言，他希望用他的度量来解释为什么宇宙一开始是低熵状态（以及为什么时间向前流动）并更清楚地了解黑洞中熵的含义。“观测熵框架提供了更清晰的信息”，巴塞罗那自治大学的物理学家Philipp Strasberg说，他最近将其纳入了不同微观熵定义的比较。“它真正将玻尔兹曼和冯·诺伊曼的思想与当今人们的工作联系起来。”
与此同时，量子信息理论家采取了不同的方法处理主观性。他们将信息视为一种资源，观察者可以使用它来跟日益与环境融合在一起的系统进行交互。对于一台可以跟踪宇宙中每个粒子的确切状态的具有无限能力的超级计算机来说，熵将始终保持不变——因为不会丢失任何信息——时间将停止流动。但是，像我们这样拥有有限计算资源的观察者总是不得不与粗略的现实图景作斗争。我们无法跟踪房间内所有空气分子的运动，因此我们以温度和压力的形式取平均值。随着系统演变成更可能的状态，我们逐渐失去了对微观细节的跟踪，而这种持续的趋势随着时间的流逝而成为现实。“物理学的时间，归根结底，是我们对世界无知的表现”，Rovelli写道。无知构成了我们的现实。
“外面有一个宇宙，每个观察者都带着一个宇宙——他们对世界的理解和模型”，Aguirre说。熵提供了我们内部模型中缺点的度量。他说，这些模型“使我们能够做出良好的预测，并在一个经常充满敌意但总是困难的物理世界中明智地采取行动。
以知识为驱动
2023年夏天，通过Aguirre于2006年共同创立的一个名为Foundational Questions Institute（FQxI）的非营利研究组织，在英国约克郡一座历史悠久的豪宅庄园连绵起伏的山脚下，Aguirre主持了一次闭门研讨会（retreat）。来自世界各地的物理学家齐聚一堂，参加为期一周的智力安睡派对，并有机会进行瑜伽、冥想和野外游泳。该活动召集了获得FQxI资助的研究人员，以探讨如何使用信息作为燃料。
对于这些物理学家中的许多人来说，对发动机和计算机的研究已经变得模糊不清。他们已经学会了将信息视为真实的、可量化的物理资源，即一种可以诊断从系统中提取多少功的手段。他们意识到，知识就是力量（power）。现在，他们开始着手利用这种力量。
一天早上，在庄园的蒙古包里参加了一次可选的瑜伽课程后，这群人聆听了Susanne Still（夏威夷大学马诺阿分校的物理学家）。她首先讨论了一项新工作，针对可以追溯到一个世纪前，由匈牙利出生的物理学家利奥·西拉德（Leo Szilard）所提出的思想实验：
想象一个带有垂直分隔线的盒子，该分隔线可以在盒子的左右壁之间来回滑动。盒子中只有一个粒子，位于分隔线的左侧。当粒子从壁上弹开时，它会将分隔器向右推。一个聪明的小妖可以装配一根绳子和滑轮，这样，当分隔器被粒子推动时，它会拉动绳子并在盒子外举起一个重物。此时，小妖可以偷偷地重新插入分隔器并重新启动该过程——实现明显的无限能量源。
然而，为了始终如一地开箱即用，恶魔必须知道粒子在盒子的哪一侧。西拉德的引擎由信息提供动力。
原则上，信息引擎有点像帆船。在海洋上，利用你对风向的了解来调整你的帆，推动船向前行进。
但就像热机一样，信息引擎也从来都不是完美的。他们也必须以熵生产的形式纳税。正如西拉德和其他人所指出的，我们不能将信息引擎用作永动机的原因是，它平均会产生至少同样多的熵来测量和存储这些信息。知识产生能量，但获得并记住知识会消耗能量。
在西拉德构思他的引擎几年后，阿道夫·希特勒成为德国总理。出生于犹太家庭并一直居住在德国的西拉德逃离了。他的著作几十年来一直被忽视，直到最终被翻译成英文，正如Still在最近的一篇信息引擎历史回顾中所述。
最近，通过研究信息处理的基本要素，Still成功地扩展并泛化了西拉德的信息引擎概念。
十多年来，她一直在研究如何将观察者本身视为物理系统，受其自身物理限制的约束。趋近这些限制的程度不仅取决于观察者可以访问的数据，还取决于他们的数据处理策略。毕竟，他们必须决定要测量哪些属性以及如何将这些细节存储在有限的内存中。
在研究这个决策过程时，Still发现，收集无助于观察者做出有用预测的信息会降低他们的能量效率。她建议观察者遵循她所说的“最小自我障碍原则”——选择尽可能接近他们物理限制的信息处理策略，以提高他们决策的速度和准确性。她还意识到，这些想法可以通过将它们应用于修改后的信息引擎来进一步探索。
在西拉德的原始设计中，小妖的测量完美地揭示了粒子的位置。然而，在现实中，我们从来没有对系统有完美地了解，因为我们的测量总是有缺陷的——传感器会受到噪声的影响，显示器的分辨率有限，计算机的存储空间有限。Still展示了如何通过对西拉德的引擎进行轻微修改来引入实际测量中固有的“部分可观测性”——基本方法是通过更改分隔线的形状。
想象一下，分隔线在盒子内以一定角度倾斜，并且用户只能看到粒子的水平位置（也许他们看到它的阴影投射到盒子的底部边缘）。如果阴影完全位于分隔线的左侧或右侧，则可以确定粒子位于哪一侧。但是，如果阴影位于中间区域的任何位置，则粒子可能位于倾斜分隔线的上方或下方，因此位于盒子的左侧或右侧。
使用部分可观测的信息引擎，Still计算了测量粒子位置并在内存中对其进行编码的最有效策略。这导致了一种纯粹基于物理的算法推导，该算法目前也用于机器学习，称为信息瓶颈算法（information bottleneck algorithm）。它提供了一种通过仅保留相关信息来有效压缩数据的方法。
从那时起，和她的研究生Dorian Daimer一起，Still研究了改进的西拉德引擎的多种不同设计，并探索了各种情况下的最佳编码策略。这些理论设备是“在不确定性下做出决策的基本组成部分”，拥有认知科学和物理学背景的Daimer说。“这就是为什么研究信息处理的物理学对我来说如此有趣，因为在某种意义上，这是一种完整的循环，最终回归到对科学家的描述。”
重新工业化
尽管如此，他并不是约克郡唯一一个梦想西拉德引擎的人。近年来，许多FQxI受资助者在实验室中开发了有功能性的引擎，其中信息用于为机械设备提供动力。与卡诺的时代不同，没有人期望这些微型发动机为火车提供动力或赢得战争；相反，它们正在充当探测基础物理学的试验台。但就像上次一样，信息引擎正在迫使物理学家重新构想能量、信息和熵的含义。
在Still的帮助下，John Bechhoefer已经用漂浮在水浴中的比尘埃还小的二氧化硅珠重新创建了西拉德的引擎。他和加拿大西蒙弗雷泽大学的同事用激光捕获硅珠并监测其随机热波动。当硅珠碰巧向上晃动时，它们会迅速抬起激光阱以利用其运动。正如西拉德所想象的那样，他们通过利用信息的力量成功地提起了重量。
在调查从他们的真实世界信息引擎中提取功的限制时，Bechhoefer和Still发现，在某些状态下，它可以显著跑赢传统发动机。受到Still理论工作的启发，他们还追踪了接收部分低效信息的硅珠的状态。
在牛津大学物理学家Natalia Ares的帮助下，信息引擎现在正在缩小到量子尺度，她曾与Still一同参加了闭门研讨会。在与杯垫大小相当的硅芯片上，Ares将单个电子困在一根细碳纳米线内，该纳米线悬挂在两根支柱之间。这个“纳米管”被冷却至接近绝对零度的千分之一，像吉他弦一样振动，其振荡频率由内部电子的状态决定。通过追踪纳米管的微小振动，Ares和她的同事计划诊断不同量子现象的功输出。
Ares在走廊的黑板上写满了许多实验计划，旨在探测量子热力学。“这基本上就是整个工业革命的缩影，但尺度是纳米级的，”她说。一个计划中的实验灵感来源于Still的想法。实验内容涉及调整纳米管的振动与电子（相对于其他未知因素）的依赖程度，本质上为调整观察者的无知提供了一个“旋钮”。
Ares和她的团队正在探索热力学在最小尺度上的极限——某种意义上，是量子火焰（quantum fire）的驱动力。经典物理中，粒子运动转化为有用功的效率限制由卡诺定理设定。但在量子领域，由于有多种熵可供选择，确定哪个熵将设定相关界限变得更加复杂——甚至如何定义功输出也是一个问题。“如果我们像实验中那样只有一个电子，那熵意味着什么？”Ares说道。“根据我的经验，我们仍然在这方面非常迷茫。”
最近一项由美国国家标准与技术研究院（NIST）的物理学家Nicole Yunger Halpern主导的研究表明，通常被视为同义的熵生成的常见定义，在量子领域中可能会出现不一致，这再次出于不确定性和观察者依赖性。在这个微小的尺度上，不可能同时知道某些属性。而你测量某些量的顺序也会影响测量结果。Yunger Halpern认为，我们可以利用这种量子奇异性来获取优势。“在量子世界中，有一些经典世界中没有的额外资源，所以我们可以绕过卡诺定理，”她说道。
Ares正在实验室中推动这些新的边界，希望为更高效的能源收集、设备充电或计算开辟道路。这些实验也可能为我们所知道的最有效的信息处理系统——我们自己——的机制提供一些洞见。科学家们不确定人脑是如何在仅仅消耗20瓦电力的情况下，执行极其复杂的脑力运动的。也许，生物学计算效率的秘诀也在于利用小尺度上的随机波动，而这些实验旨在探测任何可能的优势。“如果在这方面有某些收获，自然界也许实际上利用了它，”与Ares合作的埃克塞特大学理论学家Janet Anders说道。“我们现在正在发展的这种基础理解，或许能帮助我们未来更好地理解生物是如何运作的。”
Ares的下一轮实验将在她位于牛津实验室的一个粉色的制冷室中进行。几年前，她开玩笑地向制造商提出了这个外观改造的建议，但他们警告说，金属涂料颗粒会干扰她的实验。然后，公司偷偷将冰箱送到汽车修理厂，给它覆盖了一层闪亮的粉色薄膜。Ares将她的新实验场地视为时代变革的象征，反映了她对这场新的工业革命将与上一场不同的希望——更加有意识、环保和包容。
“感觉就像我们正站在一个伟大而美好的事物的起点，”她说道。
拥抱不确定性
2024年9月，几百名研究人员聚集在法国帕莱佐，为纪念卡诺（Carnot）其著作出版200周年而举行的会议上。来自各个学科的参与者讨论了熵在各自研究领域中的应用，从太阳能电池到黑洞。在欢迎辞中，法国国家科学研究中心的一位主任代表她的国家向卡诺道歉，承认忽视了卡诺工作的重要影响。当天晚上，研究人员们在一个奢华的金色餐厅集合，聆听了一首由卡诺的父亲创作、由一支四重奏演奏的交响乐，其中包括这位作曲家的远亲后代。
卡诺的深远见解源于试图对时钟般精确的世界施加极致控制的努力，这曾是理性时代的圣杯。但随着熵的概念在自然科学中逐渐扩展，它的意义发生了变化。熵的精细理解抛弃了对完全效率和完美预测的虚妄梦想，反而承认了世界中不可减少的不确定性。“在某种程度上，我们正朝着几个方向远离启蒙时期，”Rovelli说——远离决定论和绝对主义，转向不确定性和主观性。
无论你愿不愿意接受，我们都是第二定律的奴隶；我们无法避免地推动宇宙走向终极无序的命运。但我们对熵的精细理解让我们对未来有了更为积极的展望。走向混乱的趋势是驱动所有机器运作的动力。虽然有用能量的衰减限制了我们的能力，但有时候换个角度可以揭示隐藏在混沌中的秩序储备。此外，一个无序的宇宙正是充满了更多的可能性。我们不能规避不确定性，但我们可以学会管理它——甚至或许能拥抱它。毕竟，正是无知激励我们去追求知识并构建关于我们经验的故事。换句话说，熵正是让我们成为人类的原因。
你可以对无法避免的秩序崩溃感到悲叹，或者你可以将不确定性视为学习、感知、推理、做出更好选择的机会，并利用你身上蕴藏的动力。

2024-12-20
薛其坤：探究微观量子世界

本文系讲演稿整理而得
欧姆定律是接近200年前，由德国物理学家欧姆提出的一个非常经典的电学规律，它说的是通过一个导体，导体的电阻与加在导体两端的电压差成正比，与流过这个导体的电流成反比。大家都非常熟悉。换一句话来说，流过这个导体的电流正比于加在这个导体两端的电压，反比于这个材料的电阻。这个材料的电阻越大，它越绝缘；在额定的电压下，它的电流就越小。
欧姆定律讲的是沿着电流流动方向关于电压、电阻、电流基本关系的科学规律。我们很好奇，自然就想问“在垂直于电流流动的方向上，是不是也会有类似欧姆定律关于电流、电压、电阻关系的东西呢？”答案：“是！”
这就是欧姆定律提出50多年以后，在1879年由美国物理学家埃德温霍尔发现的霍尔效应。霍尔效应实验是一个非常精妙的实验，他把这个导线变成了这样一个平板，当时用的材料是金。在垂直于这个金的平板方向上，再加一个磁场，当然沿着电流流动的方向仍然有欧姆定律的存在。但是由于这个磁场下，流动的电子受到洛伦兹力的作用，它会在垂直于电流的方向也发生偏转。
在这样一个磁场下，电流除了欧姆定律方向的电流在流动以外，电子还在横向发生偏转，形成电荷的积累，形成电压。这个电压就叫霍尔电压，这个现象就是霍尔效应。加一个磁场就可以产生霍尔效应，那么我们自然想问，是不是不需要磁场也能实现这样一个非常伟大的霍尔效应呢？答案也是“是”！
他发现霍尔效应一年以后，就做了这样一个试验，把材料金换成铁，靠铁本身的磁性产生的磁场，也发现了类似的霍尔效应。因为科学机理完全不一样，命名为反常霍尔效应。
不管怎么样，霍尔效应、反常霍尔效应是非常经典的电磁现象之一。为什么呢？它用一个非常简单的科学实验、科学装置就把电和磁这两个非常不一样的现象在一个装置上完成了。
当然了，霍尔效应非常有用。今天我给大家列举了一些大家非常熟悉的例子。比如测量电流的电流钳，我们读取信用卡的磁卡阅读器，汽车的速度计，这都是霍尔效应的应用。它已经遍布在我们生活的每一个方面，是一个极其伟大的科学发现，同时对我们社会技术进步带来了极大的便利。
这不是这个故事的结束。100年以后，德国物理学家冯·克利青把研究的材料从金属变成半导体硅，结果他就发现了量子霍尔效应，或者说霍尔效应的量子版本。他用了一个具体材料，就是我们熟知的每一个计算机、每一个芯片都有的场效应晶体管。这个场效应晶体管中有硅和二氧化硅的分界面，在这个界面上有二维电子气。就是在这样一个体系中，在半导体材料中，他发现了量子霍尔效应。
在强磁场下，冯·克利青先生发现了霍尔电阻，右边这个公式，=h/ne²，h是以普朗克科学家命名的一个常数，是一个自然界的物理学常数。n是自然数——1、2、3、4、5。e就是一个电子带的电量，这是一个非常伟大的发现。为什么呢？我一说就明白，因为测到的霍尔电阻和研究的材料没有任何的关系。硅，可能任何材料都会有这个，它只和物理学常数，和自然界的一些基本性能相关，和具体材料没有任何关系。因此它就打开了我们认识微观世界、认识自然界的大门。
同时，量子霍尔效应给我们材料中运动的电子建造了一个高速公路，就像左边大家看到的动画一样，电子的高速公路上，它的欧姆电阻，平行于电流方向的电阻变成0，像超导一样。因此，用量子霍尔效应这样的材料做一个器件的话，它的能耗会非常低。
大家今天看到的是两条道的情况，是n=2。如果n=3，这个高速公路的一边就有3条道；如果n=4，电子的高速公路就变成4条道，所以这样一种理解就把自然数n，1、2、3、4、5、6、7、8和微观世界的电子高速公路密切结合起来。大家可以看到，我们对自然界的理解，对量子世界的理解又大大前进了一步。
冯·克利青在1980年发现量子霍尔效应以后，由于这个巨大的科学发现，五年以后他被授予诺贝尔物理学奖。
硅有量子霍尔效应，是不是其他半导体材料也会有量子霍尔效应呢？有三位物理学家在第二年，1982年就把研究的材料从硅变成了可以发光的砷化镓，结果，他们发现了分数化的，不是一二三四了，三分之一、五分之一，分数化的量子霍尔效应，1998年这三位物理学家获得诺贝尔物理学奖。
在我们这个世纪，大家都知道石墨烯，有两位物理学家利用石墨烯这个量子材料继续做一百年前的霍尔效应实验，结果发现了半整数的量子霍尔效应。随着量子霍尔效应的不断发现，我们对自然界，对材料，对量子材料，对未来材料的理解在电子层次上、在量子层次上逐渐加深，所以推动了科学，特别是物理学的巨大进步。
量子霍尔效应有很多应用，今天我讲一个大家比较熟悉的应用，那就是重量的测量。我们每天都希望测测体重，重量的测量无处不存在。1889年国际度量衡大会定义了公斤千克的标准，是9:1的铂铱合金做成的圆柱体，以后的一百多年，全世界都用这个做为标准称重量。
但是在118年以后的2007年，我们发现这个标准变化了：减轻了50微克。一个标准减少50微克是一个巨大的变化，全世界的标准就不再标准了，而且随着时间的推移也会进一步变化。因此我们需要更精确，可以用得更久的重量标准。
在2018年的时候，国际度量衡大会重新定义了公斤的标准，那就是基于刚才我提到的量子霍尔效应，和另一个诺奖工作、约瑟夫森效应提出了一个全新的称，叫量子称或者叫基布尔称，它对重量的测量精度可以达到10的负8次方克，而且是由物理学的自然界常数所定义的，1万年、10万年、1亿年也不会发生变化。这是我举的一个大家能理解的例子。
刚才我提到了三个不同版本的量子霍尔效应。它们需要一个磁场，就像霍尔效应一样，而且一般情况下需要的磁场都特别强，一般是10个特斯拉，10万个高斯，这是非常强大的磁场，我们庞大地球产生的磁场只有0.5高斯，我们要用的磁场是地球磁场强度的20万倍。能不能把它去掉磁场也能观察到量子霍尔效应呢？我带领的团队与合作者一起，在2013年的时候完成了这个实验，在世界上首次发现了不需要任何磁场、只需要材料本身的磁性而导致的量子霍尔效应，或者叫量子反常霍尔效应。
这样一个发现是不是也是材料驱动的呢？是的。我在这里给大家复习一下我们所熟悉的材料。在我们一般人的概念中，我们自然界的材料只有3类，导电的金属，不导电的绝缘体，还有一个是半导体，介于两者之间。
第一代半导体有硅、锗，第二代半导体有砷化镓、锑化汞，第三代、第四代还有氮化镓、碳化硅、金刚石等等。在研究材料和材料的相变基础上，包括量子霍尔效应上，有两个物理学家，一个是大家可能比较熟悉的华人物理学家张首晟，和宾夕法尼亚大学的Charles Kane，在这基础上他们提出了一个全新的材料：拓扑绝缘体，也就是大家在屏幕的最右边所能看到的。
什么是拓扑绝缘体？我给大家简单解释一下。这个图大家可能比较熟悉，最左边是一个陶瓷的碗，是绝缘的、不导电的。再朝右是一个金做成的碗，是导电的，叫导体。拓扑绝缘体就是一个陶瓷碗镀了一层导电的膜。如果把这个镀了膜的碗进一步进行磁性掺杂，使它有磁性的话，它就会变成一个只有边上镀金的碗。这个边上镀金碗就叫磁性拓扑绝缘体材料。
按照张首晟等的理论，它就可以让我们能观察到量子反常霍尔效应。但是，这个材料是一个三不像的矛盾体：它有磁性，它要拓扑，它还要绝缘，我们还要把它做成薄膜，这就要求一个运动员篮球打得像姚明那么好，跑步像博尔特那么快，跳水要全红蝉那么伶俐，这样的材料非常难以制备。为什么呢？因为大部分磁性材料都是导电的，铁、钴、镍都是导电的；另外，磁性和拓扑在物理上是很难共存的；还有一点，在两维薄膜的情况下，很难实现铁磁性，使这个才有真正的磁性。因此真正观测到量子反常霍尔效应，在实验室看到它，这是一个极其具有挑战性的实验。
我带领的团队和另外三个团队紧密合作，我们动员了20多位研究生，奋斗了4年，尝试了一千多个样品，最后在2012年10月份，全部完成了量子反常霍尔效应发现，完成了实验。我们证明了确实在边上镀金的碗（磁性拓扑绝缘体）中，存在量子反常霍尔效应这样一个新的规律。
今天，我特别把当时发现量子反常霍尔效应的样品带到了现场。大家可以看到，看到很多电级，电级之间有方块，每个方块上就是首先观察到的量子反常霍尔效应的样品。
这里我再给大家讲一下制备这个材料，对原子磁场的控制，对科学发现非常重要。这是其中一个例子，我们学生制备的，采集的一些照片。中间大家会看到，拓扑绝缘体碲化铋薄膜的扫描隧道显微镜照片，上头每一个亮点代表一个原子，更重要的是，在这个范围内你找不到一个缺陷。说明我们材料的纯度非常高，我们在其他材料中也能做到这个水平。
这是另一个拓扑绝缘体材料：硒化铋。大家可以看到，这么大的范围内，你只看到你想要的原子，没有任何缺陷，而且薄膜是原子级的平整，这为我们最后发现量子反常霍尔效应奠定了非常好的基础。
最近，我们继续在朝这个方向努力，我们正在攻克的一个问题就是高温超导机理这个重大科学问题。我再次放了博士后制备的研究高温超导机理异质结样品的电镜照片，大家从上可以看到有5个样品，不同的颜色代表这个异质结的结合部。大家可以看到，每个亮点几乎是接近一个原子，我们制备的异质结，两个材料的结合部几乎达到了原子尺度的完美，只有这样，我们才能在这样一个非常难以攻克的高温超导机理上有所作为，我们会沿着这个方向继续努力下去。

2024-12-12
最优学习的85%规则
文章原题目：The Eighty Five Percent Rule for Optimal Learning
论文地址：https://www.biorxiv.org/content/10.1101/255182v1
1. “恰到好处”——学习的迷思
人们在学习新技能时，例如语言或乐器，通常会觉得在能力边界附近进行挑战时感觉最好——不会太难以至于气馁，也不会太容易以至于感到厌烦。
历史传统中有所谓的中庸原则，我们也会有一种简单直觉经验，即做事要“恰到好处”。反映在学习中，即存在一个困难程度的“甜蜜点”，一个“金发姑娘区”。在现代教育研究中，在这个区域的不仅教学最有效果[1]，甚至能解释婴儿在更多更少可学习刺激之间的注意力差异[2]。
在动物学习研究文献中，这个区域是“兴奋”[3]和“失落”[4]背后的原因，通过逐步增加训练任务的难度，动物才得以学习越来越复杂的任务。
在电子游戏中几乎普遍存在的难度等级设置中，也可以观察到这一点，即玩家一旦达到某种游戏水平，就会被鼓励、甚至被迫进行更高难度水平的游戏。
类似地，在机器学习中，对于各种任务进行大规模神经网络训练，不断增加训练的难度已被证明是有用的 [5,6]，这被称为“课程学习”（Curriculum Learning）[7] 和“自步学习”（Self-Paced Learning）[8]。
尽管这些历史经验有很长的历史，但是人们一直不清楚为什么一个特定的难度水平就对学习有益，也不清楚最佳难度水平究竟是多少。
在这篇论文中，作者就讨论了在二分类任务的背景下，一大类学习算法的最佳训练难度问题。更具体而言，论文聚焦于基于梯度下降的学习算法。在这些算法中，模型的参数（例如神经网络的权重）基于反馈进行调整，以便随时间推移降低平均错误率[9]，即降低了作为模型参数函数误差率的梯度。
这种基于梯度下降的学习构成了人工智能中许多算法的基础，从单层感知器到深层神经网络[10]，并且提供了从感知[11]，到运动控制[12]到强化学习[13]等各种情况下人类和动物学习的定量描述。对于这些算法，论文就训练的目标错误率提供了最佳难度的一般结果：在相当温和的假设下，这一最佳错误率约为15.87%，这个数字会根据学习过程中的噪音略有不同。
论文从理论上表明，在这个最佳难度下训练可以导致学习速度的指数级增长，并证明了“85%规则”在两种情况下的适用性：一个简单的人工神经网络：单层感知机，以及一个更复杂、用来描述人类和动物的感知学习[11]的类生物神经网络（biologically plausible network）。
2. 计算最优学习率
在标准的二分类任务中，人、动物或机器学习者需要输入的简单刺激做出二元标签分类判断。
例如，在心理学和神经科学[15,16]的随机点动实验（Random Dot Motion）范例中，刺激由一片移动的点组成 – 其中大多数点随机移动，但有一小部分连贯一致地向左或向右移动。受试者必须判断相应一致点的移动方向。
决定任务感知判断难度的一个主要因素是一致移动点所占的比例。如下图所示，一致点占0%时显然最难，100 %时最容易，在 50%时难度居中。
实验人员可以在训练过程中使用被称为“阶梯化”（staircasing）的程序[17]控制这些一致移动点的比例以获得固定的错误率。
论文假设学习者做出的主观决策为变量 h，由刺激向量 x（如所有点的运动方向) 的经函数 Φ 计算而来，即：h = Φ(x, φ) ——(1)，其中φ是可变参数。并假设变换过程中，会产生一个带噪声表示的真实决策变量Δ（例如，向左移动点的百分比），即又有 h = ∆ + n ——(2)。
噪声 n 由决策变量的不完全描述而产生的，假设 n 是随机的，并从标准偏差σ的零均值正态分布中采样。设 Δ = 16，则主观决策变量 p(h) 的概率分布如图1A所示。
红色曲线是学习之后新的曲线，可以看到其分布标准差σ比原来有所降低，使更多变量分布在了Δ=16 附近。这就说明学习者在学习之后决策准确度有所提高。曲线下方的阴影区域面积（积分）对应于错误率，即在每个难度下做出错误响应的概率。
如果把决策界面（decision boundary）设置为 0，当 h > 0 时模型选择选项 A，当 h < 0 时选择 B， h = 0 时随机选择。那么由带噪声表示的决策变量导致的错误概率分布为：
其中 F(x) 是噪声标准正态分布的累积分布函数，概率密度函数 p(x)= p(x|0,1)。由等式(3)可以得到β = 1/σ。即若 σ 为正态分布的标准差，则 β 精确表示了在任务难度Δ下学习者的技能水平。σ越小，β越大，技能水平越高。
如图1B所示，无论学习前还是学习后，随着决策变得更容易（Δ增加），两条曲线皆趋于下降，从而使错误率变得更低。
但两条曲线的下降速度是不一样的：当β增加（σ变小）后，曲线更集中和陡峭，因此学习之后的红色曲线下降速度也更快，这表示学习者对任务挑战的技能水平越趋于完善。
由最初的公式(1) 可知，学习的目标是调整参数φ，使得主观决策变量 h 更好地反映真实决策变量Δ。即构建模型的目标应该是尽量去调整参数φ以便减小噪声 σ 的幅度，或者等效地去增加技能水平精度 β。
实现这种调节的一种方法是使用误差率函数的梯度下降来调整参数。例如，根据时间 t 来改变参数。论文在将梯度转换为精度β的表示后，发现影响因子只在于最大化学习率 ∂ER/∂β 的值，如图1C所示。显然，最佳难度Δ随着技能水平精度β的函数 dER/dβ 而变化，这意味着必须根据学习者的技能水平实时调整学习难度。不过，通过Δ和ER之间的单调关系（图1B），能够对此以误差率ER来表达最佳难度，于是可以得到图1D。
在变换后，以误差率表达的最佳难度是一个精度函数的常量。这意味着可以通过在学习期间将误差率钳制在固定值上实现最佳学习。论文通过计算得出，对于高斯分布的噪声这个固定值是：
——即最佳学习率下误差率约为 15.87 %。
3. 模拟验证：感知机模型
为了验证“85%规则”的适用性，论文模拟了两种情况下训练准确性对学习的影响：在人工智能领域验证了经典的感知机模型，一种最简单的人工神经网络，已经被应用于从手写识别到自然语言处理等的各种领域。
感知机是一种经典的单层神经网络模型，它通过线性阈值学习过程将多元刺激 x 映射到二分类标签 y 上。为了实现这种映射，感知机通过神经网络权重进行线性变换，并且权重会基于真实标签 t 的反馈进行更新。也就是说，感知机只有在出错时才进行学习。自然的，人们会期望最佳学习与最大错误率相关。然而，因为感知机学习规则实际上是基于梯度下降的，所以前面的分析对这里也适用，即训练的最佳错误率应该是15.87％。
为了验证这个预测，论文模拟了感知机学习情况。以测量范围为0.01到0.5之间的训练误差率，步长为0.01（每个误差率1000次模拟）训练。学习的程度由精确度β确定。正如理论预测的那样，当以最佳错误率学习时，网络学习效率最高。如图2A所示，不同颜色梯度表示了以相对精度β/βmax 作为训练误差率和持续时间的函数，在 β=βmax 时学习下降最快；在不同错误率比例因子下的动态学习过程，图2B也显示，理论对模拟进行了良好的描述。
图2：“85%规则”下的感知机
4. 模拟验证：类生物神经网络
为了证明“85%规则”如何适用于生物系统学习，论文模拟了计算神经科学中感知学习的“Law和Gold模型”[11]。在训练猴子学会执行随机点运动的任务中，该模型已被证明可以解释包括捕捉行为、神经放电和突触权重等长期变化情况。在这些情况下，论文得出相同结果，即当训练以85％的准确率进行时，学习效率达到最大化。
具体来说，该模型假设猴子基于MT脑区的神经活动做出有关左右感知的决策。MT区在视觉系统的背侧视觉通路（Dorsal visual stream），是已知在大脑视觉中表征空间和运动信息的区域[15]，也被称为“空间通路”（where），相对的，视觉系统另一条腹侧视觉通路（Ventral visual stream）则表征知觉形状，也被称为“辨识通路”（what）。
在随机点动任务中，已经发现MT神经元对点运动刺激方向和一致相关性 COH 都有响应，使得每个神经元对特定的偏好方向响应最强，且响应的幅度随着相关性而增加。这种激发模式可通过一组简单的方程进行描述，从而对任意方向与相关刺激响应的噪声规模进行模拟。
根据大脑神经集群响应情况，Law 和 Gold 提出，动物有一个单独脑区（侧面顶侧区域，LIP）用来构建决策变量，作为MT中活动的加权和。不过它与感知机的关键差异在于，存在一个无法通过学习来消除的随机神经元噪声。这意味着无论多么大量的学习都不可能带来完美的性能。不过，由论文计算结果所示，不可约噪声的存在不会改变学习的最佳精度，该精度仍为85%。
Law and Gold 模型和感知机的另一个区别是学习规则的形式。具体来说就是有基于预测误差正确的奖励，会根据强化学习规则进行更新权重。尽管与感知器学习规则有很大的不同，但Law和Gold模型仍然在误差率[13]上实现梯度下降，在 85%左右实现学习最优。
为了测试这一预测，论文以各种不同的目标训练误差率进行了模拟，每个目标用MT神经元的不同参数模拟100次。其中训练网络的精度β，则通过在1%到100%之间以对数变化的一组一致性测试上，拟合网络的模拟行为来进行估计。
如图3A所示，在训练网络精确度β作为训练错误率的函数下，蓝色的理论曲线很好描述了训练后的精度。其中灰点表示单次模拟的结果。红点对应于每个目标误差率的平均精度和实际误差率。
此外，在图3B中，以三条不同颜色测量曲线显示了三种不同训练错误率下行为的预期差异。可以看到，在误差率为 0.16 （接近 15.87%）的黄色曲线上，结果精确度高于过低或过高误差率的两条曲线，即取得了最优的训练效果。
5. 心流的数学理论
沿着相同的思路，论文的工作指向了“心流”状态的数学理论[17]。这种心理状态，即“个体完全沉浸在没有自我意识但具有深度知觉的控制”的活动，最常发生在任务的难度与参与者的技能完全匹配时。
这种技能与挑战之间平衡的思想，如图4A所示，最初通过包括另外两种状态的简单概念图进行描述：挑战高于技能时的“焦虑”和技能超过挑战时的“无聊”，在二者中间即为“心流”。
而以上这三种不同性质的区域：心流，焦虑和无聊，可以本篇论文的模型中自然推演出来。
设技能水平为精度 β，以真实决策变量的反函数 1 /Δ 为技能挑战水平。论文发现当挑战等于技能时，心流与高学习率和高准确性相关，焦虑与低学习率和低准确性相关，厌倦与高准确性和低学习率相关（图4B和图C）。
也就是说，在技能与挑战水平相等时以“心流”状态进行的学习，具有最高的学习率和最高的准确性。
此外论文引述了 Vuorre 和 Metcalfe 最近的研究[18]发现，心流的主观感受达到巅峰时的任务是往往主观评定为中等难度的任务。而在另一项关脑机接口控制学习方面的研究工作发现，主观自我报告的最佳难度测量值，在最大学习任务相关难度处达到峰值，而不是在与神经活动的最佳解码相关难度处达到峰值[19]。
那么一个重要的问题来了，在使用最佳学习错误率，达到主观最佳任务难度即心流状态进行学习时，其学习速度究竟有多快？
论文通过比较最佳错误率与固定但可能次优的错误率、固定难度进行学习来解决了这个问题。通过对训练误差率函数计算，最终得到，在固定错误率下：学习技能β精度随着时间 t 的平方根而增长。
而相对的，在没有使用最佳固定错误率学习，即决策变量固定下一般学习，其结果会强烈地依赖于噪声的分布。不过论文计算出了噪声为正态分布的情况下的近似解，对β的提升，学习技能以更慢的对数速度增长。即若最佳训练率下，可以相当于对后者实现指数级增长的改进。二者学习增速趋势对比图如下：
从论文对感知机和Law and Gold 模型测试，心流理论的数学化可以看出，未来研究者们去测试各种学习类型活动参与度的主观测量值，验证是否在最大学习梯度点达到峰值，“85%规则”是否有效将会是有非常有趣的。
然而这篇论文的作用还远不仅于此，下面就本文意义做进一步深入探讨。
6. 学习的定量时代？讨论、延伸与启示
学习对个体生物个体的重要性不言而喻，甚至比大多数人想得更重要。在2013年1月，《心理学通报与评论》上发表了一篇论文①的就认为，学习不仅一个是认知过程，在更本质的功能层面是一种个体自适应过程，包括生物体在有机环境规律作用下的行为改变，并认为就如演化论是生物学核心一样，学习研究应该是心理学的核心。
然而，自心理学诞生后的诸多理论，对学习的研究往往止于简单行为操作或概念描述层面。比如行为主义研究者巴普洛夫和和斯金纳经典条件反射、操控条件反射，苏联心理学家维果斯基（Lev Vygotsky）有关儿童教育的“最近发展区”理论，有关动机和表现之间的关系的耶基斯–多德森定律（Yerkes–Dodson law）、基于舒适-学习-恐慌区的“舒适圈理论”，还包括米哈里·契克森米哈赖的“心流理论”，安德斯·艾利克森的“刻意练习”等等。
这些学习理论，要么强调学习需要外部刺激相关性、或正向奖励负向惩罚的某些强化，要么强调学习在大周期的效果，或较小周期的最小行动，要么寻求某种任务难度与技能水平、或动机水平与表现水平之间的一个折中区域。但是却从来没有给出如何到能达这种状态的条件，往往只能凭借有教育经验的工作者在实际教学中自行慢慢摸索。
而在这篇论文中，研究者考虑了在二分类任务和基于梯度下降的学习规则情况下训练准确性对学习的影响。准确计算出，当调整训练难度以使训练准确率保持在85％左右时，学习效率达到最大化，要比其他难度训练的速度快得多，会使学习效果指数级快于后者。
这个结果理论在人工神经和类生物学神经网络具有同样的效果。即“85%规则”既适用于包括多层前馈神经网络、递归神经网络、基于反向传播的各种深度学习算法、玻尔兹曼机、甚至水库计算网络（reservoir computing networks）[21, 22])等广泛的机器学习算法。通过对∂ER/∂β梯度最大化的分析，也证明其适用于类生物神经网络的学习，甚至任何影响神经表征精确度的过程，比如注意、投入或更一般的认知控制[23，24]。例如在后者中，当∂ER/∂β最大化时，参与认知控制的好处会最大化。通过关联预期价值控制理论（Expected Value of Control theory）[23，24，25]的研究，可以知道学习梯度 ∂ER/∂β 由大脑中与控制相关的区域 ( 如前扣带回皮层 ) 来进行监控。
因此可以说，本篇论文无论对计算机科学和机器学习领域研究，还是对心理学和神经科学研究，都具有重要的意义。
在前者，通过“课程学习”和“自步学习”诉诸广泛的机器学习算法，本文基于梯度下降学习规则思路下包括神经网络的各种广泛学习算法，都急需后续研究者进行探索和验证。在最佳学习率上，论文的工作仅仅是对机器学习学习效率数学精确化实例的第一步。并且同时也促使研究者思考：如何将这种最优化思路推广到在更广泛的环境和任务的不同算法中？例如贝叶斯学习，很明显和基于梯度下降的学习不同，贝叶斯学习很难受益于精心构建的训练集，无论先出简单或困难的例子，贝叶斯学习者会学得同样好，无法使用 ∂ER/∂β 获得“甜蜜点”。但跳开论文研究我们依然可以思考：有没有其它方法，例如对概念学习，通过更典型或具有代表性的样本、以某种设计的学习策略来加快学习速度和加深学习效果？
另一方面，这篇论文的工作同样对心理学、神经科学和认知科学领域有重大启示。
前面已经提到，有关学习理论大多止步于概念模型和定性描述。除了少数诸如心理物理学中的韦伯-费希纳定律（Weber-Fechner Law）这样，有关心理感受强度与物理刺激强度之间的精确关系，以及数学心理学（Mathematical psychology）的研究取向和一些结论，缺乏数学定量化也一直是心理学研究的不足之处。
而这篇论文不仅结论精确，其结论适用于包括注意、投入或更一般的认知控制下任何影响神经表征精确度的过程。如前所述，如果我们采取“学习不仅一个是认知过程，在更本质的功能层面是一种个体自适应改变过程”有关学习本质的观点，会发现它带来的启示甚至具有更大的适用性，远远超出了一般的认知和学习之外。
例如，在知觉和审美方面的研究中，俄勒冈大学（University of Oregon）的物理学 Richard Taylor 通过对视觉分形图案的研究发现，如设白纸的维度D为1，一张完全涂黑的纸的维度D为2，即画出来的图形维度在 1~2 之间，那么人类的眼睛更偏好于看维度 D=1.3 的图形[26]。事实上许多大自然物体具有的分形维度就是 1.3，在这个复杂度上人们会感到最舒适。一些著名的艺术家，比如抽象表现主义代表人物 ( Jackson Pollock )，他所画的具有分形的抽象画（下图中间一列，左边是自然图，右边为计算机模拟图）分布在 D=1.1 和 1.9 之间，具有更高分形维度的画面会给人带来更大的压迫感[27]。
心理学家 Rolf Reber 在审美愉悦加工的流畅度理论（Processing fluency theory of aesthetic pleasure）中[28]提出，我们有上述这种偏好是因为大脑可以快速加工这些内容。当我们能迅速加工某些内容的时候，就会获得一个正性反应。例如加工 D = 1.3的分形图案时速度很快，所以就会获得愉悦的情绪反应。此外，在设计和艺术领域心理学家域唐纳德·诺曼（Donald Arthur Norman）和艺术史学家贡布里希（Ernst Gombrich）也分别提出过类似思想。
对比下 D = 1.3 和 15.87% 的出错率，如果进行下统一比例，会发现前者多出原有分形维复杂性和整体的配比，未知：已知（或熟悉：意外，秩序与复杂）约为 0.3/1.3 ≈ 23.07%，这个结果比15.87%要大。这种计算方法最早由数学家 George David Birkhoff 在1928 年于《Aesthetic Measure》一书中提出，他认为若 O 为秩序，C 为复杂度，则一个事物的审美度量 M = O/C。
因此，在最简化估计下，可以类似得出 23.07% 额外信息的“最佳审美比”，会让欣赏者感到最舒适。
当然，因为信息复杂度的计算方法不一，上面只是一个非常粗略的估计。审美过程涉及感觉、知觉、认知、注意等多个方面，并且先于狭义的认知和学习过程，因此最佳审美比应该会15.87%要大。但至于具体数值，很可能因为不同环境和文化对不同的主体，以及不同的计算方法有较大差别，例如有学者从香农熵和柯尔莫哥洛夫复杂性方面进行度量的研究[29]。
但不管怎样，从这篇文章的方法和结论中，我们已可以得到巨大启示和信心，无论是在人工智能还是心理学或神经科学，无论学习还是审美、知觉或注意，在涉及各种智能主体对各种信息的处理行为中，我们都可能寻求到一个精确的比例，使得通过恰当选取已知和未知，让智能主体在体验、控制或认知上达到某种最优。而这种选取的结果，会使积累的效果远超自然过程得到改进。从这个意义上讲，这篇论文影响得很可能不只是某些科学研究方向，而是未来人类探索和改进自身的根本认知和实践方法。
参考资料
1. Celeste Kidd, Steven T Piantadosi, and Richard N Aslin. The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PloS one, 7(5):e36399, 2012.
2. Janet Metcalfe. Metacognitive judgments and control of study. Current Directions in Psychological Science, 18(3):159–163, 2009.
3. BF Skinner. The behavior of organisms: An experimental analysis. new york: D.appleton-century company, 1938.
4. Douglas H Lawrence. The transfer of a discrimination along a continuum. Journal of Comparative and Physiological Psychology, 45(6):511, 1952.
5. J L Elman. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71–99, Jul 1993.
6. Kai A Krueger and Peter Dayan. Flexible shaping: How learning in small steps helps.Cognition, 110(3):380–394, 2009.
7. Yoshua Bengio, Jérˆ ome Louradour, Ronan Collobert, and Jason Weston. Curricu- lum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009.
8. M Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pages 1189–1197, 2010.
9. David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et al. Learning represen- tations by back-propagating errors. Cognitive modeling, 5(3):1, 1988.
10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton.Deep learning.Nature, 521(7553):436–444, 2015.
11. Chi-Tat Law and Joshua I Gold. Reinforcement learning can account for associative and perceptual learning on a visual-decision task. Nat Neurosci, 12(5):655–63, May 2009.
12. WI Schöllhorn, G Mayer-Kress, KM Newell, and M Michelbrink.Time scales of adaptive behavior and motor learning in the presence of stochastic perturbations.Human movement science, 28(3):319–333, 2009.
13. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
14. Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386, 1958.
15. William T Newsome and Edmond B Pare. A selective impairment of motion perception following lesions of the middle temporal visual area (mt). Journal of Neuroscience, 8(6):2201–2211, 1988.
16. Kenneth H Britten, Michael N Shadlen, William T Newsome, and J Anthony Movshon.The analysis of visual motion: a comparison of neuronal and psychophysical perfor- mance. Journal of Neuroscience, 12(12):4745–4765, 1992.
17. Mihaly Csikszentmihalyi. Beyond boredom and anxiety. Jossey-Bass, 2000.
18. Matti Vuorre and Janet Metcalfe. The relation between the sense of agency and the experience of flow. Consciousness and cognition, 43:133–142, 2016.
19. Robert Bauer, Meike Fels, Vladislav Royter, Valerio Raco, and Alireza Gharabaghi.Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforce- ment learning of brain self-regulation. Clinical Neurophysiology, 127(9):3156–3164, 2016.
20. De Houwer J1, Barnes-Holmes D, Moors A..What is learning? On the nature and merits of a functional definition of learning.https://www.ncbi.nlm.nih.gov/pubmed/23359420
21. Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34):13, 2001.
22. Wolfgang Maass, Thomas Natschläger, and Henry Markram. Real-time computing without stable states: A new framework for neural computation based on perturba- tions. Neural computation, 14(11):2531–2560, 2002.
23. Amitai Shenhav, Matthew M Botvinick, and Jonathan D Cohen. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2):217–240, 2013.
24. Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, and Matthew M Botvinick. Toward a rational and mechanistic account of mental effort. Annual Review of Neuroscience, (0), 2017.
25. Joshua W Brown and Todd S Braver. Learned predictions of error likelihood in the anterior cingulate cortex. Science, 307(5712):1118–1121, 2005.
26. Hagerhall, C., Purcell, T., and Taylor, R.P. (2004). Fractal dimension of landscape silhouette as a predictor for landscape preference. Journal of Environmental Psychology 24: 247–55.
27. A Di Ieva.The Fractal Geometry of the Brain.
28. Rolf Reber, Norbert Schwarz, Piotr Winkielman.Processing Fluency and Aesthetic Pleasure:Is Beauty in the Perceiver’s Processing Experience.http://dx.doi.org/10.1207/s15327957pspr0804_3
29. Rigau,Jaume Feixas,Miquel Sbert,Mateu.Conceptualizing Birkhoff’s Aesthetic Measure Using Shannon Entropy and Kolmogorov Complexity. https://doi.org/10.2312/COMPAESTH/COMPAESTH07/105-112
2024-11-27
Richard Dawkins 《The Genetic Book of the Dead_ A Darwinian Reverie》
Contents
1 Reading the Animal
2 ‘Paintings’ and ‘Statues’
3 In the Depths of the Palimpsest
4 Reverse Engineering
5 Common Problem, Common Solution
6 Variations on a Theme
7 In Living Memory
8 The Immortal Gene
9 Out Beyond the Body Wall
10 The Backward Gene’s-Eye View
11 More Glances in the Rear-View Mirror
12 Good Companions, Bad Companions
13 Shared Exit to the Future
1 Reading the Animal
You are a book, an unfinished work of literature, an archive of descriptive history. Your body and your genome can be read as a comprehensive dossier on a succession of colorful worlds long vanished, worlds that surrounded your ancestors long gone: a genetic book of the dead. This truth applies to every animal, plant, fungus, bacterium, and archaean but, in order to avoid tiresome repetition, I shall sometimes treat all living creatures as honorary animals. In the same spirit, I treasure a remark by John Maynard Smith when we were together being shown around the Panama jungle by one of the Smithsonian scientists working there: ‘What a pleasure to listen to a man who really loves his animals.’ The ‘animals’ in question were palm trees.
From the animal’s point of view, the genetic book of the dead can also be seen as a predictor of the future, following the reasonable assumption that the future will not be too different from the past. A third way to say it is that the animal, including its genome, embodies a model of past environments, a model that it uses to, in effect, predict the future and so succeed in the game of Darwinism, which is the game of survival and reproduction, or, more precisely, gene survival. The animal’s genome makes a bet that the future will not be too different from the pasts that its ancestors successfully negotiated.
I said that an animal can be read as a book about past worlds, the worlds of its ancestors. Why didn’t I use the present tense: read the animal as a description of the environment in which it itself lives? It can indeed be read in that way. But (with reservations to be discussed) every aspect of an animal’s survival machinery was bequeathed via its genes by ancestral natural selection. So, when we read the animal, we are actually reading past environments. That is why my title includes ‘the dead’. We are talking about reconstructing ancient worlds in which successive ancestors, now long dead, survived to pass on the genes that shape the way we modern animals are. At present it is a difficult undertaking, but a scientist of the future, presented with a hitherto unknown animal, will be able to read its body, and its genes, as a detailed description of the environments in which its ancestors lived.
I shall have frequent recourse to my imagined Scientist Of the Future, confronted with the body of a hitherto unknown animal and tasked with reading it. For brevity, since I’ll need to mention her often, I shall use her initials, SOF. This distantly resonates with the Greek sophos, meaning ‘wise’ or ‘clever’, as in ‘philosophy’, ‘sophisticated’, etc. In order to avoid ungainly pronoun constructions, and as a courtesy, I arbitrarily assume SOF to be female. If I happened to be a female author, I’d reciprocate.
This genetic book of the dead, this ‘readout’ from the animal and its genes, this richly coded description of ancestral environments, must necessarily be a palimpsest. Ancient documents will be partially over-written by superimposed scripts laid down in later times. A palimpsest is defined by the Oxford English Dictionary as ‘a manuscript in which later writing has been superimposed on earlier (effaced) writing’. A dear colleague, the late Bill Hamilton, had the engaging habit of writing postcards as palimpsests, using different-colored inks to reduce confusion. His sister Dr Mary Bliss kindly lent me this example.
Besides his card being a nicely colorful palimpsest, it is fitting to use it because Professor Hamilton is widely regarded as the most distinguished Darwinian of his generation. Robert Trivers, mourning his death, said, ‘He had the most subtle, multi-layered mind I have ever encountered. What he said often had double and even triple meanings so that, while the rest of us speak and think in single notes, he thought in chords.’ Or should that be palimpsests? Anyway, I like to think he would have enjoyed the idea of evolutionary palimpsests. And, indeed, of the genetic book of the dead itself.
Both Bill’s postcards and my evolution palimpsests depart from the strict dictionary definition: earlier writings are not irretrievably effaced. In the genetic book of the dead, they are partially overwritten, still there to be read, albeit we must peer ‘through a glass darkly’, or through a thicket of later writings. The environments described by the genetic book of the dead run the gamut from ancient Precambrian seas, via all intermediates through the mega-years to very recent. Presumably some kind of weighting balances modern scripts versus ancient ones. I don’t think it follows a simple formula like the Koranic rule for handling internal contradictions – new always trumps old. I’ll return to this in Chapter 3.
If you want to succeed in the world you have to predict, or behave as if predicting, what will happen next. All sensible prediction must be based on the past, and much sensible prediction is statistical rather than absolute. Sometimes the prediction is cognitive – ‘I foresee that if I fall over that cliff (seize that snake by its rattling tail, eat those tempting belladonna berries), it is likely that I will suffer or die in consequence.’ We humans are accustomed to predictions of that cognitive kind, but they are not the predictions I have in mind. I shall be more concerned with unconscious, statistical ‘as-if’ predictions of what might affect an animal’s future chances of surviving and passing on copies of its genes.
This horned lizard of the Mojave, whose skin is tinted and patterned to resemble sand and small stones, embodies a prediction, by its genes, that it would find itself born (well, hatched) into a desert. Equivalently, a zoologist presented with the lizard could read its skin as a vivid description of the sand and stones of the desert environment in which its ancestors lived. And now here’s my central message. Much more than skin deep, the whole body through and through, its very warp and woof, every organ, every cell and biochemical process, every smidgen of any animal, including its genome, can be read as describing ancestral worlds. In the lizard’s case it will no doubt spin the same desert yarn as the skin. ‘Desert’ will be written into every reach of the animal, plus a whole lot more information about its ancestral past, information far exceeding what is available to present-day science.
The lizard burst out of the egg endowed with a genetic prediction that it would find itself in a sun-parched world of sand and pebbles. If it were to violate its genetic prediction, say by straying from the desert onto a golf green, a passing raptor would soon pick it off. Or if the world itself changed, such that its genetic predictions turned out to be wrong, it would also likely be doomed. All useful prediction relies on the future being approximately the same as the past, at least in a statistical sense. A world of continual mad caprice, an environmental bedlam that changed randomly and undependably, would render prediction impossible and put survival in jeopardy. Fortunately, the world is conservative, and genes can safely bet on any given place carrying on pretty much as before. On those occasions when it doesn’t – say after a catastrophic flood or volcanic eruption or, as in the case of the dinosaurs’ tragic end when an asteroid-strike ravaged the world – all predictions are wrong, all bets are off, and whole groups of animals go extinct. More usually, we aren’t dealing with such major catastrophes: not huge swathes of the animal kingdom being wiped out at a stroke, but only those variant individuals whose predictions are slightly wrong, or slightly more wrong than those of competitors within their own species. That is natural selection.
The top scripts of the palimpsest are so recent that they are of a special kind, written during the animal’s own lifetime. The genes’ description of ancestral worlds is overlain by modifications and detailed refinements scripted since the animal was born – modifications written or rewritten by the animal’s learning from experience; or by the remarkable memory of past diseases laid down by the immune system; or by physiological acclimatisation, to altitude, say; or even by simulations in imagination of possible future outcomes. These recent palimpsest scripts are not handed down by the genes (though the equipment needed to write them is), but they still amount to information from the past, called into service to predict the future. It’s just that it’s the very recent past, the past enclosed within the animal’s own lifetime. Chapter 7 is about those parts of the palimpsest that were scribbled in since the animal was born.
There is also an even more recent sense in which an animal’s brain sets up a dynamic model of the immediately fluctuating environment, predicting moment to moment changes in real time. Writing this on the Cornish coast, I take envious pleasure in the gulls as they surf the wind battering the cliffs of the Lizard peninsula. The wings, tail, and even head angle of each bird sensitively adjust themselves to the changing gusts and updraughts. Imagine that SOF, our zoologist of the future, implants radio-linked electrodes in a flying gull’s brain. She could obtain a readout of the gull’s muscle-adjustments, which would translate into a running commentary, in real time, on the whirling eddies of the wind: a predictive model in the brain that sensitively fine-tunes the bird’s flight surfaces so as to carry it into the next split second.
I said that an animal is not only a description of the past, not just a prediction of the future, but also a model. What is a model? A contour map is a model of a country, a model from which you can reconstruct the landscape and navigate its byways. So too is a list of zeros and ones in a computer, being a digitised rendering of the map, perhaps including information tied to it: local population size, crops grown, dominant religions, and so on. As an engineer might understand the word, any two systems are ‘models’ of each other if their behavior shares the same underlying mathematics. You can wire up an electronic model of a pendulum. The periodicity of both pendulum and electronic oscillator are governed by the same equation. It’s just that the symbols in the equation don’t stand for the same things. A mathematician could treat either of them, together with the relevant equation written on paper, as a ‘model’ of any of the others. Weather forecasters construct a dynamic computer model of the world’s weather, continually updated by information from strategically placed thermometers, barometers, anemometers, and nowadays above all, satellites. The model is run on into the future to construct a forecast for any chosen region of the world.
Sense organs do not faithfully project a movie of the outer world into a little cinema in the brain. The brain constructs a virtual reality (VR) model of the real world outside, a model that is continuously updated via the sense organs. Just as weather forecasters run their computer model of the world’s weather into the future, so every animal does the same thing from second to second with its own world model, in order to guide its next action. Each species sets up its own world model, which takes a form useful for the species’ way of life, useful for making vital predictions of how to survive. The model must be very different from species to species. The model in the head of a swallow or a bat must approximate a three-dimensional, aerial world of fast-moving targets. It may not matter that the model is updated by nerve impulses from the eyes in the one case, from the ears in the other. Nerve impulses are nerve impulses are nerve impulses, whatever their origin. A squirrel’s brain must run a VR model similar to that of a squirrel monkey. Both have to navigate a three-dimensional maze of tree trunks and branches. A cow’s model is simpler and closer to two dimensions. A frog doesn’t model a scene as we would understand the word. The frog’s eye largely confines itself to reporting small moving objects to the brain. Such a report typically initiates a stereotyped sequence of events: turning towards the object, hopping to get nearer, and finally shooting the tongue towards the target. The eye’s wiring-up embodies a prediction that, were the frog to shoot out its tongue in the indicated direction, it would be likely to hit food.
My Cornish grandfather was employed by the Marconi company in its pioneering days to teach the principles of radio to young engineers entering the company. Among his teaching aids was a clothesline that he waggled as a model of sound waves – or radio waves, for the same model applied to both, and that’s the point. Any complicated pattern of waves – sound waves, radio waves, or even sea waves at a pinch – can be broken down into component sine waves – ‘Fourier analysis’, named after the French mathematician Joseph Fourier (1768–1830). These in turn can be summed again to reconstitute the original complex wave (Fourier synthesis). To demonstrate this, Grandfather attached his clothesline to rotating wheels. When only one wheel turned, the rope executed serpentine undulations approximating a sine wave. When a coupled wheel rotated at the same time, the rope’s snaking waves became more complex. The sum of the sine waves was an elementary but vivid demonstration of the Fourier principle. Grandfather’s snaking rope was a model of a radio wave travelling from transmitter to receiver. Or of a sound wave entering the ear: a compound wave upon which the brain presumably performs something equivalent to Fourier analysis when it unravels, for example, a pattern even as complex as whispered speech plus intrusive coughing against the background of an orchestral concert. Amazingly, the human ear, well, actually, the human brain, can pick out here an oboe, there a French horn, from the compound waveform of the whole orchestra.
Today’s equivalent of my grandfather would use a computer screen instead of a clothesline, displaying first a simple sine wave, then another sine wave of different frequency, then adding the two together to generate a more complex wiggly line, and so on. The following is a picture of the sound waveform – high-frequency air pressure changes – when I uttered a single English word. If you knew how to analyse it, the numerical data embodied in (a much-expanded image of) the picture would yield a readout of what I said. In fact, it would require a great deal of mathematical wizardry and computer power for you to decipher it. But let the same wiggly line be the groove in which an old-fashioned gramophone needle sits. The resulting waves of changing air pressure would bombard your eardrums and be transduced to pulse patterns in nerve cells connected to your brain. Your brain would then without difficulty, in real time, perform the necessary mathematical wizardry to recognise the spoken word ‘sisters’.
Our sound-processing brain software effortlessly recognises the spoken word, but our sight-processing software has extreme difficulty deciphering it when confronted with a wavy line on paper, on a computer screen, or with the numbers that composed that wavy line. Nevertheless, all the information is contained in the numbers, no matter how they are represented. To decipher it, we’d need to do the mathematics explicitly with the aid of a high-speed computer, and it would be a difficult calculation. Yet our brains find it a doddle if presented with the same data in the form of sound waves. This is a parable to drive home the point – pivotal to my purpose, which is why I said it twice – that some parts of an animal are hugely harder to ‘read’ than others. The patterning on our Mojave lizard’s back was easy: equivalent to hearing ‘sisters’. Obviously, this animal’s ancestors survived in a stony desert. But let us not shrink from the difficult readings – the cellular chemistry of the liver, say. That might be difficult in the same way as seeing the waveform of ‘sisters’ on an oscilloscope screen is difficult. But nothing negates the main point, which is that the information, however hard to decipher, is lurking within. The genetic book of the dead may turn out to be as inscrutable as Linear A or the Indus Valley script. But the information, I believe, is all there.
The pattern to the right is a QR code. It contains a concealed message that your human eye cannot read. But your smartphone can instantly decipher it and reveal a line from my favourite poet. The genetic book of the dead is a palimpsest of messages about ancestral worlds, concealed in an animal’s body and genome. Like QR codes, they mostly cannot be read by the naked eye, but zoologists of the future, armed with advanced computers and other tools of their day, will read them.
To repeat the central point, when we examine an animal there are some cases – the Mojave horned lizard is one – where we can instantly read the embodied description of its ancestral environment, just as our auditory system can instantly decipher the spoken word ‘sisters’. Chapter 2 examines animals who have their ancestral environments almost literally painted on their backs. But mostly we must resort to more indirect and difficult methods in order to extract our readout. Later chapters feel their way towards possible ways of doing this. But in most cases the techniques are not yet properly developed, especially those that involve reading genomes. Part of my purpose is to inspire mathematicians, computer scientists, molecular geneticists, and others better qualified than I am, to develop such methods.
At the outset I need to dispel five possible misunderstandings of the main title, Genetic Book of the Dead. First is the disappointing revelation that I am deferring the task of deciphering much of the book of the dead to the sciences of the future. Nothing much I can do about that. Second, there is little connection, other than a poetic resonance, with the Egyptian Books of the Dead. These were instruction manuals buried with the dead, to help them navigate their way to immortality. An animal’s genome is an instruction manual telling the animal how to navigate through the world, in such a way as to pass the manual (not the body) on into the indefinite future, if not actual immortality.
Third, my title might be misunderstood to be about the fascinating subject of Ancient DNA. The DNA of the long dead – well, not very long, unfortunately – is in some cases available to us, often in disjointed fragments. The Swedish geneticist Svante Pääbo won a Nobel prize for jigsawing the genome of Neanderthal and Denisovan humans, otherwise known only from fossils; in the Denisovan case only three teeth and five bone fragments. Pääbo’s work incidentally shows that Europeans, but not sub-Saharan Africans, are descended from rare cases of interbreeding with Neanderthals. Also, some modern humans, especially Melanesians, can be traced back to interbreeding events with Denisovans. The field of ‘Ancient DNA’ research is now flourishing. The woolly mammoth genome is almost completely known, and there are serious hopes of reviving the species. Other possible ‘resurrections’ might include the dodo, passenger pigeon, great auk, and thylacine (Tasmanian wolf). Unfortunately, sufficient DNA doesn’t last more than a few thousand years at best. In any case, interesting though it is, Ancient DNA is outside the scope of this book.
Fourth, I shall not be dealing with comparisons of DNA sequences in different populations of modern humans and the light that they throw on history, including the waves of human migration that have swept over Earth’s land surface. Tantalisingly, these genetic studies overlap with comparisons between languages. For example, the distribution of both genes and words across the Micronesian islands of the Western Pacific islands shows a mathematically lawful relationship between inter-island distance and word-resemblance. We can picture outrigger canoes scudding across the open Pacific, laden with both genes and words! But that would be a chapter in another book. Might it be called The Selfish Meme?
The present book’s title should not be taken to mean that existing science is ready to translate DNA sequences into descriptions of ancient environments. Nobody can do that, and it’s not clear that SOF will ever do so. This book is about reading the animal itself, its body and behaviour – the ‘phenotype’. It remains true that the descriptive messages from the past are transmitted by DNA. But for the moment we read them indirectly via phenotypes. The easiest, if not the only, way to translate a human genome into a working body is to feed it into a very special interpreting device called a woman.
The Species as Sculpture; the Species as Averaging Computer
Sir D’Arcy Thompson (1860–1948), that immensely learned zoologist, classicist, and mathematician, made a remark that seems trite, even tautological, but it actually provokes thought. ‘Everything is the way it is because it got that way.’ The solar system is the way it is because the laws of physics turned a cloud of gas and dust into a spinning disc, which then condensed to form the sun, plus orbiting bodies rotating in the same plane as each other and in the same direction, marking the plane of the original disc. The moon is the way it is because a titanic bombardment of Earth 4.5 billion years ago hived off into orbit a great quantity of matter, which then was pulled and kneaded by gravity into a sphere. The moon’s initial rotation later slowed, in a phenomenon called ‘tidal locking’, such that we only ever see one face of it. More minor bombardments disfigured the moon’s surface with craters. Earth would be pockmarked in the same way but for erosive and tectonic obliteration. A sculpture is the way it is because a block of Carrara marble received the loving attention of Michelangelo.
Why are our bodies the way they are? Partly, like the moon, we bear the scars of foreign insults – bullet wounds, souvenirs of the duellist’s sabre or the surgeon’s knife, even actual craters from smallpox or chickenpox. But these are superficial details. A body mostly got that way through the processes of embryology and growth. These were, in turn, directed by the DNA in its cells. And how did the DNA get to be the way it is? Here we come to the point. The genome of every individual is a sample of the gene pool of the species. The gene pool got to be the way it is over many generations, partly through random drift, but more pertinently through a process of non-random sculpture. The sculptor is natural selection, carving and whittling the gene pool until it – and the bodies that are its outward and visible manifestation – is the way it is.
Why do I say it’s the species gene pool that is sculpted rather than the individual’s genome? Because, unlike Michelangelo’s marble, the genome of an individual doesn’t change. The individual genome is not the entity that the sculptor carves. Once fertilisation has taken place, the genome remains fixed, from zygote right through embryonic development, to childhood, adulthood, old age. It is the gene pool of the species, not the genome of the individual, that changes under the Darwinian chisel. The change deserves to be called sculpting to the extent that the typical animal form that results is an improvement. Improvement doesn’t have to mean more beautiful like a Rodin or a Praxiteles (though it often is). It means only getting better at surviving and reproducing. Some individuals survive to reproduce. Others die young. Some individuals have lots of mates. Others have none. Some have no children. Others a swarming, healthy brood. Sexual recombination sees to it that the gene pool is stirred and shaken. Mutation sees to it that new genetic variants are fed into the mingling pool. Natural selection and sexual selection see to it that, as generation succeeds generation, the shape of the average genome of the species changes in constructive directions.
Unless we are population geneticists, we don’t see the shifting of the sculpted gene pool directly. Instead, we observe changes in the average bodily form and behaviour of members of the species. Every individual is built by the cooperative enterprise of a sample of genes taken from the current pool. The gene pool of a species is the ever-changing marble upon which the chisels, the fine, sharp, exquisitely delicate, deeply probing chisels of natural selection, go to work.
A geologist looks at a mountain or valley and ‘reads’ it, reconstructs its history from the remote past through to recent times. The natural sculpting of the mountain or valley might begin with a volcano, or tectonic subduction and upthrust. The chisels of wind and rain, rivers and glaciers then take over. When a biologist looks at fossil history, she sees not genes but things that eyes are equipped to see: progressive changes in average phenotype. But the entity being carved by natural selection is the species gene pool.
The existence of sexual reproduction confers on The Species a very special status not shared by other units in the taxonomic hierarchy – genus, family, order, class, etc. Why? Because sexual recombining of genes – shuffling the pack (American deck) – takes place only within the species. That is the very definition of ‘species’. And it leads me to the second metaphor in the title of this section: the species as averaging computer.
The genetic book of the dead is a written description of the world of no particular ancestral individual more than another. It is a description of the environments that sculpted the whole gene pool. Any individual whom we examine today is a sample from the shuffled pack, the shaken and stirred gene pool. And the gene pool in every generation was the result of a statistical process averaged over all those individual successes and failures within the species. The species is an averaging computer. The gene pool is the database upon which it works.
2 ‘Paintings’ and ‘Statues’
When, like that Mojave Desert lizard, an animal has its ancestral home painted on its back, our eyes give us an instant and effortless readout of the worlds of its forebears, and the hazards that they survived. Here’s another highly camouflaged lizard. Can you see it on its background of tree bark? You can, because the photograph was taken in a strong light from close range. You are like a predator who has had the good fortune to stumble upon a victim under ideal seeing conditions. It is such close encounters that exerted the selection pressure to put the finishing touches to the camouflage’s perfection. But how did the evolution of camouflage get its start? Wandering predators, idly scanning out of the corner of their eye, or hunting when the light was poor, supplied the selection pressures that began the process of evolution towards tree bark mimicry, back when the incipient resemblance was only slight. The intermediate stages of camouflage perfection would have relied upon intermediate seeing conditions. There’s a continuous gradient of available conditions, from ‘seen at a distance, in a poor light, out of the corner of the eye, or when not paying attention’ all the way up to ‘close-up, good light, full-frontal’. The lizard of today has a detailed, highly accurate ‘painting’ of tree bark on its back, painted by genes that survived in the gene pool because they produced increasingly accurate pictures.
We have only to glance at this frog to ‘read’ the environment of its ancestors as being rich in grey lichen. Or, in another of Chapter 1’s formulations, the frog’s genes ‘bet’ on lichen. I intend ‘bet’ and ‘read’ in a sense that is close to literal. It requires no sophisticated techniques or apparatus. The zoologist’s eyes are sufficient. And the Darwinian reason for this is that the painting is designed to deceive predatory eyes that work in the same kind of way as the zoologist’s own eyes. Ancestral frogs survived because they successfully deceived predatory eyes similar to the eyes of the zoologist – or of you, vertebrate reader.
In some cases, it is not prey but predators whose outer surface is painted with the colours and patterning of their ancestral world, the better to creep up on prey unseen. A tiger’s genes bet on the tiger being born into a world of light and shade striped by vertical stems. The zoologist examining the body of a snow leopard could bet that its ancestors lived in a mottled world of stones and rocks, perhaps a mountainous region. And its genes place a future bet on the same environment as cover for its offspring.
By the way, the big cat’s mammalian prey might find its camouflage more baffling than we do. We apes and Old World monkeys have trichromatic vision, with three colour-sensitive cell types in our retinas, like modern digital cameras. Most mammals are dichromats: they are what we would call red-green colour-blind. This probably means they’d find a tiger or snow leopard even harder to distinguish from its background than we would. Natural selection has ‘designed’ the stripes of tigers, and the blotches of snow leopards, in such a way as to fool the dichromat eyes of their typical prey. They are pretty good at fooling our trichromat eyes too.
Also in passing, I note how surprising it is that otherwise beautifully camouflaged animals are let down by a dead giveaway – symmetry. The feathers of this owl beautifully imitate tree bark. But the symmetry gives the game away. The camouflage is broken.
I am reduced to suspecting that there must be some deep embryological constraint, making it hard to break away from left-right symmetry. Or does symmetry confer some inscrutable advantage in social encounters? To intimidate rivals, perhaps? Owls can rotate their necks through a far greater angle than we can. Perhaps that mitigates the problem of a symmetrical face. This particular photograph tempts the speculation that natural selection might have favoured the habit of closing one eye because it reduces symmetry. But I suppose that’s too much to hope for.
Subtly different from ‘paintings’ are ‘statues’. Here the animal’s whole body resembles a discrete object that it is not. A tawny frogmouth or a potoo resembling a broken stump of a tree branch, a stick caterpillar sculpted as a twig, a grasshopper resembling a stone or a clod of dry soil, a caterpillar mimicking a bird dropping, are all examples of animal ‘statues’.
The working difference between a ‘painting’ and a ‘statue’ is that a painting, but not a statue, ceases to deceive the moment the animal is removed from its natural background. A ‘painted’ peppered moth removed from the light-coloured bark that it resembles and placed on any other background will instantly be seen and caught by a predator. In this photograph, the background is a soot-blackened tree in an industrial area, which is perfect for the dark, melanic mutant of the same species of moth that you may have noticed less immediately by its side. On the other hand, the masquerading Geometrid stick caterpillar photographed by Anil Kumar Verma in India, if placed on any background, would have a good chance of still being mistaken for a stick and overlooked by a predator. That is the mark of a good animal statue.
Although a statue resembles objects in the natural background, it does not depend for its effectiveness on being seen against that background in the way that a ‘painting’ does. On the contrary, it might be in greater danger. A lone stick insect on a lawn might be overlooked, as a stick that had fallen there. A stick insect surrounded by real sticks might be spotted as the odd one out. When drifting alone, the leafy sea dragon’s resemblance to a wrack might protect it, at least more so than its seahorse cousin whose shape in no way mimics a seaweed. But would this statue be less safe when nestling in a waving bed of real seaweed? It’s a moot question.
Freshwater mussels of the species Lampsilis cardium have larvae that grow by feeding on blood, which they suck from the gills of a fish. The mussel has to find a way to put its larvae into the fish. It does it by means of a ‘statue’, which fools the fish. The mussel has a brood pouch for very young larvae on the edge of its mantle. The brood pouch is an impressive replica of a pair of small fish, complete with false eyes and false, very fish-like, ‘swimming’ movements. Statues don’t move, so the word ‘statue’ is strictly inappropriate, but never mind, you get the point. Larger fish approach and attempt to catch the dummy fish. What they actually catch – and it does them no good – is a squirt of mussel larvae.
This highly camouflaged snake from Iran has a dummy spider at the tip of its tail. It may look only half convincing in a still picture. But the snake moves its tail in such a way that it looks strikingly like a spider scuttling about. Very realistic indeed, especially when the snake itself is concealed in a burrow with only the tail tip visible. Birds swoop down on the spider. And that is the last thing they do. It is worth reflecting on how remarkable it is that such a trick has evolved by natural selection. What might the intermediate stages have looked like? How did the evolutionary sequence get started? I suppose that, before the tip of the tail looked anything like a spider, simply waggling it about was somewhat attractive to birds, who are drawn to any small moving object.
Both ‘paintings’ and ‘statues’ are easy-to-read descriptions of ancestral worlds, the environments in which ancestors survived. The stick caterpillar is a detailed description of ancient twigs. The potoo is a perfect model of long-forgotten stumps. Except that they are not really forgotten. The potoo itself is the memory. Twigs of past ages have carved their own likeness into the masquerading body of that caterpillar. The sands of time have painted their collective self-portrait on the surface of this spider, which you may have trouble spotting.
‘Where are the snows of yesteryear?’ Natural selection has frozen them in the winter plumage of the willow ptarmigan.
The leaf-tailed gecko recalls to our minds, though not his, the dead leaves among which his ancestors lived. He embodies the Darwinian ‘memory’ of generations of leaves that fell long before men arrived in Madagascar to see them, probably long before men existed anywhere.
The green katydid (long-horned grasshopper) has no idea that it embodies a genetic memory of green mosses and fronds over which its ancestors walked. But we can read at a glance that this is so. Same with this adorable little Vietnamese mossy frog.
Statues don’t always copy inanimate objects like sticks or pebbles, dead leaves, or tree branch stubs. Some mimics pretend to be poisonous or distasteful models, and inconspicuous is precisely what they are not. At first glance you might think this was a wasp and hesitate to pick it up. It’s actually a harmless hoverfly. The eyes give it away. Flies have bigger compound eyes than wasps. This feature is probably written in a deep layer of palimpsest that, for some reason, is hard to over-write. The largest anatomical difference between flies and wasps – two wings rather than four (the feature that gives the fly Order its Latin name, Diptera) – is perhaps also difficult to over-write. But maybe, too, that potential clue is hard to notice. What predator is going to take the time to count wings?
Real wasps, the models for the hoverfly mimicry, are not trying to hide. They’re the opposite of camouflaged. Their vividly striped abdomen shouts ‘Beware! Don’t mess with me!’ The hoverfly is shouting the same thing, but it’s a lie. It has no sting and would be good to eat if only the predator dared to attack it. It is a statue, not a painting, because its (fake) warning doesn’t depend on the background. From our point of view in this book, we can read its stripes as telling us that the ecology of its ancestors contained dangerous yellow-and-black stripy things, and predators that feared them. The fly’s stripes are a simulacrum of erstwhile wasp stripes, painted on its abdomen by natural selection. Yellow and black stripes on an insect reliably signify a warning – either true or false – of dire consequences to would-be attackers. The beetle to the right is another, especially vivid example.
If you came face to face with this, peering at you through the undergrowth, would you start back, thinking it was a snake?
It isn’t peering and it isn’t a snake. It’s the chrysalis of a butterfly, Dynastor darius, and chrysalises don’t peer. As a fine pretence of the front end of a snake, it’s well calculated to frighten. Never mind that rational second thoughts could calculate that it’s a bit on the small side to be a dangerous snake. There exists a distance – still close enough to be worrying – at which a snake would look that small. Besides, a panicking bird has no time for second thoughts. One startled squawk and it’s away. Having more time for reflection, the Darwinian student of the genetic book of the dead will read the caterpillar’s ancestral world as inhabited by dangerous snakes. Some caterpillars, whose rear ends pull the same snake trick, even move muscles in such a way that the fake eyes seem to close and open. Would-be predators can’t be expected to know that snakes don’t do that.
Eyes are scary in themselves. That’s why some moths have eyespots on their wings, which they suddenly expose when surprised by a predator. If you had good reason to fear tigers or other members of the cat family, might you not start back in alarm if suddenly confronted with this, the so-called owl moth of South East Asia?
There exists a distance – a dangerous distance – at which a tiger or a leopard would present a retinal image the same size as a close-up moth. OK, it doesn’t look very like any particular member of the cat family to our eyes. But there’s plenty of evidence that animals of various species respond to dummies that bear only a crude resemblance to the real thing – scarecrows are a familiar example, and there’s lots of experimental evidence as well. Black-headed gulls respond to a model gull head on the end of a stick, as though it were a whole real gull. A shocked withdrawal might be all it takes to save this moth.
I am amused to learn that eyes painted on the rumps of cattle are effective in deterring predation by lions.
We could call it the Babar effect, after Jean de Brunhoff’s lovable and wise King of the Elephants, who won the war against the rhinoceroses by painting scary eyes on elephant rumps.
What on Earth is this? A dragon? A nightmare devil horse? It is in fact the caterpillar of an Australian moth, the pink underwing. The spectacular eye and teeth pattern is not visible when the caterpillar is at rest. It is screened by folds of skin. When threatened, the animal pulls back the skin screen to unveil the display, and, well, all I can say is that if I were a would-be predator, I wouldn’t hang about.
PHOTO: HUSEIN LATIF
The scariest false face I know? It’s a toss-up between the octopus on the left and the vulture on the right. The real eyes of the octopus can just be seen above the inner ends of the ‘eyebrows’ of the large, prominent false eyes. You can find the real eyes of the Himalayan griffon vulture if you first locate the beak and hence the real head. The false eyes of the octopus presumably deter predators. The vulture seems to use its false face to intimidate other vultures, thereby clearing a path through a crowd around a carcase.
Some butterflies have a false head at the back of the wings. How might this benefit the insect? Five hypotheses have been proposed, of which the consensus favourite is the deflection hypothesis: birds are thought to peck at the less vulnerable false head, sparing the real one. I slightly prefer a sixth idea, that the predator expects the butterfly to take off in the wrong direction. Why do I prefer it? Perhaps because I am committed to the idea that animals survive by predicting the future.
Paintings and statues aimed at fooling predators constitute the nearest approach achieved by any book of the dead to a literal readout, a literal description of ancestral worlds. And the aspect of this that I want to stress is its astounding accuracy and attention to detail. This leaf insect even has fake blemishes. The stick caterpillar (here) has fake buds.
I see no reason why the same scrupulous attention to detail should not pervade less literal, less obvious parts of the readout. I believe the same detailed perfection is lurking, waiting to be discovered, in internal organs, in brain-wiring of behaviour, in cellular biochemistry, and other more indirect or deeply buried readings that can be dug out if only we could develop the tools to do so. Why should natural selection escalate its vigilance specifically for the external appearance of animals? Internal details, all details, are no less vital to survival. They are equally subject to becoming written descriptions of past worlds, albeit written in a less transparent script, harder to decipher than this chapter’s superficial paintings and statues. The reason paintings and statues are easier for us to read than internal pages of the genetic book of the dead is not far to seek. They are aimed at eyes, especially predatory eyes. And, as already pointed out, predatory eyes, vertebrate ones at least, work in the same way as our eyes. No wonder it is camouflage and other versions of painting and sculpture that most impress us among all the pages of the book of the dead.
I believe the internally buried descriptions of ancestral worlds will turn out to have the same detailed perfection as the externally seen paintings and statues. Why should they not? The descriptions will just be written less literally, more cryptically, and will require more sophisticated decoding. As with the ear’s decoding of Chapter 1’s spoken word ‘sisters’, the paintings and statues of this chapter are effortlessly read pages from books of the dead. But just as the ‘sisters’ waveform, when presented in the recalcitrant form of binary digits, will eventually yield to analysis, so too will the non-obvious, non-skin-deep details of animals and their genes. The book of the dead will be read, even down to minute details buried deep inside every cell.
This is my central message, and it will bear repeating here. The fine-fingered sculpting of natural selection works not just on the external appearance of an animal such as a stick caterpillar, a tree-climbing lizard, a leaf insect or a tawny frogmouth, where we can appreciate it with the naked eye. The Darwinian sculptor’s sharp chisels penetrate every internal cranny and nook of an animal, right down to the sub-microscopic interior of cells and the high-speed chemical wheels that turn therein. Do not be deceived by the extra difficulty of discerning details more deeply buried. There is every reason to suppose that painted lizards or moths, and moulded potoos or caterpillars, are the outward and visible tips of huge, concealed icebergs. Darwin was at his most eloquent in expressing the point.
It may be said that natural selection is daily and hourly scrutinising, throughout the world, every variation, even the slightest; rejecting that which is bad, preserving and adding up all that is good; silently and insensibly working, whenever and wherever opportunity offers, at the improvement of each organic being in relation to its organic and inorganic conditions of life. We see nothing of these slow changes in progress, until the hand of time has marked the long lapse of ages, and then so imperfect is our view into long past geological ages, that we only see that the forms of life are now different from what they formerly were.
3 In the Depths of the Palimpsest
It’s all very well for me to say an animal is a readout of environments from the past, but how far into the past do we go? Every twinge of lower-back pain reminds us that our ancestors only 6 million years ago walked on all fours. Our mammalian spine was built over hundreds of millions of years of horizontal existence when the working body depended on it – depended in the literal sense of hanging from it. The human spine was not ‘meant’ to stand vertically, and it understandably protests. Our human palimpsest has ‘quadruped’ boldly written in a firm hand, then over-written all too superficially – and sometimes painfully – with the tracery of a new description – biped. Parvenu, Johnny-come-lately biped.
The skin of Chapter 1’s Mojave horned lizard proclaimed to us an ancestral world of sandy, stony desert, but that world was presumably recent. What can we read from the palimpsest about earlier environments? Let’s begin by going back a very long way. As with all vertebrates, lizard embryos have gill arches that speak to us of ancestral life in water. As it happens, we have fossils to tell us that the watery scripts of all terrestrial vertebrates, including lizards, date back to Devonian times and then back to life’s marine beginning. The poetic point has often been made – I associate it with that salty, larger-than-life intellectual warrior JBS Haldane – that our saline blood plasma is a relic of Palaeozoic seas. In a 1940 essay called ‘Man as a Sea Beast’, Haldane notes that our plasma is similar in chemical composition to the sea but diluted. He takes this as an indication, not a very strong one in my reluctant opinion (‘reluctant’ because I like the idea), that Palaeozoic seas were less salty than today’s:
As the sea is always receiving salt from the rivers, and only occasionally depositing it in drying lagoons, it becomes saltier from age to age, and our plasma tells us of a time when it possessed less than half its present salt content.
The phrase ‘tells us of a time’ resonates congenially with the title of this book. Haldane goes on:
we pass our first nine months as aquatic animals, suspended in and protected by a salty fluid medium. We begin life as salt-water animals.
Whatever the plausibility of Haldane’s inference about changing salinity, what is undeniable is this. All life began in the sea. The lowest level of palimpsest tells a story of water. After some hundreds of millions of years, plants and then a variety of animals took the enterprising step out onto the land. Following Haldane’s fancy, we could say they eased the journey by taking their private sea water with them in their blood. Animal groups that independently took this step include scorpions, snails, centipedes and millipedes, spiders, crustaceans such as woodlice and land crabs, insects (who later took a further giant leap into the air) and a range of worms who, however, never stray far from moisture to this day. All these animals have ‘dry land’ inscribed on top of the deeper marine layers of palimpsest. Of special interest to us as vertebrates, the lobefins, a group of fish represented today by lungfish and coelacanths, crawled out of the sea, perhaps initially only in search of water elsewhere but eventually to take up permanent residence on dry land, in some cases very dry indeed. Intermediate palimpsest scripts tell of juvenile life in water (think tadpole) accompanying adult emergence on land.
That all makes sense. There was a living to be made on land. The sun showers the land with photons, no less than the surface of the sea. Energy was there for the taking. Why wouldn’t plants take advantage of it via green solar panels, and then animals take advantage of it via plants? Do not suppose that a mutant individual suddenly found itself fully equipped genetically for life on land. More probably, individuals of an enterprising disposition made the first uncomfortable moves. This was perhaps rewarded by a new source of food. We can imagine them learning to make brief, snatch-and-grab forays out of water. Genetic natural selection would have favoured individuals who were especially good at learning the new ploy. Successive generations would have become better and better at learning it, spending less and less time in the sea.
The general name for learned behaviour becoming genetically incorporated is the Baldwin Effect. Though I won’t discuss it further here, I suspect that it’s important in the evolution of major innovations generally, perhaps including the first moves towards defying gravity in flight. In the case of the lobe-finned fishes who left the water in the Devonian era around 400 million years ago, there are various theories for how it happened. One that I like was proposed by the American palaeontologist AS Romer. Recurrent drought would have stranded fishes in shrinking pools. Natural selection favoured individuals able to leave a doomed pool and crawl overland to find another one. A point in strong favour of the theory is that there would have been a continuous range of distances separating the pools. At the beginning of the evolutionary progression, a fish could save its life by crawling to a neighbouring pool only a short distance away. Later in evolution, more distant pools could be reached. All evolutionary advances must be gradual. A suffocating fish’s ability to exploit air requires physiological modification. Major modification cannot happen in one fell swoop. That would be too improbable. There has to be a gradient of step-by-step small improvement. And a gradient of distances between pools, some near, some a bit further, some far, is exactly what is needed. We shall meet the point again in Chapter 6 and the astonishingly rapid evolution of Cichlid fishes in Lake Victoria. Unfortunately, Romer prefaced his theory by quoting evidence that the Devonian was especially prone to drought. When this evidence was called into question, Romer’s whole theory suffered in appreciation. Unnecessarily so.
In whatever way the move to the land happened, profound redesign became necessary. Water really is a very different environment from airy land. For animals, the move out of water was accompanied by radical changes in anatomy and physiology. Watery scripts at the base of the palimpsest had to be comprehensively over-written. It is the more surprising that a large number of animal groups later went into reverse, throwing their hard-won retooling to the winds as they trooped back into the water. Among invertebrates, the list includes pond snails, diving bell spiders, and water beetles. The water that they re-invaded is fresh water, not sea. But some vertebrate returnees, notably whales (including dolphins), sea cows, sea snakes, and turtles, went right back into the salted marine world that their ancestors had taken such trouble to leave.
Seals, sea lions, walruses, and their kin, also Galapagos marine iguanas, only partially returned to the sea, to feed. They still spend much time on land, and breed on land. So do penguins, whose streamlined athleticism in the sea is bought at the cost of risible maladroitness on land. You cannot be a master of all trades. Sea turtles laboriously haul themselves out on land to lay eggs. Otherwise, they totally recommitted to the sea. As soon as baby turtles hatch in the sand, they lose no time in racing down the beach to the sea. Lots of other land vertebrates moved part-time into fresh water, including snakes, crocodiles, hippos, otters, shrews, tenrecs, rodents such as water voles and beavers, desmans (a kind of mole), yapoks (water opossums), and platypuses. These still spend a good deal of time on land, taking to the water mainly to feed.
Sea turtle
You might think that returnees to water would unmask the lower layers of palimpsest and rediscover the designs that served their ancestors so well. Why don’t whales, why don’t dugongs, have gills? Their embryos, like the embryos of all mammals, even have the makings of gills. It would seem the most natural thing in the world to dust off the old script and press it into service again. That doesn’t happen. It’s almost as though, having gone to such trouble to evolve lungs, they were reluctant to abandon them, even if, as you might think, gills would serve them better. Given gills, they wouldn’t have to keep coming to the surface to breathe. But rather than revive the gill, what they did was stick loyally to the lung, even at the cost of profound modifications to the whole air-breathing system, to accommodate the return to water.
They changed their physiology in extreme ways such that they can stay under water for over an hour in some cases. When whales do come to the surface, they can exchange a huge volume of air very quickly in one roaring gulp before submerging again. It’s tempting to toy with the idea of a general rule stating that old scripts from lower down the palimpsest cannot be revived. But I can’t see why this should in general be true. There has to be a more telling reason. I suspect that, having committed their embryological mechanics to air-breathing lungs, the repurposing of gills would be a more radical embryological upheaval, more difficult to achieve than rewriting superficial scripts to modify the air-breathing equipment.
Sea snakes don’t have gills, but they obtain oxygen from water through an exceptionally rich blood supply in the head. Again, they went for a new solution to the problem, rather than revive the old one. Some turtles obtain a certain amount of oxygen from water via the cloaca (waste disposal plus genital opening), but they still have to come to the surface to breathe air into their lungs.
Steller’s sea cow
Never parted from the buoyant support of water, whales are freed to evolve in massively (indeed so) different directions from their terrestrial ancestors. The blue whale is probably the largest animal that ever lived. Steller’s sea cows (see previous page), extinct relatives of dugongs and manatees, reached lengths of 11 metres and masses of 10 tonnes, larger than minke whales. They were hunted to extinction in the eighteenth century, soon after Steller first saw them. Like whales, sea cows breathe air, having failed to rediscover anything equivalent to the gills of their earlier ancestors. For reasons just discussed, that word ‘failed’ may be ill-advised.
Ichthyosaurs were reptilian contemporaries of the dinosaurs, with fins and streamlined bodies, and with powerful tails, which were their main engines of propulsion: like dolphins, except that ichthyosaur tails would have moved from side to side rather than up and down. The ancestors of whales and dolphins had already perfected the mammalian galloping gait on land, and the up-and-down motion of dolphin flukes was naturally derived from it. Dolphins ‘gallop’ through the water, unlike ichthyosaurs, who would have swum more like fish. Otherwise, ichthyosaurs looked like dolphins and they probably lived pretty much like dolphins. Did they leap exuberantly into the air – wonderful thought – wagging their tails like dolphins (but from side to side)? They had big eyes, from which we might guess that they probably didn’t rely on sonar as the small-eyed dolphins do. Ichthyosaurs gave birth to live babies in the sea, as we know from a fossil ichthyosaur who unfortunately died during the act of giving birth (see above). Unlike turtles, but like dolphins and sea cows, ichthyosaurs were fully emancipated from their terrestrial heritage. So were plesiosaurs, for there’s evidence that they were livebearers too. Given that viviparity has evolved, according to one authoritative estimate, at least 100 times independently in land reptiles, it seems surprising that sea turtles, buoyant in water but painfully heavy on land, still labour up the sands to lay eggs. And that their babies, when they hatch, are obliged to flap their perilous way down to the sea, running a gauntlet of gulls, frigate birds, foxes, and even marauding crabs.
Ichthyosaur died while giving birth
Sea turtles revert to land to lay their eggs, in holes that they dig in a sandy beach. And an arduous exertion it is, for they are woefully ill-equipped to move out of water. Seals, sea lions, otters, and many other mammals whom we’ll discuss in a moment, spend part of their time in water and are adapted to swimming rather than walking, which makes them clumsy on land, though less so than sea turtles. As already remarked, the same is true of penguins, who are champions in water but comically awkward on land. Galapagos marine iguanas are proficient swimmers, but they can manage a surprising turn of speed on land too, when fleeing snakes. All these animals show us what the intermediates might have been like, on the way to becoming dedicated mariners like whales, dugongs, plesiosaurs, and ichthyosaurs.
Tortles and turtoises – a tortuous trajectory
Turtles and tortoises are of special interest from the palimpsest point of view, and they deserve special treatment. But first I have to dispel a confusing quirk of the English language. In British common usage, turtles are purely aquatic, tortoises totally terrestrial. Americans call them all turtles, tortoises being those turtles that live on land. In what follows, I’ll try to use unambiguous language that won’t confuse readers from either of the two nations ‘separated by a common language’. I’ll sometimes resort to ‘chelonians’ to refer to the entire group.
Land tortoises, as we shall see, are almost unique in that their palimpsest chronicles a double doubling-back during the long course of their evolution. Their fish ancestors, along with the ancestors of all land vertebrates including us, left the sea in Devonian times, around 400 million years ago. After a period on land they then, like whales and dugongs, like ichthyosaurs and plesiosaurs, returned to the water. They became sea turtles. Finally, uniquely, some aquatic turtles came back to the land and became our modern dry-land (in some cases very dry indeed) tortoises. This is the ‘double doubling-back’ that I mentioned. But how do we know? How has the uniquely complicated palimpsest of land tortoises been deciphered?
We can draw a family tree of extant chelonians, using all available evidence including molecular genetics. The diagram below is adapted from a paper by Walter Joyce and Jacques Gauthier. Aquatic groups are shown in blue, terrestrial in orange. I’ve taken the liberty of colouring the ‘ancestral’ blobs blue when the majority of their descendant groups are blue. Today’s land tortoises constitute a single branch, nested among branches consisting of aquatic turtles.
This suggests that modern land tortoises, unlike most land reptiles and mammals, have not stayed on land continuously since their fish ancestors (who were also ours) emerged from the sea. Land tortoises’ ancestors were among those who, like whales and dugongs, went back to the water. But, unlike whales and dugongs, they then re-emerged back onto the land. I suppose this means I should reluctantly admit that American terminology has something going for it. As it turns out, what we British call tortoises are just sea turtles who turned turtle and returned to the land. They’re terrestrial turtles. No, I can’t do it. My upbringing leads me to go on calling them tortoises, but I’ll curb my tendency to wince at a phrase like ‘desert turtles’. In any case, what is interesting from the point of view of the genetic book of the dead is this: where reversals are concerned, land tortoises appear to have the most complicated palimpsests of all, with the largest number of almost perverse-seeming reversals.
Modern land tortoise
Moreover, it appears that our modern land tortoises may not be the first of their kind to achieve this remarkable double doubling-back. What looks like an earlier case occurred in the Triassic era. Two genera, Proganochelys and Palaeochersis, date way back to the first great age of dinosaurs, indeed long before the more spectacular and famous giant dinosaurs of the Jurassic and Cretaceous. It appears that they lived on land. How can we know? This is a good opportunity to return to our ‘future scientist’ SOF, faced with an unknown animal, and invite her to ‘read’ its environment from its skeleton. Fossils present the challenge in earnest because we can’t watch them living – whether swimming or walking – in their environment.
Proganochelys
So, what might SOF say of those enigmatic fossils, Proganochelys and Palaeochersis? Their feet don’t look like swimming flippers. But can we be more scientific about this? Joyce and Gauthier, whom we’ve already met, used a method that can point the way for anyone who wants to quantitatively decipher the genetic book of the long dead. They took seventy-one living species of chelonians whose habitat is known, and made three key measurements of their arm bones, the humerus (upper arm), the ulna (one of the two forearm bones), and the hand, as a percentage of total arm length. They plotted them on triangular graph paper. Triangular plotting makes convenient use of a proof in Euclidean geometry. From any point inside an equilateral triangle, the lengths of perpendiculars dropped to the three sides add up to the same value. This provides a useful technique for displaying three variables when the three are proportions that add up to a fixed number such as one, or percentages that add up to 100. Each coloured point represents one of the seventy-one species. The perpendicular distances of a point from each of the three lines of the big triangle represent the lengths of their three skeletal measurements. And when you colour-code the species according to whether they live in water or on land, something significant leaps off the page. The coloured points elegantly separate out. Blue points represent species living in water, yellow points species living on land. Green points represent genera that spend time in both environments and they, satisfyingly, occupy the region between the blues and yellows.
So now, the interesting question is, where do the two ancient fossil species, Palaeochersis and Proganochelys, fall? They are represented by the two red stars. And there’s little doubt about it. The red stars fall among the yellow points, the dry-land species of modern tortoises. They were terrestrial tortoises. The two stars fall fairly close to the green points, so maybe they didn’t stray far from water. This kind of method shows one way in which our hypothetical SOF might ‘read’ the environment of any hitherto unknown animal – and hence read the environment in which its ancestors were naturally selected. No doubt SOF will have more advanced methods at her disposal, but studies such as this one might point the way.
Palaeochersis and Proganochelys, then, were landlubbers. But had they stayed on land ever since their (and our) fishy ancestors crawled out of the sea? Or did they, like modern land tortoises, number sea turtles among their forebears? To help decide this, let’s look at another fossil. Odontochelys semitestacea lived in the Triassic, like Palaeochersis and Proganochelys but earlier. It was about half a metre long, including a long tail, which modern chelonians lack. The ‘Odonto’ in the generic name records the fact that it had teeth, unlike all modern chelonians, who have something more like a bird’s beak. And the specific name semitestacea testifies to its having only half a shell. It had a ‘plastron’, the hard shell that protects the belly of all chelonians, but it lacked the domed upper shell. The ribs, however, were flattened like those that support the shell in a normal chelonian.
The fossil was discovered in China and described by a group of scientists led by Li Chun. They believe Odontochelys, or something like it, is ancestral to all chelonians and that the turtle shell evolved ‘from the bottom up’. They referred to the Joyce and Gauthier paper on forelimb proportions and concluded that Odontochelys was aquatic. In case you’re wondering what was the use of half a shell, sharks (who have been around since long before any of this story) often attack from below, so the armoured belly might have been anti-shark. If we accept this interpretation, it again suggests that the chelonian shell evolved in water. Against land predators we would not expect that the breastplate should be the first piece of armour to evolve. Quite the reverse. Odontochelys was probably something like a swimming lizard, a sort of Galapagos marine iguana but armoured with a large ventral breastplate.
Although it’s controversial, the Chinese scientists favour the view that an aquatic turtle like Odontochelys, with its half shell, was ancestral to chelonians. Like all reptiles, it would have been descended from terrestrial, lizard-like ancestors, perhaps something like Pappochelys. If they are right that the chelonian shell evolved, Odontochelys-style, from the bottom up in shark-infested waters, what can we say about Palaeochersis and Proganochelys out on the land?
Odontochelys
It would seem that these represent an earlier emergence from water, an earlier incarnation of doubling-back terrestrial tortoises, to parallel today’s behemoths of Galapagos and Aldabra, who evolved from a later generation of aquatic turtles. In any case, the group we know as land tortoises stand as poster child for the very idea of an elaborate palimpsest. Not only did they leave the water for the land, return to water, and then double back onto the land again. They may even have done it twice! The doubling-back was achieved first by the likes of Proganochelys, and then again, independently, by our modern land tortoises. Maybe some went back to water yet again. It wouldn’t surprise me if some freshwater terrapins represent such a triple reversal, but I know of no evidence. Even one doubling-back is remarkable enough.
Pappochelys
If this giant Galapagos tortoise could sing a Homeric epic of its ancestors, its DNA-scored Odyssey would range from ancient legends of Devonian fishes, through lizard-like creatures roaming Permian lands, back to the sea with Mesozoic turtles, and finally returning to the land a second time. Now that’s what I call a palimpsest!
Giant Galapagos tortoise
Who Sings Loudest
I said in Chapter 1 that the palimpsest chapter would return to the question of the relative balance between recent scripts and ancient ones. It is time to do so. You might conjecture something like the scriptural rule for internal Koranic contradictions: later verses supersede earlier ones. But it’s not as simple as that. In the genetic book of the dead, older scripts of the palimpsest can amount to ‘constraints on perfection’.
Famous cases of evolutionary bad design, such as the vertebrate retina being installed back to front, or the wasteful detour of the laryngeal nerve (see below), can be blamed on historical constraints of this kind.
‘Can you tell me the way to Dublin?’
‘Well, I wouldn’t start from here.’
The joke is familiar to the point of cliché, but it strikes to the heart of our palimpsest priority question. Unlike an engineer who can go back to the drawing board, evolution always has to ‘start from here’, however unfavourable a starting point ‘here’ may be. Imagine what the jet engine would look like if the designer had had to start with a propellor engine on his drawing board, which he then had to modify, step by tinkering step, until it became a jet engine. An engineer starting with the luxury of a clean drawing board would never have designed an eye with the ‘photocells’ facing backwards, and their output ‘wires’ being obliged to travel over the surface of the retina and eventually dive through it in a blind spot on their way to the brain. The blind spot is worryingly large, although we don’t notice it because the brain, in building its constrained virtual reality model of the world, cunningly fills in a plausible replacement for the missing patch on the visual field. I suppose such guesswork could be dangerous if a hazard happened to fall on the blind spot at a crucial moment. But this piece of bad design is buried deep in embryology. To change it in order to make the end product more sensible would require a major upheaval early in the embryonic development of the nervous system. And the earlier in embryology it is, the more radical and difficult to achieve. Even if such an upheaval could at length be achieved, the intermediate evolutionary stages on the way to the ultimate improvement would probably be fatally inferior to the existing arrangement, which works, after all, pretty well. Mutant individuals who began the long trek to ultimate improvement would be out-competed by rivals who coped adequately with the status quo. Indeed, in the hypothetical case of reforming the retina, they would probably be totally blind.
You can call the backwards retina ‘bad design’ if you wish. It’s a legacy of history, a relic, an older palimpsest script partially over-written. Another example is the tail of humans and other apes, prominent in the embryo, shrunk to the coccyx in the adult. Also faintly traced in the palimpsest is our sparse covering of hair. Once useful for heat insulation, it is now reduced to a relic, still retaining its now almost pointless erectile properties in response to cold or emotion.
The recurrent laryngeal nerve in a mammal or a reptile serves the larynx. But instead of going directly to its destination, it shoots straight past the larynx, on its way down the neck into the chest, where it loops around a major artery and then rushes all the way back up the neck to the larynx. If you think of it as design, this is obviously rotten design. The length of the detour in the giant dinosaur Brachiosaurus would have been about 20 metres. In a giraffe it is still impressive, as I witnessed at first hand when, for a Channel Four documentary called Inside Nature’s Giants, I assisted in the dissection of a giraffe, who had unfortunately died in a zoo. Who knows what inefficiencies or outright errors might have resulted from the transmission delay that such a detour must have imposed. But natural selection is not wantonly silly. It wasn’t originally bad design in our fishy ancestors when the nerve in question went straight to its end organ – not larynx, for fish don’t have a larynx. Fish don’t have a neck either. When the neck started to lengthen in their land-dwelling descendants, the marginal cost of each small lengthening of the detour was small compared to what would have been the major cost of radically reforming embryology to re-route the nerve along a ‘sensible’ path, the other side of the artery. Mutant individuals who began the embryologically radical evolutionary journey towards re-routing the laryngeal nerve would have been out-competed by rival individuals who made do with the working status quo. There’s a very similar example in the routing of the tube connecting testis to penis. Instead of taking the most direct route, it loops over the tube connecting kidney to bladder: an apparently pointless detour. Once again, the bad design is a constraint buried deep in embryology and deep in history.
Recurrent laryngeal nerve
‘Buried deep in embryology and deep in history’ is another way of saying ‘buried deep under layers of younger scripts in the palimpsest’. Far from a ‘Koranic’ type of rule in which ‘Later trumps Earlier’, we might be tempted to toy with the reverse, ‘Earlier trumps Later’. But that won’t do either. The selection pressures that winnowed our recent ancestors are probably still in force today. So, to change the metaphor from a book to a cacophony of voices, the youngest voice, in its youthful vigour, might have something of a built-in advantage. Not an overriding advantage, however. I’d be content with the more cautious claim that the genetic book of the dead is a palimpsest made up of scripts ranging from very old to very young and including all intermediates between. If there are general rules governing relative prominence of old versus young or intermediate, they must wait for later research.
Biologists have long recognised morphological features that lie conservatively in basal layers of the palimpsest. An example is the vertebrate skeleton: the dorsally placed spinal column, with a skull and tail at the two ends, the column made of serially segmented vertebrae through which runs the body’s main trunk nerve. Then the four limbs that sprout from it, each consisting of a single, typically long bone (humerus or femur) connected to two parallel bones (radius/ulna, tibia/fibula); then a cluster of smaller bones terminating in five digits. It’s always five digits in the embryo, although in the adult some may be reduced or even missing. Horses have lost all but the middle digit, which bears the hoof (a massively enlarged version of our nail). A group of extinct South American herbivores, the Litopterns, included some species, such as Thoatherium (left), which independently evolved almost exactly the same hoofed limb as the horse (right). The two limbs have been drawn the same size for ease of comparison, but Thoatherium was considerably smaller than a typical horse, about the size of a small antelope. Think of the horse in the picture as a Shetland pony!
Litoptern Horse
Arthropods have a different Bauplan (building plan or body plan), although they resemble vertebrates in their segmented pattern of units repeated fore-and-aft in series. Annelid worms such as earthworms, ragworms, and lugworms also have a segmented body plan, and they share with arthropods the ventral position of the main nerve. This difference in position of the body’s main nerve has led to the provocative speculation that we vertebrates may be descended from a worm who developed the habit of swimming upside down – a habit that has been rediscovered by brine shrimps today. If this is so, the ‘basic’ vertebrate Bauplan may not be quite as basic as we thought.
Brine shrimp
But, important and even stately as such morphological bauplans are, morphology has become overshadowed by molecular genetics when it comes to reading the lower layers of biological palimpsests in order to reconstruct animal pedigrees. Here’s a neat little example. South American trees are inhabited by two genera of tree sloths, the two-toed and the three-toed. There was also a giant ground sloth, which went extinct some ten or twelve thousand years ago, just recently enough to supply molecular biologists with DNA. Since the two tree sloths are so alike, in both anatomy and behaviour, it was natural to suppose that they are closely related, descended from a tree-dwelling ancestor quite recently, and more distantly related to the giant ground sloth. Molecular genetics now shows, however, that the two-toed tree sloth is closer to the giant sloth – all 4 tonnes of it – than it is to the three-toed tree sloth.
Long before modern molecular taxonomy burst onto the scene, morphological evidence aplenty showed us that dolphins are mammals not fish, for all that they look and behave superficially like large fish – mahi-mahi are indeed sometimes called ‘dolphinfish’ or even ‘dolphins’. But although science long knew that dolphins and whales were mammals, no zoologist was prepared for the bombshell released in the late twentieth century by molecular geneticists when they showed, beyond all doubt, that whales sprang from within the artiodactyls, the even-toed, cloven-hoofed ungulates. The closest living cousins of hippos are not pigs, as I was taught as a zoology undergraduate. They are whales. Whales don’t have hooves to cleave. Indeed, their land ancestors probably didn’t actually have cloven hooves, but broad four-toed feet, as hippos do today. Nevertheless, they are fully paid-up members of the artiodactyls. Not even outliers to the rest of the artiodactyls but buried deep within them, closer cousins to hippos than hippos are to pigs or to other animals who actually have cloven hooves. A staggering revelation that nobody saw coming. Molecular gene sequencing may have other shocks in store for us yet.
Just as a computer disc is littered with fragments of out-of-date documents, animal genomes are littered with genes that must once have done useful work but now are never read. They’re called pseudogenes – not a great name, but we’re stuck with it. They are also sometimes called ‘junk’ genes, but they aren’t ‘junk’ in the sense of being meaningless. They are full of meaning. If they were translated, the product would be a real protein. But they are not translated. The most striking example I know concerns the human sense of smell. It is notoriously poor compared with that of coursing hounds, seal-hunting polar bears, truffle-snuffling sows, or indeed the majority of mammals. You’d be right to credit our ancestors with feats of smell discrimination that would amaze us if we could go back and experience them. And the remarkable fact is that the necessary genes, large numbers of them, are still with us. It’s just that they are never read, never transcribed, never rendered into protein. They’ve become sidelined as pseudogenes. Such older scripts of the DNA palimpsest are not only there. They can be read in total clarity. But only by molecular biologists. They are ignored by the natural reading mechanisms of our cells. Our sense of smell is frustratingly poor compared to what it could be if only we could find a way to turn on those ancient genes that still lurk within us. Imagine the high-flown imagery that mutant wine connoisseurs might unleash. ‘Black cherry offset by new-mown hay in the attack, with notes of lead pencil in the satisfying finish’ would be tame by comparison.
Hippos are closer cousins to whales than to any other ungulates
The analogy between genome and computer disc is a more than usually close one. If I invite my computer to list the documents on my hard disc, I see an orderly array of letters, articles, chapters of books, spreadsheets of accounts, music, holiday photos, and so on. But if I were to read the raw data as it is actually laid out on the disc, I would face a phantasmagoria of disjointed fragments. What seems to be a coherent book chapter is made up of here a scrap, there a fragment, dotted around the disc. We think it’s coherent only because system software knows where to look for the next fragment. And when I delete a document, I may fondly imagine it has gone. It hasn’t. It’s still sitting where it was. Why waste valuable computer-time to expunge it? All that happens when you delete a document is that the system software marks its territory on the disc as available to be over-written by other stuff, as and when the space is needed. If the territory is not needed it will not be over-written and the original document, or parts of it, will survive – legible but never actually read – like the smell pseudogenes that we still possess but don’t use. This is why, if you want to remove incriminating documents from your computer, you must take special steps to expunge them completely. Routine ‘deletion’ is not proof against hackers.
Pseudogenes are a lucid message from the past: a significant part of the genetic book of the dead. If she hadn’t already deduced it from other cues, SOF would know, from the graveyard of dead genes littering the genome, that our ancestors inhabited a world of smells richer than we can imagine. The DNA tombstones are not only there, the lettering on them is more or less clear and distinct. Incidentally, these molecular tombstones are a huge embarrassment to creationists. Why on earth would a Creator clutter our genome with smell genes that are never used?
This chapter has been mainly concerned with deep layers of the palimpsest, the legacies of more ancient history. In the next four chapters we turn to layers nearer the surface. This amounts to a look at the power of natural selection to override the deep legacies of history. One way to study this is to pick out convergent resemblances between unrelated animals. Another way is ‘reverse engineering’. To which we now turn.
4 Reverse Engineering
One of the central messages of this book – that the meticulously detailed perfection we see in the external appearance of animals pervades the whole interior too – obviously rests on an assumption that something approaching perfection is there in the first place. There, and to be expected on Darwinian grounds. It’s an assumption that has been criticised and needs defending, which is the purpose of the next three chapters.
The most prominent critics of what they called ‘adaptationism’ were Richard Lewontin and Stephen Gould, both at Harvard, both distinguished, in their respective fields of genetics and palaeontology. Lewontin defined adaptationism as ‘That approach to evolutionary studies, which assumes without further proof that all aspects of the morphology, physiology and behavior of organisms are adaptive optimal solutions to problems.’ I suppose I am closer to being an adaptationist than many biologists. But I did devote a chapter of The Extended Phenotype to ‘Constraints on Perfection’. I distinguished six categories of constraint, of which I’ll mention five here.
1. Time lags (the animal is out of date, hasn’t yet caught up with a changing environment). Quadrupedal relics in the human skeleton supply one example.
2. Historical constraints that will never be corrected (e.g. recurrent laryngeal nerve, back-to-front retina).
3. Lack of available genetic variation (even if natural selection would favour pigs with wings, the necessary mutations never arose).
4. Constraints of costs and materials (even if pigs could use wings for certain purposes, and even if the necessary mutations were forthcoming, the benefits are outweighed by the cost of growing them).
5. Mistakes due to environmental unpredictability or malevolence (e.g. when a reed warbler feeds a baby cuckoo it is an imperfection from the point of view of the warbler, engineered by natural selection on cuckoos).
If such constraints are allowed for and admitted, I think I could fairly be called an adaptationist. There remains the point, which will occur to many people, that certain ‘aspects of the morphology, physiology and behavior of organisms’ may be too trivial for natural selection to notice them. They pass under the radar of natural selection. If we are talking about genes as molecular geneticists see them, then it is probably true that most mutations pass unnoticed by natural selection. This is because they are not translated into a changed protein, therefore nothing changes in the organism. They are literally neutral, in the sense of the Japanese geneticist Motoo Kimura, not mutations at all in the functional sense. It’s like changing the font in which an instruction is printed, from Times New Roman to Helvetica. The meaning is exactly the same after the mutation as it was before. But Lewontin had sensibly excluded such cases when he specified ‘morphology, physiology and behavior’. If a mutation affects the morphology, physiology, or behaviour of an animal, it is not neutral in the trivial ‘changing the font’ sense.
Nevertheless, some people still have an intuitive feeling that many mutations are probably still negligible, even if they really do affect morphology, physiology, or behaviour. Even if there’s a real change visible in the animal’s body, mightn’t it be too trivial for natural selection to bother about? My father used to try to persuade me that the shapes of leaves, say the difference between oak shape and beech shape, couldn’t possibly make any difference. I’m not so sure, and this is where I tend to part company with the sceptics like Lewontin. In 1964, Arthur Cain (my sometime tutor at Oxford) wrote a polemical paper in which he forcefully (some might say too forcefully) argued the case for what he called ‘The Perfection of Animals’. On ‘trivial’ characters, he argued that what seems trivial to us may simply reflect our ignorance. ‘An animal is the way it is because it needs to be’ was his slogan, and he applied it both to so-called trivial characters and to the opposite – fundamental features like the fact that vertebrates have four limbs and insects have six. I think he was on firmer ground where so-called trivial characters were concerned, for instance in the following memorable passage:
But perhaps the most remarkable functional interpretation of a ‘trivial’ character is given by Manton’s work on the diplopod [a kind of millipede] Polyxenus, in which she has shown that a character formerly described as an ‘ornament’ (and what could sound more useless?) is almost literally the pivot of the animal’s life.
Even in those cases where the character is very close to being genuinely trivial, natural selection may be a more stringent judge than the human eye. What is trivial to our eyes may still be noticed by natural selection when, in Darwin’s words, ‘the hand of time has marked the long lapse of ages’. JBS Haldane made a relevant hypothetical calculation. He assumed a selection pressure in favour of a new mutation so weak as to seem trivial: for every 1,000 individuals with the mutation who survive, 999 individuals without the mutation will survive. That selection pressure is much too weak to be detected by scientists working in the field. Given Haldane’s assumption, how long will it take for such a new mutation to spread through half the population? His answer was a mere 11,739 generations if the gene is dominant, 321,444 generations if it is recessive. In the case of many animals, that number of generations is an eye-blink by geological standards. A relevant point is that, however seemingly trivial a change may be, the mutated gene has very many opportunities to make a difference – via all the thousands of individuals in whose bodies it finds itself over geological time. Moreover, even though a gene may have only one proximal effect, because embryology is complicated, that one primary effect may ramify. As a result, the gene appears to have many seemingly disconnected effects in different parts of the body. These different effects are called pleiotropic, and the phenomenon is pleiotropism. Even if one of a mutation’s effects was truly negligible, it’s unlikely that all its pleiotropic effects would be.
With all due recognition to the various constraints on perfection, I think a fair working hypothesis is one that, surprisingly, Lewontin himself expressed, admittedly long before his attacks on adaptationism: ‘That is the one point, which I think all evolutionists are agreed upon, that it is virtually impossible to do a better job than an organism is doing in its own environment.’
Some biologists prefer to say natural selection produces animals that are just ‘good enough’ rather than optimal. They borrow from economists the term ‘satisficing’, a jargon word that they love to namedrop. I’m not a fan. Competition is so fierce, any animal who merely satisficed would soon be out-competed by a rival individual who went one better than satisficing. Now, however, we have to borrow from engineers the important notion of local optima. If we think of a landscape of perfection where improvement is represented by climbing hills, natural selection will tend to trap animals on the top of the nearest relatively low hill, which is separated from a high mountain of perfection by an impassable valley. Going down into the valley is the metaphor for getting temporarily worse before you can get better. There are various ways, known to both biologists and engineers, whereby hill-climbers can escape local optima and make their way to ‘broad, sunlit uplands’, though not necessarily to the highest peak of all. But I shall leave the topic now.
Engineers assume that a mechanism designed by somebody for a purpose will betray that purpose by its nature. We can then ‘reverse engineer’ it to discern the purpose that the designer had in mind.
Reverse engineering is the method by which scientific archaeologists reconstructed the purpose of the Antikythera mechanism, a mesh of cogwheels found in a sunken Greek ship dating from about 80 BC. The intricate gearing was exposed by modern techniques such as X-ray tomography. Its original purpose has been reverse engineered as an ancient equivalent of an analogue computer, designed to simulate the movement of heavenly bodies according to the system of epicycles later associated with Ptolemy.
Reverse engineering assumes that the object facing us had a purpose in the mind of a competent designer, a purpose that can be guessed. The reverse engineer sets up a hypothesis as to what a sensible designer might have had in mind, then checks the mechanism to see if it fits the hypothesis. Reverse engineering works well for animal bodies as well as for man-made machines. The fact that the latter were deliberately designed by conscious engineers while the former were designed by unconscious natural selection makes surprisingly little difference: a potential for confusion readily exploited by creationists with their characteristically eager appetite for it. The grace of a tiger and of its prey could not easily, it would seem, be bettered:
What immortal hand or eye
Could frame thy fearful symmetry.
Indeed, animals sometimes seem too symmetrically designed, to their own detriment: remember the owl pictured in Chapter 2.
Darwin had a section of Origin of Species called ‘Organs of extreme perfection and complication’. It’s my belief that such organs are the end products of evolutionary arms races. The term ‘armament race’ was introduced to the evolution literature by the zoologist Hugh Cott in his book on Animal Coloration published in 1940, during the Second World War. As a former officer in the regular army during the First World War, he was well placed to notice the analogy with evolutionary arms races. In 1979, John Krebs and I revived the idea of the evolutionary arms race in a presentation to the Royal Society. Whereas an individual predator and its prey run a race in real time, arm races are run in evolutionary time, between lineages of organisms. Each improvement on one side calls forth a counter-improvement on the other. And so the arms race escalates, until called to a halt, perhaps by overwhelming economic costs, just like military arms races.
Antelopes could always outrun lions, and vice versa, but only by counter-productive investment of too much ‘capital’ in leg muscles at the expense of other calls on investment in, say, milk production. If the language of ‘investment’ sounds too anthropomorphic, let me translate. Individuals who excel in running speed would be out-competed by slightly slower individuals who divert resources more usefully, from athletic legs into milk. Conversely, individuals who overdo milk production are out-competed by rivals who economise on milk production and put the energy saved into running speed. To quote the economists’ hackneyed saw, there’s no such thing as a free lunch. Trade-offs are ubiquitous in evolution.
I think arms races are responsible for every biological design impressive enough to, in the words of David Hume’s Cleanthes, ravish ‘into admiration all men who have ever contemplated them’. Adaptations to ice ages or droughts, adaptations to climate change, are relatively simple, less prone to ravish into admiration because climate is not out to get you. Predators are. So are prey, in the indirect sense that, the more success prey achieve at evading capture, the closer their would-be predators come to starvation. Climate doesn’t menacingly change in response to biological evolution. Predators and prey do. So do parasites and hosts. It is the mutual escalation of arms races that drives evolution to Cleanthean heights, such as the feats of mimetic camouflage we met in Chapter 2, or the sinister wiles of cuckoos that will amaze us in Chapter 10.
And now for a point that at first sight seems negative. Whereas animals look beautifully designed on the outside, as soon as we cut them open, we seem superficially to get a different impression. An untutored spectator of a mammal dissection might fancy it a mess. Intestines, blood vessels, mesenteries, nerves seem to spill out all over the place. An apparent contrast with the sinewy elegance of, say, a leopard or antelope when seen from outside. On the face of it, this might seem to contradict the conclusion of Chapter 2. The central point stated there was that the perfection typical of the outer layer must pervade every internal detail as well. Now compare your heart with the village pump, which seems neatly and simply fit for purpose. Admittedly, the heart is two pumps in one, serving the lungs on the one hand and the rest of the body on the other. But you could be forgiven for wondering whether a more minimally elegant pump might profitably have been designed.
Each eye sends information to the brain on the opposite side. Muscles on the left side of the body are controlled by the right side of the brain and vice versa. Why? I suppose we are again dealing with ancient scripts long buried in low strata of the palimpsest. Given such deep constraints, natural selection busily tinkers with the upper-level scripts, making good, as far as possible, the inevitable imperfections imposed by deeper levels. The backwards wiring of the vertebrate retina is well compensated by post-hoc making good. You might think that ‘from such warped beginnings nothing debonair can come’. The great German scientist Hermann von Helmholtz is said to have remarked that if an engineer had produced the eye for him, he would have sent it back. Yet after tweaking, ‘in post’ as movie-makers say, the vertebrate eye can become a fine piece of optical kit.
Two pumps
Why do animals look obviously well designed on the visible outside but apparently less so inside? Does the clue reside in that word ‘visible’? In the case of Chapter 2’s camouflage, and also ornamental extravaganzas like the peacock’s fan, (human) eyes are admiring the external appearance of the animal, and (peahen or predator) eyes are doing the natural selection of external appearance: similar vertebrate eyes in both cases. No wonder external appearance looks more perfectly ‘designed’ than internal details. Internal details are every bit as subject to natural selection, but they don’t obviously look that way because it is not selection by eyes.
That explanation won’t do for the streamlined flair of a sprinting cheetah, or its equally graceful Tommy prey. Those beauties did not evolve for the delectation of eyes but to satisfy the lifesaving requirements of speed. Here it would seem to be the laws of physics that impose what we perceive as elegance: as it is for the aerodynamic grace of a fast jet plane. Aesthetics and functionality converge on the same stylish elegance.
I confess that I find the interior of the body bewilderingly complex. I might even go so heretically far as to dismiss it as a mess. But I am a naive amateur where internal anatomy is concerned. A consultant surgeon whom I have consulted (what else should one do with a consultant?) assures me in no uncertain terms that, to his trained eye, internal anatomy has a beautiful elegance, everything neatly stowed away in its proper place, all shipshape and Bristol fashion. And I suspect that ‘trained eye’ is exactly the point. In Chapter 1, I contrasted the ear’s effortless deciphering of the spoken word ‘sisters’ with the eye’s fumbling impotence to see anything beyond a wavy line on an oscilloscope. My eye sees elegance on the outside. Then when I cut an animal open, my amateur eye contemplates only a mess. The trained surgeon sees stylish perfection of design, inside as well as out. It is, at least partly, the story of ‘sisters’ all over again. Yet there is more to be said. Something about embryology.
Veins, nerves, arteries, lymphatic system – a whole armful of complexity
The sceptic vocally doubts whether it can really matter whether this vein in the arm passes over or under that nerve. Maybe it doesn’t in the sense that, if their relationship could be reversed with a magic wand, the person’s life might not suffer, and might even improve. But I think it does matter in another sense – the sense that solved the riddle of the laryngeal nerve. Every nerve, blood vessel, ligament, and bone got that way because of processes of embryology during the development of the individual. Exactly which passes over or under what may or may not make a difference to their efficient working, once their final routing is achieved. But the embryological upheaval necessary to effect a change, I conjecture, would raise problems, or costs, sufficient to outweigh other considerations. Especially if the embryological upheaval strikes early. The intricate origami of embryonic tissue-folding and invagination follows a strict sequence, each stage triggering its successor. Who can say what catastrophic downstream consequences might flow from a change in the sequence – the kind of change necessary to re-route a blood vessel, say.
Moreover, perhaps Darwinian forces have worked on human perception to sharpen our appreciation of external appearances as opposed to internal details. At all events, I revert with confidence to the conclusion of Chapter 2. It is entirely unreasonable to suppose that the chisels of natural selection, so delicately adept at perfecting external and visible appearance, should suddenly stop at the animal’s skin rather than working their artistry inside. The same standards of perfection must pervade the interior of living bodies, even if less obviously to our eyes. To dissect the non-obvious and make it plain will be the business of future zoological reverse engineers, and it is to them that I appeal.
Ideally, reverse engineering is a systematic scientific project, perhaps involving mathematical models in the sense discussed in Chapter 1. More usually, at present at least, it involves intuitive plausibility arguments. If the object in question has a lens in front of a dark chamber, focusing a sharp image on a matrix of light-sensitive units at the back of the chamber, any person living after the invention of the camera can instantly divine the purpose for which it evolved. But there will be numerous details that will matter and will require sophisticated techniques of reverse engineering, including mathematical analysis. In this chapter our reverse engineering is mostly of the intuitive, common sense kind, like the example of the eye and the camera.
Reverse engineering is supplemented by comparison across species. If SOF is confronted with a hitherto unknown animal, she can read it both by pure reverse engineering (‘a device designed by an engineer to do such-and-such would probably look rather like this’) and also by comparison with known species (‘this organ looks like an organ in so-and-so species that we already know, and it probably is used for the same purpose’).
An indirect version of reverse engineering can be used to infer aspects of an animal that cannot be seen, for example when all we have is fossils. We have no fossil evidence about the heart of a dinosaur. But fossils tell us that some sauropods such as Brontosaurus and the even larger Sauroposeidon had extraordinarily long necks. The CGI artists of Jurassic Park beautifully illustrated the dominant view that they reached up to browse tall trees. Like giraffes, only more so. Now the engineer steps in and invokes simple laws of physics to dictate that the heart would have had to generate very high pressure in order to push blood to the height of the animal’s brain when plucking leaves from a high tree. You can’t suck water through a straw that’s more than 10.3 metres tall, even if your sucking is powerful enough to generate a perfect vacuum in the straw. Sauroposeidon’s head probably overtopped its heart by about that much, which gives an idea of the pressure that the heart would have had to generate to push blood up to the head. Without ever seeing a fossilised sauropod heart, the engineer infers that it must have generated especially high pressure. Either that or that they didn’t browse trees at all.
I can’t resist reflecting that the difficulty of pumping blood to a head so high might have been partially responsible for those large dinosaurs outsourcing some brain functions to a second ‘brain’, in the pelvis. Also, I never miss an excuse to quote Bert Leston Taylor’s delightfully witty poem on the subject.
Behold the mighty dinosaur,
Famous in prehistoric lore,
Not only for his power and strength
But for his intellectual length.
You will observe by these remains
The creature had two sets of brains –
One in his head (the usual place),
The other at his spinal base,
Thus he could reason A priori
As well as A posteriori.
No problem bothered him a bit
He made both head and tail of it.
So wise was he, so wise and solemn,
Each thought filled just a spinal column.
If one brain found the pressure strong
It passed a few ideas along.
If something slipped his forward mind
’Twas rescued by the one behind.
And if in error he was caught
He had a saving afterthought.
As he thought twice before he spoke
He had no judgment to revoke.
Thus he could think without congestion
Upon both sides of every question.
Oh, gaze upon this model beast,
Defunct ten million years at least.
The pelvic ‘brain’ would have been about on a level with the heart, and impressively much lower than the head.
Alas, there are no sauropods for us to test such ideas, and we must make do with the next best thing, which is the giraffe. Though not in the same league as a giant dinosaur, the giraffe’s head is quite lofty enough to require an abnormally high blood pressure, out of the ordinary for a mammal. And the following graph bears out the expectation.
I have plotted mean arterial blood pressure against the logarithm of body mass for a range of mammals from mouse to elephant. It’s best to use logarithms for the weights – otherwise it would be hard to fit mouse and elephant on the same page, with intermediate animals conveniently spread out between. The dotted line is the straight line that best fits the data. The line slopes upwards – larger animals tend to have higher blood pressure. Most species are pretty close to the line, meaning that their blood pressure is close to typical for their weight. But the big exception is the giraffe, which is far above the line. Its blood pressure is way higher than it ‘should be’ for an animal of its size. Surprisingly, other evidence shows that the giraffe heart is not especially large. It seems to be prevented from enlarging in evolution by the need to share the body cavity with large herbivorous guts. It achieves the extra-high blood pressure in a different way, by a greater density of heart muscle cells, an improvement that probably imposes costs of its own. Without ever seeing a Brontosaurus heart, we can predict that it too would have stood way above the line in the equivalent graph for reptiles.
The teeth of a hitherto unknown animal speak volumes, and this is fortunate because teeth, being necessarily hard enough to crunch food, are also hard enough to outlast anything else in the fossil record. Some important extinct species are known only from teeth. In the rest of this chapter, we shall use teeth and other biological food-processing devices as our example of choice. Look at this ancient skull. The first thing you notice is the scary canine teeth. You might reverse engineer these as being good for either fighting rivals or stabbing prey to death and holding onto them. Seeking further evidence, you might then look at the other teeth near the back of the jaw, the molars. They don’t mesh surface-to-surface in the way that ours or a horse’s do, but shear past each other like scissors as the jaws close. They seem designed to slice rather than to mill. This says ‘carnivore’. Well, obviously. But it’s only obvious because we are rather good at intuitive reverse engineering, and because we have living large carnivores like lions and tigers for comparison. It does no harm to make the reasoning explicit.
Sabretooth
Animals, perhaps because they are themselves made of meat, find meat relatively easy to digest, and carnivore intestines tend to be appropriately short. If SOF were handed an unknown animal, very long intestines would signal ‘herbivore’ to her. I’ll return to this. Meat, moreover, demands relatively little pre-processing with teeth before digestion. Cutting off substantial chunks to be swallowed whole is sufficient. Plants may be easier to catch than animals – they don’t run away – but they make up for it by being harder to process once you’ve caught them. Plant cells are different from animal cells. They have thick walls toughened by cellulose and silica. For this and other reasons, herbivores need to grind their food into tiny pieces before it is ready to pass into the gut for further breaking up chemically into even smaller pieces. Herbivore teeth are millstones which, like the mills of God, grind slowly and they grind exceeding small. Carnivore teeth don’t resemble millstones and they don’t grind. They cut, shearing through fibrous tissues.
Looking at the back teeth of the above skull, then, we confirm our initial diagnosis from the dagger-like canines, and convincingly reverse-engineer our scary specimen as telling a tale of ancestral carnivores. Moving to the rest of the skull, we note that the articulation of the lower jaw allows only up-and-down movement suitable for scissoring food, not side-to-side movement such as would be needed for milling. Up and down is putting it mildly: the sheer size of the gape is formidable. As you’ll have guessed, this is the skull of a sabretooth cat, often called sabretooth tiger, although it could just as well be called sabretooth lion. It was a big cat, Smilodon, not closer to any particular modern big cat than to any other. Contemporaneous with Smilodon, there were true lions in America, now extinct, bigger than Smilodon, bigger than African lions.
How did Smilodon use those formidable fangs? It’s notable that among modern carnivores, the cat family (Felidae) runs to long canine teeth more than the dog family (Canidae), despite the name ‘canine’ for the teeth. A plausible reason is as follows. Canids are mostly pursuit-hunters. They run their prey down to exhaustion. When they finally catch up with it, the poor spent creature is in no state to escape. Killing it is not a problem. Just start eating! Felids, on the other hand, tend to be stalkers and ambushers. Their prey, when they first pounce upon it, is fresh and in a strong position to escape. Either a swift killing stab or an inescapable grip is desirable, and long penetrating canines answer both needs. Among living cats, the clouded leopard sports the nearest approach to the sabres of Smilodon. Clouded leopards spend much of their time in trees and drop on their prey. Long, sharp daggers would be especially suited to subduing an animal taken by surprise from above, not ‘heated in the chase’ and in full possession of its powers.
Turning to other parts of the skull of Smilodon, we notice that the eye sockets point forward, indicating binocular vision, useful for pouncing on prey and no good for seeing danger creeping up from behind. Sabretooths had no need to watch their back. Herbivorous animals, whose ancestors became ancestors by virtue of noticing would-be killers, tend to have lookout eyes pointing sideways, giving almost 360° vision, calculated to spot a predator stalking from any direction.
Clouded leopard
So now, suppose you are presented with the skull below. It’s obviously very different. The eyes look sideways, as if scanning all around for danger while not being especially concerned with what is ahead. Probably an animal with a need to fear predation, then. The incisor teeth at the front look well suited to cropping grass. Most noticeable are the back teeth. They are broad grinders rather than sharp slicers, and they meet their opposite numbers in a precise fit when the jaws close. Their whole shape with its articulation is well suited to grinding plant food into very small pieces, again confirming the suspicion that this animal’s genes survived in a world of grass or other plant food. And the lower jaw, unlike that of Smilodon, moves sideways as well as up and down, a good milling action. This fossil is Pliohippus, an extinct horse that lived in the Pliocene, probably in mortal fear of Smilodon.
Pliohippus
The contrast between the skulls of the carnivorous sabretooth and the herbivorous horse is stark and clear. There was an animal called Tiarajudens, one of those we used to call a mammal-like reptile (nowadays we’d call it an early mammal), which flourished perhaps 280 million years ago, before the great age of dinosaurs. It had impressive sabretooth canines, much like Smilodon, which indicate a carnivorous diet similar to that of the formidable cat. But the back teeth suggest that, along with other animals to whom it was related, it was in fact a herbivore. So, we have a mismatch. Why would a creature with grinding back teeth have canine teeth like Smilodon? Perhaps Tiarajudens was a herbivore equipped with daggers for defence against predators. Or perhaps, like modern walruses, for fighting against rivals of its own species, as elephants use their gigantic tusks (elephant tusks are enlarged incisor teeth, not canines as in walruses).
Walrus
Walruses have been seen using their (upper canine) tusks to lever themselves out of the water and to make holes in the ice. Anyway, Tiarajudens stands as a cautionary warning against over-hasty reverse engineering, looking at only one thing, in this case the canine teeth.
Hedgehog
Some mammals such as shrews and small bats eat insects. Dolphins eat fish. Though technically carnivorous, the dental demands of these diets are different. Insectivorous teeth are neither grinders nor cutters but piercers. They tend to have sharp points, well suited to piercing the external skeletons of insects. If SOF’s unknown specimen sported piercing teeth like those of this hedgehog, she’d suspect that its ancestors survived on a diet of insects and other arthropods. And that is correct, but they like earthworms too. Ants and termites are a special case (see below).
Common dolphin
Gavial
And now here’s the skull of a dolphin (top), and a gavial (bottom), to show typical fish-eating teeth and jaws. These two fish-eaters, a mammal and a crocodilian, have independently evolved pretty much the same dentition and jaw shape, an example of convergent evolution (which is the topic of Chapter 5). What’s the reverse-engineering explanation for this convergent resemblance? Fish-eaters, unlike, say, lions, are usually much larger than their prey. They don’t need to grind or cut or pierce their prey. Their prey is small enough to swallow whole. Long rows of small, pointed teeth are well equipped to grasp a slippery, soft fish and prevent it from escaping. And the slender jaws can snap shut on the fish without expelling a rush of water that might propel it out of harm’s way.
Ichthyosaur
If you were lucky enough to stumble upon a fossil like the above, you could apply the lesson of the previous paragraph: fish-eater. It’s an ichthyosaur such as we met in Chapter 3, a contemporary and relative of dinosaurs, member of a large group that went extinct somewhat earlier than the last of the dinosaurs. Both reverse engineering, and comparison with the dolphin and gavial pictures, speak to us loud and clear: its ancestors ate fish.
Killer whales (Orca) and sperm whales can be thought of as giant dolphins. They too eat prey smaller than themselves, and they too have long rows of dolphin-like teeth but hugely enlarged. Sperm whales have them only in the lower jaw (very occasionally in the upper jaw, and we may take this as a vestigial relic). Killer whales have them in both jaws. All other large whales, the so-called baleen whales, are filter feeders, sieving krill (crustaceans). They have no teeth at all (though, revealingly, their embryos have them and never use them). Their huge baleen filters are made of keratin, like hooves, fingernails, and rhinoceros horn. The reverse engineer would have no trouble in diagnosing a baleen whale as a trawler. Actually, they are better than trawlers, for they will target a huge aggregation of krill, and gulp it in with copious quantities of sea water, which is then forced out through the curtain of baleen, trapping the krill.
Ants and termites are colossally numerous. A specialist capable of penetrating an ant nest’s formidable defences can hoover up a bonanza of food denied to an ordinary insectivore like a hedgehog. And their dentition is correspondingly specialised. For this purpose, by the way, termites are honorary ants. Mammals who preferentially eat ants and/or termites are all called anteaters. There’s a group of three South American mammals whose name in English is ‘anteater’: the Giant Anteater, the Lesser Anteater, and the Silky Anteater.
Giant Anteater
Tamandua
Giant Anteater
Pangolin
Armadillo
Echidna
The Giant Anteater’s scientific name, Myrmecophaga, is simply Greek for ‘anteater’. You will already have concluded that, since other mammals also specialise in eating ants, ‘Anteater’ is not a great name for a taxonomic group. I’ll use a capital letter for the three South American ‘Anteaters’ and a lower-case letter for other mammals who eat ants (or termites).
The South American Anteaters push the anteating habit to its extreme. The skulls of two of them, Tamandua and the Giant Anteater Myrmecophaga, are pictured at the top of the page opposite. Notice the extreme prolongation of the snout and the total absence of teeth. You’d hardly recognise the Giant Anteater’s skull as a skull at all. All anteaters show the same features, if to a lesser extent. The pangolin has no teeth and a moderately long snout. Armadillos have a longer snout and rather small teeth. The aardvark or antbear of Africa has back teeth, but no teeth at all along most of its long snout. Myrmecobius, the numbat, marsupial anteater of Australia, has a long, pointy head. It has teeth but doesn’t use them for eating except in infancy. Adults seem to use them only for gripping and preparing nest material.
Tachyglossus, the spiny anteater or echidna of Australia and New Guinea, is as distant as you can get from all the above while still being a mammal. It’s an egg-laying mammal like the platypus, a leftover from the ‘mammal-like reptiles’ of the ancient supercontinent of Gondwana. But unlike the platypus, with which it shares deep palimpsest features, it does, as its English name suggests, eat ants and termites. And its rather weird-looking skull does indeed have a long, slender snout and no teeth. Let’s not get carried away, however. A slightly longer snout is possessed by the related echidna genus, Zaglossus, and Zaglossus eats almost nothing but earthworms. Evidently, we must be careful before we jump too precipitately to the conclusion that ‘long snout’ necessarily means anteater. Anteating is not the only habit capable of writing ‘long snout’ in the palimpsest.
What else might SOF use to diagnose an animal as an anteater? Myrmecophaga, the Giant Anteater of South America, whose hugely elongated skull we have already seen, has a giant-sized sticky tongue, which it can protrude to a length of 60 cm, having deployed its formidable claws to break into an ant or termite nest. Huge numbers of the insects stick to the tongue and are drawn in before the tongue shoots out again. Despite its great length, the tongue flicks out and in again at high speed, more than twice per second. Though none can quite match Myrmecophaga, creditably long, sticky tongues are also found, convergently evolved, in aardvarks and the unrelated aardwolves, who, unlike other members of the hyaena family, specialise in eating termites. Pangolins, too, have convergently evolved a long sticky tongue. That of the giant pangolin can be 40 cm long and is attached way back near the pelvis instead of to the hyoid bone in the throat, like ours. A pangolin can extend its tongue deep inside an ants’ nest, skilfully steering through the labyrinth of tunnels, turning left, turning right, leaving no subterranean avenue unexplored. Tamanduas also have a long sticky tongue but, in this case, their evolution was not independent of Myrmecophaga. They surely inherited the long tongue from their shared ancestor, also an anteater. The egg-laying spiny anteater too has a long, sticky tongue, and this time it really is convergent. As is that of the numbat, the marsupial anteater.
There are also physiological resemblances among anteating mammals, notably a low metabolic rate and low body temperature, convergently evolved enough times to impress our hypothetical SOF. However, a low metabolic rate is not exclusively diagnostic of an ant-eating habit. Sloths, befitting their name, also have a low metabolic rate. So do koalas, whom you could regard as a kind of marsupial equivalent of sloths. Both live up trees, eating relatively un-nutritious leaves, and both are slow moving, you might even say lethargic. The convergence doesn’t extend to both ends of the alimentary canal, however. Koalas defecate more than a hundred times per day, while sloths hold the record for the other extreme. They defecate about once per week, maybe because they laboriously climb down from the tree in order to do so.
Some of my reverse-engineering conjectures could be wrong. They are only provisional, to illustrate the point that the teeth of an animal, if properly read, will tell a story. In many cases, a story of ancient grassland prairies or leafy forests. Or, if the teeth resemble those of Smilodon or the clouded leopard, they speak to us of ambush and stalking. No doubt, if we could read them, every tooth we find could plunge us ever deeper into more specific, detailed stories. Teeth are enamelled archives of ancient history.
Teeth constitute the first food processor in the conveyor belt of digestion. The revealing differences between carnivores and herbivores continue on into the gut. Weight for weight, plants are not so nutritious as meat, so cows, for example, need to graze pretty continuously. Food passes through them like an ever-rolling stream, and they defecate some 40 or 50 kilograms per day. Plant stuff being so different from their own bodies, herbivores need help from chemical specialists to digest it. Those specialist chemists, some of whom were honing their skills perhaps a billion years before animals came on the scene at all, include bacteria, archaea (formerly classified as bacteria but actually far separated from them), fungi, and (what we used to call) protozoa. Ruminants such as cows and antelopes do their fermentation in a different way from horses and rabbits, and at different ends of the gut, but all rely on help from micro-organisms. As already mentioned above, herbivores have longer guts than carnivores, and their guts are complicated by elaborate blind alleys and fermentation chambers, specially fashioned to house symbiotic micro-organisms. Ruminants have the added complication of sending the food back for reprocessing by the teeth for a second time after it’s been swallowed – chewing the cud.
Herbivore gut Carnivore gut
There is one bird, the hoatzin of South America, which eats nothing but leaves, the only bird to do so. And – an example of convergent evolution, the process we’ll meet in the next chapter – the hoatzin resembles ruminant mammals in having lots of little gut chambers in which are housed bacteria wielding the necessary chemical expertise to digest leaves. Incidentally, there’s a widely believed myth that the hoatzin is unique among birds in retaining ancient claws in the front of the wing, like the Jurassic ‘intermediate’ fossil Archaeopteryx. It’s true that hoatzin chicks have these primitive claws, but so do the chicks of many other birds, as David Haig pointed out to me. He went on to suggest that this mythic meme is popular among both biologists and creationists, who respectively want Archaeopteryx to be, and not to be, an ‘evolutionary intermediate’. No animal exists to be primitive for the sake of it, nor to serve as an evolutionary intermediate. The claws are useful to the chicks, who used them for clambering back into a tree when they fall.
Tiktaalik
By the same token animals don’t exist for the sake of ‘moving on to the next stage in evolution’. The Devonian fossil Tiktaalik is widely touted as a transition between fish and land vertebrates. So it may be, but being transitional is not a way to earn a living. Tiktaalik was a living, breathing, feeding, reproducing creature, which should be reverse-engineered as such – not as a half-way stage on the way to something better.
What of our own teeth and jaws, our own guts, and those of our near relatives? What tales of long-gone ancestral meals do they tell? Comparison of our Homo sapiens lineage with extinct hominins such as Paranthropus (Australopithecus) robustus and boisei shows a marked trend over time towards shrinkage of both jaws and teeth in our sapiens lineage. The ribcage of those robust old hominins could accommodate a large vegetarian gut. They were evidently less carnivorous than we are, equipped with large plant-milling teeth, strong grinding jaws, and correspondingly powerful jaw muscles. Even though the muscles themselves have not fossilised, their bony attachments, sometimes culminating in a vertical (‘sagittal’) crest like a gorilla’s to increase their purchase, speak to us eloquently of generations of plant roughage. Our own jaw muscles don’t reach so high up the side of our head and we have no bony crest.
The primatologist Richard Wrangham has promoted the intriguing hypothesis that the invention of cooking was the key to human uniqueness and human success. He makes a persuasive case that our reduced jaws, teeth, and guts are ill-suited to either a carnivorous or a herbivorous diet unless a substantial proportion of our food is cooked. Cooking enables us to get energy from foods more quickly and efficiently. For Wrangham it was cooking that led to the dramatic evolutionary enlargement of the human brain, the brain being by far the most energy-hungry of our organs. If he’s right, it’s a nice example of how a cultural change (the taming of fire) can have evolutionary consequences (the shrinking of jaws and teeth).
Birds have no teeth, nor bony jaws. Surprising as it sounds, they may have lost them to save weight – an important concern in a flying animal – replacing them with light, horny beaks. The word ‘mandible’ is used for both parts of the beak – the upper mandible and the lower mandible. Beaks can tear but they can’t chew. Birds do the equivalent of chewing with the gizzard, a muscular chamber of the gut, often containing hard gastroliths – stones or grains of sand that the bird swallows to help with the milling process. Ostriches swallow appropriately large stones, up to 10 cm. Being flightless, they don’t have to worry so much about weight. Even larger stones found with fossil birds such as the giant moas of New Zealand are identified as gastroliths by their polished surfaces – polished by the grinding action in the gizzard.
1. Macaw
2. Crossbill
3. Spoonbill
4. Eagle
5. Skimmer
6. Hummingbird
Beaks vary greatly, and speak to us eloquently of different ways of procuring food. Their variety has been compared with the set of pliers in a mechanic’s toolkit. Pointed beaks delicately select small targets such as single seeds or grubs. Parrot beaks are robust nutcrackers or large seed crushers, and the curved upper mandible with its pointed tip is used as something like a hand. Caged parrots can often be seen climbing on the bars, levering themselves up with the beak as if it were a hand. In the wild they use the same trick in trees. Hummingbird beaks are long tubes for imbibing nectar. Imperious, hooked eagle beaks rip flesh from carcases. Woodpecker beaks hammer like high-powered pneumatic drills, pounding rhythmically into trees in search of larvae. They have specially reinforced skulls to cope with the shock of hammering. Flamingo beaks are upside-down filters for small crustaceans, the bird world’s nearest approach to the krill-sieving baleen of whales. Oystercatchers use their long, pointed beaks to chisel into mussels and other shellfish. Curlews use theirs to probe mud for worms and shellfish. Spoonbills have flat paddle-like bills that they sweep from side to side, at the same time using their feet to stir up mud and expose small animals lurking in it. Skimmer beaks are even more specialised. The lower mandible is longer than the upper. The bird flies close to the water with the mouth open and the tip of the lower mandible skimming the surface. When it hits a fish, the beak snaps shut, trapping the fish. Pelicans have a voluminous pouch of skin under the beak, which nets fish.
Nestling birds who are fed by their parents don’t need beaks to do anything other than gape. Their beaks are grotesquely wide, with brightly coloured linings – advertising surfaces garishly designed to out-compete their siblings for parental largesse. The huge difference from adult beaks of the same species reminds us that juvenile needs can be very different from adult ones, a principle writ large by caterpillars and butterflies, tadpoles and frogs, and many other examples where larval forms occupy a completely different niche from the adults they become.
GALAPAGOS FINCHES
Large ground finch
Medium ground finch
Small tree finch
Green warbler finch
Crossbills sport a weird crossover of upper and lower jaw beaks, which is helpful in prying apart the scales of pinecones. Insectivorous birds have differently shaped beaks from seed-eaters. And specialists on seeds of different sizes have correspondingly different beaks, the differences making total sense from a reverse-engineering point of view. The evolution of such differences is the subject of a beautiful and still proceeding long-term study of ‘Darwin’s Finches’ on one of the smaller Galapagos Islands by Peter and Rosemary Grant, and their collaborators.
Galapagos is matched as a Pacific island showcase of Darwinian evolution by the archipelago of Hawaii. Both island chains are volcanic and very young by geological standards. The biology of Hawaii differs in being more contaminated by humans, and by the other invasions for which humans are to blame. The evolutionary divergence of Hawaiian honeycreepers (below) shows a variety of beaks that outdoes even that of the Galapagos finches (above). There are eighteen surviving species (more than twice that number have gone extinct), all apparently descended from a single species of Asian finch, probably looking not unlike a Galapagos finch. The range of bill types that has evolved in such a short time is astonishing.
Some have retained the seed-eating habits of the ancestor, and still look finch-like with stout, stubby beaks. Others have modified their beaks for nectar-sipping, like African sunbirds rather than like New World hummingbirds. Yet others, with long downward-curving beaks, are probers for insects. Of these, the so-called ‘I’iwi’ (below) has a sharp, stout, stabbing lower mandible, which hammers into bark. Then the long curved upper mandible, which has been held out of the way during the hammering, comes into action to probe insects out of the cracks. The Maui parrotbill uses its powerful callipers to crush twigs and rip off bark in search of insects.
HAWAIIAN HONEYCREEPERS
Laysan finch
Kakawahie (extinct)
‘Akiapola’au
‘I’iwi
Heron beaks are long fishing spears, stabbing down into the water with sudden precision. The African black heron uses its wings to shade its field of view, which would otherwise be troubled by reflections from the rippling water surface. It dramatically sweeps its black wings across its body, laughably recalling a black-cloaked villain in Victorian melodrama. A separate problem for anyone spearfishing from above is refraction at the water surface – the illusion that makes oars look bent. There is some evidence that herons and kingfishers adjust their aim to compensate. The archer fishes of Southeast Asia face the same problem in reverse. They lurk under water and shoot insects sitting on tree branches above the surface, by squirting a sudden jet of water straight at the target. That’s remarkable enough in itself. Even more so, they seem to compensate for refraction, like herons but in the other direction.
Archer fish
Reverse engineering, then, is one method by which we can read the body of an animal. Another method is to compare it with other animals, both related and unrelated. We used this method to some extent in this chapter. When the genetic books of unrelated animals spell the same message about their environment and way of life, we call it convergence. Convergent resemblances can be spectacular, as we’ll see in the next chapter.
5 Common Problem, Common Solution
This book’s main thesis is that every animal is a written description of ancestral worlds. It rests upon the hidden assumption – well, not so very hidden – that natural selection is an immensely powerful force, carving the gene pool into shape, deep down among the smallest details. As we saw in Chapter 2, among the most convincing evidence for the power of natural selection is the perfection of camouflage, the consummate detail with which some animals resemble their (ancestral) environment, or resemble an object in that environment. Equally impressive is the detailed resemblance of an animal to another, unrelated animal, because both have converged on the same way of life. Matt Ridley’s How Innovation Works documents how our greatest human innovations have been hit upon many times independently by inventors in different countries, working in ignorance of each other’s efforts. Just the same is true of evolution by natural selection. This chapter is about convergent evolution as an eloquent witness to the power of natural selection.
Despite appearances, the animal above opposite is not a dog. It is an unrelated marsupial, Thylacinus, the Tasmanian wolf (often called Tasmanian tiger, for no better reason than the stripes). In (what hindsight can now see as) a heinous crime against nature, the Tasmanian government in 1888 put a bounty on thylacine heads. The last one to be seen in the wild was infamously shot in 1930 by someone called Wilf Batty. He must have known it was almost extinct, though he couldn’t have known his victim was the last one. I suppose in 1930 people still didn’t care about such things, a poignant example of what I have called the shifting moral Zeitgeist. A captive specimen called Benjamin survived in Hobart Zoo until 1936. Thylacinus is one of the best-known examples of convergence. It looked like a dog because it had the same way of life as a dog. Its skull especially is so like a dog’s that it is a favourite trick question in zoology student examinations. Such a favourite, indeed, that in my year at Oxford they gave us a real dog skull as a double bluff, assuming that we’d automatically plump for Thylacinus.
Thylacine
Rhinoceros beetle
You’d never mistake this for a rhinoceros. But if you watched two rhinoceros beetles fighting, and then two rhinoceroses, you’d realise that convergent resemblances can vault over many orders of magnitude of body size. A fight is a fight is a fight, and a horn is a handy weapon at any size. The same goes for stag beetles and stags, with a somewhat dramatic embellishment. Stag beetles, but not stags, can lift their rivals high in the air on the prongs of their ‘antlers’.
Paca Chevrotain
On the left is a paca, a rodent from the rainforests of South and Central America. To its right is a chevrotain or ‘mouse deer’, an even-toed ungulate that lives in Old World forests. They look like each other convergently because they have similar ways of life. In Africa, the niche is filled by a small ungulate, in South America, by a large rodent.
Armadillos are South American mammals, armoured against predators. When threatened, they roll up into a ball. The picture to the left shows the three-banded armadillo, which rolls up with especially compact elegance. In one of its illustrative quotations, the Oxford English Dictionary startlingly records that ‘Formerly the armadillo was used in medicine, being swallowed as a pill in its rolled-up state.’ Quite a stretch! Until you realise that ‘armadillo’ in this 1859 quotation referred not to the mammal but to a convergent crustacean, a woodlouse, whose Latin name Armadillidium means ‘little armadillo’. Armadillo itself is a Spanish word, a diminutive of armado or ‘armed’. So Armadillidium is a diminutive of a diminutive, a double diminutive. The commonality of name speaks to the power of convergent evolution. As befits its vernacular name of ‘pill bug’, in its rolled-up state you could indeed swallow a woodlouse whole, although as to its alleged medicinal value, I shall not comment. The mammalian armadillo and the crustacean Armadillidium have converged in their evolution, independently hitting on the same protective habit, albeit at very different sizes, rolling themselves into a ball.
The Latin language has the virtue of condensing into one word what might take three in a language such as English. Latin even has a specialised verb, glomero, meaning ‘I roll into a ball’ (from which we get English words like conglomerate and agglomerate). And Glomeris is the scientific name of yet another animal that rolls itself into a ball, and is also called ‘pill’ in vernacular English. It is not a crustacean but a millipede, the ‘pill millipede’, a member of the order Glomerida. As if that wasn’t enough, two different orders of millipede have independently converged on the roll-up pill body. In addition to the order Glomerida, members of the order Sphaerotheriida (Greek ‘spherical beast’) look just like Glomeris and indeed like Armadillidium, except that they are bigger.
Pill woodlouse Pill millipede
Pill woodlouse (above left) and pill millipede (above right) provide what may be my favourite example – in a strong field – of convergent evolution. They are almost indistinguishable when you see them crawling along, or when they roll into a ball. But the one is a crustacean, related to shrimps and crabs, while the other is a myriapod, related to centipedes. To make sure which is which, I have to turn them over. The crustacean has only one pair of legs per segment, making seven pairs in all. The millipede has many more legs, two pairs per segment. These two deeply different ‘pill’ animals look extremely alike in their surface palimpsest layers because they make their living in the same kind of way and in the same kind of place. Starting from widely separated ancestors they converged, in evolutionary time, on very similar end points.
Giant isopod
The deep palimpsest layers show that one is unmistakeably an isopod crustacean, the other a myriapod. Isopods are an important group of crustaceans, and they include members who grow to alarmingly large size on the sea bottom. We shall refer to them again in the next chapter, which goes to town on crustaceans.
Latin isn’t the only language to impress with its parsimony. The Malay noun pengguling means ‘one who rolls up’ and from it we get the name pangolin. We met the pangolin in the previous chapter. You might mistake it for a large, animated fir cone. It is not closely related to any other mammals but is out in its own order, Pholidota. That name comes from a Greek word meaning ‘covered with scales’, and an alternative English name for pangolin is ‘scaly anteater’. The scales are made of keratin, like hooves and fingernails. They aren’t as hard as the bony armour plates of armadillos.
However, when it comes to glomerising, pangolins perhaps outdo armadillos, pill woodlice, and pill millipedes. According to a report by a biologist on the island of Siberut in Indonesia, a pangolin ran away from him to the top of a steep slope, then formed itself into a ball and rolled down the slope at a speed of about 3 metres per second, twice as fast as a pangolin can run. The witness of this event interpreted the rolling down the hill as a normal response to predation. I reluctantly wonder if it might have been accidental.
There seems to be no doubt as to the effectiveness of rolling up as protection. Lions engage in futile endeavours to penetrate a pangolin’s defence. The pangolin’s enviable insouciance makes one wonder why other hunted animals don’t adopt the same strategy – the tortoise or armadillo strategy – instead of frantically fleeing. I suppose armour is expensive to make, but then so are long, well-muscled, fast-running legs. And it’s not a good argument – though possibly true – that if all antelopes, say, were to jettison speed for armour-plated roll-ups, lions on their side of the evolutionary arms race would come up with a counter-strategy. What might be a better argument is that the first individual antelopes to essay rudimentary, and still inadequate, armour would suffer compared with unencumbered rival antelopes disappearing in a cloud of dust.
Lion thwarted by pangolin
Two of the best-known examples of convergent evolution, too familiar to need detailed illustration yet again, are flight and eyes. The laws of physics allow the possibility of using energy to stay aloft for indefinite periods, and the wing has been independently and convergently invented five times: by insects, pterosaurs, birds, bats, and … human technology.
Eyes have been independently evolved many dozens of times, to nine basic designs. The convergent similarity between the camera, the vertebrate eye, and the cephalopod eye has become almost legendary. Here I’ll just mention that the most revealing difference – the vertebrate retina but not the mollusc one being wired up backwards – is a difference at a deep palimpsest level. This is another way of saying there’s a fundamental difference in their embryology. The vertebrate eye develops mostly as an outgrowth of the brain, while the cephalopod eye develops as an invagination from the outside. That difference lies deep down among the oldest palimpsest layers.
A less familiar example of convergence, compound eyes, have also evolved independently several times. Some bivalve molluscs have a form of compound eye, as do some tube-dwelling annelid worms. These are convergent on each other and on the more highly developed compound eyes of crustaceans, insects, trilobites, and other arthropods. Camera eyes have one lens, which focuses an upside-down image on a retina. The image of a compound eye, if you can call it an image, is the right way up. Think hunting dragonfly, with its pair of large hemispheres, each a cluster of tubes radiating outwards in different directions. Whichever tube sees the target, that’s the direction to fly in order to catch it.
A familiar sight throughout both North and South America is the ‘turkey vulture’. It looks like a vulture, behaves like a vulture, lives the life of a vulture, feeding on carrion that it finds, like a vulture, with a sense of smell keener than is typical among birds. But it is not a vulture. Or rather, it has converged on vulturehood independently of true vultures. But wait, who is to say that Old World vultures are any more ‘true’ than New World turkey vultures? Americans might see the priority differently. Let us call both of them vultures, in enthusiastic recognition of convergent evolution and its impressive power to mislead.
We could settle much the same argument about which are the ‘true’ porcupines. Old World and New World porcupines are both rodents. But within the very large order of rodents, they are not particularly closely related, and they evolved their spiny defences independently. The two pictures show a leopard about to suffer the same punishment from an Old World porcupine as the dog has endured from a New World porcupine.
Contrary to legend, no porcupine shoots its quills. But they do have a quick-release mechanism so that a predator injudicious enough to molest a porcupine comes away with a face full of quills. New World quills prolong the agony by means of backward-facing barbs, which make them difficult to remove. This detail is not shared by the otherwise convergent Old World porcupines but it is convergent, at a much smaller scale, on the barbs of bee stings (American stingers).
Dog after approaching New World porcupine
Leopard approaching Old World porcupine
The sting of a bee, unlike a porcupine quill, is double. There are two barbed blades rubbing against each other with the venom running between them. The two move alternately against each other, sawing their way into the victim. Both are serrated with backward-pointing barbs like those on a New World porcupine quill. The sting is a modified ovipositor, a tube for egg-laying. Porcupine quills are modified hairs. Bees are not the only insects whose ovipositors are serrated. In cicadas (which don’t sting), the serrations, and the bee-like alternate sawing action of the two blades, serve to dig the ovipositor (egg-laying tube) into (for example) a tree, where the eggs are laid.
The sting of a bee, derived from the ovipositor and therefore possessed only by females, is a hypodermic syringe for injecting venom. The hypodermic venom injector has evolved convergently in eleven different animal groups by my count (probably more than once independently in some groups): in insects, scorpions, snakes, lizards, spiders, centipedes, stingrays, stonefish, cone shells, and the hind-leg claw of the male duckbilled platypus. The stinging cells, ‘cnidoblasts’, of jellyfish are miniature harpoons that shoot out on the ends of threads, and inject venom. Among plants, stinging nettles have miniature hypodermic syringes.
The short spikes of hedgehogs are like the long quills of porcupines in being modified hairs. And these too have arisen independently at least three times. There are spiky tenrecs in Madagascar, which look remarkably like hedgehogs although they are not members of the same Order as hedgehogs. They are Afrotheres, related to elephants, aardvarks and dugongs. A third convergence is provided by the spiny anteaters of Australia and New Guinea. Egg-layers, they are as distant from hedgehogs and tenrecs as it is possible to be while still being mammals. They too are covered with spikes, again modified hairs.
We have seen that porcupine quills are a nice example of convergent evolution, independently arisen within the rodents. So-called flying squirrels also arose twice independently in different families of rodents, the true squirrels, and the so-called scaly-tails or anomalures. We know they evolved their gliding habit independently of each other because the closest relatives of both, within the rodents, are not gliders. It’s the same way we know New World and Old World porcupines are convergent, again within the large order of rodents.
Not surprisingly, the gliding skill has evolved convergently in a number of vertebrates. The picture shows four mammal examples, including the two rodents just mentioned. The colugo of the Southeast Asian forests is sometimes called the flying lemur, but it isn’t a lemur (all true lemurs come from Madagascar, though that’s not what makes the colugo a non-lemur) and it doesn’t really fly, although it is perhaps a more accomplished glider than the others in the picture. The sugar glider, although it looks extremely like a flying squirrel, is actually a marsupial from Australasia, one of several ‘flying phalangers’. Despite the startlingly close resemblance between sugar glider and flying squirrel, we know that one is a marsupial and the other a rodent, because of deeper layers of palimpsest. For example, the female phalanger has a pouch, the squirrel a placenta.
1. Colugo
2. Flying squirrel
3. Marsupial sugar glider
4. Anomalure
(Not to scale)
The Australian marsupial fauna provides many other examples of convergent evolution, of which perhaps the most famous is the extinct thylacine or Tasmanian wolf, already mentioned. The picture opposite shows a selection of comparisons between Australian marsupials and their placental equivalents in the rest of the world. These include a pair of anteaters and a pair of ‘mice’. The marsupial ‘mole’ of Australia resembles not only the familiar Eurasian mole but also the ‘golden mole’ of South Africa. Also very mole-like, among the rodents there are the zokors of Asia.
All these ‘moles’ independently adopted the same burrowing way of life, all have adapted their hands into powerful spades and all four look pretty alike. So convincing is the convergence that the golden moles were once classified as moles until it was realised that they belong to a radically different branch of (African) mammals, the Afrotheria, together with elephants, aardvarks, and manatees. Eurasian moles, by contrast, are Laurasiatheres, related to hedgehogs, horses, dogs, bats, and whales. Rodent zokors are related to the blind mole rats, who are thoroughly committed to subterranean life and look like moles, but, as you might expect from a rodent, they dig with their teeth rather than their hands. The family tree, overleaf, showing the affinities of four ‘moles’ is quite surprising.
PLACENTAL MAMMALS MARSUPIALS
Dog Thylacine
European mole Marsupial mole
Mouse Marsupial mouse
Flying squirrel Sugar glider
Tamandua Numbat
Independently evolved ‘moles’
Impressive as are the convergences of Australian marsupials with a whole variety of placental mammals, we mustn’t overlook the exceptions. Kangaroos don’t look very like the African antelopes with whom they share a way of life. They easily might have converged. But they didn’t. They diverged, mostly because they early committed themselves to a different gait for travelling fast. I suppose there was a time when the ancestors of either could have adopted the hopping gait of a kangaroo or the galloping gait of an antelope. Both gaits are fast and efficient, a least after many generations of evolutionary perfecting. But once an evolutionary lineage starts down a path like hopping or galloping, it is difficult to change. ‘Commitment’ really is a thing, in evolution. Once a lineage of mammals had advanced some way along the hopping gait path, any mutant that tried to gallop would have been out-competed. Perhaps its front legs were already too short. Conversely, in a lineage that was somewhat committed to galloping, a mutant that tried to hop would clumsily fail. There’s no rule that says placental mammals couldn’t have taken the kangaroo route. Indeed, there are rodents whose ancestors travelled that path very successfully. A colleague teaching zoology at the University of Nairobi said in a lecture that there were no kangaroos in Africa. This was denied by a student who excitedly claimed to have seen a small one. What he had seen was a springhaas or springhare, a rodent that looks and hops just like a wallaby, complete with foreshortened arms and enlarged, counterbalancing tail.
Springhare
If you could witness an ichthyosaur sporting in Mesozoic waves, you’d be irresistibly reminded of dolphins. A classic case of convergent evolution. On the other hand, your time machine might also present to you a plesiosaur. Far from looking like a dolphin or an ichthyosaur, it doesn’t resemble anything else you ever saw. Ichthyosaurs and plesiosaurs are both descended from land reptiles that went back to the sea. But they started out along, and then became ‘committed to’, alternative paths towards efficient swimming ‘gaits’. Ichthyosaurs rediscovered the ancient side-to-side tailbeat of their fish ancestors. They probably passed through a phase resembling the serpentine wavy motion of Galapagos marine iguanas. Plesiosaurs, instead, relied like sea turtles on their limbs, all four of which became huge flippers. Once committed, both ichthyosaurs and plesiosaurs became increasingly dedicated to their respective evolutionary pathways. And ended up looking extremely different.
Convergently evolved animals are not necessarily contemporaries. In North America in the Eocene period there were mole-like subterranean animals, the Epoicotheriids, with mole-like digging hands, not closely related to any living burrowers but belonging to the pangolin family, Pholidota. I’d be surprised if there weren’t dinosaur ‘moles’, but I must confess I don’t know of any. There were smallish dinosaurs such as Oryctodromeus who dug burrows, but I don’t know of any who could be called convergent on moles.
Then there were the so-called ‘false sabretooths’. We’ve already met Smilodon, the sabretooth ‘tiger’, that large, robust and doubtless frightening cat, which went extinct along with most of the American megafauna at the end of the Pleistocene era, only about 10,000 years ago, when man discovered America. What is less well known is that Smilodon was not the only member of the order Carnivora to evolve such terrifying fangs. Thirty million years earlier, spanning the Oligocene epoch, lived a group called Nimravids. The Nimravids were not cats but an older group within the Carnivora, and they independently evolved stabbing canine teeth just like those of Smilodon. Nimravids are sometimes called false sabretooths. False? Tell that to the early horse Mesohippus and the other terrified victims of those giant daggers. Those ‘false’ sabretooths were living, breathing, snarling, pouncing, probably strong-smelling carnivores, to whose victims they would have seemed anything but false. Another extinct group of ‘false sabretooths’, the Barbourofelids, lived in the Miocene epoch, later than the Nimravids but earlier than Smilodon, and convergently occupying the same niche.
‘False’ sabretooth – Nimravid
Given that the Carnivora have endowed us with three independently evolved sabretooths at different times in geological history, we might even feel a little let down if there were no marsupial sabretooth. And sure enough, South America rose to the occasion.
Marsupial sabretooth – Thylacosmilus
The marsupial Thylacosmilus looks to have been nearly as formidable as Smilodon and the other convergent sabretooths of the Carnivora. On the other hand, it was a bit smaller.
Convergences between animals and human technology can be especially impressive, as we saw in the case of the camera and the vertebrate or octopus eye. Though the discovery was originally thought an outrageous hoax, it is now well accepted that bats hunting by night have their own version – ‘echolocation’ – of what submariners have converged upon under the name ‘sonar’ – using echoes of their own sounds to detect targets. Bats are divided into two main groups, the small Microchiroptera, and the large Megachiroptera (‘fruit bats’ and ‘flying foxes’). Microchiropteran bats ‘see’ with their ears. They have highly sophisticated echolocation, good enough to hunt fast-flying insects. The brain pieces together a detailed model of the world, including insect prey, by a highly sophisticated real-time analysis of the echoes of the bats’ own shrieks. When a bat is cruising, its cries just tick over. But when homing in on a moth, which is likely to be taking evasive action, the sounds come out as a rapid-fire stutter like a machine gun. Since each pulse gives the bat an updated picture of the world, machine-gun repetition enables it to cope with a moth’s high-speed twists and turns. The higher the pitch, the shorter the wavelength by definition. And only short wavelengths can resolve a detailed picture. That means ultrasound: too high, mostly way too high, for us to hear. Young people can hear the lower end of the bat’s frequency range. I nostalgically remember them from my youth as sounding like something between a click and a squeak. We can use instruments called bat detectors, which translate ultrasound into audible clicks.
Slightly less well known is the fact that dolphins and other toothed whales (sperm whales, killer whales) do the same thing, also using ultrasound, and they are up there with bats in sophistication. A more rudimentary form of echolocation has also evolved in shrews, and in cave-nesting birds at least twice independently: in South American oilbirds and Asian cave swiftlets (of bird’s-nest soup fame). The birds don’t use ultrasound: their cries are low enough for us to hear. Some megachiropterans also use a less precise form of echolocation, but they generate their clicks with their wings rather than with the voice. This too must be seen as yet another convergent evolution of echolocation. One genus of Megachiroptera echolocates using the voice, like Microchiroptera but not so skillfully. Interestingly, molecular evidence indicates that one group of Microchiroptera, the Rhinolophids, are more closely related to Megachiroptera than they are to other Microchiroptera. This would seem to suggest that the Rhinolophids evolved their advanced sonar convergently with the other Microchiroptera. Either that or the majority of Megachiroptera lost it.
Small bats and toothed whales are in a class of their own. Their sonar is of such high quality that ‘seeing with their ears’ scarcely exaggerates what they do. Echolocation using ultrasound provides them with a detailed picture of their world, which bears comparison with vision. We know this through experimental testing of bats’ ability to fly fast between thin wires without hitting them. I have even published the speculation (probably untestable, alas) that bats ‘hear in color’. I stubbornly maintain that it’s plausible, because the hues that we perceive are internally generated labels in the brain, whose attachment to particular wavelengths of light is arbitrary. When bat ancestors gave up on eyes, substituting echoes for light, the internal labels for hues would have gone begging, left hanging in the brain with nothing to do. What more natural than to commandeer them as labels for echoes of different quality? I suppose you might call it an early exploitation of what some humans know as ‘synaesthesia’.
In one of modern philosophy’s most cited papers, Thomas Nagel didactically asked, ‘What is it like to be a bat?’ One of his points was that we cannot know. My suggestion is that it is perhaps not so very different from what it’s like to be us, or another visual animal like a swallow. Pursuing a point from Chapter 1, both swallows and bats build up an internal virtual reality model of their world. The fact that swallows use light, while bats use echoes, to update the model from moment to moment is less important than the nature and purpose of the internal model itself. This is likely to be similar in the two cases, because it is used for a similar purpose – navigation in real time between obstacles, and detection of fast-moving prey. Swallows and bats need a very similar internal model, a three-dimensional one, inhabited by moving insect targets. Both are champion insect hunters on the wing, swallows by day and then, at nightfall, the bats take over. If my speculation is right, the similarity may extend to the use of colors to label objects in the model, even in the case of bats ‘seeing with their ears’. Incidentally, each swallow eye has two foveas (regions of special acuity – our eyes have only one, which we use for reading etc.), probably one for distance and one for close vision. Instead of bifocal glasses they have bifocal retinas.
The James Webb Telescope presents us with stunning images of distant nebulae, glowing clouds of red, blue and green. Color is used to represent wavelength of radiation. But the colours in the photographs are false. They use color to represent different wavelengths, but they actually lie in the invisible infrared part of the spectrum. And my point is that the brain’s convention for representing visible light of different wavelengths is just as arbitrary. One is tempted to feel dissatisfied by false colour images such as those from the James Webb Telescope: ‘But is that really what it looks like? Is the telescope telling the truth, or are we being fobbed off with false colours?’ The answer is that we are always being ‘fobbed off’ when we look at anything. If you must talk about false colours, everything you ever see – a rose, a sunset, your lover’s face – is rendered in the brain’s own ‘false’ colours. Those vivid or pastel hues are internal concoctions manufactured by the brain as coded labels for light of different wavelength. The truth lies in the actual wavelength of electromagnetic radiation. The perceived hue is a fiction, whether it is the false colour rendering of a James Webb photograph, or whether it is the labels that the brain generates to tag the wavelengths of light hitting the retina. My conjecture about bats ‘hearing in colour’ makes use of the same idea of internally perceived hues being arbitrary labels.
Doctors use ultrasound to ‘look’ through the body wall of a pregnant woman and see a black-and-white moving image of her developing foetus. The computer uses the ultrasound echoes to piece together an image compatible with our eyes. There is anecdotal evidence that dolphins pay special attention to pregnant women swimming with them. It seems plausible that they are doing with their ears what doctors do with their instruments. If this is so, they could presumably also ‘see’ inside female dolphins and detect which ones are pregnant. Might this skill be useful to male dolphins choosing mates? No point inseminating a female who is already pregnant.
Bats and dolphins evolved their echo-analysing skills independently of each other. In the family tree of mammals, both are enveloped by relatives who don’t do echolocation. A strong convergence, and another powerful demonstration of the power of natural selection. And now for a point that’s especially telling for the genetic book of the dead. There’s a type of protein called prestin, which is intimately involved in mammal hearing. It’s expressed in the cochlea, the snail-shaped hearing organ in the inner ear. As with all biological proteins, the exact sequence of amino acids in prestins is specified by DNA. And, also as is usual, the DNA sequence is not identical in different species. Now here’s the interesting point. If you construct a family tree of resemblance based on the genome as a whole, whales and bats are far apart, as you’d expect: their ancestors have been evolving independently of one another since way back in the age of dinosaurs. If, however, you ignore all genes except the prestin gene – if you construct a tree of resemblance based on prestin sequences alone – something remarkable emerges. Dolphins and small bats cluster together with each other. But small bats don’t cluster together with non-echolo-cating large bats, to whom they are much more closely related. And dolphins don’t cluster together with baleen whales, which, although related to them, don’t echolocate. This suggests that SOF could read the prestin gene of an unknown animal and infer whether it (more precisely its ancestors) lived and hunted in conditions where ultrasonic sonar would be useful: night, dark caves, or other places where eyes are useless, such as the murky water of the Irrawaddy river or the Amazon. I’d like to know whether the two echolocating bird species have bat-like prestins.
This finding on bats and dolphins – the specific resemblance of their prestin genes – strikes me as a pattern for a whole field of future research on the genetic book of the dead. Another example concerns flight surfaces in mammals. Bats fly properly, and marsupial flying phalangers glide, using stretched flaps of skin that catch the air. There’s a specific complex of genes, shared by both bats and marsupial phalangers, which is involved in making the skin flaps. It will be interesting to know whether the same genes are shared by the other gliding mammals that we met earlier in this chapter, so-called flying lemurs and the two groups of rodents that independently evolved the gliding habit.
It would be nice to look in the same kind of way at those animals who have returned from land to water – of which whales are only the most extreme example, along with dugongs and manatees. Do returnees to water have genes in common that are not shared by non-aquatic mammals? What other features do they share? Many aquatic mammals and birds have webbed feet. If our hypothetical SOF is presented with an unknown animal who has webbed feet, she can safely ‘read’ the feet as saying, ‘Water in the recent ancestral environment.’ But that’s obvious. Can we be systematic in our search for less obvious signals of water in the genetic book of the dead? How many other features are diagnostic of aquatic life? Are there some shared genes, such as we saw in the case of prestin for sonar, and skin flaps in bats and sugar gliders? There are probably lots of shared features buried deep in an aquatic animal’s physiology and genome. We have just to find them. We can get a sort of negative clue by looking at genes that were made inactive when terrestrial animals took to the water. Just as humans have a large number of smell genes inactivated (see here), whale genomes contain several inactivated genes, whose inactivation has been interpreted as beneficial when diving to great depths.
We could proceed along the following lines. We borrow from medical science the technique known as GWAS (genome-wide association study). The idea of GWAS is lucidly and conversationally explained by Francis Collins, former Director of the Human Genome Project, as follows:
What you do for a genome-wide association study is find a lot of people who have the disease, a lot of people who don’t, and who are otherwise well matched. And then, searching across the entire genome … you try to find a place where there is a consistent difference. And if you’re successful – and [you’ve] got to be really careful about the statistics here, so that you don’t jump on a lot of false positives – it allows you to zero in on a place in the genome that must be involved in disease risk without having to guess ahead of time what kind of gene you’re going to find.
Substitute ‘lives in water’ for ‘disease’, and ‘species’ for ‘people’, and you have the procedure I am here advocating. Let’s call it ‘Interspecific GWAS’ or IGWAS.
Gather a large number of mammals known to be aquatic. Match each one with a related mammal (the more closely related the better) who lives on land, preferably in dry conditions. We might start with the following list of matched pairs, and the list could be extended.
Water vole Vole
Water shrew Shrew
Desman Mole
Platypus Echidna
Water tenrec Land tenrec
Otter Badger
Seal Wolf
Yapok Opossum
Polar bear Brown bear
To do the IGWAS, you would now look at the genomes of all the animals and try to pinpoint genes shared by the left-hand column and not by the right-hand column. Until all those animals have had their genomes sequenced, and until mathematical techniques are up to the task, proceed with a non-genomic version of IGWAS as follows. Go to work taking measurements of all the animals. Measure all the bones. Weigh the heart, the brain, the kidneys, the lungs, etc., all these weights being expressed relative to total body weight (to correct for absolute size, which is unlikely to be of much interest). By the same token, the bone measurements should be expressed as a proportion of something, just as, in the chelonian example of Chapter 3, the bone lengths were expressed as a proportion of total arm length. Measure the body temperature, blood pressure, the concentrations of particular chemicals in the blood, measure everything you can think of. Some of the measurements might not be continuously varying quantities like centimetres or grams: they might be ‘yes or no’, ‘present or absent’, ‘true or false’.
Feed all the measurements into a computer. And now for the interesting part. We want to maximise the discrimination between aquatic mammals and their terrestrial opposite numbers. We want to discover which measurements discriminate them, pull them apart. At the same time, we want to identify those features that unite all aquatic mammals, however distantly related from each other. Webbing between the toes will presumably emerge as a good discriminator, but we want to find the non-obvious discriminators, biochemical discriminators, ultimately gene discriminators. Where genomic comparisons are concerned, the GWAS methods already developed for medical purposes will serve. A possible graphic method is a version of the triangular plot of tortoise and turtle limbs that we saw in Chapter 3. Another graphic method is drawing pedigrees with genetic convergences coloured in.
A refinement of IGWAS might order species along an ecological dimension. You could, perhaps, string mammals out along a dimension of aquaticness, from whales and dugongs at one extreme to camels, desert foxes, oryxes, and gundis at the other. Seals, otters, yapoks and water voles would be intermediate. Or we might explore a dimension of arboreality. We might conclude that a squirrel is a rat who has moved a measurable distance along the dimension of arboreality. Are moles, golden moles and marsupial moles situated at one extreme on a dimension of fossoriality. Could we distribute birds along a dimension from flightless cormorants and emus who never fly, at one extreme, to albatrosses at the other, or, even more extreme, to swifts, who even copulate on the wing? Having identified such ‘dimensions’, could we look for trends in gene frequency as you move along from one extreme to the other. I can immediately foresee alarming complications. The dimensions would interact with other dimensions, and we’d have to call in experts with mathematical wings to fly through multi-dimensional spaces. My own sadly amateur ventures, limited to three dimensions, and using computer simulation rather than mathematics, are in my book Climbing Mount Improbable, especially the chapter called ‘The Museum of All Shells’.
A group at Carnegie Mellon University in Pittsburgh performed a model example of what I call (they don’t) IGWAS. What they studied was not aquaticness but hairlessness in mammals. Most mammals are hairy, and all had hairy ancestors, but if you survey the mammal family tree you notice that hairlessness pops up sporadically among unrelated mammals. See the diagram, which shows a few of the sixty-two species whose genomes were examined.
Sporadic distribution of hair loss among mammals
Whales, manatees, pigs, walruses, naked mole rats, and humans have all lost their hair more or less completely (yellow names in the diagram). And, which is important, independently of each other in many cases. We can tell this by looking at the hairy closer relatives from among whom they sprang. You remember that echolocating bats and echolocating whales had something else in common – their prestin gene. Do the genomes of the naked species have a gene for hairlessness that they share with each other? The answer is literally no. But only literally. The truth is equally interesting. It turns out that we and other naked species still retain the ancestral genes that make hairs. But the genes have been disabled. And disabled in different ways. What is convergent is the fact of being disabled, but the details are not shared. Incidentally, we again have here a problem for creationists. If an intelligent designer wished to make a naked animal, why would he equip it with genes for making hair and then disable them? Chapter 3 mentions the similar example of the human sense of smell: the olfactory sense genes of our mammal ancestors still lurk within us, but they have been turned off.
One of my favourite examples of convergent evolution is that of weakly electric fish. Two separate groups of fish, Gymnotids in South America and Gymnarchids in Africa, have independently and convergently discovered how to generate electric fields. They have sense organs all along the sides of the body, which can detect distortions that objects in the environment cause in the electric fields. It is a sense of which we can have no awareness. Both groups of fish use it in murky water where vision is impossible. There’s just one difficulty. The normal undulating movements typical of fish fatally compromise the analysis of the electric fields measured along the body. It is necessary for the fish’s body to maintain a rigid stance. But if their body is rigid, how do they swim? By means of a single longitudinal fin traversing the whole length of the body. The body itself, with its row of electrical sensors, stays rigid, while the single longitudinal fin alone performs the sinuous movements typical of fish locomotion. But there’s one revealing difference. In the South American fish, the longitudinal fin runs along the ventral surface, while in the African fish it runs along the back. In both groups of fish, the undulating waves can be thrown into reverse: the fish swim backwards and forwards with apparently equal facility.
The ‘duck bill’ of the platypus and the huge, flat ‘paddle’ sticking out of the front end of the paddlefish (Polyodontidae) are both covered with electrical sensors, convergently and independently evolved. In this case the electric fields they pick up are generated, inadvertently, by the muscles of their prey. There is a long-extinct trilobite that also had a huge paddle-like appendage like that of the paddlefish. Its paddle was studded with what look like sense organs, and it seems probable that this represents yet another convergence.
A ringed plover’s eggs and chicks lie out on the ground, defenceless except for their camouflage. A fox approaches. The parent is much too small to put up any kind of resistance. So it does an astonishing thing. It attempts to lure the predator away from the nest by offering itself as a bigger prize than the nest. It limps away from the nest, pretending to have a broken wing, simulating easy prey. It flutters pathetically on the ground, wings outstretched, sometimes with one wing stuck incongruously in the air. There’s no assumption that it knows what it is doing or why it is doing it (although it may). The minimal assumption we need make is that natural selection has favoured ancestors whose brains were genetically wired up to perform the distraction display, and perfect it over generations. Now, why tell the story in this chapter on convergent evolution? It’s because the broken wing display has arisen not once but many times independently in different families of birds. The diagram on the following page is a pedigree of birds, wrapped around in a circle so it fits on the page. Birds who perform the broken wing display are coloured in red, those who don’t in blue. You can see that the habit is distributed sporadically around the pedigree, a lovely example of convergent evolution.
My final example of convergence will lead us into the next chapter. More than 200 species, belonging to thirty-six different fish families, practise the ‘cleaner’ trade. They remove surface parasites and damaged scales from the bodies of larger ‘client’ fish. Each individual cleaner fish has its own cleaning station, and its own loyal clients who return repeatedly to the same ‘barber’s shop’ on the reef. This site tenacity is important in keeping the benefit exclusively mutual: the cleaner eats the parasites and worn-out scales from the skin of particular client fish, and the client refrains from eating its particular benefactor. Without individual site fidelity, and therefore repeat visits, clients would have no incentive to refrain from eating the cleaner – after being cleaned, of course. Sparing a cleaner would benefit fish in general, including competitors of the sparer. Natural selection doesn’t ‘care’ about general benefit. Quite the contrary. Natural selection cares only about benefit to the individual and its close relations, at the expense of competitors. A bond of individual loyalty between particular cleaner and particular client therefore really matters, and it is achieved by site tenacity. Some cleaners even venture inside the mouth of a client to pick its teeth – and survive to repeat the service on the client’s next visit. Cleaner fish advertise their trade and secure their safety by a characteristic dance, often enhanced by a striped pattern – the fishy equivalent of the striped pole insignia of a human barber’s shop. This constitutes a safe-conduct pass.
Broken wing display
The remarkable ‘broken wing display’ crops up again and again in different bird groups (shown in red). Striking testimony to the power of natural selection.
The interesting point for this chapter is that the cleaner habit has evolved many times convergently, not only many times independently in fish but many times in shrimps too. As before, the client fish abide by the covenant and refrain from eating their cleaner shrimps, in just the same respectful way as for cleaner fish. In many cases, cleaner shrimps sport a similar stripe, the ‘barber’s pole’ insignia. It is to the benefit of all that all the ‘barber’s pole’ badges should look similar.
When swimming in the sea, you would be well advised to steer clear of the sharp-toothed jaws of the moray eel. Yet here is a shrimp, calmly picking its teeth. Note, yet again, the red stripe or ‘barber’s shop pole’, telling the moray, ‘Don’t eat me, I’m your special cleaner. You and I have a mutual relationship. You’ll need me again.’ Does the shrimp feel fear as it trustingly enters those formidable jaws? Does some equivalent of ‘trust’ pulsate through its cephalic ganglion? I doubt it, but not everyone would agree. Do you?
Moray eel and cleaner shrimp
Not only has the habit evolved independently – convergently – in fish and shrimps. It has evolved convergently many times within shrimps, just as it has many times within fish. Even within one family of shrimps, the Palaemonidae, the cleaner trade is practised by sixteen different species, having evolved within the Palaemonidae five times independently. Here’s how we know the five evolutions were independent of each other. The method again serves as a model for how we ever know instances of evolution are independent of each other. Look at the family tree of the Palaemonidae, constructed with the aid of molecular genetic sequencing. It contains sixty-eight species of shrimp. Those species that practise the fish-cleaning trade have a little fish symbol by them. There are sixteen species of palaemonid cleaner shrimps. But many of the sixteen cannot be said to have evolved the habit independently. For example, the three species of Urocardella are all cleaners, but the picture warns us against counting them as independent: they probably inherited it from their common ancestor.
Six members of the genus Ancyclomenes are cleaners, but again we must make the conservative assumption that they inherited it from their common ancestor – and that the habit has been lost in A.aqabai, A.kuboi, A.luteomaculatus, and A.venustus. Using this conservative approach, we conclude that the cleaning habit evolved independently in five palaemonid genera but not in all species of those five genera. And the story doesn’t end with the Palaemonidae. Two other families of shrimps not shown in the diagram, the Hippolytidae (see moray eel picture above) and the Stenopodidae, also have many species of cleaner.
The Cambridge palaeontologist Simon Conway Morris has treated convergent evolution more vividly and thoroughly than anyone else. In his wittily written Life’s Solution he points out that convergent evolution is commonly sold as amazing, astounding, uncanny, etc., but there is no need for this. Far from being especially amazing, it’s exactly what we should expect of natural selection. Convergent evolution is, nevertheless, great for confounding armchair philosophers and others who underestimate the power of natural selection and the magnificence of its productions. In addition to 110 densely packed pages of massively researched endnotes and references to the biological literature, Life’s Solution has three indexes: a general index, a name index and – this must surely be unique – a ‘convergences index’. It runs to five double-column pages and around 2,000 examples of convergence. Of course, not all of them are as impressive as the pillbugs, the moles, the gliders, the sabretooths, or the fish-cleaners but even so …
Independent evolution of cleaners
Convergent evolution can be so impressive, it makes you wonder how we know the resemblance really is convergent. That’s the power of natural selection, the immense yet subtle power that underpins the whole idea of the genetic book of the dead. Pill woodlouse and pill millipede, alike as two pills, how do we know one is a crustacean, the other a distant myriapod? There are numerous tell-tale clues. The deep layers of the palimpsest are never completely over-written. The glyphs of history keep breaking through. And, if all else fails, molecular genetics cannot be denied.
Convergence of animals with widely separated histories is one manifestation of the power of selection to write layer upon layer of the palimpsest. Another is its converse: evolutionary divergence from a common historic origin, natural selection seizing a basic design and moulding and twisting it into an often bizarre range of functionally important shapes. The next chapter goes there.
6 Variations on a Theme
As we saw in Chapter 3, molecular comparison conclusively shows that whales are located deep within the even-toed ungulates, the artiodactyls. By ‘located deep within’, I mean something very specific and surprising. It’s worth repeating. We’re talking about much more than just a shared ancestor, with the whales going one way, and the artiodactyls the other. That would not have been surprising. ‘Deep within’ means that some artiodactyls (hippos) share a more recent ancestor with whales than they share with the rest of the artiodactyls whom they much more strongly resemble. This has been known for more than twenty years, but I still find it almost incredible, so overwhelming is the submersion under surface layers of palimpsest. Of course, this doesn’t mean whales’ ancestors were hippos or even resembled hippos. But whales are hippos’ closest living relatives.
What is it that’s so special about whales, so special that new writings in their book of the dead so comprehensively obliterated almost every trace of that earlier world, of grazing prairies and galloping feet, which must lie buried far down in the palimpsest? How did the whales manage to diverge so completely from the rest of the artiodactyls? How were they able so comprehensively to escape their artiodactyl heritage?
The answer probably lies in that word ‘escape’. Cattle, pigs, antelopes, sheep, deer, giraffes, and camels are relentlessly disciplined by gravity. Even hippos spend significant amounts of time on land, and indeed can accelerate their ungainly bulk to an alarming speed. The land-dwelling artiodactyl ancestors of whales had to submit to gravity. In order to move, land mammals must have legs stout enough to bear their weight. A land animal as big as a blue whale would need legs half way to Stonehenge pillars, and it’d have a hard time surviving, with heart and lungs smothered suffocatingly by the body’s own weight. But in the sea, whales shook off gravity’s tyranny. The density of a mammal body is approximately that of water. Gravity never goes away, but buoyancy tames it. When their artiodactyl ancestors took to the water, whales shed the need for leggy support, and the fossil evidence beautifully lays out the intermediate stages.
A major milestone marks the point where, like dugongs and manatees but unlike seals and turtles, whales gave up returning to land even to reproduce. That was the final release from gravity, as buoyancy totally took over. Whales were free to grow to prodigious size, literally insupportable size. A whale is what happens when you take an ungulate, cut it adrift from the land and liberate it from gravity. All manner of other modifications followed in the wake of the great emancipation, and they richly defaced the ancient palimpsest. Forelegs became flippers, hind limbs disappeared inside and shrank to tiny relics, the nostrils moved to the top of the head, two massive horizontal flukes – lobes stiffened not by bone but by dense fibrous tissue – sprouted sideways to form the propulsive organ. Numerous profound alterations of physiology and biochemistry allowed deep diving, and hugely prolonged intervals between breaths. Whales switched from a (presumed) herbivorous diet to one dominated by fish, squid, and – in the case of the baleen whales – filtered shoals of krill in lavish quantities.
Fish, too, are allowed by buoyancy to adopt bizarre shapes (see pictures here), which gravity on land would forbid. In the case of teleost (bony as opposed to cartilaginous) fish, the buoyancy is perfect, owing to that exquisite device, the swim-bladder, buried deep within the body. By manipulating the amount of gas in the swim-bladder, the fish is able to adjust its specific gravity and achieve perfect equilibrium at whatever happens to be its preferred depth at any time.
I think that’s what makes a home aquarium such a restful furnishing for a room. You can dream of drifting effortlessly through life, as a fish drifts through water in perpetual equilibrium. And it is the same hydrostatic equilibrium that frees fish to assume such an extravaganza of shapes. The leafy sea dragon trails clouds of glorious fronds, and you feel you could almost identify the species of wrack that those fronds mimic. You must peer deep between them to discern that they are parts of a fish: a modified sea horse – which is itself a distorted caricature of the ‘standard fish’ design of more familiar cousins such as trout and mackerel.
Most predatory fish actively seek and pursue prey, and this expends a considerable proportion of the energy obtained from the food caught. Angler fish, of which there are several hundred species sitting on the sea bottom, save energy by luring prey to come to them. The anglers themselves are superbly camouflaged. A fishing rod (modified fin spine) sprouts from the head. At its tip is a lure or bait, which the angler fish waves around in a tempting manner. Unsuspecting prey are attracted to the bait, whereupon the angler opens its enormous mouth and engulfs the prey. Different species of angler favour different baits. With some it resembles a worm, and it jiggles about plausibly as the angler waves its rod. Angler fish of the dark deep sea harbour luminescent bacteria in the tip of the rod. The resultant glowing lure is very attractive to other deep-sea fish, and invertebrate prey such as shrimps. Convergently, snapping turtles rest with their mouth open, wiggling their tongue like a worm, as bait for unsuspecting prey fish.
Sea horses and angler fish are extreme exponents of the adaptive radiation of teleost fish. They also, in their different ways, sport unusual sex lives. The sex life of angler fish is nothing short of bizarre. Everything I said in the previous paragraph applies to female angler fish only. The males are tiny ‘dwarf males’, hundreds of times smaller than females. A female releases a chemical, which attracts a dwarf male. He sinks his jaws into her body, then digests his own front end, which becomes buried in the female’s body. He becomes no more than a small protuberance on her, housing male gonads from which she extracts sperm when she needs to. It is as though she becomes a hermaphrodite, except that ‘her’ testes possess a different genotype from her own, having invaded from outside in the form of the dwarf male locked into her skin.
Lionfish
Weedy sea dragon
Marlin
Leafy sea dragon
Trumpet fish
Sunfish
Gulper eel
Seahorse
Puffer
Sloane’s viper fish
Ghost pipefish
Angler fish
Freed by buoyancy from the constraints of gravity, fish were able to evolve an astonishing variety of shapes
Many species of fish are livebearers – females get pregnant like mammals and give birth to live young. Sea horses are unusual in that it’s the male who gets pregnant, carries the young in a belly pouch, and eventually gives birth to them. Do you wonder, then, how we define him as male? Throughout the animal and plant kingdoms, the male sex is easily defined as the one that produces lots of small gametes, sperms, as opposed to fewer, larger, eggs.
Adaptive radiation means evolutionary divergence fanning out from a single origin. It is seen in an especially dramatic way when new territory suddenly becomes available. When, 66 million years ago, a celestial catastrophe cleared 76 per cent of all species from the planet, the stage was wide open for mammalian understudies to step into the dinosaurs’ vacated costumes. The subsequent adaptive radiation of mammals was spectacular. From the small, burrowing creatures who survived the devastation, probably by hibernating in safe little underground bunkers, a comprehensive range of descendants, ranging hugely in size and habit, appeared in surprisingly quick time.
On a smaller scale and a much shorter timescale, a volcanic island can spring up suddenly (suddenly by the standards of geological time) through volcanic upwelling from the bottom of the sea. For animals and plants it is virgin territory, barren, untenanted, open to exploitation afresh. Slowly (by the standards of a human lifetime) the volcanic rock crumbles and starts to make soil. Seeds fly in on the wind, or are transported by birds and fertilised with their droppings. From being a black lava desert, the island greens. Winged insects waft in, and tiny spiders parachuting under floating threads of silk. Migrating birds are blown off course, land for recuperation, stay, reproduce; their descendants evolve. Fragments of mangrove drift in from the mainland, and the occasional tree uprooted by a hurricane. Such freak raftings carry stowaways – iguanas, for instance. Step by accidental step, the island is colonised. And then descendants of the colonists evolve, rapidly by geological standards, diversifying to fill the various empty niches. Diversification is especially rich in archipelagos, where driftings between islands happen more frequently than from the mainland to the archipelago. Galapagos and Hawaii are textbook examples.
A volcano is not the only way new virgin territory for evolution can open up. A new lake can do it too. Lake Victoria, largest lake in the tropics and larger than all but one of the American Great Lakes, is extremely young. Estimates range from 100,000 years to a carbon-dated figure of only 12,400 years. The discrepancy is easily explained. Geological evidence shows that the lake basin formed about 100,000 years ago, but the lake itself has dried up completely and refilled several times. The figure of 12,400 years represents the age of the latest refilling, and therefore the age of the current lake in its large geography. And now, here is the astonishing fact.
There are about 400 species of Cichlid (pronounced ‘sicklid’) fish in Lake Victoria, and they are all descended from probably as few as two founder lineages that arrived from rivers within the short time that the lake has existed. The same thing happened earlier in the other great lakes of Africa, the much deeper Lakes Tanganyika and Malawi. Each of the three lakes has its own unique radiation of Cichlid fishes, different from, but parallel to, the others.
Nimbochromis livingstonii Lamprologus lemairii
Here’s a slightly macabre example of this parallelism. In Lake Malawi (where I spent my earliest bucket-and-spade beach holidays), there is a predatory fish called Nimbochromis livingstonii. It lies on the bottom of the lake pretending to be dead. It even has light and dark blotches all over its body, giving the appearance of decomposition. Deceived into boldness, small fish approach to nibble at the corpse, whereupon the ‘corpse’ suddenly springs into action and devours the small fish. This hunting technique was thought to be unique in the animal kingdom. But then exactly the same trick was discovered in Lake Tanganyika, the other great Rift Valley lake. Another Cichlid fish, Lamprologus lemairii, has independently, convergently, hit upon the same death-shamming trick. And it has the same blotchy appearance, suggestive of death and decay. In both lakes, adaptive radiation independently hit upon the same somewhat gruesome way of getting food. Along with dozens of other ways of life, independently discovered in parallel in the two similar lakes.
My old friend, the late George Barlow, vividly described the three great lakes of Africa as Cichlid factories. His book, The Cichlid Fishes, makes fascinating reading. The Cichlids have so much to teach us about evolution in general and adaptive radiation in particular. Each of the three great lakes has its own, independently evolved radiation of several hundred Cichlid species. All three lakes tell the same story of explosive Cichlid evolution, yet the three histories unfolded entirely independently. All three began with a founder population of very few species. Each of the three followed a parallel evolutionary course of massive radiation into a huge variety of ‘trades’ or ways of life – the same great range of trades being independently discovered in all three lakes.
You might think the oldest lake would have the most species. After all, it’s had the longest time to evolve them. But no. Lake Tanganyika, easily the oldest at about 6 million years, has only (only!) 300 species. Victoria, a baby of only 100,000 years, has about 400 species. Lake Malawi, intermediate in age at between 1 and 2 million years, has the largest species count, probably around 500, although some estimates exceed 1,000. Moreover, the size of the radiation seems unrelated to the number of founder species. The huge radiations in Victoria and Malawi trace back substantially to only one lineage of Cichlids, the Haplochromines. The relatively venerable Lake Tanganyika’s approximately 300 species appear to stem from twelve different founder lineages, of which the Haplochromines are only one.
What all this suggests is that young Lake Victoria’s dramatic explosion of species is the model for all three lakes. All three probably took only tens of thousands of years to generate several hundred species. After the explosive beginning, the typical pattern is probably to stabilise the number, or it may even decrease, such that the final number of species is not correlated with the age of the lake, or with the number of founder species. The Cichlids of Lake Victoria show how fast evolution can proceed when it dons its running shoes. We cannot expect that such an explosive rate is typical of animals in general. Think of it as an upper bound.
And when you work it out, even Lake Victoria’s feat is not quite so surprising as first appears. Although the lake in its present form is only some 12,400 years old, I’ve already mentioned that a lake filled the same shallow basin 100,000 years ago. In the intervening years it has largely dried up several times and refilled, the latest such episode occuring with the refill of 12,400 years ago. Lake Malawi shows how dramatically these lake levels can fall and rise. Between the fourteenth and nineteenth centuries, the water level was more than 100 metres lower than today. Unlike Lake Victoria, however, it came nowhere close to drying up altogether. In its Rift Valley chasm, it is nearly ten times as deep as Victoria. In shallow Lake Victoria, as each drying cycle occurred, the lowering of the water level would have left numerous ponds and small lakes, these becoming reunited at the next iteration of the refill cycle. The temporary isolation of the fish trapped in the residual ponds and small lakes enabled them to evolve separately – no gene flow between ponds. At the next refill of the cycle, they were reunited, but by then they would have drifted apart genetically, too far to interbreed with those who had been stranded in other ponds. If this is correct, the drying/refilling alternation provided ideal conditions for speciation (the technical term for the evolutionary origin of a new species, by splitting of an existing species). And it means that, from an evolutionary point of view, we could regard the true age of Lake Victoria as 100,000 years, not 12,400. Still very young.
Given 100,000 years to play with, what sort of interval between speciation events would yield 400 species, starting, hypothetically, with a single founding species? Is 100,000 years long enough? Here’s how a mathematician might reason: a back-of-the-envelope calculation, making conservative assumptions throughout, to be on the safe side. There are two extremes, two bounds bracketing the possible rate of speciation, depending on the pattern of splitting. The most prolific pattern (an improbable extreme) is where every species splits into two, yielding two daughter species which, in turn, split into two. This pattern yields exponential growth of species numbers. It would take only between eight and nine speciation cycles to yield 400 species (2⁹ is 512). An interval of 11,000 years between speciations would do the trick. The least prolific pattern (also an improbable extreme) is where the founder species ‘stays put’ and successively throws off one daughter species after another. This would require far more speciation events, about 400, to reach the tally of 400 species: a speciation event every 250 years. How to estimate a realistic intermediate between these two extremes? A simple average (arithmetic mean) gives an estimate of between 5,000 and 6,000 years between speciations, which is enough time. Our mathematician, however, might be more cautious and recommend the geometric mean (multiply the two numbers together and take the square root). One reason to prefer it is that it captures the stronger influence of an occasional very bad year. This more conservative estimate asks for an interval of about 1,600 years between speciations. Somewhere between the two estimates is plausible, but let’s bend over backwards to be cautious and use the estimate of 1,600 years. Cichlid fish typically reach sexual maturity in under two years, so let’s again be conservative and assume a two-year generation time. Then we’d need about 800 fish generations between speciation events, in order to generate 400 species in 100,000 years. Eight hundred generations is enough for plenty of evolutionary change.
How do I know 800 generations is plenty of time? Again, mathematicians can do back-of-the-envelope calculations to assist intuition. One calculation that I like was done by the American botanist Ledyard Stebbins. Imagine that natural selection is driving mouse-sized animals towards larger size. Stebbins, too, bent over backwards to be conservative, by assuming a very weak selection pressure, so weak that it could not be detected by scientists working in the field, trapping mice and measuring them. In other words, natural selection in favour of larger size is assumed to exist but to be so slight and subtle that it is below the threshold of detectability by field researchers. If the same undetectably weak selection pressure were maintained consistently, how long would it take for the mice to evolve to the size of an elephant? The answer Stebbins calculated was about 20,000 generations, the blink of an eye by geological standards. Admittedly, it’s a lot more than our 800 generations, but we weren’t talking about anything so grandiose as mice turning into elephants. We were only talking about Cichlid fishes changing enough to be incapable of interbreeding with other species. Moreover, Stebbins’s assumptions, like ours, were conservative. He assumed a selection pressure so weak that you couldn’t measure it. Selection pressures have actually been measured in the wild, for example on butterflies. Not only are they easily detectable, they are orders of magnitude stronger than the sub-threshold, under-the-radar pressure assumed by Stebbins. I conclude that 100,000 years is a comfortably long time in Cichlid evolution, easily enough time for an ancestral species to diversify into 400 separate species. That’s fortunate, because it happened!
Incidentally, Stebbins’s calculation is an instructive antidote to sceptics who think geological time is not long enough to accommodate the amount of evolutionary change we observe. His 20,000 generations to wreak the change from mouse to elephant is so short that it would ordinarily not be measurable by the dating methods of geologists. In other words, a selection pressure too weak to be detectable by field geneticists is capable of yielding major evolutionary change so fast that it could look instantaneous to geologists.
The crustaceans are another great group of mostly aquatic animals with spectacular evolutionary radiations, from much more ancient common sources. In this case, it is the modification of a shared anatomy that impresses. Rigid skeletons permit movement only if built up of hinged units, bones in the case of vertebrates, armoured tubes and casings in the case of crustaceans and other arthropods. Because these bones and tubes are rigid and articulated, there is a finite number of them, each one a unit that can be named and recognised across species. The fact that all mammals have almost the same repertoire of nameable bones (206 in humans) makes it easy to recognise evolved differences as distortions of each named bone: ulna, femur, clavicle, etc. The same is true of crustacean skeletal elements, with the bonus that, unlike bones, they are externally visible.
The great Scottish zoologist D’Arcy Thompson took six species of crab and looked at just one unit of the skeleton, the main portion of the body armour, the carapace, of each.
Geryon Corystes
Chorinus Scyramathia
Lupa Paralomis
He arbitrarily chose one of the six, it happened to be Geryon (far left), and drew it on a rectangular grid. He then showed that he could approximate the shape of each of the other five, simply by distorting the grid in a mathematically lawful way. Think of it as drawing one crab on a sheet of stretched rubber, then distorting the rubber sheet in mathematically specified directions to simulate five other shapes. These distortions are not evolutionary changes. The six species are all contemporary. No one species is ancestral to any other, they share ancestors who are no longer with us. But they show how easily changes in embryonic development (altered gradients of growth rates, for instance) can yield an illuminating variety of crustacean form with respect to one part of the exoskeleton. D’Arcy Thompson did the same thing with many other skeletal elements including human and other ape skulls.
Of course, bodies are not drawn on anything equivalent to stretched rubber. Each individual develops afresh from a fertilised egg. But changes in growth rates, of each part of the developing embryo, can end up looking like the distortions of stretched rubber. Julian Huxley applied D’Arcy Thompson’s method to the relative growth of different body parts in the developing embryo. Such embryological changes are under genetic control, and evolutionary changes in gene frequencies generate evolutionary variety, again looking like stretched rubber. And of course it isn’t just the carapace. The same kind of evolutionary distortion is seen in all the elements of the crustacean body (and the bodies of all animals but often less obviously). You can see how the same parts are present in each specimen, just emphasised to different degrees. The differential emphasis is achieved by different growth rates in different parts of the embryo.
Crustaceans are exceedingly numerous. With characteristic wit, the Australian ecologist Robert May said, ‘To a first approximation, all species are insects,’ yet it has been calculated that there are more individual copepods (crustacean water fleas) than there are individual insects in the world. The painting opposite, by the zoologist Ernst Haeckel (1834–1919), Darwin’s leading champion in Germany, is a dazzling display of the anatomical versatility of the copepods.
Wondrous copepods from Ernst Haeckel’s Art Forms in Nature
Mantis shrimp
Here’s a typical adult crustacean, a mantis shrimp. Well, mantis shrimps (Stomatopods) are typical with respect to their body plan, which, together with their colourful beauty, is why I’ve chosen one for this purpose. But they include some formidable customers who are far from typical in one alarming respect. They pack a punch, literally. With vicious blows from club-like claws, they smash mollusc shells in nature, while in captivity the blow from a large smasher, travelling as fast as a small-calibre rifle bullet, will shatter the glass of your aquarium tank. The energy released is so great that the water boils locally and there is a flash of light. You don’t want to mess with a mantis shrimp, but they’re a wonderful example of the diverse modification of the basic crustacean body plan.
Mantis shrimps are not to be confused with the (literally) stunning ‘pistol shrimps’ or ‘snapping shrimps’ (Alpheidae), who in their way also beautifully illustrate the diversity of crustacea. These have one enlarged claw, somewhat bigger than the other. They snap the enlarged claw with terrific force, generating a shock wave – a violent pulse of extreme high pressure immediately followed by extreme low pressure in its wake. The shock wave stuns or kills prey. The noise is among the loudest heard in the sea, comparable to the bellows and squeaks of large whales. Muscles are too slow to generate high-speed movement such as the snapping claws of pistol shrimps or the punching clubs of mantis shrimps (or indeed the jump of a flea). They store energy in an elastic material or spring, and then suddenly release it – the catapult or bow-and-arrow principle.
Crustacea dazzle with diversity. But it is a constrained diversity. To repeat the point, which is the reason I chose crustaceans for this chapter, you can in every species easily recognise the same parts. They are connected to each other in the same order, while differing hugely in shape and size. The first thing you notice about the basic crustacean body plan is that it is segmented. The segments are arrayed from front to rear like a goods train with trucks (American freight train with wagons or cars). The segmentation of centipedes and millipedes is even more obviously train-like because most of their segments are the same. A mantis shrimp or a lobster is like a train whose trucks are the same in a few respects (wheels, bogies, and coupling hooks, say) but different in other ways (cattle wagons, milk tanks, timber carriers, etc.).
Crustaceans in their evolution achieve astonishing variety by changing the trucks over evolutionary time, while never losing sight of the train. Varied as they are, the segments of a mantis shrimp are still visibly a train built to the same pattern as any other crustacean, each bearing a pair of limbs that fork at the tip. The claw of a crab or lobster is a conspicuous example of the fork. As you move from front to rear of the animal, the paired appendages consist of antennae, various kinds of mouth parts, claws, then four pairs of legs. Move backwards further, and the segments of a lobster or mantis shrimp’s abdomen each have small, jointed appendages called swim-merets underneath, on both sides, each often ending in a little paddle. In a lobster or, even more so, a crab, the segments of the thorax and head are hidden beneath a shared cover, the carapace. But their segmentation is betrayed by the appendages, walking legs in the case of four of them, antennae, large claws and mouth parts at the front end. The rear end of the abdomen, the guard’s van (American caboose) of the train, has a special pair of flattened appendages called uropods. When I first visited Australia, I was intrigued to see, laid out in a buffet, what they call bay bugs. These have what look like uro-pods at the front end as well as the rear, a sort of crustacean version of Doctor Dolittle’s Pushmi-Pullyu, but with two rear ends instead of two heads. This is not all that surprising, as we shall now see.
The segmentation of arthropods and vertebrates was once thought to have evolved independently. No longer, and thereby hangs a fascinating tale, a tale that is true too of other segmented animals such as annelid worms. Just as the segments are arrayed in series from front to rear like a train, so the genes controlling the segments are arrayed in series along the length of a chromosome. This revolutionary discovery overturned the whole attitude to zoology that I had learned as a student, and I find it wonderful. To pursue the railway analogy, there’s a train of gene trucks in the chromosome to parallel the train of segment trucks in the body.
It’s been known for more than a century that mutant fruit flies can have a leg growing where an antenna ought to be. That mutation is called antennapedia for obvious reasons, and it breeds true. There are other dramatic mutations in fruit flies, for example bithorax, which has four wings like normal insects, instead of the two-winged pattern that gives flies their name, Diptera. These major mutations are all explained by changes in the sequentially arranged genes in the ‘chromosome train’. When I first saw that bay bug in a Great Barrier Reef restaurant, I immediately wondered whether bay bugs had originally evolved by a mutation similar to antennapedia, in this case duplicating uropods at the front end of the animal.
This kind of effect has been neatly shown by Nipam Patel and his colleagues. They work on a marine crustacean called Parhyale, belonging to the Amphipod order. I remember being fascinated by the hundreds of small amphipods in the cold stream on our farm, in the course of which my parents dug out a pool for us to swim. The swarms of exuberantly jumping ‘sandhoppers’ that we so often encounter on beaches are another familiar example. We met iso-pods, in the flattened shape of ‘pill bugs’, in the previous chapter. Amphipods are different. They are flattened left to right rather than back to belly. And, in Parahyale and many others, their appendages are far from all the same. Some of their legs point in what seems to be the ‘wrong’ direction. Three of the ‘trucks’ appear to be ‘coupled’ up backwards (red shading in left picture on the next page). Patel and his colleagues, by means of ingenious manipulations of the genes controlling the trucks of the train, were able to change the three reversed segments, coupling the trucks so that all the limbs faced in the same direction (right picture). The way this works is that the three backwards segments are replaced by duplicates of the three segments in front of them. The Patel group achieved equally interesting manipulations of other segments but the work, though fascinatingly ingenious, would take us too far afield.
ILLUSTRATION: KALLIOPI MONOYIOS
We vertebrates too are segmented, but in a different way. This is obvious in fish, and it remains pretty clear in our ribs and vertebral column. Snakes carry it to an extreme – sort of like centipedes but with internal ribs instead of external legs. We now understand the embryological mechanism whereby segments are multiplied up. Surprisingly, actually rather wonderfully, it has turned out to be pretty much the same in vertebrates and arthropods. Hence, we understand how it is that different snake species evolve radically different numbers of vertebrae ranging from around 100 to more than 400 – compared to our thirty-three. Vertebrae, whether or not they sprout ribs, all have similar coupling mechanisms to the neighbouring ‘trucks of the train’, and all have similar blood vessels, and sensory and motor nerves, connected to the spinal cord, which passes through them. As I just mentioned, one of the most revolutionary discoveries of recent zoology is that the embryological mechanisms underlying segmentation in arthropods and vertebrates, deep in the lower levels of their palimpsests, are tantalisingly similar. Once again, the truly beautiful fact is that in both groups, genes are laid out along chromosomes in the same order as the segments that they influence.
Although crustaceans all follow the segmented plan boldly written in the depths of the palimpsest, the ‘trucks’ vary so extravagantly that the simile of the train can become rather strained. Sometimes many of the segments join together to form a singular body, as in crabs. Often the appendages sprouting from the segments vary spectacularly, ranging from the formidable claws near the front of a lobster, or the punching clubs of a mantis shrimp, to the swimmerets arrayed under the abdomen. Crustaceans range in size from ‘water fleas’ at less than 1 millimetre to the Japanese spider crab Macrocheira with a limb span that can reach 3 metres (10 feet). Frightening as this creature might be to meet, it is harmless to humans. Imagine the handshake of a lobster, or the punch of a mantis shrimp, that size!
Japanese spider crab
Crabs can be thought of as lobsters with a truncated tail (abdomen) curled up under the main body, so you don’t see it unless you upend the animal. The crab abdomen bears a passing resemblance to the ape/human coccyx, both being made of a handful of segments from an ancestral tail squashed up. Hermit crabs are strictly not crabs, but belong in their own group (Anomura) within the crustacea. Their abdomen is not squashed up underneath them as in true crabs, but soft and curled round to one side, to fit the discarded mollusc shells that hermit crabs inhabit. The process by which they choose their shells, and compete with one another for favoured shells, is fascinating in its own right. But that’s another story. In this chapter they serve as yet another illustration of the wonderful diversity of crustaceans.
The larvae of crustaceans show the group’s diversity at least as gloriously as the adults. But still the basic train design is palpable throughout. Perhaps even more dramatically than in the case of adult crustaceans, it is as though natural selection pulled, pushed, kneaded, or distorted the various segments of the body with wild abandon. Different species of crustacean pass through nameable larval stages, free-living animals in their own right, often leading a very different life from the adults – as caterpillars live very differently from butterflies among the insects. The zoea is one such larval type. It is the last stage before the adult, in crabs, lobsters, crayfish, shrimps, bay bugs, and their kind – the decapod crustaceans.
Overleaf is a page full of assorted zoeas to show how easily the basic crustacean plan can be stretched and bent around in evolution, as though made of modelling clay. What I take away from these exquisite little creatures is that all have the same parts, they just vary the relative sizes and shapes of those parts. They all look like distorted versions of each other. That’s what evolutionary diversification is all about, and the crustacea show it as plainly as any animal group. You can match up the corresponding parts in all the species, and can clearly see how the different species have pulled, stretched, twisted, swelled, or shrunk the same parts in different ways over evolutionary time. It is wondrous to behold, you surely agree.
Crustacean larvae. Always the same parts, yet pulled and pushed in different directions
Zoeas may look a little like the adults they are to become. But they need to survive in a very different world, usually the world of plankton, and their bodies are versatile enough to evolve into all sorts of unlikely distortions – written in surface layers of the palimpsest. Many of them sport long spikes, presumably to make them difficult to swallow. The impressive spikes of the planktonic zoea at top middle are nowhere to be seen in the typical adult crab it is to become. Truth be told, the adult in this case is not easily seen at all under the sea urchin that it habitually carries around on its back – presumably to gain protection via the urchin’s own spikes. Notice the long, prominent abdomen of the larva, with its easily discerned segments. As with all crabs, the adult abdomen is neither long nor prominent but tucked discreetly under the thorax.
An earlier larval stage than the zoea, found in most crustacean life cycles, is the nauplius larva. Unlike zoeas, which bear some sort of resemblance to the adult they will become, naupliuses have an appearance all their own. There’s another larval stage possessed by some crustaceans, the cyprid larva, presumably so called because it resembles the adult of a water flea called Cypris. Perhaps the adult Cypris is an example of the overgrown larva phenomenon, which is a fairly common way for evolution to progress. Below is the cyprid larva of a member of the rather obscure crustacean sub-class, Facetotecta.
Facetotectan larva
This larva is unmistakeably crustacean, with a head shield, and abdominal segments bearing typically crustacean forked appendages. From 1899, when the larvae were first discovered, until 2008, nobody knew what adult facetotectans looked like. And they still have never been seen in the wild. What happened in 2008 was that a group of experimentalists succeeded in persuading larvae to turn into a precursor of the adult. They did it by means of hormone treatment. The subtitle of their paper is ‘Towards a solution to a 100-year-old riddle’. The adults turn out to be soft, unarmoured, slug-like or worm-like creatures with no visible segments and no appendages, presumably parasites, although nobody knows who their victims are. You wouldn’t know, to look at them, that they are crustaceans at all. This experiment recalls a similar one by Julian Huxley with axolotls in 1920. Axolotls are vertebrates, members of the Amphibia. They look like tadpoles; indeed they are tadpoles, but sexually mature tadpoles, and they reproduce. They evolved from larvae who would once have turned into salamanders. The adult stage of their life history was cut off during their evolution, as the larvae became sexually capable. By treating them with thyroid hormone, Julian Huxley succeeded in turning them into the salamanders that their ancestors once were. This experiment may have inspired his younger brother Aldous Huxley to write his novel After Many a Summer, in which an eighteenth-century aristocrat discovered how to cheat death – and developed, 200 years later, into a shaggy, long-armed ape humming a Mozart aria. We humans are ‘larval’ apes!
Those slug-like facetotectans are yet another manifestation of crustacean diversity. They must be descended from adults who had segments and limbs like any respectable crustacean. But the most characteristically crustacean scripts of the palimpsest have been almost completely obliterated by parasitic over-writing, while being retained in the larva. Degenerative evolution of this kind is common in parasites hailing from many parts of the animal kingdom. Within the crustacea, it is also shown to an extreme in certain members of the barnacle family, though not the typical barnacles that encrust rocks at the seaside and prick your bare feet when you walk on them.
As a boy on a seaside holiday, I remember being frankly incredulous when my father told me barnacles are really crustaceans. I thought they were molluscs because, well, they look like molluscs. Nothing like crustaceans, anyway, until you look carefully inside. The barnacles that cling close to rocks look like miniature limpets, while goose barnacles look like mussels on stalks. So how do we know they are really crustaceans? Look inside. Or see Darwin’s own drawing above and you find a shrimp-like creature lying on its back and sweeping the water with its comb-like limbs to filter out swimming morsels of food. As we have by now come to expect, the larvae of barnacles are more unmistakeably crustacean than the adults. Before the adult settles down to its sedentary permanence, it is a free-swimming larva in the plankton. On the left is the nauplius larva of Semibalanus, a small rock barnacle with, for comparison, the nauplius larva of a shrimp, Sicyonia.
Barnacle larva Shrimp larva
Barnacles don’t encrust only rocks. To a barnacle, a whale would seem like a gigantic mobile rock. Not surprisingly, some barnacles make their home on the surface of whales, and there are species of barnacle who live nowhere else. Others ride on crabs, and some of them, especially Sacculina, evolved into the most extreme examples of divergence from normal crustacean form. They moved, in evolutionary time, from the outside of the crabs to the inside, and became internal parasites bearing no apparent resemblance to a barnacle – or even any kind of animal. Parasites often evolve in a direction that could fairly be called degeneration, and Sacculina is an extreme example of this. I shall return to it in the final chapter.
There are many groups of animals that I could have chosen to illustrate evolutionary divergence and variation on a theme. Fish and crustaceans do it perhaps more spectacularly than any other groups, and I chose especially the larvae of crustaceans, partly because, living in the plankton as most of them do, they are less familiar than adult lobsters, crabs, and prawns. I regret that in this book I have been able to show only a small number of them. See the splendid Atlas of Crustacean Larvae, published by Johns Hopkins University Press, for the full and amazing range of diversity that these mesmerising little creatures display. Sir Thomas Browne (1605–82) was unaware of them when he wrote the following, about bees, ants, and spiders, but crustacean larvae might have moved him to even greater eloquence.
Ruder heads stand amazed at those prodigious pieces of nature, Whales, Elephants, Dromedaries and Camels; these I confess, are the Colossus and Majestick pieces of her hand but in these narrow Engines there is more curious Mathematicks, and the civilitie of these little Citizens more neatly sets forth the wisdome of their Maker.
7 In Living Memory
The most recent scripts, those in the top layer of the palimpsest, are those written during the animal’s own lifetime. I said that the genes inherited from the past can be seen as predicting the world into which an animal is going to be born. But genes can predict only in a general way. Conditions change on a timescale faster than the generational turnover with which natural selection can cope. Many details are usefully filled in during the animal’s own lifetime, mostly by memories stored in the brain, as opposed to the genetic book of the dead, in which ‘memories’ are written in DNA. Like gene pools, brains store information about the animal’s world, information that can be used to predict the future, and hence aid survival in that world. But brains can do it on a swifter timescale. Strictly speaking, where learning – indeed, this whole chapter – is concerned, we are talking not about the genetic book of the dead but about the non-genetic book of the living. However, as we shall see, naturally selected genes from the past prime the brain to learn certain things rather than others.
The gene pool of a species is sculpted by the chisels of natural selection, with the result that an individual, programmed as it is by a sample of genes drawn from the well-carved gene pool, tends to be good at surviving in environments that did the carving: that is, an averaged set of ancestral environments. An important part of the body’s equipment for survival is the brain. The brain – its lobes and crevices, its white matter and grey matter, its bewildering byways of nerve cells and highways of nerve trunks – is itself sculpted by natural selection of ancestral genes. The brain is subsequently changed further by learning, during the animal’s lifetime, in such a way as to improve yet further the animal’s survival. ‘Sculpting’ might not seem so appropriate a word here. But the analogy between learning and natural selection has impressed many, not least BF Skinner, a leading – if controversial – authority on the learning process.
Skinner specialised in the kind of learning called operant conditioning, using a training apparatus that later became known as the Skinner Box. It’s a cage with an electrically operated food dispenser. An animal, often a rat or a pigeon, gets used to the idea that food sometimes appears in the automatic dispenser. Built into the wall of the box is a pressable lever or a peckable key. Pressing the lever or key causes food to be delivered, not every time but on some automatically scheduled fraction of occasions. Animals learn to operate the device to their advantage. Skinner and his associates have developed an elaborate science of so-called operant conditioning or reinforcement learning. Skinner Boxes have been adapted to a wide variety of animals. I once saw a film of a rotund gourmand, in a specially reinforced Skinner Box, noisily exercising the lever-bashing skill of his bulbous pink snout. I found it endearing, and I hope the pig enjoyed it as much as I enjoyed the spectacle.
You can train an animal to do almost anything you like, by operant conditioning, and you don’t have to use the automated Skinner Box apparatus. Suppose you want to train your dog to ‘shake hands’, that is, politely raise his right front paw as if to be shaken. Skinner called the following technique ‘shaping’. You watch the animal, waiting until he spontaneously makes a move that you perceive as being slightly in the right direction: an incipient, tentative, upward movement of the right front paw, say. You then reward him with food. Or perhaps not with food but with a signal such as the sound of a ‘clicker’, which he has previously been taught to associate with a food reward. The clicker is known as a secondary reward or secondary reinforcement, where the food is the primary reward (primary reinforcement). You then wait until he moves his right front paw a little further in the right direction. Progressively, you ‘shape’ his behaviour closer and closer to the target you have chosen, in this case ‘shaking hands’. You can use the same shaping technique to teach a dog to do all manner of cute tricks, even useful ones like shutting the door when there’s a cold draught and you are too lazy to get out of your armchair. It is elaborations of the same shaping technique that erstwhile circus trainers employed to teach bears and lions to do undignified tricks.
I think you can see the analogy between behaviour ‘shaping’ and Darwinian selection, the parallel that so appealed to Skinner and many others. Behaviour-shaping by reward and punishment is the equivalent of shaping the bodies of pedigree dogs by artificial selection – domestic breeding. The gene pools of pedigree cattle, sheep, and cats, of racehorses and greyhounds, pigs and pigeons, have been carefully sculpted by human breeders over many generations to improve running speed, milk or wool yield, or in the case of dogs, cats, and pigeons, aesthetic appeal according to various more-or-less bizarre standards. Darwin himself was an enthusiast of the pigeon fancy, and he devoted an early chapter of On the Origin of Species to the power of artificial selection to modify domestic animals and plants.
Now, back to shaping in Skinner’s sense. The animal trainer has a particular end result in mind, such as handshaking in a dog. She waits for spontaneous ‘mutations’ (please note well the quotation marks) of behaviour thrown by an individual animal and selects which ones to reward. As a consequence of the reward, the chosen spontaneous variant is then ‘reproduced’ by the animal itself in the form of a repetition. Next, the trainer waits for a new ‘mutant’ (again please don’t ignore the quotation marks) extension of the desired behaviour. When the dog spontaneously goes a little further in the desired direction of the handshake, she rewards him again. And so on. By a careful regimen of selective rewards, the trainer shapes the dog’s behaviour progressively towards a desired end.
The analogy with genetic selection is evident and was expounded by Skinner himself. But so far, the analogy is with artificial selection. How about natural selection? What role does reinforcement learning play in the wild, where there are no human trainers? Does the analogy with reward learning extend from artificial selection to natural selection. How does reward learning improve the animal’s survival?
Darwin bridged the gap from domestic breeding to natural selection with his great insight that human breeders aren’t necessary. Human selective breeders – let’s call them gene pool sculptors – are replaced by natural sculptors: the survival of the fittest, differential survival in wild environments, differential success in attracting mates and vanquishing sexual rivals, differential parenting skills, differential success in passing on genes. And just as Darwin showed that we don’t need a human breeder, the analogy with learning does without a human trainer. With no human trainers, animals in the wild learn what’s good for them and shape their behaviour so as to improve their chances of survival.
‘Mutation’ consists of spontaneous trial actions that might be subject to ‘selection’ – i.e. reward or punishment. The rewards and punishments are doled out by nature’s own trainers. When a hen scratches the ground with her feet, the action has a good chance of uncovering food of some kind, perhaps a grub or a seed. And so ground-scratching is rewarded, and repeated. When a squirrel bites the kernel of a nut, it’s hard to crack unless held at a particular angle in the teeth. When the squirrel spontaneously discovers the right angle of attack, the nut cracks open, the squirrel is rewarded, the correct alignment of the teeth on the nut is remembered and repeated, and the next nut is cracked more quickly.
Much depends on the rewards that nature doles out. Food is not the only reward that we can use, even in the lab. Once, for a research project that I needn’t go into, I wanted to train baby chickens to peck differently coloured keys in a Skinner Box. There were reasons not to use food as reward, so I used heat instead. The reward was a two-second blast from a heat lamp, which the chicks found agreeable, and they readily learned to peck keys for the heat reward. But now we need to face the question, what, in general, do we mean by ‘reward’? As Darwinians, we must expect that natural selection of genes is ultimately responsible for determining what an animal treats as rewarding. It’s not obvious what will be rewarding, however obvious it might seem to us because we are animals ourselves.
We may define a reward as follows. If a random act by an animal is reliably followed by a particular sensation and if, in consequence, the animal tends to repeat the random act, then we recognise that sensation (presence of food or warmth or whatever it is) as a reward by definition. If a Skinner Box delivered not food or heat but an attractive and receptive member of the opposite sex, I have no doubt that it would – at least under some circumstances – fit the definition of a reward: an animal in the right hormonal condition would learn to press a key to obtain such a reward. A mother animal, cruelly deprived of her child, would learn to press a key to restore access. And the child would learn to press a key to obtain access to its lost mother. I know of no direct evidence for any of those guesses, nor for my conjecture that a beaver would treat access to branches, stones, and mud suitable for dam-building as a reward by the above definition. And a crow in the nesting season would define access to twigs as a reward. But as a Darwinian, in all those cases I make the prediction with a modicum of confidence.
Brain scientists are able to implant electrodes painlessly in the brains of animals, through which they can stimulate the brain electrically. Normally they do this in order to investigate which parts of the brain control which behaviour patterns. The experimenter controls an animal’s behaviour by passing weak electric currents. Stimulate a chicken’s brain here, and the bird shows aggressive behaviour. Stimulate a rat’s brain there, and the rat lifts its right front paw. The neurologists James Olds and Peter Milner conceived a variant of the technique. They handed the switch over to the rat. By pressing a lever, rats were able to stimulate their own brain. Olds and Milner discovered particular areas of the brain where self-stimulation by rats was highly rewarding: the rats appeared to become addicted to lever-pressing. Not only did electrical stimulation in these brain regions fulfil the definition of a reward. It did so in a big way. When the electrodes were inserted in these so-called pleasure centres, rats would obsessively press the switch, to the extent of unfortunately neglecting other vital activities. They would sometimes press the lever at a rate of 7,000 presses per hour, would ignore food and receptive members of the opposite sex and go for the lever instead, would run across a grid delivering electric shocks in order to get at the lever. They would press the lever continually for twenty-four hours until the experimenters removed them for fear they’d die of starvation. The experiments have been repeated on humans with similar results. The difference is that humans could verbalise what it felt like:
A sudden feeling of great, great calm … like when it’s been winter, and you have just had enough of the cold, and you go outside and discover the first little shoots and know that spring is finally coming.
Another woman (and you have to wonder whether the experiment was approved by an ethics committee)
quickly discovered that there was something erotic about the stimulation, and it turned out that it was really good when she turned it up almost to full power and continued to push on her little button again and again … she often ignored personal needs and hygiene in favor of whole days spent on electrical self-stimulation.
Rat addict
It seems plausible that natural selection has wired up animal brains in such a way that external stimuli or situations that are good for the animal (which will vary from species to species) are internally connected to the ‘pleasure centres’ discovered by Olds and Milner.
Punishment is the opposite of reward. If an action is reliably followed by a stimulus X and, as a consequence, the animal becomes less likely to repeat the action, then X is defined as a punishment. In the laboratory, psychologists sometimes use electric shock as punishment. More humanely (I guess) they use a ‘time out’ – an interval during which the animal is denied access to reward. Dog trainers (the practice is frowned upon by many experts, rightly in my opinion) sometimes smack an animal as punishment. When I was at boarding school (and this practice is now not only frowned upon but illegal) my friends and I were from time to time caned by the headmaster, hard enough (astonishing as it now seems) to leave bruises that took weeks to heal (and were admired at bath-time like battle scars). What my offences were I have now forgotten, but I’m sure I didn’t forget while I was still at the school and within range of Slim Jim and Big Ben, the two canes in the headmaster’s quiver. My probability of repeating the offence undoubtedly decreased. Therefore, beatings were punishments by definition, as well as by the intention of the headmaster.
In nature, bodily injury is perceived as painful. If an action is followed by pain, the probability of repeating that action goes down. Not only is that how we define punishment: it also explains what pain is for, in the Darwinian sense. Injury often presages death and hence failure to reproduce. Therefore, the nervous system defines bodily injury as painful.
Sometimes pain is endured when offset by reward. We’ve already seen that rats will endure painful electric shock to get to the self-stimulation lever. The punishment of a bee sting may be offset by the reward of honey. The taste of honey is such an intense reward that many animals, including bears, honey badgers, raccoons, and human hunter-gatherers, are prepared to endure the pain for the sake of it. Rewards and punishments trade off against each other, just as mutually opposing natural selection pressures trade off against each other.
The Darwinian interpretation of pain as a warning not to repeat the preceding action has ethical implications. In our treatment of non-human animals, on farms and hunting fields, in slaughterhouses and bullrings, we are apt to assume that their capacity to suffer is less than ours. Are they not less intelligent than we are? Surely this means they feel pain, if at all, less acutely than us? But why should we assume that? Pain is not the kind of thing you need intelligence to experience.
The capacity to feel pain has been built into nervous systems as a warning, an aid to learning not to repeat actions that caused bodily damage and might next time lead to death. So, if a species is less intelligent, might its pain need to be more agonising, rather than less? Shouldn’t humans, being cleverer, get away with less painful pain in order to learn not to repeat the self-harming action? A clever animal, you might think, could get away with a mild warning, ‘Er, probably a good idea not to do that again, don’t you think?’ Whereas a less intelligent animal would need the sort of dire warning that only excruciating pain can deliver. How should this affect our attitude towards slaughterhouses and agricultural husbandry? Should we not, at very least, give our animal victims the benefit of the doubt? It’s a thought, to put it at its mildest!
Rewards and punishments, pleasure and pain, are so familiar and obvious to us as human animals that you probably wonder why I am labouring the topic in this chapter. Here is where things start to become less obvious and more interesting. The brain’s choice of what shall constitute reward and what punishment is not fixed in stone. It is ultimately determined by genetic natural selection. Animals come into the world equipped with genetically granted definitions of reward and punishment. These definitions have been made by natural selection of ancestral genes. Any sensation associated with an increased probability of death will become defined as painful. A dislocated limb in the wild dramatically increases the probability of death. And it is intensely painful, as I recently and very vocally testified, all the way to the hospital. It has certainly made me take great care to avoid risking a repeat. Copulation increases the probability of reproduction, and genetic selection has consequently made the accompanying sensations pleasurable – which means rewarding. It has been suggested, with support from rat experiments and from the self-stimulating woman mentioned above, that sexual pleasure is directly linked to the ‘pleasure centres’ discovered by Olds and his colleagues. Presumably other sensations, too, could be so linked by natural selection.
I conjecture that by artificial selection you could breed a race of pigeons who enjoy listening to Mozart but dislike Stravinsky. And vice versa. After many generations of selective breeding, perhaps spread over several human lifetimes, the birds would be genetically equipped with a definition of reward such that they would learn to peck a key that caused a recording of Mozart to be played, and would learn to peck a key that caused a recording of Stravinsky to be switched off. And of course, the experiment would be incomplete unless we also bred a line of pigeons who treated Mozart as punishment and Stravinsky as reward. Let’s not get pedantic as to whether it is really Mozart that they’d treat as rewarding. The learned preference would probably generalise from Mozart to Haydn! The only point I am trying to make is that the definitions of what is rewarding and what is punishing are not carved in stone. They are carved in the gene pool and therefore potentially changeable by selection.
As a corollary, I conjecture that, by artificial selection, you could (though I wouldn’t wish to, and it might take an unconscionable number of generations) breed a race of animals who regarded what had previously been pain as rewarding. By definition, it would no longer be pain! It would be cruel to release them into their species’ natural environment because, of course, they would be unfitted to survive there – that’s the whole point. But the mere fact that they enjoy what normal members of their species would call pain is not cruel – because, however hard it is for us to imagine, at least within the confines of my thought experiment, they enjoy it! Anyway, the more interesting conclusion is that, in a state of nature, it is natural selection that determines what is reward and what is punishment. My thought experiment was devised to dramatise that conclusion.
Experimental psychologists have long known that you can train an animal to treat as a reward something that previously had neutral value for the animal. As mentioned above, it’s called secondary reinforcement, and an example is the clicker used by dog trainers. But secondary reinforcement is not what I’m talking about here, and I really want to emphasise that. I’m not talking about secondary reinforcement, but about genetically changing the very definition of what constitutes primary reinforcement. I conjecture that we could achieve it by breeding, as opposed to training. I called it a conjecture because the experiment has not, as far as I know, been done. I’m now talking about selectively breeding animals in such a way as to change their own genetically instilled definition of what constitutes a primary reward in training. To repeat my suggestion above, I predict that by artificial selection you could in principle breed a race of animals who would treat bodily injury as rewarding.
Douglas Adams carried the point to a wonderful comedic reductio in The Restaurant at the End of the Universe. Zaphod Beeblebrox’s table was approached by a large bovine creature, who announced himself as the dish of the day. He explained that the ethical problem of eating animals had been solved by breeding a species that wanted to be eaten and was capable of saying so. ‘Something off the shoulder, perhaps, braised in a white wine sauce?’
Birds don’t naturally listen to human music, so my Mozart/Stravinsky flight of fancy may seem implausible. But do they have a music of their own? A respected ornithologist and philosopher named Charles Hartshorne suggested that we should regard birdsong as music, appreciated aesthetically by the birds themselves. He may not have been wrong, as I shall soon argue.
The role of learning and genes in the development of birdsong has been intensively studied, especially by WH Thorpe, Peter Marler, and their colleagues and students. Many birds learn to imitate the song of their father or other members of their own species. Spectacular feats of mimicry by the likes of mynahs and lyre birds are an extreme. In addition to mimicking other species such as kookaburras (‘laughing jackass’), lyre birds have been recorded by David Attenborough giving remarkably convincing imitations of car alarms, camera shutters (with or without a motor drive), the chainsaws of lumberjacks and the mixed noises of a building site. I have even heard it said, but have failed to verify it, that lyre birds can distinctly mimic Nikon versus Canon camera shutters. Such virtuoso mimics incorporate an amazing variety of such sounds in an ample repertoire.
This raises the question of why many songbirds have large repertoires in the first place. Individual male nightingales can sport more than 150 recognisably distinct songs. Admittedly that’s an extreme, but the general phenomenon of song repertoires demands an explanation. Given that song serves to deter rivals and attract mates, why not stick to one song? Why switch between alternatives? Several hypotheses have been proposed. I’ll mention just my favourite, the ‘Beau Geste’ hypothesis of John Krebs.
In the adventure yarn of that name by PC Wren, an outnumbered unit of the French Foreign Legion was beleaguered in a desert fort, and the commander beat off the opposing force with a spectacular bluff.
As each man fell, throughout that long and awful day, [the commander] had propped him up, wounded or dead, set the rifle in its place, fired it, and bluffed the Arabs that every wall and every embrasure and loophole of every wall was fully manned.
Krebs’s hypothesis is that the bird with a large repertoire is pretending his territory is already occupied to the full. He is, as it were, mimicking the sounds that would emerge from an area if it were already overpopulated with too many members of his species. This deters rivals from attempting to set up their territory in the area. The more densely populated an area is, the less will it benefit an individual to settle there. Above a certain critical density, it pays an individual to leave and seek territory elsewhere, even an otherwise inferior territory. So, by pretending to be many nightingales, an individual nightingale seeks to persuade others to find a different place to set up his territory. In the case of lyre birds, the sound of a chainsaw is just another addition to the repertoire, the size of which conveys the message: ‘Go away, there’s no future for you here, the place is fully occupied.’
Virtuoso impressionists like lyre birds, mynahs, parrots, and starlings are outliers. Probably they are just manifesting, in extreme form, the normal way young birds learn their species song – imitating their fathers or other species members. The point of learning the correct species song is to attract mates and intimidate rivals. And now we return to our discussion of the definition of a reward: how natural selection defines what will be treated as reward and what punishment.
In an experiment by JA Mulligan, three American song sparrows (Melospiza melodia) were reared by canaries in a soundproof room so that they never heard the song of a song sparrow. When they grew up, all three produced songs that were indistinguishable from those of typical wild song sparrows. This shows that song sparrow song is coded in the genes. But it is also learned. In the following special sense. Young song sparrows teach themselves to sing, with reference to a built-in template, a genetically installed idea of what their song ought to sound like.
What’s the evidence for this? It is possible surgically, under anaesthetic and I trust painlessly, to deafen birds. This has been done, with both song sparrows and the related white-crowned sparrows, Zonotrichia leucophrys. If birds of either species are deafened as adults, they continue to sing almost normally: they don’t need to hear themselves sing. As adults, that is. If, however, they are deafened when three months old, too young to sing, their song when they reach adulthood is a mess, bearing little resemblance to the correct song. On the template hypothesis, this is because they have to teach themselves to sing, matching their random efforts against the template of correct song for the species. There’s an interesting difference between the two species. Whereas the song sparrow never needs to hear another bird sing – its template is innate – the white-crowned sparrow makes a ‘recording’ of white-crowned sparrow song, early in life, long before it starts to develop its own song. Once the template is in place, whether innate as in the song sparrow or recorded as in the white-crowned, the nestlings then use it to teach themselves to sing.
Doves and chickens push this to an extreme: they don’t need to listen to themselves, ever. Ring dove (also known as barbary dove) squabs, who have been surgically rendered completely deaf, later develop vocalisations that are just like those of intact doves. That the behaviour is innate is further testified by the fact that hybrid doves coo in a way that is intermediate between the parental species’ coos. As we shall see in Chapter 9, young crickets (nymphs), before they achieve their final moult to become adults, can artificially be induced to display nerve-firing patterns identical to their species song patterns, even though nymphs never sing. And hybrid crickets have a song that is intermediate between the two parental species.
But I want to get back to the sparrows. As we have seen, they teach themselves to sing by listening to their own random babblings, and repeating those fragments that are rewarded by a match to a template – whether the template is genetically built-in (song sparrow), or a ‘recording’ (white-crowned sparrow) remembered from infancy. Did you notice this means that a sound that matches the template is a reward by our definition? We have identified a new kind of reward to add to food and warmth. The song template is a much more specialised kind of reward. It’s easy to see how food (relief of hunger pangs) and warmth (relief of cold discomfort) would be general, non-specific rewards. Indeed, psychologists of the early twentieth century delighted in reducing all rewards to one simple formula, which they called ‘drive reduction’. Hunger and thirst were seen as examples of ‘drives’, analogous to forces driving the animal. A particular pattern of sounds, complicated and characteristic enough to be recognised, by ornithologists and birds alike, as belonging to one species and one species alone, is a reward of a very different kind from generalised drive-reduction. And, I would personally add, of a much more interesting kind. As a student I tried to read up that rat psychology literature, and I’m sorry to admit that I found it rather boring compared to the zoology literature on wild animals.
The ethologist Keith Nelson once gave a conference talk with the title ‘Is bird song music? Well, then, is it language? Well, then, what the hell is it?’ It isn’t language: not rich enough in information, and it doesn’t seem to be grammatical in the sense of possessing a hierarchical nesting of ‘clauses’ enclosing ‘sub-clauses’. Hartshorne, as I mentioned previously, thought it was music, and I think there’s a sense in which he was right. I believe we can make a case that birds have an aesthetic sense, which responds to song. I think there’s also a sense in which it works like a drug. In what follows, I am drawing on a pair of papers that I wrote jointly with John Krebs some years ago, about animal signals generally. We were critically responding to a then prevalent idea that animal signals function to convey useful information from the sender to the recipient, for the mutual benefit of both. For example, ‘I am a male of the species Luscinia megarhynchos, I am in breeding condition, and I have a territory over here.’ The gene’s-eye view of evolution, then quite novel, did not sit well with ‘mutual benefit’. Krebs and I followed the gene’s-eye view to a more cynical view of animal signals, substituting the idea of manipulation of the receiver by the signaller. ‘You are a female of the species Luscinia megarhynchos. COME HITHER! COME HITHER! COME HITHER!’
When an animal seeks to manipulate an inanimate object, it has only one recourse – physical power … But when the object it seeks to manipulate is itself another live animal there is an alternative way. It can exploit the senses and muscles of the animal it is trying to control … A male cricket does not physically roll a female along the ground and into his burrow. He sits and sings, and the female comes to him under her own power.
Now, you might object, surely the female should respond to male song in this way only if it benefits her. But we regarded the relationship between signaller and signallee as an arms race, run in evolutionary time. Perhaps she does put up some sales-resistance. But that provokes the male, on the other side of the arms race, to up the ante: increase the intensity of his signal. And now we come to another strand to the argument, which Krebs and I advanced in the second of our two papers. This concerns what we called ‘mind-reading’. Any animal in a social encounter can benefit itself by predicting (behaving as if predicting) the behaviour of another. There are all kinds of give-away clues. If a male dog raises his hackles, this is an involuntary indicator of an aggressive mood. Responding appropriately to such give-aways is what we dubbed ‘mind-reading’. Humans can become quite adept at mind-reading in this sense, making use of such cues as shifty eyes or fidgety fingers. And now, to bring the argument full circle, an animal who is the victim of a mind-reader can exploit the fact of being mind-read, in such a way as to render inappropriate the very word ‘victim’. A male, for instance, might manipulate a female by ‘feeding’ her mind-reading machinery, perhaps with deceptive cues. This is just to say that where victimhood is concerned, manipulation is not a one-way street. Mind-reading turns the tables. And then manipulation potentially turns them back again, against the mind-reader.
On this view animal signals, to repeat, evolve as an arms race between mind-reading and manipulation, an arms race between salesmanship and sales-resistance. In those cases where the sender benefits from being mind-read and the receiver benefits from being manipulated, we suggested that the ensuing signal should shrink to a ‘conspiratorial whisper’. Why escalate a signal when there is no push-back. Conversely – the opposite of a conspiratorial whisper – loud, conspicuous, vivid signals will arise where the recipient does not ‘want’ to be manipulated. In such cases the arms race, in evolutionary time, escalates towards exaggeration on the part of the sender, to combat increased ‘sales-resistance’ on the part of the receiver.
Why, you might wonder, should there ever be ‘sales-resistance’? It’s most easily seen in the case of the arms race between the sexes. You might think it’s always a good idea for males and females to get together and coordinate. You’d be wrong, and for an interesting reason. Ultimately because sperms are smaller and more numerous (‘cheaper’) than eggs, females need to be choosier than males. A male is more likely to ‘want’ to mate with a female than the female will ‘want’ to mate with him. Females pay a higher cost if they mate with the wrong male than males pay if they mate with the wrong female. In extreme cases, there is no such thing as the wrong female. Hence males are more likely to escalate salesmanship when trying to persuade females. And females more likely to favour sales-resistance. Where you see high-amplitude signals – bright colours, loud sounds – that means there’s probably sales-resistance. Where there’s no sales-resistance, signals are likely to sink to a conspiratorial whisper. Conspicuous signals are costly, if not in energy, in risk of attracting predators or alerting prey.
I’ve been a bit terse in condensing two full-sized papers into four paragraphs. It should become clearer when I now apply it to birdsong. Birdsong is too loud and conspicuous to be a ‘conspiratorial whisper’, so let’s go for the other extreme: increased sales-resistance fomenting exaggerated efforts to manipulate. Is birdsong an attempt to manipulate the behaviour of females and other males: an attempt to change their behaviour to the advantage of the singer?
If biologists wish to manipulate the behaviour of a bird, what can they do? This chapter has already introduced one possibility that birds themselves, unfortunately for them, cannot do: electrical stimulation of another’s brain through implanted electrodes. The Canadian surgeon Wilder Penfield pioneered the technique on human patients whose brains were undergoing surgery for other reasons. By exploring different parts of the cerebral cortex, he was able to jerk specific muscles into action like a puppeteer pulling strings. When he drew a map of which parts of the brain pulled which muscles, it looked like a caricature of a human body, the so-called ‘motor homunculus’ (there’s also a ‘sensory homunculus’ on the left-hand side of the picture, which looks rather similar). The grotesque exaggeration of the homunculus’s hand goes some way towards explaining the formidable skill of a concert pianist, for example. And the large brain area given over to the lips and tongue is no doubt related to speech. The German biologist Erich von Holst, working with chickens in a deeper part of the brain, the brain stem, was able to control what might be called the bird’s ‘mood’ or ‘motivation’, resulting in changes to the observed behaviour, including ‘guiding hen to nest’ and ‘uttering call warning of predator’. I repeat that these operations are painless, by the way. There are no pain receptor nerves in the brain.
Now, a male nightingale might well ‘wish’ he could implant electrodes in a female’s brain and control her behaviour like a puppet. He can’t do that, he’s no von Holst, and he has no electrodes. But he can sing. Might song have something like the same manipulative effect? No doubt he might benefit, if only he could inject hormones into her bloodstream. Again, he can’t literally do that. But evidence on ring doves and canaries suggests that birds can do something close to it. Male doves vigorously court females with a display called the bow-coo. The bow is a characteristic movement resembling an unusually obsequious human bow, and it is accompanied by an equally characteristic coo, consisting of a staccato note followed by a purring glissando. A week’s exposure to a bow-cooing male reliably causes massive growth of a female’s ovary and oviduct, with accompanying changes in sexual, nest-building, and incubation behaviour. This was shown by the American animal psychologist Daniel S Lehrman. Lehrman went on to show that the behaviour of male ring doves has a direct effect on the hormones circulating in female bloodstreams. Parallel work by Robert Hinde and Elizabeth Steel in Cambridge on nest-building behaviour in female canaries showed the same thing.
The ring dove and canary type experiments have not been done on nightingales, but it probably is generally the case that male birdsong changes the hormonal state of females. Male song manipulates female behaviour, as though the male had the power to inject her with chemicals, presumably nightingales no less than other species.
My heart aches, and a drowsy numbness pains
My sense, as though of hemlock I had drunk,
Or emptied some dull opiate to the drains
One minute past, and Lethe-wards had sunk.
John Keats was not a bird, but his brain was a vertebrate brain like a female nightingale’s. The male nightingale song drugged him – almost to death in his poetic fancy. If it can so intoxicate the mammal Keats, might it not have a yet more powerful effect on the vertebrate brain that it was designed to beguile, the brain of another nightingale? To answer yes, we hardly need the testimony of the dove and canary experiments. I believe natural selection has shaped the male nightingale’s song, perfecting its narcotic power to manipulate the behaviour of a female, presumably by causing her to secrete hormones.
But now, let’s return to learning, and the deafening experiments. The evidence shows that young white-crowned sparrows and song sparrows teach themselves to sing with reference to a template. Young white-crowneds need to hear song in order to make their ‘recording’ of the template. But any old song won’t do. They have to hear the song of their own species. This shows that, even when the template is recorded, there is an innate component to it, built in by the genes. And in the case of the song sparrow, it doesn’t even need to be recorded.
I suggested above that birdsong might be appreciated as music, enjoyed aesthetically by the birds themselves. We are now in a position to spell out the argument. The male teaches himself to sing by comparing his ‘random’ burblings against a template. The template serves as reward, positively reinforcing those random attempts that happen to match it. Reflect, now, that the male songster has a brain much like the female he later hopes to manipulate. When he teaches himself to sing, he is finding out which fragments of song appeal to a bird of his own species (himself … but later, a female). What is that, if not the employment of aesthetic judgment?
Burble. I like it (conforms to my template). Repeat it.
Burble warble. Ooooh, that’s even better. I like that very much.
It really turns me on. Repeat that too. YES!
What turns him on will probably turn a female on too, for they are, after all, members of the same species with the same typical brain of the species. At the end of the developmental period when the final adult song has been perfected, it will be equally beguiling to the singer himself and his female target. He learns to sing whichever phrases turn him on. There seems no powerful reason to deny that both enjoy an aesthetic experience – as did John Keats when he heard the nightingale.
We’ve come a long way from the idea of reward as generalised ‘drive reduction’. And we’ve arrived at what I think is a much more interesting place. The lesson of these experiments on birdsong is that reward can be a highly specific stimulus, or stimulus-complex, ultimately laid down by genes: what Konrad Lorenz, one of the fathers of ethology, dubbed the ‘Innate Schoolmarm’.
If this is right, we should predict the following result in a Skinner Box. A young song sparrow who has never heard song should learn to peck a key that yields the sound of song sparrow, but no other species’ song. That hasn’t been done, but various similar experiments have. Joan Stevenson found that chaffinches preferred to settle on a perch attached to a switch that turned on chaffinch song. However, the control sound for comparison was white noise, not the song of another species. Her chaffinches, moreover, were not naive but wild caught. Her method was adopted by Braaten and Reynolds with hand-reared, naive zebra finches and using starling song for comparison instead of white noise. They showed a clear preference for perches that played zebra finch song rather than starling song. It would be nice to do a big experiment with, say, naive young songbirds of six different species, with six perches, each perch playing one of the six songs. We should predict that each species should learn to sit on the perch that played their own species song. It wouldn’t be an easy experiment. Hand-rearing baby songbirds is hard work. A neat design might be to give each baby to foster parents of one of the other six species.
The template of song sparrows is innate. The ‘recorded’ template of young white-crowned sparrows, laid down early in life before they start singing, looks like the kind of learning called ‘imprinting’, most closely associated with Konrad Lorenz and his pursuing geese. Imprinting was first recognised in nidifugous baby birds.
Nidifugous, from the Latin, means ‘fleeing the nest’. Nidifugous hatchlings start life equipped with warm and protective downy feathers and well-coordinated limbs. Examples are ducklings, goslings, moorhen chicks, chicken chicks, ground-nesting species generally. Within hours of hatching, as soon as their feathers are dry, nidifugous chicks are up and about, walking competently, looking around alertly, pecking at food prospects, and dogging parental footsteps. The opposite of nidifugous is nidicolous. All songbirds are nidicolous. Nidicolous bird species typically nest in trees. The babies are helpless, naked, incapable of walking (they’re in a nest balanced up a tree, where would they walk to?), incapable of feeding themselves but with a huge gaping beak, a begging organ into which their parents tirelessly shovel food. Many seabirds such as gulls are nidifugous in that they hatch with downy feathers and don’t gape for food. But they are dependent on the parents bringing food that they regurgitate for the chicks.
Mammals, too, have their own equivalent to nidifugous (think gambolling lambs; and wildebeest calves must follow the herd on the day they’re born) and nidicolous (baby mice are hairless and helpless). Man is a nidicolous species. Our babies are almost completely helpless. There has been an evolutionary trade-off between a pressure towards a bigger brain, conflicting with the difficulty of being born imposed by a large head. The result was to push our babies towards being born earlier, before the head became insufferably (for the mother) large to push out. The result was to make us even more helplessly nidicolous than other ape species.
Nidifugous species, both mammals and birds, are in danger if they become separated from their parent(s), and this is where imprinting comes in. Nidifugous babies, as soon as they hatch, do something equivalent to taking a mental photograph of the first large moving object they see. They then follow it about, at first very closely, then venturing gradually further away as they grow older. The first moving object they see is usually their parent, so the system works fine in nature. Goslings hatched in an incubator, however, tend to imprint on a human carer, for example Konrad Lorenz.
The idea of imprinting in mammals is imprinted in child minds by the nursery rhyme ‘Mary had a little lamb’ (Everywhere that Mary went / The lamb was sure to go). Imprinted animals, both birds and mammals, often retain their mental photograph into adulthood and attempt to mate with creatures (such as humans) who resemble it. One of the reasons zoos have difficulty with breeding is that the frustrated animals hanker after their keepers.
Imprinting may or may not be a special kind of learning. Some say it’s just a special case of ordinary learning. It’s controversial. Either way, it’s a nice example of a recent, ‘top layer’ palimpsest script. The genes could have equipped the animal with a built-in image or specification of precisely what to follow, what to mate with, what song to sing. Instead, they equip the animal with rules for colouring in the details.
Reinforcement learning and imprinting are not the only kinds of learning by which an animal, during its own lifetime, supplements the inherited ancestral wisdom. Elephants make important use of traditional knowledge. The brains of old matriarchs contain a wealth of knowledge about such vital matters as where water can be found. Young chimpanzees learn from their elders skills such as using a stone as a hammer to crack nuts, and preparing a twig to probe termite nests. The handover from adept to apprentice is a kind of inheritance, but it is memetic, not genetic. This is why these skills are practised in particular local areas and not others. The skill of sweet potato washing in Japanese macaques is another example. So is pecking through the foil or cardboard lids of milk bottles by British tits, in the days when milk was delivered daily on the doorstep. In this case, the skill was seen to radiate geographically outwards from focal points, in the manner of an epidemic.
What else equips animals to improve on their genetic endowment, apart from learning? Perhaps the most important example of a ‘memory’ not mediated by the brain is the immune system. Without it, none of us would have survived our first infection. Immunology is a huge subject, too big for me to do it justice in this book. I’ll say a few words, just enough to make the point that genes don’t attempt the impossible task of equipping bodies with information about all the bacteria, viruses, and other pathogens that they might ever encounter. Instead, genes furnish us with tools for ‘remembering’ past infections, forearming us against future infection. We carry not just the genetic book of the dead (the ancestral past) but a special molecular book in which is written a continually updated medical record of our infections and how we dealt with them.
Geese imprinted on Konrad Lorenz. A special kind of learning, which casts light on the mind of birds
Bacteria, too, suffer from infection – by viruses called bacteriophages, or phages for short – and they have their own immune system, which is rather different from ours. When a bacterium is infected, it stores a copy of part of the viral DNA within its own single circular chromosome. These copies have been called ‘mug shots’ of criminal viruses. Each bacterium sets aside a portion of its circular chromosome as a kind of library of these mug shots. The mug shots will later be used to apprehend criminals in the form of the same or related viruses making a reappearance. The bacterium makes RNA copies of the mug shots. These RNA images of ‘criminal’ DNA are circulated through the interior of the bacterial cell. If a virus of a familiar type should invade, the appropriate mug shot RNA binds to it, and special protein enzymes cut up the joined pair, rendering the virus harmless.
The bacterium needs a way to label the mug shots, so they aren’t confused with its own DNA. They are labelled by the presence of adjacent nonsense sequences of DNA, which are palindromes called CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats. Each time a bacterium is assailed by a new kind of virus, another CRISPR-flanked mug shot is added to the CRISPR region of the chromosome. It’s another story, but CRISPR has become famous because scientists have discovered a way in which the bacterial skill can be borrowed for the human purpose of editing genomes.
The vertebrate immune system works rather differently. It’s more complicated but we too have a ‘memory’ of pathogens of the past. Our immune system is then able to mount a rapid response, should any of those old enemies venture to return. This is why those of us who have had mumps or measles can safely mingle with victims, confident that we shall not get the disease a second time. And the enormous boon of vaccination works by tricking the immune system into building up a false memory, normally by injecting either a killed strain or a weakened strain of the pathogen.
The Covid-19 pandemic was largely stopped in its tracks, saving thousands of lives, by a wonderful new type of vaccine, the mRNA vaccine. The role of mRNA (messenger RNA) is to convey coded messages from DNA in the nucleus to where proteins are made to the code’s specification. Now, here’s how mRNA vaccines work. Instead of injecting a killed or weakened strain of the dangerous virus, a harmless protein in its jacket is first sequenced. The genetic code appropriate to that protein is then written into mRNA. The mRNA does its thing, which is to code the synthesis of protein – in this case the harmless jacket protein of the Covid virus. And then, the immune system does its thing and attacks the virus if it enters the body, recognising it by the protein in its jacket.
What is especially interesting, in pursuit of our analogy between learning and evolution, is that the vertebrate immune system’s ‘memory’ (unlike the bacterial one) works in a kind of Darwinian way, by an internal version of natural selection, within the body. But that is another story, beyond our scope here.
The immune system, and the brain, are the two rich data banks in which entries are written during the animal’s own lifetime, to update the genetic book of the dead, or ‘colour in the details’. More minor examples need mentioning for the sake of completeness. Darkening of the skin is a kind of memory of lying out in the sun. It provides useful screening against the damage that the sun’s rays, especially ultraviolet, can wreak, for example in causing skin cancers. This is a case where genetic and post-genetic scripts both contribute. People whose ancestors have lived many generations in fierce tropical sun tend to be born with dark skin, for example native Australians, many Africans, and people from the south of the Indian sub-continent. By contrast, those whose ancestors have lived many generations at higher latitudes are at risk from too little sun. They tend to lack Vitamin D and hence are prone to rickets. Genetic natural selection at high latitudes has therefore favoured lighter skins. That’s all written in the genetic book of the dead. But this chapter is about palimpsest scripts written after birth, and here is where suntan comes in. Browning in the sun, a post-birth ‘colouring-in’, achieves in light-skinned, high-latitude people a temporary approach towards what is written into the genome of tropical peoples. You could think of the two as short-term memory and long-term memory of sunlight.
Another example is acclimatisation to high altitude. The higher you go, the thinner the atmosphere, where lack of oxygen causes ‘mountain sickness’, whose symptoms include headaches, dizziness, nausea, and complications of pregnancy. People whose ancestors have long lived at high altitude have evolved genetic adaptations such as elevated haemoglobin levels in the blood. Those ‘memories’ of ancestral natural selection are written in the genetic book of the dead. Interestingly, the details differ between Andean and Himalayan peoples, not surprisingly because they have independently, over 10,000 years or more, adapted to a lack of oxygen in mountainous regions widely separated from each other. There are several routes to acclimatisation, and it is not surprising that different mountain peoples have followed different evolutionary paths.
Once again, ancestral scripts can be over-written during the animal’s own lifetime. Lowland people who move to high areas can acclimatise. In 1968, when the Olympic Games were held in Mexico City, national teams deliberately arrived early, in order to train at the high altitude (2,200 metres, more than 7,000 feet) of the Anahuac Plateau. Changes that develop during a period of weeks living at high altitude are written into the post-birth palimpsest layer. As with skin colour, they mimic the older, gene-authored scripts.
Talking of skin colour, the ‘paintings’ of Chapter 2 were all done by ancestral genes, replaying ancestral worlds. But there are some animals who can repaint their skin on the fly, to match the changing background they happen to be sitting on at any given moment. This is another example of the non-genetic book of the living. Chameleons are proverbial, but they aren’t the top virtuosi when it comes to impromptu skin artistry. Flatfish such as plaice can change not just their colour but also their patterning. The one above is capable of changing its colour to match the yellow background on which it now sits. But you only have to take one look at it to read it as a detailed description of the lighter bottom it has just moved off, with its mottled pattern projected by shimmering light from surface ripples.
Even flatfish are upstaged by octopuses and other cephalopod molluscs, who have perfected the art of dynamic cross-dressing to an astonishing extent. And they, uniquely in the animal kingdom, do their changes at high speed. Roger Hanlon, while diving off Grand Cayman, saw a clump of brown seaweed suddenly turn ghostly white and swim rapidly away in a puff of sepia smoke. It was an octopus, with a perfect painting of brown seaweed all over its skin. As Hanlon approached, an emergency order from the octopus brain twitched the muscles controlling the tiny bags of pigment peppering the skin. Instantaneously, the whole surface changed colour from perfect camouflage (trying not to be noticed by predators) to scary white (startling would-be predators). Finally, the puff of dark brown ink deflects the attention of would-be predators away from the fleeing octopus.
Thaumoctopus mimicus
Sea snake
Thaumoctopus mimicus
Flounder
Hanlon saw an octopus (upper right) in Indonesian waters, Thaumoctopus mimicus, who mimicked a flounder (lower right), not just its appearance but also its behaviour, stopping and starting in jerky glides over the sand surface. What’s the point? Hanlon is unsure, but he suspects it deceives predators who like to bite off a tentacle but cannot cope with a substantial flatfish. This octopus also can put on a show with its tentacles (upper left), making each one resemble a venomous sea snake (lower left) common in tropical waters. Cephalopods can even change their skin’s texture, ruffling up or puckering it into extraordinary shapes. A colleague once dramatised their other-world strangeness by beginning a lecture on Cephalopods: ‘These are the Martians.’
The main thesis of this book is that the animal can be read as a description of much older, ancestral environments. This chapter has shown how further details are added, on top of the ancestral palimpsest scripts. Earlier chapters invoked a future scientist, SOF, presented with an animal and challenged to read its body and reconstruct the environments that shaped it. There, we spoke only of ancestral environments, described in the genomic database and its phenotypic manifestations. In this chapter we’ve seen how SOF could supplement her reading of ancestral environments, by additional readings of the more recent past, including the other two great databases that supplement the genes, namely the brain and the immune system. Today’s doctors can read your immune system database and reconstruct a moderately complete history of the infections you have suffered – or been vaccinated against. And if SOF could read what is written in the brain (a big if, she really would have to be a scientist of the future), she could reconstruct much detail of the animal’s past environments in its own lifetime.
Experience, either literal experience stored in the brain as memories, disease experience, or genetic ‘experience’ sculpted into the genome by natural selection, enables an animal to predict (behave as if predicting) what will happen next. But there’s one more trick that the brain can pull off in order to foretell the future: simulation, or imagination. Human imagination is a much grander affair than this but, from the point of view of an animal’s survival, and our analogy between natural selection and learning, we could regard imagination as a kind of ‘vicarious trial and error’. Unfortunately, that particular phrase has been usurped by rat psychologists. A rat in a ‘maze’ (usually just a choice between turning left or right) will sometimes physically vacillate, looking left, right, left, right before finally making up its mind. This ‘VTE’ may be a special case of imagining alternative futures, but it’s probably safest if I reluctantly surrender the phrase itself to the rat-runners and not use it here. Instead, I’ll prefer an analogy with computer simulation: the animal’s brain simulates likely consequences of alternative actions internally, thereby sparing itself the dangers of trying them out externally in the real world.
I said the human imagination is a much grander affair. It finds expression in art and literature. Words written by one person can call up an imagined scene in the brain of another. Gertrude’s lament for Ophelia can move a reader to tears four centuries after the poet’s death. Less ambitiously, let me ask you to imagine a baboon atop a steep cliff. Someone has balanced a plank over the edge of the cliff. Resting at the far end of the plank, over the abyss, is a bunch of bananas. Imagine them, yellow and tempting. The baboon is indeed tempted to venture out along the plank. However, his brain internally simulates the consequence, sees that his extra weight would topple the plank – imagines himself tumbling to his death. So he refrains.
Let’s now imagine a range of brains faced with the banana on the plank. First, the genetic book of the dead can build in an innate fear of heights. I myself experience a tingling of the spine, which inhibits me from walking within a metre of the edge of a precipice such as the Cliffs of Moher in Western Ireland. This, even when there’s no wind and no reason to suppose that I would fall.
The visual cliff
A whole genre of experimentation, the so-called ‘visual cliff’ experiment, has been devised to investigate fear of heights. The baby in the picture is quite safe: there’s strong glass over the ‘cliff’. I recently visited one of the world’s tallest buildings where one could stand on toughened glass looking down on the street far below. Perfectly safe, and I watched others walk on the glass, but I avoided doing so myself. Irrational, but innate fears are hard to conquer. Perhaps an innate fear of heights is inherited from tree-climbing ancestors who survived because they possessed it. Not everyone succumbs, of course. These New York construction workers are enjoying a relaxed lunch with evident (though incomprehensible to me) nonchalance.
Death by falling is the crudest route through which a fear of heights might be built into animals. Another way is by learning, reinforced by pain. Young baboons who fall down smaller cliffs are not killed, but they experience pain. Pain, as we’ve seen, is a warning: ‘Don’t do that again. Next time the cliff might be higher, and it will kill you.’ Pain is a kind of vicarious, relatively safe substitute for death. Pain stands in for death in the analogy between learning and natural selection.
The ‘detour problem’
But now, since you are human with a human power of imagination, you are probably simulating in your brain an unusually bright baboon. He sees himself, in his own imagination, pulling the plank carefully inwards, complete with bananas. Or reaching out with a stick and nudging the bananas along the plank towards him. Probably only highly evolved brains are capable of such simulations. Even dogs (above) perform surprisingly poorly on the so-called ‘detour problem’. But if he succeeds, this imaginative baboon risks no pain and doesn’t fall to his death but does it all by internal simulation. He simulates the fall in his imagination, and consequently refrains from venturing out along the plank. He then simulates the safe solution to the problem and gets the bananas.
I need hardly say that internal simulation of dangerous futures is preferable to the actual actions. Provided, of course, that the simulation leads to accurate prediction. Aircraft designers find it cheaper and safer to test model wings in wind tunnels rather than actual wings on real aeroplanes. And even wind tunnel models are more expensive than computer simulations or analytical calculations, if these can be done. Simulation still leaves some room for uncertainty. The maiden flight of a new plane is still an informative event, however rigorously its parts have been subjected to ordeal by wind tunnel or computer simulation.
Once a sufficiently elaborate simulation apparatus is in place in a brain, emergent properties spring up. The brain that can imagine how alternative futures might affect survival can also, in the skull of a Dante or a Hieronymus Bosch, imagine the torments of Hell. The neurons of a Dalí or an Escher simulate disturbing images that will never be seen in reality. Non-existent characters come alive in the head of the great novelist and in those of her readers. Albert Einstein, in imagination, rode a sunbeam to his place among the immortals with Newton and Galileo. Philosophers imagine impossible experiments – the brain in a vat (‘Where am I?’), atom-for-atom duplication of a human (which ‘twin’ would claim the ‘personhood’?). Beethoven imagined, and wrote down, glories that he tragically could never hear. The poet Swinburne happened upon a forsaken garden on a sea cliff, and his imagination revived a pair of long-dead lovers whose eyes went seaward, ‘a hundred sleeping years ago’. Keats reconstructed the ‘wild surmise’ with which stout Cortez and all his men stared at the Pacific, ‘silent upon a peak in Darien’.
The ability to perform such feats of imagination sprang, emergently, from the Darwinian gift of vicarious internal simulation within the safe confines of the skull, of predicted alternative actions in the unsafe real world outside. The capacity to imagine, like the capacity to learn by trial and error, is ultimately steered by genes, by naturally selected DNA information, the genetic book of the dead.
8 The Immortal Gene
The central idea of The Genetic Book of the Dead grows out of a view of life that may be called the gene’s-eye view. It has become the working assumption of most field zoologists studying animal behaviour and behavioural ecology in the wild, but it has not escaped criticism and misunderstanding, and I need to summarise it here because it is central to the book.
There are times when an argument can helpfully be expressed by contrast with its opposite. Disagreement that is clearly stated deserves a clear reply. I could hypothetically invent the opposite of the gene’s-eye view, but fortunately I don’t need to because the diametric opposite has been put, articulately and clearly, by my Oxford colleague (and incidentally my doctoral examiner, on a very different subject long ago) Professor Denis Noble. His vision of biology is alluring, and is shared by others whose expression of it is less explicit and less clear. Noble is clear. He ringingly hits a nail on the head, but it’s the wrong nail. Here is his lucid and unequivocal statement, right at the beginning of his book Dance to the Tune of Life:
This book will show you that there are no genes ‘for’ anything. Living organisms have functions which use genes to make the molecules they need. Genes are used. They are not active causes.
That is precisely and diametrically wrong, and it will be my business in this chapter to show it.
If genes are not active causes in evolution, almost all scientists now working in the fields known as Behavioural Ecology, Ethology, Sociobiology, and Evolutionary Psychology have been barking up a forest of wrong trees for half a century. But no! ‘Active causes’ is precisely what genes must be: necessarily so if evolution by natural selection is to occur. And, far from being used by organisms, genes use organisms. They use them as temporary vehicles, which they exploit in the service of journeying to future generations. This is not a trivial disagreement, no mere word game. It is fundamental. It matters.
A physiologist of distinction, Denis Noble is captivated by the shattering complexity of the organism, of every last one of its trillions of cells. He sets out to impress his readers with the intricate co-dependency of all aspects of the living organism. As far as this reader is concerned, he succeeds. He sees every part as working inextricably with every other part in the service of the whole. In that service – and this is where he goes wrong – he sees the DNA in the nucleus of a cell as a useful library to be drawn upon when the cell needs to make a particular protein. Go into the nucleus, consult the DNA library there, take down the manual for making the useful protein, and press it into service. I devised that characterisation of Noble’s position during a public debate with him in Hay-on-Wye, and he vigorously nodded his assent. DNA, in Noble’s view, is the servant of the organism, in just the same way as the heart or the liver or any cell therein. DNA is useful to make a particular enzyme when you need it, just as the enzyme is useful for speeding up a chemical reaction … and so on.
Dance to the Tune of Life has the subtitle ‘Biological Relativity’. Noble’s usage of ‘relativity’ has only a tenuous and contrived connection with Einstein’s, but it exactly matches that of the historian Charles Singer in A Short History of Biology:
The doctrine of the relativity of functions is as true for the gene as it is for any of the organs of the body. They exist and function only in relation to other organs.
Now here is Noble some ninety years later. He has the advantage over Singer in that we now know genes are DNA. But his sentiment about biological relativity, in conjunction with the quotation above, resonates perfectly with Singer’s.
The principle of Biological Relativity is simply that there is no privileged level of causation in biology.
I shall argue that, no matter how complicatedly interdependent the parts of a living organism are when we are talking physiology, when we move to the special topic of evolution by Darwinian natural selection there is one privileged level of causation. It is the level of the gene. To justify that is the main purpose of this chapter.
Here’s Singer’s whole vitalistic passage from which I took the above quotation. It’s the peroration of his book and is a perfect prefiguring of Noble’s ‘relativity’.
Further, despite interpretations to the contrary, the theory of the gene is not a ‘mechanist’ theory. The gene is no more comprehensible as a chemical or physical entity than is the cell or, for that matter, the organism itself. Further, though the theory speaks in terms of genes as the atomic theory speaks in terms of atoms, it must be remembered that there is a fundamental distinction between the two theories. Atoms exist independently, and their properties as such can be examined. They can even be isolated. Though we cannot see them, we can deal with them under various conditions and in various combinations. We can deal with them individually. Not so the gene. It exists only as a part of the chromosome, and the chromosome only as part of a cell. If I ask for a living chromosome, that is, for the only effective kind of chromosome, no one can give it to me except in its living surroundings any more than he can give me a living arm or leg. The doctrine of the relativity of functions is as true for the gene as it is for any of the organs of the body. They exist and function only in relation to other organs. Thus the last of the biological theories leaves us where the first started, in the presence of a power called life or psyche which is not only of its own kind but unique in each and all of its exhibitions.
Watson and Crick blew that out of the water in 1953. The triumphant field of digital genomics that they initiated falsifies every single one of Singer’s sentences about the gene. It is true but trivial that a gene is impotent in the absence of its natural milieu of cellular chemistry. Here’s Noble again, bringing Singer up to date but agreeing with his sentiment:
There really is nothing alive in the DNA molecule alone. If I could completely isolate a whole genome, put it in a petri dish with as many nutrients as we may wish, I could keep it for 10,000 years and it would do absolutely nothing other than to slowly degrade.
Obviously a gene in a petri dish cannot do anything, and it would degrade as a physical molecule within months, let alone 10,000 years. But the information in DNA is potentially immortal, and causally potent. And that is the whole point. Never mind the physical molecule and never mind the petri dish. Let the sequence of A, T, C, G triplet codons of an organism’s genome be written on a long paper scroll. Or, no, paper is too friable. To last 10,000 years, carve the letters deep in the hardest granite. To be sure, world-spanning ranges of highland massif would still be too small, but that is a superficial difficulty. In 10,000 years, if scientists still walk the Earth, they will read the sequence and type it into a DNA-synthesising machine such as we already have in early form. They’ll have the embryological knowhow to create a clone of whoever donated the genome in the first place (just a version of the way Dolly the sheep was made). Of course, the DNA information would need the biochemical infrastructure of an egg cell in a womb, but that could be provided by any willing woman. The baby she bears, an identical twin of its 10,000-year dead predecessor, would be living repudiation of Singer and Noble.
That the information necessary to create the twin could be carved in lifeless granite and left for 10,000 years is a truth that fills me with amazement still, even seventy years after Watson and Crick prepared us for it. Charles Singer would be forced to recant his vitalism, while Charles Darwin, I suspect, would exult.
The point is that, transitory though physical DNA molecules themselves may be, the information enshrined in the nucleotide sequence is potentially eternal. Essential though the surrounding machinery is – messenger RNA, ribosomes, enzymes, uterus and all – they can be provided anew by any woman. But the information in an individual’s DNA is unique, irreplaceable, and potentially immortal. Carving it in granite is a way to dramatise this. But it’s not the practical way. In the normal course of events, DNA information achieves its immortality through being copied. And copied. And copied. Copied indefinitely, potentially eternally, down the generations. Of course, DNA can’t copy itself on its own. Obviously, just as a computer disc can’t copy itself without supporting hardware, DNA needs an elaborate infrastructure of cellular chemistry. But of all the molecules that are involved in the process, however essential they may be for the copying process, only DNA is actually copied. Nothing else in the body is so honoured. Only the information written in DNA.
You might think every part of the body is replicated. Does not every individual have arms and kidneys, and are these not renewed in every generation? Yes, but you’d be utterly wrong if you called it replication in the sense that genes are replicated. Arms and kidneys don’t replicate to make new arms and kidneys. Here’s the acid test, and it really matters. Make a change to an arm, say by a fracture or by pumping iron, and the change is not propagated to the next generation. Make a change in a germline gene, on the other hand, and the mutation may long outlast 10,000 years, copied again and again down the generations.
Before the invention of printing, biblical scriptures were painstakingly copied by scribes at regular intervals to forestall decay. The papyrus might crumble but the information lived on. Scrolls don’t replicate themselves. They need scribes, and scribes are complicated, just as the enzymes involved in DNA replication are complicated. Through the mediation of scribes/enzymes information in scrolls/DNA is copied with high fidelity. Actually, scribes might copy with lower fidelity than DNA replication can achieve. With the best will in the world human copyists make errors, and some zealous scribes were not above a little well-meant improvement. Older manuscripts of Mark 9, 29 quote Jesus as saying that a particular kind of demonic possession can be cured only by prayer. Later versions of the text, not content with mere prayer, say ‘prayer and fasting’. It seems that some zealous scribe, perhaps belonging to a monkish order that especially valued fasting, thought to himself that Jesus must surely have meant to mention fasting, how could he not? So it was scarcely taking a liberty to put the words into his mouth. DNA is capable of higher fidelity of replication than that, but even DNA is not perfect. It does make mistakes – mutations. And in one important respect, DNA is unlike the over-zealous scribe: mutation is never biased towards improvement. Mutation has no way to judge in which direction improvement lies. Improvement is judged retrospectively. By natural selection.
So the information in DNA is potentially eternal even though the physical medium of DNA is finite. And let me repeat why this matters. Only the information contained in DNA is destined to outlive the body. Outlive in a very big way. Most animals die in a matter of years if not months or weeks. Few survive the ravages of decades, almost none centuries. And their physical DNA molecules die with them. But the information in the DNA can last indefinitely. I once attended an evolution conference in America where, at the farewell dinner, we were all challenged to produce an appropriate poem. My limerick ran as follows:
An itinerant Selfish Gene
Said ‘Bodies a-plenty I’ve seen.
You think you’re so clever
But I’ll live for ever:
You’re just a survival machine.’
And I raided Rudyard Kipling for the body’s reply:
What is a body that first you take her,
Grow her up and then forsake her,
To go with the old Blind Watchmaker.
I have emphasised the immortality of the gene in the form of copies. But how big is the unit that enjoys such immortality? Not the whole chromosome: it is far from immortal. With minor exceptions such as the Y-chromosome, our chromosomes don’t march intact down the centuries. They are sundered in every generation by the process of crossing over. For the purposes of this argument, the length of chromosome that should be considered significant in the long run depends upon how many generations it is allowed, by crossing over, to remain intact, when measured against the relevant selection pressures. I expressed this only slightly facetiously in my first book, The Selfish Gene, by saying that the title strictly should have been The slightly selfish big bit of chromosome and the even more selfish little bit of chromosome. A small fragment of chromosome, such as a gene responsible for programming one protein chain, can last 10,000 years. In the form of copies. But only fragments that are successful in negotiating the obstacle course that is natural selection actually do that. It’s arguable that a better book title would have been The Immortal Gene, and I have adopted it as the title of this chapter. As we shall see in Chapter 12, it is no paradox that The Cooperative Gene would also have been appropriate.
How does a gene earn ‘immortality’? In the form of copies, it influences a long succession of bodies so that they survive and reproduce, thereby handing the successful gene on to the next generation and potentially the distant future. Unsuccessful genes tend to disappear from the population, because the bodies they successively inhabit fail to survive into the next generation, fail to reproduce. Successful genes are those with a statistical tendency to inhabit bodies that are good at surviving and reproducing. And they enjoy that statistical tendency, positive or negative, by virtue of the causal influence they exert over bodies. So, we have arrived at the reason why it was profoundly wrong to say that genes are not active causes. Active causes is precisely and indispensably what they must be. If they were not, there could be no natural selection and no adaptive evolution.
‘Cause’ has a testable meaning. How do we ever identify a causal agent in practice? We do it by experimental intervention. Experimental intervention is necessary, because correlation does not imply causation. We remove, or otherwise manipulate, the putative cause, and we strictly must do so at random, a large number of times. Then we look to see whether there tends to be a statistically significant change in the putative effect. To take an absurd example, suppose we notice that the church clock in the village of Runton Acorn reliably chimes immediately after that of Runton Parva. If we’re very naive, we jump to the conclusion that the earlier chiming causes the later. But of course it’s not good enough to observe a correlation. The only way to demonstrate causation is to climb up the church tower in Runton Parva and manipulate the clock. Ideally, we force it to chime at random moments, and we repeat the experiment many times. If the correlation with the Runton Acorn chiming is maintained, we have demonstrated a causal link. The important point is that causation is demonstrated only if we manipulate the putative cause, repeatedly and at random. Of course, nobody would be silly enough to actually do this particular experiment with the church clocks. The result is too obvious. I use it only to clarify the meaning of ‘cause’.
Now back to Denis Noble’s statement that ‘Genes are used. They are not active causes.’ By our ‘church clock’ definition, genes most definitely are active causes because, if a gene mutates (a random change), we consistently observe a change in the body of the next generation – and subsequent generations for the indefinite future. Mutation is equivalent to climbing the Runton Parva tower and changing the clock. By contrast, if there is a non-genetic change in the body (a scar, a lost leg, circumcision, an exaggeratedly muscular arm due to exercise, a suntan, acquired fluency in Esperanto or virtuosity on the bassoon), we do not observe the same thing in the next generation. There is no causal link.
Genetic information, then, is potentially immortal, is causal, and there’s a telling difference between potentially immortal genes that succeed in being actually immortal and potentially immortal genes that fail. The reason some succeed and others fail is precisely that they have a causal influence, albeit a statistical one, on the survival and reproductive prospects of the many bodies that they inhabit, through successive generations and across many bodies through populations. It’s important to stress ‘statistical’. One copy of a good gene may fail to survive to the next generation because the body it inhabits is struck by lightning or otherwise suffers bad luck. More relevantly, one copy of a good gene may happen to find itself sharing a body with bad genes, and is dragged down with them. Statistics enter in because sexual recombination sees to it that good genes don’t consistently share bodies with bad genes. If a gene is consistently found in bodies that are bad at surviving, we draw the statistical conclusion that it is a bad gene. After 10,000 years of recombining, shuffling, recombining again, a gene that remains in the gene pool is a gene that is good at building bodies: in collaboration with the other genes that it tends to share bodies with, and that means the other genes in the gene pool of the species (you may remember from Chapter 1 that the species can be seen as an averaging computer).
In The Selfish Gene, I used the image of the Oxford vs Cambridge Boat Race, the parable of the rowers. Eight oarsmen and a cox all have their part to play, and the success of the whole boat depends upon their cooperation. They must not only be strong rowers, they must be good cooperators, good at melding with the rest of the crew. The rowers, of course, represent genes, and they are arrayed along the length of the boat, as genes are arrayed along a chromosome. It’s hard to separate the roles of the individual oarsmen, so intimate is their cooperation, and so vital is cooperative pulling together for the success of the whole boat. The coach swaps individual rowers in and out of his trial crews. Although it’s hard to judge individual performance by watching them, he notices that certain individuals consistently seem to be members of the fastest trial crews. Other individuals consistently are seen to be members of slower crews. Although single individuals never row on their own, in the long run the best rowers show their mettle in the performance of the successive boats in which they sit.
Natural selection sorts out the good genes from the bad, precisely because of the causal influence of genes on bodies. The practical details vary from species to species. Genes that make for good swimmers are ‘good genes’ in a dolphin gene pool but not in a mole gene pool. Genes that make for good diggers are ‘good genes’ in a mole, wombat, or aardvark gene pool but not in a dolphin or salmon gene pool. Genes for expert climbing flourish in a monkey, squirrel, or chameleon gene pool but not in a swordfish, rhinoceros, or earthworm gene pool. Genes for aerodynamic proficiency flourish in a swallow or bat gene pool though not in a hippo or alligator gene pool.
But however varied the details of ‘good’ and ‘bad’ may be from species to species, the central point remains. Depending on their causal influence on bodies, genes either survive or don’t survive to the next generation, and the next, and the next … ad infinitum. Let me put it more forcefully: any Darwinian process, anywhere in the universe – and I’m pretty sure if there’s life elsewhere in the universe it will be Darwinian life – any Darwinian process depends on trans-generational replicated information, and that information must have a causal influence on its probability of being replicated from one generation to the next. It happens that on our planet the replicated information, the causal agent in the Darwinian process, is DNA. It is wrong, utterly, blindingly, flat-footedly, downright wrong, to deny its fundamental role as a cause in the evolutionary process.
Have I labored the point excessively? Would that it were excessive, but unfortunately there is reason to think that views such as those I have criticized here have been widely influential. Stephen Jay Gould (whose errors were consistently masked by the graceful eloquence with which he expressed them) went so far as to reduce the role of genes in evolution to mere ‘bookkeeping’. The metaphor of the bookkeeper has a dramatic appeal so seductive that it evidently seduced Gould himself. But it’s as wide of the mark as it is possible to be. It is the bookkeeper’s role to keep a passive record of transactions after they happen. When the bookkeeper makes an entry in his ledger, the entry does not cause a subsequent monetary transaction. It is the other way around.
I hope the preceding pages have convinced you that ‘bookkeeping’ is worse than a hollow travesty of the central causal role that genes play in evolution. It is the exact opposite of the truth, a metaphor as deeply wrong as it is superficially persuasive. Gould was also a proponent of ‘multi-level selection’, and this is another respect in which he is seen as an opponent of the gene’s-eye view of evolution (see, for instance, the philosopher Kim Sterelny’s perceptive book Dawkins Versus Gould: Survival of the Fittest). Gould, and others, insisted that natural selection occurs at many levels in the hierarchy of life: species, group, individual, gene. The first thing to say about this is that although there is a persuasive hierarchy, a real ladder, the gene doesn’t belong on it. Far from being the bottom rung of a ladder, far from being on the ladder at all, the gene is set off to one side. Precisely because of its privileged role as a causal agent in evolution. The gene is a replicator. All other rungs in the ladder are vehicles, a term that I shall explain later in this chapter.
As for higher levels of selection, there is, to be sure, a sense in which some species survive at the expense of others. This can look a bit like natural selection at the species level. The native red squirrel in Britain is steadily going extinct as a direct result of the lamentable whim of the 11th Duke of Bedford in the nineteenth century to introduce American grey squirrels. The greys out-compete the smaller reds, and also infect them with squirrel pox, to which they themselves have evolved resistance over many generations in America. Ecological replacement of a species by a competitor species looks superficially like natural selection. But the resemblance is empty and misleading. This kind of ‘selection’ does not foster evolutionary adaptation. It’s not natural selection in the Darwinian sense. You would not say that any aspect of the grey squirrel’s body or behaviour was a device to drive red squirrels extinct, whereas you might happily talk about the Darwinian function of its bushy tail, meaning those aspects of the tail that assisted ancestral squirrels to out-compete rival squirrel individuals of the same species, with a slightly different tail.
In 1988, I published a paper called ‘The Evolution of Evolvability’. This is the closest I have come to supporting something like ‘multi-level selection’. My thesis was that certain body plans, for example the segmented body plans of arthropods, annelids, and vertebrates, are more ‘evolvable’ than others. I quote from that paper:
I suspect that the first segmented animal was not a dramatically successful individual. It was a freak, with a double (or multiple) body where its parents had a single body. Its parents’ single body plan was at least fairly well‑adapted to the species’ way of life, otherwise they would not have been parents. It is not, on the face of it, likely that a double body would have been better adapted … What is important about the first segmented animal is that its descendant lineages were champion evolvers. They radiated, speciated, gave rise to whole new phyla. Whether or not segmentation was a beneficial adaptation during the individual lifetime of the first segmented animal, segmentation represented a change in embryology that was pregnant with evolutionary potential.
I envisioned that my concept of ‘evolvability’ should be regarded as a property of embryology. Thus, a segmented embryology has high evolvability potential, meaning an embryology that lends itself to rich evolutionary divergence. The world tends to become populated by clades with high evolvability potential. A clade is a branch of the tree of life, meaning a group plus its shared ancestor. ‘Birds’ constitutes a clade, for all birds have a single common ancestor not shared by any non-birds. ‘Fish’ is not a clade, because the common ancestor of all fish is shared by all terrestrial vertebrates including us, who are not fish. ‘Mammals’ is a clade, but only if you include so-called ‘mammal-like reptiles’. It would be unhelpful and confusing to call the evolution of evolvability group selection. ‘Clade selection’, a coining of George C Williams, fits the bill.
What other criticisms of the gene’s-eye view should we consider? Many would-be critics have pointed out that there is no simple one-to-one mapping between a gene and a ‘bit’ of body. Though true, that’s not a valid criticism at all, but I need to explain it because some people think it is. You know those gruesome butchers’ diagrams, where a map of a cow’s body is defaced by lines representing named ‘cuts’ of meat: ‘rump’, ‘brisket’, ‘sirloin’, etc? Well, you can’t draw a map like that for domains of genes. There’s no ‘border’ you can draw on the body, marking where the ‘territory’ of one gene ends and that of the next one begins. Genes don’t map onto bits of body; they map onto timed embryological processes. Genes influence embryonic development, and a change in a gene (mutation) maps onto a change in a body. When geneticists notice a gene’s effects, all they are really seeing is a difference between individuals that have one version (‘allele’) of the gene and individuals that don’t. The units of phenotype that geneticists count, or trace through pedigrees, traits such as the Hapsburg jaw, albinism, haemophilia, or the ability to smell freesias, loop the tongue, or disperse the froth on contact with beer, are all identified as differences between individuals. For, of course, countless genes are involved in the development of any jaw, Hapsburg or not; any tongue, loopy or not. The Hapsburg jaw gene is no more than a gene for a difference between some individuals and other individuals. Such is the true meaning whenever anyone talks of a gene ‘for’ anything. Genes are ‘for’ individual differences. And, just as the eyes of a geneticist are focused on individual differences in phenotype, so also, precisely and acutely, are the eyes of natural selection: differences between those who have what it takes to survive and those who don’t.
As for the all-important interactions between genes in influencing phenotype, here’s a better metaphor than the butcher’s map. A large sheet hangs from the ceiling, suspended from hooks by hundreds of strings attached to different places all over the sheet. It may help the analogy to consider the strings as elastic. The strings don’t hang vertically and independently. Instead, they can run diagonally or in any direction, and they interfere with other strings by cross-links rather than necessarily going straight to the sheet itself. The sheet takes on a bumpy shape, because of the interacting tensions in the tangled cat’s-cradle of hundreds of strings. As you’ve guessed, the shape of the sheet represents the phenotype, the body of the animal. The genes are represented by the tensions in the strings at the hooks in the ceiling. A mutation is either a tug towards the hook or a release, perhaps even a severing of the string at the hook. And, of course, the point of the parable is that a mutation at any one hook affects the whole balance of tensions across the tangle of strings. Alter the tension at any one hook, and the shape of the whole sheet shifts. In keeping with the sheet model, many, if not most, genes have ‘pleiotropic’ (multiple) effects, as defined in Chapter 4.
A balance of tensions
For practical reasons, geneticists like to study the minority of genes that do have definable, seemingly singular effects, like Gregor Mendel’s smooth or wrinkled peas, for example. But even such ‘major genes’ often have a surprisingly miscellaneous collection of other pleiotropic effects, sprinkled seemingly at haphazard around the body. And it’s not surprising that this should be so: genes exert their effects at many stages of embryonic development. It’s only to be expected, therefore, that they’ll have pleiotropic consequences even at opposite ends of the body. A change in tension at one hook leads to a comprehensive shapeshift, all over the whole sheet.
There’s no one-to-one mapping, then, from single gene to single ‘bit’ of body. We have no butcher’s map here. But not by a jot or even a tittle does this fact threaten the gene’s-eye view of evolution. However pleiotropic, however complicated and interactive the effects of a gene may be, you can still add them all up to derive a net positive or net negative effect of a change (mutation) in its influence on the body: a net effect on its chances of surviving into the next generation. Such causal influences on a gene’s own survival in the gene pool come unscathed through the complications, notwithstanding numerous interactions with other genes – the other genes with which it jointly affects the tensions in all the strings. When the gene in question mutates, the whole shape of the sheet may shift, with perhaps lots of pleiotropic changes all over the body. But the net effect of all these changes, in different parts of the body, and in interaction with many other genes, must be either positive or negative (or neutral) with respect to survival and reproduction. That is natural selection.
The tension in the genetic strings is affected too by environmental influences. See these as yet more strings tugging from the side, rather than from hooks in the ceiling. The developing animal is, of course, influenced by the environment as well as by the genes, always in interaction with the genes. But again, this doesn’t matter one iota to the gene’s-eye view of evolution. To the extent that, under available environmental conditions, a change in a gene causes a change in that gene’s chances of making it through the generations (either positive or negative), natural selection will occur. And natural selection is what the gene’s-eye view is all about.
So much for that criticism of the gene’s-eye view. What else do we have? Granted that genes are active causes in evolution, it is the whole individual body that we observe behaving as an active agent. This fact, too, is often wrongly seen as a weakness of the gene’s-eye view. Yes, of course, it is the whole animal who possesses executive instruments with which to interact with the world – legs, hands, sense organs. It’s the whole animal who restlessly searches for food, trying first this avenue of hope, then switching to another, showing all the symptoms of questing appetite until consummation is reached. It is the individual animal who shows fear of predators, looks vigilantly up and around, jumps when startled, runs in evident terror when pursued. It is the individual animal who behaves as a unitary agent when courting the opposite sex. It is the individual animal who skilfully builds a nest, and works herself almost to death caring for her young.
The animal, the individual animal, the whole animal, is indeed an agent, striving towards a purpose, or set of purposes. Sometimes the purpose seems to be individual survival. Often it is reproduction and the survival of the individual’s children. Sometimes, especially in the social insects, it is the survival and reproduction of relatives other than children – sisters and nieces, nephews and brothers. My late colleague WD Hamilton (he of the palimpsest postcard in Chapter 1) formulated the general definition of the exact mathematical quantity that an individual under natural selection is expected to maximise as it engages in its purposeful striving. It includes individual survival. It includes reproduction. But it includes more, because genes are shared with collateral relatives, and gene survival can therefore be fostered by enabling the survival and reproduction of a sister or a nephew. He gave a name to the exact quantity that an individual organism should strive to maximise: ‘inclusive fitness’. He condensed his difficult mathematics into a long and rather complicated verbal definition:
Inclusive fitness may be imagined as the personal fitness which an individual actually expresses in its production of adult offspring as it becomes after it has been first stripped and then augmented in certain ways. It is stripped of all components which can be considered as due to the individual’s social environment, leaving the fitness which he would express if not exposed to any of the harms or benefits of that environment. This quantity is then augmented by certain fractions of the quantities of harm and benefit which the individual himself causes to the fitnesses of his neighbours. The fractions in question are simply the coefficients of relationship appropriate to the neighbours whom he affects: unity for clonal individuals, one-half for sibs, one-quarter for half sibs, one-eighth for cousins … and finally zero for all neighbours whose relationship can be considered negligibly small.
Pretty convoluted? A bit hard to read? Well, it has to be convoluted because inclusive fitness is a hard idea. It’s necessarily convoluted in my view because looking at it from the individual’s point of view is an unnecessarily convoluted way of thinking about Darwinism. It all becomes blessedly simple if you dispense with the individual organism altogether and go straight to the level of the gene. Bill Hamilton himself did this in practice. In one of his papers, he wrote:
let us try to make the argument more vivid by attributing to the genes, temporarily, intelligence and a certain freedom of choice. Imagine that a gene is considering the problem of increasing the numbers of its replicas, and imagine that it can choose between causing purely self-interested behaviour by its bearer … and causing ‘disinterested’ behaviour that benefits in some way a relative.
See how clear and easy to follow that is, compared to the previous quotation on inclusive fitness. The difference is that the clear passage adopts the gene’s-eye view of natural selection. The difficult passage is what you get when you re-express the same idea from the point of view of the individual organism. Hamilton gave his blessing to my half-humorous informal definition: ‘Inclusive fitness is that quantity that an individual will appear to be maximising, when what is really being maximised is gene survival.’
Role Maximises
Gene Replicator Survival
Organism Vehicle Inclusive fitness
Bill Hamilton
The individual organism, in my terminology, is a ‘vehicle’ for survival of copies of the ‘replicators’ that ride inside it. The philosopher David Hull got the point after an extensive correspondence with my then student Mark Ridley, but he substituted the word ‘interactor’ for my ‘vehicle’. I never quite understood why. Depending on your preference you can see either the vehicle or the replicator as the agent that maximises some quantity. If it’s the vehicle, then the quantity maximised is inclusive fitness, and rather complicated. But equivalently, if it’s the replicator, the quantity maximised is simple: survival. I don’t want to downplay the importance of vehicles as units of action. It is the individual organism who possesses a brain to take decisions, based on information supplied by senses, and executed by muscles. The organism (‘vehicle’) is the unit of action. But the gene (‘replicator’) is the unit that survives. On the gene’s-eye view, the very existence of vehicles should not be taken for granted but needs explaining in its own right. I essayed a kind of explanation in ‘Rediscovering the Organism’, the final chapter of The Extended Phenotype.
Replicators (on our planet, stretches of DNA) and vehicles (on our planet, individual bodies) are equally important entities, equally important but they play different, complementary roles. Replicators may once have floated free in the sea but, to quote The Selfish Gene, ‘they gave up that cavalier freedom long ago. Now they swarm in huge colonies, safe inside gigantic lumbering robots’ (individual bodies, vehicles). The gene’s-eye view of evolution does not play down the role of the individual body. It just insists that that role (‘vehicle’) is a different kind of role from that of the gene (‘replicator’).
Successful genes, then, survive in bodies down the generations, and they cause (in a statistical sense) their own survival by their ‘phenotypic’ effects on the bodies that they inhabit. But I went on to amplify the gene’s-eye view by introducing the notion of the extended phenotype. For the causal arrow doesn’t stop at the body wall. Any causal effect on the world at large – any causal effect that can be attributed to the presence of a gene as opposed to its absence, and that influences the gene’s chances of survival, may be regarded as a phenotypic effect, of Darwinian significance. It has only to exert some kind of statistical influence on the chances, positive or negative, on that gene’s surviving in the gene pool. I must now revisit the extended phenotype, for it is, to me, an important part of the gene’s-eye view of evolution.
Alternative titles for The Selfish Gene, all true to its content
9 Out Beyond the Body Wall
Imagine the furore if Jane Goodall reported seeing chimpanzees building an amazing stone tower in a forest clearing. They meticulously select stones of the correct shape for the purpose, rotating each one until it snugly fits neighbouring stones. Then the chimps cement it securely in place before picking out another stone. They evidently like to use two radically different sizes of stones, small ones to build the walls themselves, and much larger ones to provide outer fortification and structural strength, the all-important supporting walls. The discovery would be a sensation, headline news, the subject of breathless BBC discussions. Philosophers would jump on it, there’d be passionate debates about personhood, moral rights, and other topics of philosophical moment. The tower is ill-suited to housing its builders. If not functional, then, is it some kind of monument? Does it have ritual or ceremonial significance like Stonehenge? Does the tower show that religion is older than mankind? Does it threaten the uniqueness of man?
The edifice pictured is a real animal construction, but not one built by chimpanzees; the reality is much smaller, and it doesn’t stand up like a monument but lies flat on the bottom of a stream. It is the house of a little insect, the larva of a caddis fly, Silo pallipes. Caddis adults fly in search of mates and live only a few weeks, but their larvae grow for up to two years under water, living in mobile homes that they build for themselves out of materials gathered from their surroundings, cementing them with silk that they secrete from glands in the head. In the case of Silo pallipes (see top left of picture) the building material is local stone. Its astonishing building skills were unravelled by Michael Hansell, now our leading expert on animal architecture in general.
These larvae are master masons. Just look at the delicate placing of the small stones between the carefully chosen large ones buttressing the sides. Hansell showed how they select stones, choosing by size and shape but not by weight. Ingenious experiments in whichhe removed parts of the house showed how the larvae fit appropriate stones in the gaps, and cement them in place. Just as impressive is the log house at top right of the picture. This was built not by a caddis larva but by a caterpillar, a so-called bagworm. Caddises in water and bagworms on land have converged independently on the habit of building houses from materials that they gather from their surroundings. The picture shows a selection of caddis and bagworm houses.
If only chimps had the skills of a caddis larva…
The word ‘phenotype’ is used for the bodily manifestation of genes. The legs and antennae, eyes and intestines are all parts of the caddis’s phenotype. The gene’s-eye view of evolution regards the phenotypic expression of a gene as a tool by which the gene levers itself into the next generation – and, by implication, an indefinite number of future generations. What this chapter adds is the notion of the extended phenotype. Just as the shell of a snail is part of its phenotype, its shape, size, thickness, etc. being affected by snail genes, so the shape, size, etc. of a stone caddis house or twiggy bagworm cocoon are all manifestations of genes. Because these phenotypes are not part of the animal’s own body, I refer to them as extended phenotypes.
These elegant constructions must be the products of Darwinian evolution, no less than the armoured body wall of a lobster, a tortoise, or an armadillo. And no less than your nose or big toe. This means they have been put together by the natural selection of genes. Such is the Darwinian justification for speaking of extended phenotypes. There must be genes ‘for’ the various details of caddis and bagworm houses. This means only that there must be, or have been, genes in the insects’ cells, variants of which cause variation in the shape or nature of houses. To conclude this, we need assume only that these houses evolved by Darwinian natural selection, an assumption that no serious biologist would dispute, given their elegant fitness for purpose. The same is true of the nests of potter wasps, mud dauber wasps, and ovenbirds. Built of mud rather than living cells, they are extended phenotypes of genes in the bodies of the builders.
While their grasshopper cousins sing with serrated legs, male crickets sing with their wings, scraping the top of one front wing against a rough ‘file’ on the underside of the other front wing. Among their songs, the ‘calling song’ is loud enough to attract females within a certain radius, and to deter rival males. But what if it could be amplified, widening the catchment area for pulling females? Some kind of megaphone, perhaps? We use a megaphone as a simple directional amplifier, which works by ‘impedance matching’. No need to go into what that means, except to say that, unlike an electronic amplifier, it adds no extra energy. Instead, it concentrates the available energy in a particular direction. Could a cricket grow a megaphone out of its horny cuticle – a phenotype in the conventional sense? Like the remarkable backwards-facing trombone of the dinosaur Parasaurolophus, which probably served as a resonator for its bellowings. Crickets could have evolved something like that. But an easier material was to hand, and mole crickets exploited it.
CADDIS BAGWORM
EXTENDED PHENOTYPES BUILT OF MUD
Potter wasp
Mud dauber
Ovenbird
Mole crickets, as their name suggests, are digging specialists. Their front legs are modified to form stout spades, strongly resembling those of moles, albeit on a smaller scale. The similarity, of course, is convergent. Some species of mole crickets are so deeply committed to underground life that they cannot fly at all. Given that a mole cricket could benefit from a megaphone, and given that it digs a burrow, what more natural than to shape the burrow as a megaphone? In the case of Gryllotalpa vineae it is a double megaphone, like an old-fashioned clockwork gramophone with two horns. Henry Bennet-Clark showed that the double horn concentrates the sound into a disc section rather than letting it dissipate in all directions as a hemisphere. Bennet-Clark was able to hear a single Gryllotalpa vineae (a species he discovered himself) from 600 metres away. The range of no ordinary cricket comes close.
Parasaurolophus
Assuming it’s as beautifully functional as it seems to be, the mole cricket’s megaphone must have evolved by natural selection, as a step-by-step improvement, in just the same way as the digging hand or as any part of the cricket’s own body. Therefore, there must be genes controlling horn shape, just as there are genes controlling wing shape or antenna shape. And just as there are genes controlling the patterning of cricket song itself. If there were no genes for horn shape, there would be nothing for natural selection to choose. Once again, remember that a gene ‘for’ anything is only ever a gene whose alternative alleles encode a difference between individuals.
Mole cricket Mole
Mole cricket with double megaphone burrow
Now, when contemplating the double megaphone (or, for that matter, the houses of caddises and bagworms) you might be tempted to say something along the following lines. Cricket burrows are not like wings or antennae. They are the product of cricket behaviour, whereas wings and antennae are anatomical structures. We are accustomed to the idea of anatomical structures being under the control of genes. Can the same be said of behaviour, of cricket digging behaviour, or the sophisticated stonemasonry behaviour of a caddis larva? Yes, of course it can. And there is nothing to stop it being said of artifacts that are produced by the behaviour. The artifacts are just one further step in the causal chain from gene to protein to … a long cascade of processes in the embryo, culminating in the adult body.
There are numerous studies of the genetics of behaviour, including, as it happens, the genetics of cricket song. I want to discuss this work because, weirdly, behaviour genetics arouses a scepticism never suffered by anatomical genetics. Cricket song (though not specifically mole cricket song) has been the subject of penetrating genetic research by David Bentley, Ronald Hoy, and their colleagues in America. They studied two species of field cricket, Teleogryllus commodus from Australia and Teleogryllus oceanicus, also Australian but found in Pacific islands too. Adult crickets who have been brought up in isolation from other crickets sing normally. Nymphs who have not yet undergone their final moult to adulthood never sing, but in the laboratory their thoracic ganglia can be induced to emit nerve impulses with a time-pattern identical to the species song pattern. These facts strongly suggest that the instructions for how to sing the species song are coded in the genes. And those genes must be relevantly different in the two species, for their song patterns are different. This is beautifully confirmed by hybridisation experiments.
In nature these two Teleogryllus species don’t interbreed, but they can be induced to do so in the laboratory. The diagram, from Bentley and Hoy, shows the songs of the two species and of various hybrids between them. All cricket songs are made up of pulses separated by pauses. T.oceanicus (A in the picture) has a ‘chirp’ consisting of about five pulses followed by a series of about ten ‘trills’, each trill always made up of two pulses, closer to each other than the pulses of the chirp. We hear a rhythmic repetition pattern of trills. To my ears the trills sound slightly quieter than the chirps. After about ten of these double-pulse trills there’s another chirp. And the cycle repeats rhythmically, over and over again indefinitely. T.commodus (F) has a similar pattern of alternating chirps and trills. But instead of a series of ten or so double-pulse trills, there is only one long trill or perhaps two, between chirps.
Songs of pure bred and hybrid crickets
Now to the interesting question: what about the hybrids? Hybrid songs (C and D) are intermediate between those of the two parent species (A and F). It makes a difference which species is the male (compare C with D), but we needn’t go into that here, interesting though it is for what it might tell us about sex chromosomes. In any case, hybrid song is a beautiful confirmation of genetic control of a behaviour pattern. Further evidence (B and E) comes from crossing hybrids with each of the two wild species (what geneticists call a backcross). If you compare all five songs, you’ll note a satisfying generalisation: hybrid songs resemble the two wild species’ songs in proportion to the number of genes the hybrid individual has inherited from each species. The more oceanicus genes an individual has, the more its song resembles wild oceanicus rather than commodus. And vice versa. As your eyes move down the page from oceanicus towards commodus, the more you detect resemblance to commodus song. This suggests that several genes of small effect (‘polygenes’) sum their effects. And what is not in doubt is that the species-specific song patterns that distinguish these two species of crickets are coded in the genes: a nice example of how behaviour is just as subject to genetic control as anatomical structures are. Why on earth shouldn’t it be? The logic of gene causation is identical for both. Both are products of a chain of causation, with the behaviour having one more link in the chain.
You could do a similar study of the genetics of megaphone-building behaviour. But you might as well go to the next step in the causal chain, the megaphone itself. Do a genetic study of differences between megaphones. They are extended phenotypes of mole cricket genes. This has not been done, but nothing prevents it. Again, nobody has studied the genetics of caddis houses, but there’s no reason why they shouldn’t, although there might be practical difficulties in breeding them in the lab. Michael Hansell was once giving a talk at Oxford, on the building behaviour of caddis larvae. In passing, he was lamenting his failed attempts to breed caddises in the lab, for he wished he could study their genetics. At this, the Professor of Entomology growled from the front row: ‘Haven’t you trrrried cutting their heads off?’ It seems that the insect brain exercises inhibitory influences such that beheading can be expected to have a releasing effect.
If you were to succeed in breeding caddises in captivity, you could systematically select changes in caddis houses over generations. Or you could artificially select for mole cricket megaphone size or shape, generation by generation, breeding from those individuals whose horns happen to be wider, or deeper, or of a different shape. You could breed giant megaphones, just as you might breed giant antennae or mandibles.
That would be artificial selection, but something like it must have happened through natural selection. Whether by artificial or natural selection, the evolution of larger megaphones could come about only by differential survival of genes for megaphone size. For the megaphone to have evolved in the first place as a Darwinian adaptation, there had to be genes for megaphone shape. The notion of the extended phenotype is a necessary part of the gene’s-eye view of evolution. The extended phenotype should be an uncontroversial addition to Darwinian theory.
But aren’t those ‘genes for megaphone shape’ really genes for altered digging behaviour, which is part of the ‘ordinary’ phenotype of the cricket? Aren’t genes for caddis house shape ‘really’ genes for building behaviour, that is to say, ‘ordinary’ phenotypic manifestations within the body? Why talk about ‘extended’ phenotypes outside the body at all? Well, you could equally well say that the genes for altered digging behaviour are ‘really’ genes for changed wiring in the ganglia in the thorax. And genes for changes in the thoracic ganglia are, in turn, ‘really’ genes for changes in cell-to-cell interactions in embryonic development. And they, in turn, are ‘really’ … and so on back until we hit the ultimate ‘really’. Genes are really really really only genes for changed proteins, assembled according to the rules for translating the sixty-four possible DNA triplet codons into twenty amino acids plus a punctuation mark. I repeat, because it is important, we have here a chain of causation whose first steps (DNA codons choosing amino acids) are knowable, whose final step (megaphone shape) is observable and measurable, and whose intermediate steps are buried in the details of embryology and nerve connections – perhaps inscrutable but necessarily there. The point is that any one of those many intermediate steps in the chain of causation could be regarded as ‘phenotype’, and could be the target of selection, artificial or natural. There is no logical reason to stop the chain at the animal’s body wall. Megaphone is ‘phenotype’, every bit as much as nerve-wiring is phenotype. Every one of those steps, both in the cricket’s body and extended outside it, can be regarded as caused by gene differences. Just the same is true of the chain of causation leading from genes to caddis house, even though the behavioural step, the actual building itself, involves sophisticated trial and error in the selection of suitable stones and rotating them into position to fit the existing structure. And now to advance the argument a stage further. The extended phenotype of a gene can reach into the body of a different individual.
Natural selection doesn’t see genes for digging behaviour directly, nor does it see neuron circuitry directly, nor indeed megaphone shape directly. It sees, or rather hears, song loudness. Gene selection is what ultimately matters, but song loudness is the proxy by which gene selection is mediated, via a long series of intermediates. But even song loudness is not the end of the causal chain. As far as natural selection is concerned, song loudness only matters insofar as it attracts females (and deters males, but let’s not complicate the argument). The causal chain extends to a radius where it exerts an influence on a female cricket. This has to mean that a change in female behaviour is part of the extended phenotype of genes in a male cricket. Therefore, the extended phenotype of a gene can reside in another individual. The general point I am aiming towards is that the phenotypic expression of a gene can extend even to living bodies other than the body in which the genes sit. Just as we can talk of a gene ‘for’ a Hapsburg lip, or a gene ‘for’ blue eyes, so it is entirely proper to talk of a gene (in a male cricket) ‘for’ a change in another individual’s behaviour (in this case a female cricket).
We saw in Chapter 7 that song in male canaries and ring doves has a dramatic effect on female ovaries. They swell hugely, with a corresponding rush of hormones and all that it entails. The consequent changes in female behaviour and physiology are in truth phenotypic expression of male genes. Extended phenotypic expression. You may deny it only if you deny Darwinian selection itself.
Ears are not the only portals into a female dove’s brain through which a male’s genes might exert an extended phenotypic influence. Male birds of many species glow with conspicuous colours. These cannot be good for individual survival, but they are still good for the survival of the genes that fashioned them. They achieve this good by assisting individual reproduction at the expense of individual survival. With few exceptions, it is males that sacrifice their personal longevity on the altar of gene survival, through sexually attractive coloration. In those species such as pheasants or birds of paradise, where males dazzle, females are usually drabber in colour, often well camouflaged. Bright coloration in males is favoured, either through attracting females or through besting rival males. In both cases, the naturally selected genes for bright coloration have extended phenotypic expression in the changed behaviour of other individuals. I don’t know whether exposure to a male peacock fan causes peahen ovaries to change, as male dove bow-cooing song does to female dove ovaries. It wouldn’t surprise me. I’d even be surprised if it didn’t.
Unfortunately, predators tend to have eyes like the eyes of the females whom the male is seeking to impress. What is conspicuous to one will probably be conspicuous to all. It’s worth it to the male, or rather to the genes that coloured him. Even if his finery costs him his life, it can already have paid its way in previous success with females. But is there some way a male bird could manipulate females via their eyes without calling attention to himself? Could he shed his dangerously conspicuous personal phenotype, offloading it to an extended phenotype at a safe distance from his own body? ‘Shed’ and ‘offload’, of course, must be understood over evolutionary time. We aren’t talking about shedding feathers in an annual moult, although that happens too – perhaps for the same reason. Black-headed gulls, for instance, shed their conspicuously contrasting face masks as soon as the breeding season is over.
Bower birds are a family of birds inhabiting the forests of New Guinea and Australia. Their name comes from a remarkable and unique habit. They build ‘bowers’ to seduce females. The skills needed to build a bower could be seen as a distant derivative of nest-building skills, and perhaps ultimately derived from them. But the bower is emphatically not a nest. No eggs are laid in it, no chicks reared there. Female bower birds build nests to house eggs as other birds do, and their nests don’t resemble male bowers.
The bower’s sole purpose is to attract females, and males take enormous pains in their creation. First, they clear stray leaves and other debris from the arena in which the bower is to be built. Then the bower itself is assembled from twigs and grass. The details vary from species to species. Some resemble a Robinson Crusoe hat, some a grand archway, others a tower. The final stage of bower design is, I think, the most remarkable of all. The ground in front of and under the bower is colourfully and – I can’t resist saying – tastefully decorated. The male gathers decorative objects – coloured berries, flowers, even bottle tops. Movies of male bower birds at work irresistibly remind me of an artist putting the finishing touches to a canvas, standing back, head cocked judgmentally, then darting forward to make a delicate adjustment, standing back again and surveying the effect with head on one side before darting forward again. That is what emboldened me to use a word like ‘tastefully’. It is hard to resist the impression that the bird is exercising his aesthetic judgement in perfecting a work of art. Even if the decorated bower is not to every human’s taste, or even every female bower bird’s, the ‘touching up’ behaviour of the male almost forces the conclusion that the male has taste of his own, and he is adjusting his bower to meet it.
Remember the discussion in Chapter 7, where I suggested that when male songbirds learn to sing, they are exercising their own aesthetic judgement? The evidence shows, you’ll remember, that young birds burble at random, choosing, by reference to a template, which random fragments to incorporate into their mature song. The male, I argued, has a similar brain to a female of his own species. Not surprisingly, therefore, whatever appeals to him can be expected to appeal to her. The development of song in the young bird could be regarded as a work of creative composition in which the male adopts the principle of ‘whatever turns me on will probably appeal to a female too’. I see no reason to refrain from a similar aesthetic interpretation of bower-building. ‘I like the look of a heap of blue berries just there. So there’s a good chance that a female of my own species will like it too … And perhaps a single red flower over there … or, no, it looks better here … and better still, slightly to the left, and why not set it off with some red berries?’ Of course, I am not literally suggesting that he thinks it through in so many words.
Species differ as to their preferred decoration colours, as well as the shape of their bowers. The satin bower bird (here) goes for blue, a fact that may be connected with the blue-black sheen of his plumage, or the species’ brilliant blue eyes. The male satin bower bird who built this bower has discovered blue drinking straws and bottle tops, and laid out a rich feast of blue to delight the female eye. More soberly, the Great Bower Bird, Chlamydera nuchalis, says it with shells and pebbles (opposite).
The bower is an extended phenotype of genes in the body of the male bower bird. An external phenotype, which presumably has the advantage that its extravagance is not worn on the body and therefore will not call predators’ attention to the male himself. I do not know whether exposure to a more than usually magnificent bower stimulates a hormone surge in the blood of a female, but again the research on ring doves and canaries would lead me to expect this.
We are accustomed to thinking of genes as being physically close to their phenotypic targets. Extended phenotypes can be large, and far distant from the genes that cause them. The lake flooded by a beaver’s dam is an extended phenotype of beaver genes, extended in some cases over acres. The songs of gibbons can be heard a kilometre away in the forest, howler monkeys as much as five kilometres: true genetic ‘action at a distance’. These vocalisations have been favoured by natural selection because of their extended phenotypic effect on other individuals. Chemical signals can achieve a great range among moths. Visual signals require an uninterrupted line of sight, but the principle of genetic action at a distance remains. The gene’s-eye view of evolution necessarily incorporates the idea of the extended phenotype. Natural selection favours genes for their phenotypic effects, whether or not those phenotypic effects are confined to the body of the individual whose cells contain the genes.
In 2002, Kim Sterelny, editor of the journal Biology and Philosophy, marked the twentieth anniversary of the publication of The Extended Phenotype by commissioning three critical appraisals, plus a reply from me. The special issue of the journal came out in 2004. The criticisms were thoughtful and interesting, and I tried to follow suit in my reply, but all this would take us too far afield here. I concluded my piece with a humorously grandiose fantasy about the building of a future Extended Phenotypics Institute. This pipedream edifice was to have three wings, the Zoological Artifacts Museum (ZAM), the laboratory of Parasite Extended Genetics (PEG), and the Centre for Action at a Distance (CAD). The subjects covered by ZAM and CAD have dominated this chapter. PEG must wait till the final chapter. Parasites often exert dramatic extended phenotypic effects on their hosts, manipulating the host’s behaviour to the parasite’s advantage, often in bizarrely macabre ways. The parasite doesn’t have to reside in the body of the host, so there is an overlap with CAD, the Action at a Distance wing. Cuckoo chicks are external parasites who exert extended phenotypic influence over the behaviour of their foster parents. And cuckoos are so fascinating they deserve a chapter of their own. For a different reason, now to be explained.
10 The Backward Gene’s-Eye View
The previous two chapters constituted my short reprise of the gene’s-eye view of evolution as I explained it in The Selfish Gene and The Extended Phenotype. I want, now and in the next chapter, to offer the gene’s-eye view in another way, a way that is particularly suitable for The Genetic Book of the Dead. This is to imagine the view seen by a gene as it ‘looks’ backwards at its ancestral history. A vivid example concerns the cuckoo. To which deplorable bird we now turn.
‘Deplorable bird’? Of course I don’t really mean that. The phrase amused me in a Victorian bird book belonging to my Cornish grandparents, where it referred to the cormorant. Each page of the book was devoted to one species. When you turned to the cormorant’s page, the very first sentence to greet you was, ‘There is nothing to be said for this deplorable bird.’ I can’t remember what grudge the author held against the cormorant. He might have had better grounds with the cuckoo, which is certainly deplorable from the point of view of its foster parents but, as a Darwinian biologist, I think it is a supreme wonder of the world. ‘Wonder’, yes, but there’s also an element of the macabre in the spectacle of a tiny wren devotedly feeding a chick big enough to swallow it whole.
Everyone knows that cuckoos are brood parasites who trick nesting birds of other species into rearing their young. ‘Cuckoo in the nest’ is proverbial. John Wyndham’s The Midwich Cuckoos, about aliens implanting their young in unwitting human wombs, is one of several works of fiction whose titles sound the cuckoo motif. Then there are cuckoo bees, cuckoo wasps, and cuckoo ants who, in their own hexapod ways, hijack the nurturing instincts of other species of insect. The cuckoo fish, a kind of catfish from Lake Tanganyika, drops its eggs among the eggs of other fish. In this case the hosts are ‘mouthbreeders’, fish belonging to the Cichlid family who take their eggs and young into their own mouths for protection. The cuckoo fish’s eggs and later fry are welcomed into the unsuspecting host’s mouth, and tended as lovingly as the mouthbreeder’s own.
Plenty of bird species have independently evolved their own versions of the cuckoo habit, for example the cowbirds of the New World, and cuckoo finches of Africa. Within the cuckoo family itself (Cuculidae), 59 of the 141 species parasitise other species’ nests, the habit having evolved there three times independently. In this chapter, unless otherwise stated, for the sake of brevity I use the name cuckoo to mean Cuculus canorus, the so-called common cuckoo. Alas, it’s not common anymore, at least in England. I miss their springtime song even if their victims don’t, and was delighted to hear it on a recent visit to a beautiful, remote corner of western Scotland where it ‘shouts all day at nothing’. My main authority – indeed today’s world authority – is Professor Nick Davies of Cambridge University. His book Cuckoo is a delightful amalgam of natural history and memoir of his field research on Wicken Fen, near Cambridge. Described by David Attenborough as one of the country’s greatest field naturalists, he achieves heights of lyrical word-painting unsurpassed in the literature of modern natural history:
North towards the horizon is the eleventh-century cathedral of Ely, which sits on the raised land of the Isle of Ely, from where Hereward led his raids against the Normans. In the early mornings, when the mist lies low, the cathedral appears as a great ship, sailing across the fens.
The ruthlessness of the cuckoo begins straight out of the egg. The newly hatched chick has a hollow in the small of the back. Nothing sinister about that, you might think. Until you are told the sole use to which it is put. The cuckoo nestling needs the undivided attention of its foster parents. Rivals for precious food must be disposed of without delay. If it finds itself sharing the nest with either eggs or chicks of the foster species, the hatchling cuckoo fits them neatly into the hollow in its back. It then wriggles backwards up the side of the nest and tosses the competing egg or chick out. There is, of course, no suggestion that it knows what it’s doing, or why it is doing it, no feelings of guilt or remorse (or triumph) in the act. The behavioural routine simply runs like clockwork. Natural selection in ancestral generations favoured genes that shaped nervous systems in such a way as to play out this instinctive act of (foster) fratricide. That is all we can say.
And there’s no more reason to expect the foster parents to know what they are doing when they fall for the cuckoo’s trick. Birds are not little feathered humans, seeing the world through the lens of intelligent cognition. It makes at least as much sense to see the bird as an unconscious automaton. This helps us understand the otherwise surprising behaviour of foster parents. A pioneering cinematographer of the cuckoo’s dark ways was Edgar Chance, avid ornithologist of the early twentieth century. By Nick Davies’s account of his film, a mother meadow pipit appeared totally unconcerned as it watched its own precious offspring being murdered by the cuckoo chick in its nest. The mother then left on a foraging trip, as if nothing untoward had happened. When she returned, she pointlessly fed her chick as it lay dying on the ground. From a human cognitive point of view, her behaviour makes no sense: neither the impassive watching of the initial murder nor the subsequent futile feeding of the doomed chick. We shall meet this point again and again throughout the chapter.
The name ‘cuckoo’ is derived from the simple, two-note tune of the male bird’s song, so simple indeed that some ornithologists downgrade it from ‘song’ to ‘call’ (on parallel grounds to the hysterically unpopular downgrading of Pluto to sub-planet status). The cuckoo’s song (or call) is commonly described as dropping through a minor third, but I’m happy to quote no less an authority than Beethoven in support of my hearing it as a major third. His famous cuckoo in the Pastoral Symphony descends from D to B Flat. Whether major or minor, whether song or call, it is simple – and perhaps has to be simple because the male never gets a chance to learn it by imitation. A cuckoo never meets either biological parent. It knows only its foster parents, who could belong to any of a variety of species, each with its own song, which the young cuckoo must not learn. So the male cuckoo’s song has to be hard-wired genetically, and a kind of common sense concludes, not very confidently, that it should therefore be simple.
Now we approach the remarkable story that earns the cuckoo its place in a chapter on genes ‘looking backwards in time’. Cuckoo eggs mimic the colour and patterning of the other eggs in the particular foster nest in which they sit. And they mimic them even though many different foster species are involved, with very different eggs. Here is a clutch of six brambling eggs plus one cuckoo egg. The only way I, and doubtless you, can tell which one is the cuckoo egg is by its slightly larger size.
At first sight, such egg mimicry might seem no more remarkable than the ‘paintings’ of Chapter 2. Well, that’s quite remarkable enough! But now look at the next picture showing a parasitised nest of meadow pipit eggs.
Again, you can spot the tell-tale size of the cuckoo egg. But what is really noticeable is that the cuckoo egg in the second picture is dark with black speckles like meadow pipit eggs, whereas the cuckoo egg in the first picture is light and with rusty speckles like brambling eggs. Meadow pipit eggs are dramatically different from brambling eggs. Yet cuckoo eggs achieve a near-perfect colour match in each of the two nests.
Once again, the mimicry might seem par for the course, all of a piece with the lizard, frog, spider, or ptarmigan ‘paintings’ of Chapter 2. It would indeed be relatively unremarkable if the cuckoos that parasitise bramblings were a different species from the cuckoos that parasitise meadow pipits. But they aren’t. They’re the same species. Males breed indiscriminately with females reared by any foster species, so the genes of the whole species are mixed up as the generations pass. That mixing is what defines them all as of the same species. Different females, all belonging to the same species and consorting with the same males, parasitise redstarts, robins, dunnocks, wrens, reed warblers, great reed warblers, pied wagtails, and others. But each female parasitises only one of those host species. And the remarkable fact is that (with a few revealing exceptions) the eggs of each female cuckoo faithfully mimic those of the particular host in whose nest she lays them. The only consistent betrayer is that cuckoo eggs are slightly larger than the host eggs that they mimic. Even so, they are smaller than they ‘should’ be for the size of the cuckoo itself. Presumably, if the pressure to mimic drove them to be any smaller, the chicks would be penalised in some way. The actual size is a compromise between pressure to be small to mimic the host eggs, and an opposite pressure towards the larger optimum for the cuckoo’s own size.
I doubt that you’re wondering why egg mimicry benefits the cuckoos. Foster parents are mostly very good at spotting cuckoo eggs, and they often eject them. A cuckoo egg of the wrong colour would stand out like a sore thumb. Actually, that’s an unusually poor cliché. Have you ever seen a sore thumb, and did it stand out? Let’s initiate a new simile. Stands out like a baseball at Lord’s? Like a Golden Delicious in a basket of genuinely delicious apples? Just look at that cuckoo egg in the brambling nest and imagine transplanting it into the meadow pipit nest. Or vice versa. The host birds would unhesitatingly toss it out. Or, if tossing it out is too difficult, abandon the nest altogether. Such discrimination is not a surprise when you consider that bird eyes are acute enough to perfect the exquisitely detailed painting of lichen-mimicking moths and stick-mimicking caterpillars.
Foster parents, then, whether as automata or cognitively, can be expected to provide the selection pressure that explains why it might benefit cuckoo eggs to show such beautiful egg mimicry. They throw out eggs that don’t look like their own. But what is surprising, hugely so, is that cuckoos, all of one intrabreeding species, manage to mimic the eggs of many different foster parent species. To drive home the point, here’s yet another example: a reed warbler nest with, once again, wonderful egg mimicry by the single, slightly larger cuckoo egg.
These beautiful examples force us back to the central question of this whole discussion. How is it possible for female cuckoos, all belonging to the same species and all fathered by indiscriminate males, to produce eggs that match such a range of very different host eggs? Are we to believe that female cuckoos take one look at the eggs in a nest and take a decision to switch on some kind of alternative egg-colouring mechanism in the lining of the oviduct? That is improbable, to say the least. There are women who might love to control, by sheer willpower and for very different reasons, the behaviour of their own oviduct. But it’s not the kind of thing willpower does. And, with the best will in the world, it’s not clear how will will power it.
What is the true explanation for the female cuckoo’s apparent versatility? Nobody knows for sure, but the best available guess makes use of a peculiarity of bird genetics. As you know, we mammals determine our sex by the XX / XY chromosome system. Every woman has two X-chromosomes in all her body cells, so all her eggs have an X-chromosome. Every man has an X- and a Y-chromosome in all his body cells. Therefore, half his sperms are Y sperms (and would father a son when coupled with a necessarily X egg) and half are X sperms (would father a daughter when coupled with a necessarily X egg). Less well known is that birds have a similar system, but it evidently arose independently because it is reversed. The chromosomes are called Z and W instead of X and Y, but that’s not important. What matters is that in birds females are ZW and males are ZZ. That’s opposite to the mammal convention, but otherwise the principle is the same. Whereas the Y-chromosome passes only down the male line in mammals, in birds the W chromosome passes only down the female line. The W comes from the mother, the maternal grandmother, the maternal maternal great grandmother and so on back through an indefinite number of female generations.
Now recall the title of this chapter: ‘The Backward Gene’s-Eye View’. It’s all about genes looking back at their own history. Imagine you are a gene on the W-chromosome of a cuckoo, looking back at your ancestry. Not only are you in a female bird today, you have never been in a male bird. Unlike the other genes on ordinary chromosomes (autosomes), which have found themselves in male and female bodies equally often down the ages, the ancestral environments of the W-chromosome have been entirely confined to female bodies. If genes could remember the bodies they have sat in, the memories of W-chromosomes would be exclusively of female bodies not male ones. Z-chromosomes would have memories of both male and female bodies.
Hold that thought while we look at a more familiar kind of memory: memory by the brain, individual experience. It is a fact that female cuckoos remember the kind of nest in which they were reared, and choose to lay their own eggs in nests of the same foster species. Unlike the improbable feat of controlling your own oviduct, remembering early experience is exactly the kind of thing bird brains are known to do. When they come to choose a mate, as we saw in Chapter 7, birds of many species refer back to a kind of mental photograph of their parent, which they filed away in memory after their first encounter on hatching (‘imprinting’): even if – in the case of incubator-hatched goslings, for instance – what they later find attractive is Konrad Lorenz. To remember Lorenz, parental plumage, father’s song, or foster-parent’s nest – it’s all the same kind of problem. The same imprinting brain mechanism works well enough in nature even if, in captivity, it misfires.
I think you can see where this argument is going. Each female mentally imprints on the same foster nest as her mother; and therefore her maternal grandmother; and her maternal maternal great grandmother. And so on back. And her childhood imprinting leads her to choose the same kind of nest as her female forebears. So, she belongs to a cultural tradition going exclusively down the female line. Among females there are robin cuckoos, reed warbler cuckoos, dunnock cuckoos, meadow pipit cuckoos, etc., each with their own female tradition. But only females belong to these cultural traditions. Each cultural line of females is called a gens – plural gentes. A female may belong to the meadow pipit gens, or the robin gens, or the reed warbler gens, etc. Males don’t belong to any gens. They are descended from – and they father – females of all gentes indiscriminately.
Finally, we put these two strands of thought together, again in the light of the chapter’s title. With the exception of W-chromosome genes, all the genes in a female cuckoo look back through a chain of ancestors belonging to every gens that’s going. W-chromosomes aside, gentes are not genetically separate like true races, because males confound them. Only W-chromosome genes are gens-specific. Only W-chromosomes look back on ancestors of a particular gens to the exclusion of any other. We talked of two kinds of memory: genetic memory and brain memory. See how the two coincide where W-chromosome genes are concerned!
With respect to the W-chromosome, and only the W-chromosome, gentes are separate genetic races. So – I think you’ve already completed the argument yourself – if the genes that determine egg coloration and speckling are carried on the W-chromosome, it would solve the riddle we began with, the riddle of how it’s possible for the females of one species of cuckoo to mimic the eggs of a wide variety of host species. It isn’t willpower that chooses egg colour, it’s W-chromosomes.
You will have guessed that it’s not as simple as that. Things seldom are in biology. Although female cuckoos have a strong preference for their natal nest type when they come to lay, they occasionally make a mistake and lay in the ‘wrong’ nest, different from their natal nest. Presumably that’s how new gentes get their start. And not all gentes achieve good egg mimicry. Dunnock (hedge sparrow) eggs are a beautiful blue. But cuckoo eggs in dunnock nests aren’t blue (left). They aren’t even ‘trying’ to be blue, we might say. The cuckoo egg in the picture stands out like a sore … like a bloodhound in a pack of dachshunds. Are cuckoos, perhaps, constitutionally incapable of making blue eggs? No. Cuculus canorus in Finland has achieved a most beautiful blue, in perfect mimicry of redstart eggs (right). So why don’t cuckoo eggs mimic dunnock eggs? And how do they get away with it? The answer is simple, although it remains puzzling. Dunnocks are among several species that don’t discriminate, don’t throw out cuckoo eggs. They seem blind to what looks to us glaringly obvious. How is this possible, given that other small songbirds have powers of discrimination acute enough to perfect the finishing touches to the egg mimicry achieved by their respective gentes of female cuckoos? And given that bird eyes are capable of perfecting the detailed mimicry of stick caterpillars, lichen-mimicking moths, and the like?
Cuckoos and their hosts, as with stick caterpillars and their predators, are engaged in an ‘evolutionary arms race’ with one another. As mentioned in Chapter 4, arms races are run in evolutionary time. It’s a persuasive parallel to human arms races, which are run in ‘technological time’, and a lot faster. The aerial swerving and dodging chases of Spitfires and Messerschmitts were run in real time measured in split seconds. But in the background and more slowly, in factories and drawing-offices in Britain and Germany, races were run to improve their engines, propellers, wings, tails, weaponry, etc., often in response to improvements on the other side. Such technological arms races are run over a timescale measured in months or years. The arms races between cuckoos and their various host species have been running for thousands of years, again with improvements on each side calling forth retaliatory improvements in the other.
Nick Davies and his colleague Michael Brooke suggest that some gentes have been running their respective arms races for longer than others. Those against meadow pipits and reed warblers are ancient arms races, which is why both sides have become so good at outdoing the other – and therefore why the cuckoo eggs are such good mimics. The arms race against dunnocks, they suggest, has only just begun. Not enough time for the dunnocks to evolve discrimination and rejection. And not enough time for the dunnock gens of cuckoos to evolve the appropriate blue colour.
If it’s true that cuckoos have only just ‘moved into’ dunnock nests, we must suppose that these ‘pioneer’ cuckoos have ‘migrated’ from another host species, presumably one with rusty-spotted grey eggs because that’s the egg colour of the ‘newly arrived’ dunnock gens of cuckoo. I suppose this is how any new gens gets its start. But don’t be misled by ‘pioneer’ and ‘migrated’. It would not have been any kind of bold decision to sally forth into fresh nests and pastures new. It would have been a mistake. As we’ve seen, cuckoos do indeed occasionally lay an egg in the wrong kind of nest, a nest appropriate to a different gens. Their egg then really does stand out like a … invent your own substitute for the sore thumb cliché. Natural selection normally penalises such blunders, we can presume, pretty promptly. But what if it’s a new host species that hasn’t yet been ‘invaded’ by cuckoos. The new host species is naive. They haven’t hitherto had any reason to throw out mismatched eggs. Once again, remember, birds are not little feathered humans with human judgement. The arms race has yet to get properly under way. And the host species can expect to remain naive while the arms race is yet young. But how young is young? Strangely enough, we are not totally without evidence bearing on the question, as Nick Davies points out.
Call the witness Geoffrey Chaucer. In The Parlement of Foules (1382), the cuckoo is reproached: ‘Thou mordrer of the heysugge on the braunche that broghte thee forth.’ Another name for dunnock is hedge sparrow or, in Middle English, heysugge (heysoge, heysoke, eysoge). This would seem to suggest that cuckoos were already parasitising dunnocks in the fourteenth century, when Chaucer wrote. Is 650 years long enough for an arms race to reach some sort of perfection of mimicry? Perhaps not, given that, as Davies points out, only 2 per cent of dunnock nests are parasitised. Maybe, then, the selection pressure is so weak that a 600-year-old arms race is indeed young.
I prefer to add two further suggestions. The first concerns identification. Did Chaucer really mean dunnock when he said heysugge? When we say ‘sparrow’ we normally mean the house sparrow, Passer domesticus, not the hedge sparrow or dunnock, Prunella modularis. Yet the English word ‘sparrow’ is used for both. To many who are not avid twitchers, all little brown birds (LBBs) look much the same, and we might even sink so low as to call them all ‘sparrows’. I can’t help wondering whether Chaucer was using ‘heysugge’ to mean LBB rather than specifically Prunella modularis?
My second suggestion is more biologically interesting. If we think carefully about it, there’s no reason, is there, to suppose that there’s only one cuckoo gens for each host species? Maybe Chaucer’s gens of dunnock cuckoos has died out, and a new gens of dunnock cuckoos is just beginning its arms race. Perhaps other gentes of dunnock cuckoos have perfect egg mimicry today, but have not come to the notice of ornithologists. There would be no relevant gene flow between them because males don’t have W-chromosomes.
Claire Spottiswoode and her colleagues are running a parallel study of an unrelated South African finch, which convergently evolved the cuckoo habit. The cuckoo finch, Anomalospiza imberbis, lays its eggs in the nests of grass warblers. Different gentes of cuckoo finch mimic the eggs of different grass warbler species. There is genetic evidence that what distinguishes the gentes is indeed their W-chromosomes, which reinforces the idea that the same thing is going on in cuckoos. As Dr Spottiswoode points out, this doesn’t have to mean that every detail of all the egg colours is carried on the W-chromosome. In both cuckoos and cuckoo finches, genes for making all the different egg colours have very probably been built up on other chromosomes (‘autosomes’) over many generations, and are carried by all the gentes and passed on by males as well as females. The W-chromosome need only have switch-genes – genes that switch on or off whole suites of genes carried on autosomes. And the relevant autosomal genes would be carried by males as well as females.
This is indeed how sex itself is determined. If you have a Y-chromosome, you have a penis. If you have no Y-chromosome, you have a clitoris instead. But there’s no reason to suppose that the genes that influence the shape and size of a penis are confined to the Y-chromosome. Far from it. It’s entirely plausible that they are scattered over many autosomes. There’s no reason to doubt that a man may inherit genes for penis size from his mother as well as from his father. Presence or absence of a Y-chromosome determines only which alternative suite of genes on autosomes will be switched on. For most purposes you can think of the entire Y-chromosome as a single gene that switches on suites of other genes on autosomes elsewhere in the genome. A point of terminology: members of these suites of autosomal genes are called ‘sex-limited’ as distinct from ‘sex-linked’. Sex-linked genes are those that are actually carried on the sex-chromosomes themselves.
Probably the best guess towards a solution of the riddle of cuckoo egg mimicry is that suites of genes on lots of chromosomes determine egg coloration and spotting. These are equivalent to ‘sex-limited’, and we may call them ‘gens-limited’. They are switched on or off by the presence or absence of one or more genes on the W-chromosome, genes that, by analogy, we can call ‘gens-linked’. All cuckoo autosomes may have suites of genes for mimicking a whole repertoire of host eggs. W-chromosomes contain switch genes that determine which suite of genes is turned on. And it is W-chromosomes that are peculiar to each gens of females, W-chromosomes that look back at their history and see a long line of nests of only one foster species.
This interpretation of egg mimicry in cuckoos is my introduction to the topic of the backward gene’s-eye view, genes looking over their shoulder at their own ancestry. Here’s a similar but more complicated example involving fish and the Y-chromosome. Different kinds of fish display a bewildering variety of sex-determining systems. Some don’t use sex chromosomes at all but determine sex by external cues. Some fish are like birds in that females are XY and males are XX. Others are like us mammals: males are XY, females XX. Among these are small fish of the genus Poecilia, which includes mollies and guppies among popular aquarium fish. One species, Poecilia parae, has a remarkable colour polymorphism, which affects only the males. Polymorphism means that there are different genetically determined colour types coexisting in the population (in this case five colour patterns) and the proportions of the different types remain stable in the population through time. All five male morphs can be found swimming together in South American streams. There’s only one female morph: females look alike.
Since the polymorphism affects only one sex, we can call them five gentes, by analogy with the cuckoos, with the difference that in these fish it’s the males who are separated by gens. The picture shows the five male types plus a female at the bottom. Three of the five male types have two long stripes like tramlines. Between the tramlines there is colour, and I’ll call them reds, yellows, and blues respectively. These three ‘tramliners’ can, for many purposes, be lumped together. The fourth type has vertical stripes. They’re officially named ‘parae’, but confusingly that’s also the name of the whole species. I’ll call them ‘tigers’. The fifth type, ‘immaculata’, is relatively plain grey, like females but smaller, and I’ll call them ‘greys’.
Tigers are the largest. They behave aggressively, chasing rival males away, and copulating with females by force. Greys are the smallest, and they manage to copulate only by occasionally sneaking up on females opportunistically. When they get away with it, it seems to be because otherwise aggressive males mistake them for females, which they do indeed resemble. Greys have the largest testes, presumably capable of producing the most sperm, perhaps to take advantage of their scarce opportunities to use it. Red, yellow, and blue tram-liners are of intermediate size. Rather than rape or sneak, they court females in a civilised manner, displaying their respective coloured flanks.
Tiger
Grey
Blue
Yellow
Red
Female
Male ‘gentes’ in fish?
Now here’s where the parallel to cuckoos kicks in. Evidence suggests that colour morph inheritance runs entirely down the male line. In every case studied, sons belong to the same type as their father, and therefore paternal grandfather, paternal paternal great grandfather, etc. Their mother has no genetic say in the matter, and nor does their maternal grandfather, etc., even though each one belongs to one colour gens or another. This suggests the hypothesis that the five types of males differ with respect to their Y-chromosomes – just as gens-inheritance in female cuckoos seems to be carried on the W-chromosome. The details of colour pattern and behaviour of the male fish may be carried in suites of genes on autosomes (gens-limited). But the genes determining which gens an individual belongs to (and presumably which suite of colour and pattern genes on other chromosomes is switched on) seem to be gens-linked, that is, carried on the Y-chromosome.
Researchers are doing fascinating work on mate choice in these fish and are homing in on what maintains the polymorphism. It seems that each of the five male types has an equilibrium frequency, fitting the definition of a true polymorphism. If its frequency falls below the equilibrium, it is favoured and therefore becomes more frequent in the population. If its frequency rises too high, it is penalised and its frequency decreases. This so-called ‘frequency-dependent selection’ is a known way for polymorphisms to be maintained in a population. How might it work in practice? The details are not yet clear but might look something like this. The grey sneakers benefit from being mistaken for females. If they become too frequent, perhaps the real females or aggressive tigers get ‘wise’ to them. How about the tigers themselves? If they get too frequent, they waste time fighting each other instead of mating. This might give the greys more opportunity to sneak matings. As for the three ‘tramliners’, who court females in a gentlemanly manner by flashing their vividly coloured flanks, there is some evidence that females prefer rarer types. This would fit the ‘equilibrium frequency’ idea, although it’s not clear why females should exhibit such a preference. More research is needed and is under way now. I am grateful to Dr Ben Sandkam, formerly of the University of British Columbia and now at Cornell, for sharing with me his thoughts on these matters.
Now let’s again apply the backward-looking technique of this chapter. Every male of Poecilia parae can look back through a long line of male ancestors, all belonging to the same gens as him, and all sharing the same Y-chromosome. This is what makes it possible for suites of genes for colour patterning and associated behaviour to become switched on in separate gentes of males, despite their sharing the same ancestors in the female line. The gene’s-eye view of the past comes into its own again, as with the cuckoos. Autosomal genes, governing characteristics other than gens-specific colour, look back on ancestors of all gentes.
Returning to cuckoos, the ‘looking back’ ploy can help us answer another riddle, and it’s an even tougher one. Although most host species are very good at distinguishing cuckoo eggs from their own (how else could natural selection have perfected cuckoo egg mimicry?), they turn out to be lamentably bad later, failing to notice that the growing cuckoo fledgling is an impostor. Even though it dwarfs them, in most cases grotesquely so. A tiny warbler is in danger, you might think, of being swallowed whole by its monstrous foster child. Foster parents, of whatever species, end up dwarfed by the cuckoo nestling into whom they tirelessly shovel food, working every devoted daylight hour to do so. How do the cuckoo nestlings get away with such a transparent, over-the-top deception? Once again, we have to be more than usually on our guard against anthropomorphism. Do not ask whether the bird’s behaviour makes sense from a human-like cognitive perspective. Of course it doesn’t. Ask instead about selection pressures acting on ancestral genes that control the development of behavioural automatisms.
A warbler feeding a cuckoo
Even given this preliminary, I must admit that available answers to the riddle epitomised by the picture on the previous page remain unsatisfying, compared to the explanations that I am accustomed to offering in my books. And indeed, compared to the explanation of egg mimicry. But here’s the best explanation – or series of partial explanations – I can find. We return to the idea of the arms race. In our 1979 paper, John Krebs and I considered ways in which an arms race might end in ‘victory’ for one side (here again, the quotation marks are strongly advised). We identified two principles, the ‘Life Dinner’ and the ‘Rare Enemy’ principle. These are closely related, maybe just different aspects of the same thing.
In one of Aesop’s Fables, a hound was pursuing a hare, got tired and gave up. Taunted for his lack of stamina, the hound replied, ‘It’s all very well for you to laugh, but we had not the same stake at hazard. He was running for his life, but I was only running for my dinner.’
As in military arms races, predators and prey must balance design improvements and resources against economic costs. The more they put into servicing the arms race – muscles, lungs, heart, the machinery of speed and endurance – the less is available for other aspects of life such as making eggs or milk, building up fat reserves for the winter etc. In the language of Darwinism, Aesop’s hares have been subject to stronger selection to invest resources into the arms race than the hounds. There is an asymmetry in the cost of failure – loss of life versus mere loss of dinner. The failed predator lives to pursue another prey. The failed prey has fled its last pursuer. But now, notice how we can say the same thing more piercingly in the language of the genetic book of the dead. The predator’s genes can look back on ancestors many of whom were outrun by prey. But not one of the prey’s ancestors was outrun by a predator. At least not before it had passed on its genes. Plenty of predator genes can look back on ancestors who failed to outrun prey. Not a single prey gene can look back on ancestors who had lost a race against a predator.
Apply the Life Dinner Principle to the cuckoo nestling and its host. The cuckoo nestling can look back on an unbroken line of ancestors, literally not a single one of whom was outwitted by a discriminating host. If it had been, it would not have become an ancestor. Cuckoo genes for failing to fool hosts are never passed on. But genes that lead foster parents to fail to notice cuckoos? Plenty of hosts who were fooled by cuckoos could live to breed again. Genetic tendencies among hosts to be fooled by cuckoos can be passed on. Genetic tendencies among cuckoos to fail to fool hosts are never passed on. It’s the Life Dinner Principle in operation.
Moreover, the host can look back on ancestors many of whom may never have met a cuckoo in their lives. In Nick Davies and Michael Brooke’s long-running study on Wicken Fen, only 5 to 10 per cent of reed warbler nests were parasitised by cuckoos. And this brings us to the Rare Enemy Effect. Cuckoos are comparatively rare. Most reed warblers, wagtails, pipits, dunnocks, etc. probably get through their lives and successfully reproduce without ever encountering a cuckoo. They may look back on many ancestors who never encountered a cuckoo in their lives. But every single cuckoo looks back at an unbroken line of ancestors who successfully fooled a host into feeding them. Asymmetries of this kind could favour ‘victory’ such that even a monstrous cuckoo nestling gets away with fooling its diminutive foster parent. The selection pressure to outwit cuckoos is weak compared to the selection pressure on cuckoos to do the outwitting.
Another parable with an Aesopian flavour is the fable of the boiled frog. A frog dropped into very hot water might do anything in its power to jump out. But a frog in cold water that is slowly heated up does not notice until it is too late. When the baby cuckoo first hatches, the deceiver is indistinguishable from the real thing. As it gradually grows, there is no one day when it suddenly becomes obvious that it’s a fake. Just as there’s never a day when a baby becomes a child; or a child a teenager; or a middle-aged man old. Every day, it looks much the same as the day before. Perhaps this helps the outwitting. Note that the boiled frog effect doesn’t apply to eggs. A cuckoo egg suddenly appears in the nest. It doesn’t gradually become more and more imposterish like a cuckoo nestling.
In another pair of papers already mentioned, Krebs and I proposed that animal communication in general can be seen as manipulation. I discussed this in Chapter 7 in connection with nightingale song bewitching John Keats. Birdsong is known to cause female gonads to swell. This is an example of what we called manipulation. It will not always be to the female’s advantage to submit to it. There will be an arms race between salesmanship and sales-resistance, each side escalating in response to the other. What tricks of salesmanship might the cuckoo nestling employ, in response to the sales-resistance of the host? They’d need to be pretty powerful to outweigh the eventually incongruous mismatch in size between foster parent and cuckoo nestling. But that’s no argument against their existence.
All nestlings open their gapes wide and squawk their appeals for food. If you’re a baby reed warbler, say, the louder you cry, the more likely you are to persuade your parent to drop food into your gape rather than a sibling’s (and there is indeed good Darwinian reason for competition among siblings, even real gene-sharing siblings). On the other hand, loud vocalisation costs vital energy. This applies to baby birds as much as to adults. In one study of wrens at Oxford, the researcher allowed himself to speculate that a male literally sang himself to death. The calling rate and loudness of a baby reed warbler will normally be regulated to an optimum level: enough to compete with siblings, but not so much as to overtax itself or attract predators. The oversized baby cuckoo needs as much food as four young reed warblers. It urges the foster parent on by sounding like a clutch of reed warbler chicks rather than just one very loud reed warbler chick.
Among the ingenious field experiments done by Nick Davies, he and his colleague Rebecca Kilner put a blackbird nestling in a reed warbler nest. The young blackbird was about the same size as a cuckoo nestling. The reed warblers fed it, but at a lower rate than they would normally feed a baby cuckoo. Then the experimenters played their masterstroke: a sound recording of a baby cuckoo piped through a little loudspeaker next to the nest, switched on whenever the baby blackbird was seen to beg. Now the reed warbler adults upped the rate with which they fed the blackbird chick, to a rate appropriate to a baby cuckoo – the same rate as for a clutch of baby reed warblers. And indeed, a recording of four baby reed warblers crying had the same effect. It would seem that baby cuckoo squawks have evolved to become a super-stimulus. Super-stimuli are well attested in experiments on bird behaviour. My old maestro Niko Tinbergen reported that oystercatchers, offered a choice, will preferentially attempt to incubate a dummy egg eight times the volume of their own egg. It’s called a supernormal stimulus. Something like this is what we’d expect as the culmination of an evolutionary arms race, with escalating salesmanship on the cuckoo’s side keeping pace with escalating sales-resistance on the part of the foster parents.
How about a visual equivalent of such a super-stimulus? The open beak of all nestlings is conspicuous, often bright yellow, orange, or red. Doubtless such bright coloration persuades the parents to drop food in, the brighter the gape the greater the chance of their favouring this gape rather than a sibling’s. Reed warbler chicks have a yellow gape. Davies and colleagues found that reed warbler parents gauge their food-fetching efforts according to the total area of yellowness gaping at them in the nest, and also to the rate of begging cries. Cuckoo chicks have a red gape. Is this, perhaps, a stronger stimulus than yellow? An experiment with painted gapes failed to support the hypothesis. Is the cuckoo gape, then, larger than a reed warbler chick’s gape? Yes, cuckoo chicks have a bigger gape than any one reed warbler chick. But its area is not equal to the sum of four reed warbler chicks – perhaps closer to two. Cuckoo chicks use sound to compensate for this, and by two weeks of age a cuckoo chick sounds like a clutch of reed warbler chicks. The combination of a somewhat bigger gape than one reed warbler chick’s, together with supernormal begging cries, is just enough to persuade the adult reed warblers to pump into the cuckoo chick as much food as they would normally bring to a whole clutch of their own chicks. Once again, we could see the supernormal begging call as the end product of an escalating arms race between salesmanship and sales-resistance.
A cardinal feeding a goldfish
That birds are susceptible to large gapes – even the alien gape of a fish – is shown by the well-attested observation of a cardinal (an American bird) repeatedly dropping food into the open mouth of a goldfish. We view the scene through human eyes and think, how absurd, how could a bird be so stupid? But the example of the oystercatcher sitting on the giant egg should warn us that human eyes are precisely what we should not trust. We have no right to be sarcastic. Birds are not little humans, cognitively aware of what they are doing and why they are doing it. And after all, a human male can be sexually aroused by a supernormal caricature of a female, even though he is well aware that it is a drawing on two-dimensional paper, with unnaturally exaggerated features, and a fraction of normal size. The baby cuckoo has no idea what it is doing when it tosses eggs out of the nest. Think of it as a programmed automaton. The oystercatcher does not know why it sits on a giant egg. Think of it as a pre-programmed incubation machine. And in the same way, think of a parent bird as a robot mother, programmed to drop food into wide-open gapes, however ridiculous it may seem to us when the gape belongs to a fish. Or to the giant imposter who is a nestling cuckoo.
If cuckoo nestlings have a supernormal gape, mimicking two ordinary chicks, there’s an Asian cuckoo, Horsfield’s hawk cuckoo, Cuculus fugax, that goes one better. It has the visual equivalent of a clutch of gapes. In addition to its yellow gape, it has a pair of dummy gapes: a patch of bare skin on each wing, the same yellow colour as the real gape. It waves the wing patches about, usually one at a time, next to the real gape. The foster parent (a species of blue robin was the host in this Japanese study by Dr Keita Tanaka) is stimulated by the double whammy of gape plus patch. Dr Tanaka has kindly sent me several photographs plus some amazing film footage. As soon as the foster parent flies in, the cuckoo chick dramatically raises its right wing and waves it about. The gesture reminds me of a swordsman raising his shield to intercept an attack. But this analogy has it exactly wrong. The point is not to repel but to attract. One film even shows the robin vigorously stuffing food up against the yellow patch on the upheld right wing, before turning and shoving it into the wide-open gape instead. The Japanese researchers ingeniously blacked out the wing patch, and this reduced the feeding rate by the robins. There’s a similar story for another brood parasite, the whistling hawk cuckoo, Hierococcyx nisicolor, in China. Like the Horsfield’s hawk cuckoo, the nestlings have yellow wing patches that they display in the same way, to fool foster parents.
So much for cuckoos, not deplorable because a true wonder of nature and natural selection. Now, let’s see what else we can do with the notion of genes looking over their shoulder.
Horsfield’s hawk cuckoo with fake gape on wing
11 More Glances in the Rear-View Mirror
Where once they would have talked of the good of the species, nowadays essentially all serious biologists studying animal behaviour in the wild have adopted what I am calling the gene’s-eye view. Whatever the animal is doing, the question these modern workers ask is, ‘How does the behaviour benefit the self-interested genes that programmed it?’ David Haig, now at Harvard University, is one of those pushing this way of thinking towards the limit, illuminating a great diversity of topics, including some important ones that doctors should care about, such as problems of pregnancy.
Among other things, Haig noticed a lovely example of genes looking backwards – actually at the immediate past generation. There’s a phenomenon called genomic imprinting. A gene can ‘know’ (by a chemical marker) whether it came from the individual’s father or mother. As you can imagine, this radically changes the ‘strategic calculations’ whereby a gene looks after its own self-interest. Haig shows how genomic imprinting changes how a gene views kin. Normally, a gene for kin altruism should regard a half-sibling as equivalent to a nephew or niece – half the value of a full sib or offspring. But if the altruistic gene ‘knows’ it came from the mother and not the father, it should see a maternal half-sibling as equal to its own offspring, or to a normal full sibling. The other way round if it ‘knows’ it came from the father. It should then see the maternal half-sibling as equivalent to an unrelated individual. Genomic imprinting opens up a whole lot of ways in which genes within an individual can come into conflict with one another, the topic of Burt and Trivers’ book Genes in Conflict. Haig goes so far as to blame warring genes for the familiar psychological sensation of being pulled in two directions at once, as in short-term gratification versus longer-term benefit. Genomic imprinting provides a stark example of how a gene might look in the ‘rear-view mirror’. Other examples constitute the topics of this chapter.
A gene on a mammalian Y-chromosome ‘looks back’ at an immensely long string of ancestral male bodies and not a single female one, probably as far back as the dawn of mammals if not further. Our mammal Y-chromosome has been swimming in testosterone for perhaps 200 million years. But if Y-chromosomes look back at only male bodies, what about X-chromosomes? If you are a gene on an X-chromosome, you might come from the animal’s father, but you are twice as likely to come from its mother. Two-thirds of your ancestral history has been in female bodies, one-third in male bodies. If you are a gene on a chromosome other than a sex chromosome, an autosome, half your ancestral history was in female bodies, half in male bodies. We should expect many autosomal genes to have sex-limited effects, programmed with an IF statement: one effect whenever they find themselves in a male body, a different effect when in a female body.
But when any gene looks back at the male bodies that it has inhabited, what it sees will not be a random sample of male bodies but a restricted sub-set. This is because the average male is often denied the Darwinian privilege of reproduction. A minority of males monopolises the mating opportunities. Most females, on the other hand, enjoy close to the average reproductive success. Red deer stags with large antlers prevail in fights over access to females. So when a red deer gene looks back at its male ancestors, it will see the minority of male bodies that are topped by abnormally large antlers.
Even more extreme is the asymmetry shown by seals, especially Mirounga, the elephant seal. There are two species: the southern elephant seal, which I have seen, close enough to touch (though I would not), on the remote island of South Georgia, and the northern elephant seal, which Burney Le Boeuf has thoroughly studied on the Pacific beaches of California. Like many mammals, elephant seals have harem-based societies but they carry it to an extreme. Successful males, ‘beachmasters’, are gigantic: up to 4 metres long and weighing 2 tonnes. Females are relatively small and are gathered into harems, which may typically number as many as fifty ‘belonging to’, and vigorously defended by, a single dominant male. Most of the males in the population have no harem, and either never reproduce or bide their time hoping to sneak an occasional copulation, as well as aspiring eventually to get big and strong enough to displace a beachmaster. In one report from Le Boeuf’s long-term California study of northern elephant seals, only eight males inseminated an astonishing 348 females. One male inseminated 121 females, while the great majority of males had no reproductive success at all. An elephant seal gene on a Y-chromosome looks back at, not just a long sequence of male bodies, but specifically at the overgrown, blubbery, belching, bloated bodies of a tiny minority of dominant, harem-holding beachmasters: highly aggressive males, over-endowed with testosterone and with the dangling trunks used as living trombones to resonate roars that intimidate other males. On the other hand, an elephant seal gene will look back at a succession of female bodies that are close to the average.
Do you find something puzzling about the fact that only a small minority of males does almost all the fathering? Isn’t it terribly wasteful? Think of all those bachelor males, consuming a fat slice of the food resources available to the species, yet never reproducing. A ‘top-down’ economic planner with species welfare in mind would protest that most of those males shouldn’t be there. Why doesn’t the species evolve a skewed sex ratio such that only a few males are born: just enough males to service the females, the same number of males as would normally hold harems? They wouldn’t have to fight each other, they’d all get a harem as a matter of automatic entitlement, just for being male. Wouldn’t a species with such an economically sensible, planned economy prevail over the present, wildly uneconomical, strife-ridden species? Wouldn’t the planned economy species win out in natural selection?
Sexual inequality on the beach
Yes, if natural selection chose between species. But, contrary to a widespread misunderstanding, it doesn’t. Natural selection chooses between genes, by virtue of their influence on individuals. And that makes all the difference. If the sensible planned economy were to come about by Darwinian means, it would have to be through the natural selection of genes controlling the sex ratio. This is not impossible. A gene could bias the number of X sperms versus Y sperms produced by males. Or it could favour selective abortion of some male foetuses. Or it could favour starving some baby sons to death and keeping just a favoured few. Never mind how it does it, just call this hypothetical gene the Planned Economy Gene, pegged to top-down common sense.
Imagine a planned economy population where most of the individuals are female, say one male for every ten females. This is the kind of population our sensible economist would expect to see. It is economically sensible because food is not wasted on males who are never going to reproduce. Now imagine a mutant gene arising, a mutation that biases individuals towards having sons. Will this male-favouring gene spread through the population? Alas for the planned economy, it certainly will. In the planned economy, females outnumber males ten to one, so a typical male can expect ten times as many descendants as a typical female. It’s a bonanza for males. The son-biased mutant gene will spread rapidly through the population. And the males will have good reason to fight. It’s the flip side of our observation that our hypothetical gene looks back at a successful minority of male bodies, not at an average sample of male bodies.
Will the population sex ratio swing right round to the opposite extreme and become male-biased? No, natural selection will stabilise the sex ratio we actually see, a 50/50 sex ratio (but see the important reservation below) with a minority of harem-holding males and a majority of frustrated bachelors. Here’s why. If you have a son, there’s a good chance he’ll end up a disconsolate bachelor who’ll give you no grandchildren. But if your son does end up a harem-holder, you’ve hit the jackpot where grandchildren are concerned. The expected reproductive success of a son, averaged over his slim chance of the jackpot plus the much greater chance of bachelor misery, equals the expected average reproductive success of a daughter. Equal sex ratio genes prevail, even though the society they create is so horribly uneconomical. Sensible as it sounds, the ‘planned economy’ cannot be favoured by natural selection. In this respect at least, natural selection is not a ‘sensible’ economist.
I said that selection would stabilise the sex ratio at 50/50 but I added a cautionary reservation. There are various reasons for that caution, and they are important. Here’s one of them. Suppose it costs twice as much to rear a son as to rear a daughter. To equip a son to fight off rivals and win a harem, he must be big. Being big doesn’t come free. It costs food. If a mother seal must suckle a son for longer than a daughter, if a son costs twice as much as a daughter to rear, the ‘choice’ facing the mother is not ‘Shall I have a son or a daughter’ but ‘Shall I have a son or two daughters?’ The general principle, first clearly understood by RA Fisher, is that the sex ratio stabilised by natural selection is 50/50 measured in economic expenditure on daughters versus economic expenditure on sons. That will amount to 50/50 in numbers of male and female bodies, only if the cost of making sons and daughters is the same. Fisher’s principle balances what he called parental expenditure on sons versus daughters. This may cash out in the form of equal numbers of males and females in the population, but only if sons and daughters are equally costly to rear. There are other complications, some pointed out by WD Hamilton, but I won’t stay to deal with them.
Elephant seals are an extreme example of a principle that typifies many mammal species. Females tend to have nearly the same reproductive success as each other, close to the population average, while a minority of males enjoys a disproportionate monopoly of reproduction. In statistical language, mean reproductive success of males and females is equal, but males tend to have a higher variance in reproductive success. And, to return to the title of this chapter, the ancestral females that genes ‘look back on’ will be close to the average. But they’ll look back on an ancestral history dominated by a minority of males: that minority endowed with whatever it takes in the species concerned – large antlers, fearsome canine teeth, sheer bodily bulk, courage, or whatever it might be.
‘Courage’ can be given a more precise meaning. Any animal must balance the short-term value of reproducing now against its own long-term survival to reproduce in the future. A brutal fight against a rival male may end in victory and a harem. But it may end in death, or serious injury which presages death. Courage is at a premium. Risking death is worthwhile because the stakes for a male are so high: a huge number of pups to his name if he wins, zero and perhaps death if he loses. A female seal would give higher priority to surviving to reproduce next year. She only has one pup in a year, so she’ll maximise her reproductive success by surviving herself. Natural selection would favour females who are more risk-averse than males; would favour males who are more courageous or foolhardy. Males are biased towards a high-stakes high-risk strategy. This is probably why males tend to die younger. Even if they’re not killed in battle, their whole physiology is skewed towards living to the full while young, even at the expense of living on at all when old.
A complication is that, in some species, including elephant seals, subordinate males sneak surreptitious matings at the risk of punishment from dominant males. They may adopt a particular strategy known as the ‘sneaky male’ strategy. This means that as a Y-chromosome looks back at its history, it will see mostly a river of dominant harem-holders but also a side rivulet, that of the sneaky males. And now, a change of topic.
As will be apparent by now, my late colleague WD Hamilton had a restless and highly original curiosity, which led him to solve many outstanding riddles in evolutionary theory, problems that lesser intellects never even recognised as problems. A naturalist from boyhood, he noticed that many insect species come in two distinct types which could be named ‘dispersers’ and ‘stay-at-homes’. Dispersers typically have wings. ‘Stay-at-homes’ often don’t. It’s surprising how many species of insects have both winged and wingless members, seemingly in balanced proportion. If you like human parallels, think of human families in which one brother comfortably inherits the farm while the other brother emigrates to the far side of the world in search of an improbable fortune. In the case of plants, dandelion seeds with their fluffy parachutes are ‘winged’ dispersers, while other members of the daisy family have, to quote Hamilton, ‘a mixture of winged and wingless within a single flower head’.
To stolid common sense, it seems intuitively obvious that if parents live in a good place (and they probably do live in a good place, or they wouldn’t have succeeded in becoming parents), the best strategy for an offspring must be to stay in the same good place. ‘Stay at home and mind the family farm’ would seem to be the watchword, and that was the conventional wisdom among most evolutionary theorists before Bill Hamilton. Bill suspected, by contrast, that selection would favour a balance between stay-at-homes and dispersers, the point of balance varying from species to species. He enlisted the help of his mathematical colleague Robert May, and together they developed mathematical models that supported his intuition.
My own, less mathematical way to express Bill’s intuition is in terms of the gene’s-eye view of the past. No matter how favourable the ‘family farm’ – the environment in which parents have flourished – it is sooner or later going to be subject to a catastrophe: a forest fire perhaps, or a disastrous flood or drought. So, as a gene looks back at the history of ‘the family farm’, the parental, grandparental, and great grandparental generations may indeed have flourished there. The success story might go back an unbroken ten or even twenty generations. But eventually, if it looks far enough back into the past, the stay-at-home gene will eventually hit one of those catastrophes.
The disperser gene may look back on the recent past as one of comparative failure: life on the family farm was milk and honey. But if we look back sufficiently far, we come to a generation where only the disperser gene, the gene for wild wanderlust, made it through. There’s also the anthropomorphic point that wanderlust occasionally strikes gold.
Naked mole rat
I perhaps went too far when in 1989 I published a speculation about naked mole rats, but it serves to dramatise the point. Naked mole rats are small, spectacularly ugly (by human aesthetics) African mammals, who live underground. They are famous among biologists as the nearest mammalian approach to social insects: ants and termites. They live in large colonies of as many as 100 individuals in which only one female, the ‘queen’, normally reproduces, and she is fecund enough to compensate for the near sterility of all the other females, who function as ‘workers’. A colony can extend through a huge network of 2 or 3 miles of burrows, gathering underground tubers as food.
This much has become lore among biologists intrigued by the obvious similarity to social insects. However, one discrepancy always worried me. Although the ants and termites that we ordinarily see are wingless, sterile workers, their underground nests periodically erupt in a boiling mass of winged reproductive individuals of both sexes. These fly up to mate, after which the newly fertilised young queens settle down, lose their wings (in many cases even biting them off), dig a hole, and attempt to found a new underground nest with the aid of sterile, wingless worker daughters (and sons in the case of termites). The winged castes are Hamilton’s dispersers, and they are an essential part – indeed, the essential part – of the biology of social insects. You could say they are what the whole social insect enterprise is all about. Why don’t naked mole rats have an equivalent? Their lack of a dispersal phase is something approaching a scandal!
Not literally winged dispersers! Even I am not foolhardy enough to predict rodents with wings. But I did wonder, and still do, whether there might be a dispersal phase that nobody has spotted yet. In 1989 I wrote: ‘Is it conceivable that some already known hairy rodent, running energetically above ground and hitherto classified as a different species, might turn out to be the lost caste of the naked mole rat?’ My idea for a hitherto unrecognised dispersal caste may not have much going for it, but it is at least testable, a virtue that scientists value highly. The genome of the naked mole rat has been sequenced. If my hypothetical dispersal phase were ever discovered, some hairy mole rats should turn out to have the same genes.
I admitted the implausibility of my suggestion. How could such a hypothetical creature have been overlooked by biologists? However, I went on to make a comparison with locusts. Locusts are the terrifying ‘wanderlust’ phase of harmless ‘stay-at-home’ grasshoppers. They look different from grasshoppers and behave very differently. They are the very same grasshoppers but (oh, in a moment) they change. The genes of a harmless grasshopper have the capacity, when the conditions are right, to change (change utterly, and a terrible beauty is born). The devastating effects are all too well known. My point is that locust plagues only occasionally happen. It just takes the right conditions. Perhaps the dispersal phase of the naked mole rat has yet to erupt during the decades since biologists have been around to study the species? No wonder it has never yet been seen. Perhaps it would take only a crafty hormone injection … and a naked mole rat could become its own hairy, scurrying (though not, I suppose, winged) dispersal phase.
Another change of topic before we leave the backwards gene’s-eye view. There are two ways in which we can look back at a family tree. Conventional pedigrees trace ancestry via individuals. Who begat whom? Which individual was born of which mother? The most recent individual ancestor shared by the late Queen Elizabeth II and her husband Prince Philip was Queen Victoria. But you can also trace the ancestry of a particular gene, and you will have guessed that this is the alternative manner of tale I want to tell here. Genes, like individuals, have parent genes and offspring genes. Genes, as well as individuals, have pedigrees, family trees. But there is a significant difference between a ‘people tree’ and a ‘gene tree’. An individual person has two parents, four grandparents, eight great grandparents, etc. So a people tree is a vast ramification as you look backwards in time. Any attempt to draw it out completely will soon get out of hand. The best way to visualise it is not on paper but zooming around a computer screen. Not so the gene tree. A gene has only one parent, one grandparent, one great grandparent, etc. A gene tree is therefore a simple linear array streaking back in time, whereas a people tree bifurcates its way unmanageably into the past. This is not so when you look forwards in time, by the way. A gene can have many offspring but only ever one parent. Looking forwards, gene trees branch and branch. But this chapter is all about looking backwards.
A particular sub-lethal gene, haemophilia, has plagued the royal families of Europe ever since the early nineteenth century. The gene tree of royal haemophilia is simple and fits the page comfortably. The equivalent people tree would want several square metres of paper to be legible. The royal haemophilia gene can be traced back to a particular individual ancestor, Queen Victoria, one of whose two X-chromosomes bore the gene. The mutation occurred, to quote Steve Jones’s mordant phrase, ‘in the august testicles’ of her father, Edward, Duke of Kent. One of Victoria’s four sons, Prince Leopold, suffered from haemophilia. The other sons, including Edward VII and his descendants such as our present monarch, King Charles III, beat the odds and were lucky to escape. Leopold survived to the age of thirty, long enough to have a daughter, Princess Alice of Albany, who inevitably carried the gene on one of her X-chromosomes. Her son Prince Rupert of Teck realised his 50 per cent probability of being afflicted and died young.
Royal haemophilia
Of Victoria’s five daughters, three (at least) inherited the gene. Princess Alice of Hesse passed it on to her son, Prince Friedrich, who died in infancy, and to two daughters, Irene and Alexandra, who passed it on to three haemophiliac grandsons of Alice, including the Tsarevich Alexey of Russia. Irene married her first cousin Henry, a common practice among royals and generally not a good idea because of inbreeding depression. But inbreeding depression was not responsible for the fact that two of their sons, Waldemar and Heinrich, suffered from haemophilia: they got it on their X-chromosome from their mother, and she’d have been equally likely to pass it on, whomever she married, cousin or not (unless the cousin was himself haemophiliac, in which case 50 per cent of her daughters would actually suffer from the disease itself). Another of Victoria’s daughters, Princess Beatrice bequeathed the gene to her daughter the Queen of Spain, and on into the Spanish Royal Family, to the resentment, I gather, of the Spanish.
Tracing back the gene tree of the royal haemophilia gene, all lines coalesce in Victoria. And indeed, there is a flourishing branch of mathematical genetic theory called Coalescent Theory in which you look back at the history of a genetic variant in a population and trace the most recent common ancestor of that gene – the coalescent gene upon which all lines converge as you look back. Forget about individuals, look through the skin to the genes within, and you can trace two copies of a particular gene back in time until you hit the ancestor in whom they coalesce. That coalescence point is the ancestral individual in which the gene itself divided into two copies, which then went their separate ways in two siblings and eventually two lines of descendants. If you make purifying assumptions like random mating, no natural selection, and everybody has two children, the coalescent tree has an expected form that mathematicians can calculate in theory. In reality, of course, those assumptions are violated, and that’s when it becomes interesting. Royal families, for example, typically violate the assumption of random mating. Protocol and political expediency constrain them to marry each other.
Coalescent theory is an important part of modern population genetics, and very relevant to this chapter on the backwards gene’s-eye view, but the mathematics is outside my scope here. I will discuss one intriguing example: a particular study of one man’s genome – as it happens, my genome, although that isn’t why I find it intriguing. It is a remarkable fact that you can make powerful inferences about the demographic history of an entire population using the genome of just a single individual. For a rather odd reason, I was one of the earliest people in Britain to have their entire genome (as opposed to the relatively small sample done by the likes of ‘23-and-Me’) sequenced. I handed the data disc over to my colleague Dr Yan Wong, and he included a clever analysis of it in the book that we co-authored, The Ancestor’s Tale (2016). It’s rather tricky to explain, but I’ll do my best.
In every cell of my body swim twenty-three chromosomes inherited intact from my father and twenty-three from my mother. Every (autosomal) paternal gene has an exact opposite number (allele) on the corresponding maternal chromosome, but my father John’s chromosomes and my mother Jean’s chromosomes float intact and aloof from each other in all my cells. Now, here’s where it gets tricky. Take a particular gene on a John chromosome and allow it to look back at its ancestral history. Now take its opposite number (‘allele’) on the equivalent Jean chromosome, and allow it to look back in the same way. It’s the same principle as tracing the royal haemophilia gene back to Victoria. But, in this case, it is not haemophilia that is being traced, we’re looking a lot further back, and we have no hope of identifying a named individual like Victoria. We could do it with any pair of alleles, one on a John chromosome and the other on a Jean chromosome. And not just one such pair but (a sample of the) many.
Sooner or later, each gene pair, as they look back, is bound to converge on a particular individual in whom a gene once split to form the ancestor of the John gene and the ancestor of the Jean gene. I really do mean a particular individual ancestor who lived at a particular time and in a particular place. This individual had two children, one of whom was John’s ancestor and the other Jean’s ancestor. But we’re talking about a different ancestral individual – different time and place – for each Jean/John gene pair. For each gene pair, there must have been two siblings, one carrying the ancestral Jean gene and the other the ancestral John gene.
There are many overlapping people-tree routes that trace my father and my mother back to different shared ancestors. But for each of my John genes there is only one path linking it to the shared ancestor of my corresponding Jean gene. Gene trees are not the same as people trees. Each gene pair coalesces in a particular ancestor, at a particular moment in the past. You can let each pair of my genes look back, and you can find a different coalescence point in each case. You can’t literally identify the exact coalescence point for any given gene pair. But what you can do, using the mathematics of coalescent theory, is estimate when it occurred. When Dr Wong did this with my genome, he found that a large majority coalesced somewhere around 60,000 years ago, say 50,000 to 70,000.
And how should this concordance be interpreted? It means that my ancestors suffered a population bottleneck around that time. Very likely, yours did too. As my John genes and my Jean genes look back at their history, during most of those millennia they see a picture of outbreeding. But somewhere around 60,000 years into the past, the effective population size narrowed to a bottleneck. When the population is smaller, the Jean and John lineages are more likely to find themselves in a shared ancestor, simply by chance. That is why my gene pairs tend to coalesce around that time. Indeed, the coalescence data from my genome, on its own, making use of no other data, can be translated into the above graph of effective population size plotted against time. It is presumably typical for Europeans. The faint grey line shows the equivalent for an individual Nigerian, whose ancestors, it would seem, were not subject to the same bottleneck. I confess to an obscure satisfaction that, of the two co-authors of a book, one was able to use the genome of the other to make a quantitative estimate of prehistoric demography affecting not just one individual but millions.
What else can genes tell us as they look back at their history? Zoologists are accustomed to drawing family trees of animals, and calculating which species are close cousins of other species, and which distant. Among ape species, for example, chimpanzees and bonobos are our closest living relatives, and those two species are exactly equally close to us. They are equally close because they share an ancestor with each other some 3 million years ago, and that ancestor shares an ancestor with us about 6 million years ago (see below). Gorillas are the outgroup, a more distant relative of the rest of us African apes. The ancestor we share with gorillas lived longer ago, perhaps 8 or 9 million years.
GORILLA CHIMP BONBO HUMAN
On the previous page is the conventional way to draw a family tree, an organism-based family tree. But we can also draw a family tree from the point of view of a gene, looking back at its own history. The organism tree is unequivocal. Chimps and bonobos are close cousins of each other, and we are their closest relatives apart from each other. But while that is indeed a fact from the point of view of the whole organism, it is not necessarily the case when it is genes that look in the rear-view mirror. True, a majority of genes would ‘agree’ with each other and with the ‘people tree’ of the traditional zoologist. Nevertheless, it is perfectly possible that, from the point of view of some particular genes, the family tree could look very different. As on the opposite page, perhaps. The majority of our genes agree with the ‘people tree’. But when the gorilla genome was published in 2012, it turned out that ‘Humans and chimpanzees are genetically closest to each other over most of the genome, but the team found many places where this is not the case. Fifteen per cent of the human genome is closer to the gorilla genome than it is to chimpanzee, and 15 per cent of the chimpanzee genome is closer to the gorilla than human.’ I hope you agree that his kind of conclusion is an interesting product of the ‘backward gene’s-eye view’.
Such an anomaly could occur even within one small family. Two brothers, John and Bill, share the same parents, Enid and Tony, and the same four grandparents: Arthur and Gertrude, the parents of Enid, and Francis and Alice, the parents of Tony. (Sex chromosomes apart) each of the brothers received exactly half his genes from each of their shared parents. That’s because each is the product of exactly one egg from Enid and one sperm from Tony. And each brother received a quarter of his genes from each of the four shared grandparents, but in this case the figure is only approximate. It’s not exactly a quarter. Through the vagaries of chromosomal crossing-over, the sperm from Tony that conceived John could, by chance, have contained mostly Alice’s genes rather than Francis’s. The sperm from Tony that conceived Bill could have contained a preponderance of Francis’s genes rather than Alice’s. The egg from Enid that gave rise to John could have contained mostly Arthur’s genes, while the egg from Enid that gave rise to Bill contained a preponderance of Gertrude’s genes. It’s even theoretically possible (though vanishingly improbable) that John received all his genes from two of his grandparents, and none from the other two. Thus, the gene’s-eye view of closeness of relatedness can differ from the individual’s-eye view. The individual’s-eye view sees all four grandparents as equal contributors.
BONOBO CHIMP GORILA HUMAN
And the same is true of all generations prior to the immediate parental generation. Although you are quite probably descended from William the Conqueror, it is also quite likely that you have inherited not a single gene from him. Biologists tend to follow the historic precedent of tracing ancestry at the level of the whole individual organism: every individual has one father and one mother, and so on back. But the John/Bill, gorilla/chimpanzee comparison of the previous paragraphs will prove, I believe, to be the tip of an iceberg. More and more, we shall see pedigrees being drawn up from the genes’ point of view as opposed to the individual organism’s. An example is the discussion of the prestin gene in Chapter 5. Such a trend is obviously highly congenial to this book, stressing, as it does, the gene’s-eye view.
The last topic I want to deal with in this chapter on the backwards gene’s-eye view is Selective Sweeps. Among the messages from the past that the genes of a living animal whisper to us, if only we could hear them, many tell of ancient natural selection pressures. That, indeed, is what I mean by the genetic book of the dead, but here I am talking about a particular kind of signal from the past, one that geneticists have learned how to read. Present-day genes send statistical ‘signals’ of natural selection pressures. A gene pool that has recently undergone strong selection shows a certain characteristic signature. Natural selection leaves its mark. A Darwinian signature. Here’s how.
Two genes that sit close to one another on a chromosome tend to travel together through the generations. This is because chromosomal crossing over is relatively unlikely to split them: a simple consequence of their proximity to each other. If one gene is strongly favoured by natural selection it will increase in frequency. Of course, but mark the sequel. Genes whose chromosomal position lies close to a positively selected gene will also increase in frequency: they ‘hitch-hike’. This is especially noticeable when the linked genes are neutral – neither good nor bad for survival. When a particular region of a chromosome contains a gene that is under strong selection in its favour, the geneticist notices a diminution in the amount of variation in the population, specifically in the hitch-hiking zone of the affected chromosome. Because of the hitch-hiking, natural selection of one favoured gene ‘sweeps’ away the variation among nearby neutral genes. This ‘selective sweep’ then shows up as a ‘signature’ of selection.
I find the ‘backwards’ way of looking at ancestral history illuminating. But the most important ‘experience’ that a gene can ‘look back on’ is easily overlooked because it hides in plain view. It is the companionship of other genes of the species: other genes with which it has had to share a succession of bodies. I am not talking here about genes being linked close to each other on the same chromosome. I am now talking about shared membership of the same gene pool, and hence of many individual bodies. This companionship is the topic of the next chapter.
12 Good Companions, Bad Companions
The previous chapter could be expanded with an indefinite number of examples of the backward gene’s-eye view. Genes look back on a series of environments variously characterised by trees, soil, predators, prey, parasites, food plants, water holes, etc. But the external environment is only part of the story. It leaves out the most important kind of ‘experience’ of a gene. Far more important is the experience of rubbing shoulders with all the other genes in a long succession of bodies: partners through dynasties of mutual collaboration in the subtle arts of building bodies. That is the central point of this chapter.
The genes within any one gene pool are travelling bands of good companions, journeying together, and cooperating with each other down the generations. Genes in other gene pools, gene pools belonging to other species, constitute parallel bands of travelling companions. These bands do not include the genes of other species. That is precisely how biologists like to define a species (although the definition sometimes blurs in practice, especially when new species are being born).
Sexual reproduction validates the very notion of a species, more precisely the notion of a gene pool: a pool of genes like a stirred pool of water. The gene pool is thoroughly stirred in every generation by sexual reproduction, but it doesn’t mix with any other such pool – pools belonging to other species. Children resemble their parents but, because the gene pool is stirred, they resemble them only slightly more than they resemble any random member of the species – and much more than they resemble a random member of another species. The gene pool of each species sloshes about in a watertight compartment of its own, isolated from all others.
As I said, that is part of the very definition of a ‘species’, at least the most widely adopted definition, the one codified by that lofty patriarch among evolutionists, Ernst Mayr (1904–2005):
Species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups.
Fossils, being dead to the possibility of actually interbreeding – beyond breeding at all – force a retreat to Mayr’s ‘potentially’. When we say that Homo erectus was a separate species, distinct from modern Homo sapiens, the Mayr definition would be interpreted as meaning, ‘If a time machine enabled us to meet Homo erectus, we would be incapable of interbreeding with them.’ A niggling difficulty arises over ‘incapable’. There are species that can be persuaded to interbreed in captivity but would not choose to do so in the wild. Chapter 9’s example of the two crickets Teleogryllus oceanicus and commodus is only one of several. Even if we were capable of interbreeding with Homo erectus, say by artificial insemination, would we – or they – choose to do so by the normal, natural means? Never mind, that is a detail that might concern a pernickety taxonomist or philosopher, but we can pass it by.
If, as most anthropologists believe, we descend from Homo erectus, there must have been intermediates during the transitional phase: intermediates that would defy classification. Nobody who has thought it through would suggest that suddenly a sapiens baby was born to proud erectus parents. Every animal ever born throughout evolutionary history would have been classified in the same species as its parents, not only by the interbreeding criterion but by all sensible criteria. That fact – though it troubles some minds – is totally compatible with the fact that Homo sapiens is descended from Homo erectus, those two species being distinct species incapable – let us presume – of breeding with each other. It’s also compatible with the fact that you are descended from a lobe-finned fish, with every intermediate along the way being a member of the same species as its parents and its children.
Moreover, when a species splits into two daughter species in the process known as speciation, there is bound to be an interregnum when the two are still capable of interbreeding. The split originates accidentally, imposed perhaps by a geographic barrier such as a mountain range or a river or stretch of sea. It is probable that chimpanzees and bonobos started to go their separate evolutionary ways when two sub-populations found themselves on opposite sides of the Congo river. The two populations were physically prevented from interbreeding – the flow of genes was halted by the flow of water between them. For a while, they could potentially interbreed, and maybe occasionally did so when an individual inadvertently crossed the river on a floating log. But the geographically imposed lack of gene flow freed them to evolve in separate directions. Those different directions could have been guided by natural selection, or unguided in a process of random drift. It doesn’t matter, the point is that the compatibility between their genes gradually declined until a stage was reached when, even if they should chance to meet, they could no longer interbreed in actuality. The initial geographic barrier doesn’t necessarily come about through an environmental change like an earthquake diverting a river. Geography can stay the same while a pregnant female, for instance, gets accidentally washed ashore on a deserted island. Or the other side of a river.
But why, in any case, do the genes of two separated populations tend to become incompatible as companions, thereby preventing interbreeding? One reason is that the two sets of chromosomes need to pair off in the process of meiosis, when gametes are made. If they become sufficiently different, say on opposite sides of a barrier, hybrids, if any, would be unable to make gametes. They might live, but could not reproduce. Another reason – back to the central point of this chapter – is that genes, on either side of the barrier, are naturally selected to cooperate with other genes on the same side, but not the other. After enough time has elapsed in physically enforced separation, two gene pools become so incompatible that interbreeding becomes impossible even if the physical barrier is removed. Chimpanzees and bonobos haven’t quite reached that stage. Hybrids can be born in captivity.
There doesn’t have to be a distinct barrier, like a river, for geographically based speciation to occur. A mouse in Madrid never meets a mouse in Vladivostok but there could well be continuous local gene flow across the 12,000-kilometre gap between them. Given enough time, their descendants could diverge genetically until they could no longer interbreed even if they should somehow contrive to meet. Speciation would have occurred, the barrier being nothing more than sheer distance rather than an unswimmable river or sea, or an impassable desert or mountain range, and despite continuous gene flow locally across the entire range. We have here the spatial equivalent of the temporal continuum between Homo erectus and Homo sapiens. In both cases the extremes never meet. Yet in both cases there can be an unbroken chain of intermediates happily breeding all the way across the range: range in space for the example of the mice; range in time for the example of erectus and sapiens.
Occasionally, the chain of intermediates wraps around in a circle, bites itself in the tail, and we have a so-called ‘ring species’. Salamanders of the genus Ensatina live all around the four edges of California’s Central Valley but don’t cross the valley. If you start sampling at the southern end of the valley and work your way up the west side to the north, go eastwards across the north end of the valley, then down the eastern side and back around to your starting point, you notice a fascinating thing. The salamanders all along your route around the edge of the valley can interbreed with their neighbours. Yet they gradually change as you go around, and when you arrive back at your starting point, the ‘last’ species of the ring cannot interbreed with the ‘first’. A ring species is a rare case where you can see laid out in the spatial dimension the kind of evolutionary change that you could see along the time dimension if only you lived long enough.
Such considerations render pointless all heated arguments about whether or not closely related animals, living or fossil, belong to the same species. It is a necessary consequence of evolution that there must be, or must have been, intermediates that you cannot forcibly assign to either species. It would be worrying if it were otherwise. But of course most species in existence are clearly distinct from most other species by any criterion, because of the long time that has elapsed since their ancestors diverged. As for the grey areas where potential interbreeding is even an issue, and where species definition is problematical, this chapter will not treat them further.
Where external environments are concerned, the genes of a mole speak to us of damp, dark, moist tunnels, of earthy smells, of earthworms and beetle larvae crawling between tangled rootlets and filaments of fungal mycelium and mycorrhizae. The genes of a squirrel have a very different ancestral autobiography, a tale of airy greenery, waving boughs, acorns, nuts, and sunlit glades to be crossed between trees. We could weave a similar list for any species. The point of this chapter, on the other hand, is that the genes’ external ‘experience’ of damp, dark soil, or forest canopy, grassy plains, coral reefs, the deep sea, or whatever it might be, is swamped by the more immediate and salient internal experiencing of other genes in the stirred gene pool. This chapter is about the ‘good companions’ with which the genes have travelled and collaborated, in body after body since earlier times: parting from and re-joining, ever encountering and re-encountering familiar sets of companion genes, collaborating in the difficult arts of building livers and hearts, bones and skin, blood corpuscles and brain cells. The details will be tweaked by ‘external’ pressures: the best heart, kidney, or intestine for a burrowing vermivore is doubtless not the same as the best heart, kidney, or intestine for a tree-climbing nut-lover. But a centrally important quality of a successful gene will be the ability to collaborate with the other genes of the shared gene pool, be it mole, squirrel, hedgehog, whale, or human gene pool.
Every biochemistry lab has on its wall a huge chart of metabolic pathways, a bewildering spaghetti of chemical formulae joined by arrows. Below is a simplified version in which chemicals are represented by blobs rather than having their formulae spelled out. The lines represent chemical pathways between the blobs. This particular diagram refers to the gut bacterium Escherichia coli, but something similar, and just as bewildering, is going on in your cells.
Every one of those hundreds of lines is a chemical reaction performed inside a living cell, and each one is catalysed by an enzyme.
Every enzyme is assembled under the influence of a specific gene (or often two or three genes, because the enzyme molecule may have several ‘domains’ wrapped around each other, each domain being a protein chain). The genes that make these enzymes must cooperate, must be good companion genes in the sense of this chapter.
All mammals have almost exactly the same set of over 200 named bones, connected in the same order, but differing in size and shape. We saw the principle in the crustaceans of Chapter 6. And the same is true of the metabolic pathways diagrammed above. They are almost the same in all animals but different in detail. And, although they may be engaged in joint enterprises that are similar, the cartels of mutually compatible genes will not be compatible with parallel cartels evolving in other lineages: antelope cartels versus lion cartels, say. Antelopes and lions both need metabolic pathways in all their cells, and both need hearts, kidneys, and lungs, but they’ll differ in details appropriate to herbivores versus carnivores. And more obviously so in teeth, intestines, and feet, for reasons we’ve covered already. If they were somehow to mix in the same body, they wouldn’t work well together.
I shall say that two separate gene pools, for instance an impala gene pool and a leopard gene pool, represent two separate ‘syndicates’ of ‘cooperating’ genes. Building a body is an embryological enterprise of immense complexity, involving feats of cooperation between all the genes in the active genome. Different kinds of body require different embryological ‘skills’, perfected over evolutionary time by different suites of mutually compatible genes: compatible with members of their own syndicate but incompatible with other syndicates simultaneously being built in other gene pools. These cooperating cartels are assembled over generations of natural selection. The way it works is that each gene is selected for its compatibility with other genes in the gene pool, and vice versa. So cartels of mutually compatible, cooperating genes build up. It is tempting but misleading to speak of alternative cartels being selected as whole units versus other cartels as whole units. Rather, cartels assemble themselves because each member gene is separately selected for its compatibility with other genes within the cartel, which are themselves being selected at the same time.
Within any one species, genes work together in embryological harmony to produce bodies of the species’ own type. Other cartels in other species’ gene pools self-assemble, and work together to produce different bodies. There will be carnivore cartels, herbivore cartels, burrowing insectivore cartels, river-fishing cartels, tree-climbing, nut-loving cartels, and so on. My main point in this chapter on ‘Good Companions’ is that by far the most important environment that a gene has to master is the collection of other genes in its own gene pool, the collection of other genes that it is likely to meet in successive bodies as the generations go by. Yes, the external ecosystem furnished by predators and prey, parasites and hosts, soil and weather, matters to the survival of a gene in its gene pool. But of more pressing moment is the ecosystem provided by the other genes in the gene pool, the other genes with which each gene is called upon to cooperate in the construction and maintenance of a continuing sequence of bodies. It is an easily dispelled paradox that my first book, The Selfish Gene, could equally well have been called The Cooperative Gene. Indeed, my friend and former student Mark Ridley wrote a fine book with that very title. In his words, which I’d have been pleased to have written myself,
The cooperation between the genes of a body did not just happen. It required special mechanisms for it to evolve, mechanisms that arrange affairs such that each gene is maximally selfish by being maximally cooperative with the other genes in its body.
As inhabitants of today’s technologically advanced world, we are aware of the power of cooperation between huge numbers of specialist experts. SpaceX employs some 10,000 people, cooperating in the joint enterprise of launching massive rockets into space and – even more difficult – bringing them back and gently landing them in a fit state to be re-used. Many different specialists are united in intimate cooperation: engineers, mathematicians, designers, welders, riveters, fitters, turners, computer programmers, crane operators, quality control checkers, 3-D printer operators, software coders, inventory control officers, accountants, lawyers, office workers, personal assistants, middle managers, and many others. Most of the experts in one field have little understanding of what experts in other parts of the enterprise do, or how to do it. Yet the feats that we humans can achieve when thousands of us deploy our complementary skills, in well-oiled collaboration but in ignorance of each other’s role, are staggering.
The human genome project, the James Webb Telescope, the building of a skyscraper or a preposterously oversized cruise ship, these are stunning achievements of cooperation. The Large Hadron Collider at CERN brings together some 10,000 physicists and engineers from more than 100 countries, speaking dozens of languages, working smoothly together to pool their diverse expertise. Yet these huge accomplishments of mass cooperation are more than matched by the nine-month collaborative enterprise of building each one of us in our mother’s womb: a feat of cooperation among billions of cells, belonging to hundreds of cell types (different ‘professions’), orchestrated by about 30,000 intimately cooperating genes, exceeding the personnel count we find in a large human enterprise such as SpaceX. Cooperation is key, in both building a body and building a rocket.
The genes that build a body must cooperate with all the other companions that the sexual lottery throws at them as the generations go by. They must cooperate not only with the present set of companions, those in today’s body. In the next generation, they’ll have to cooperate with a different sample of companions drawn from the shared gene pool. They must be ready to cooperate with all the alternative genes that march with them down the generations within this gene pool – but no other gene pool. This is because Darwinian success, for a gene, means long-term success, travelling through time over many generations, in many successive bodies. They must be good travelling companions of all the genes in the stirred gene pool of the species.
The 1957 film of JB Priestley’s novel The Good Companions had an accompanying song with a not uncatchy tune, of which the refrain was,
Be a good companion,
Really good companion,
And you’ll have good companions too.
It is a song whose evoked mutualism suits the travelling troupe of genes, which constitutes the active gene pool of a species such as ours. Sexual recombination of genes gives meaning to the very existence of the ‘species’ as an entity worth distinguishing with a name at all. Without it, as is the case with bacteria, there is no distinct ‘species’, no clear way to divide the population with confidence into discrete nameable groups. It is sexual reproduction that confers identity on the species. Some bacterial types are not far from being a big smear, grading into each other as they promiscuously share genes. The attempt to assign discrete species names to such bacteria is a losing battle in a way that doesn’t apply to animals like us, where sexual exchange is limited to sexual encounters between a male and a female of the same species – and no other species by definition. As already stated, where fossils are concerned we have to guess, based on their anatomical similarity, whether they would have been able to interbreed when they were alive. This involves subjective judgement, which is why naming fossils such as Homo rhodesiensis and Homo heidelbergensis is a matter of aggravated controversy between ‘lumpers’ and ‘splitters’. But notwithstanding naming disagreements, which can even become acrimonious, we remain confident that the gene pool surrounding every one of those fossils was a troupe of travelling companions isolated from other gene pools – even though imperfectly isolated during episodes of speciation. Bacteria largely deny us that confidence. So-called ‘species’ of bacteria are not clearly delimited.
Every working gene, ‘expert’ in rendering up its own contribution to the collaborative building of an embryo, is confined to its own gene pool. Repeated cooperation among successive samples drawn from the same troupe of travelling companions has selected genes largely incapable of working beneficially with members of other troupes. Not entirely, as we see from headlined examples like jellyfish genes transplanted into cats and making them glow in the dark. Genes are normally not put to that kind of test. Mules and hinnies, ligers and tigons, are almost always sterile. Their sets of travelling companions are still compatible enough to collaborate in building strong bodies. But their compatibility breaks down when it comes to chromosomal pairing-off in meiosis, the process of cell division that makes gametes. Mules can pull a cart, but they can’t make fertile sperms or eggs.
Nature doesn’t transplant antelope genes into leopards. If it did, a few might work normally. There are broad similarities between the embryologies of all mammals, and all mammals doubtless share genes for making most layers of the mammalian palimpsest. But that doesn’t undermine this chapter. Those genes concerned with what makes a leopard a predator, and an antelope its herbivorous prey, would not work harmoniously together. In childishly crude terms, leopard teeth wouldn’t sit well with antelope guts and antelope feeding habits. Or vice versa. In the language of this chapter, companions that travel well together in one gene pool would not be good companions in the other. The collaboration would fail.
The principle is illustrated by an old experiment of EB Ford, the eccentrically fastidious aesthete from whom I learned my undergraduate genetics. Most practical geneticists work on lab animals or plants, breeding fruit flies or mice in the laboratory. But Ford walked a minority path among geneticists. He and his collaborators monitored evolutionary change in gene pools, in the wild. A lifelong authority on butterflies and moths, he went out into the woods and fields, heaths and marshes of Britain, waving his butterfly net and sampling wild populations. He inspired others to do the same kind of thing with wild fruit flies, wild snails and flowers, as well as other species of butterflies and moths. He founded a whole discipline called Ecological Genetics and wrote the book of that title. The piece of work that I want to talk about here was a field study of wild populations of lesser yellow underwing moths, in Scotland and some of the Scottish islands. Ford knew it as Triphaena comes, but it is now called Noctua comes, following the strict precedence rules of zoological nomenclature.
The species is polymorphic, meaning there are at least two genetically distinct types coexisting in significant proportions in the wild. Not in England, however, nor in much of mainland Scotland, where all the lesser yellow underwings look like the pale upper one in the picture. But in some of the Scottish islands there exists, in significant numbers, a second morph, of darker colour, called curtisii, evidently named after the entomologist and artist John Curtis (1791–1862). I thought it fitting to use Curtis’s own painting of the curtisii morph and the cowslip, and I asked Jana Lenzová to paint in the light morph to complete the picture.
Dark and light morphs of lesser yellow underwing
The difference between the two morphs is controlled by a single gene, which we can call the curtisii gene. Curtisii is nearly dominant. This means that if an individual has either one curtisii gene (‘heterozygous’) or two curtisii genes (‘homozygous for curtisii’), it will be dark. If dominance were complete, heterozygous individuals with one curtisii gene would look exactly the same as homozygotes with two. Curtisii being only nearly dominant, the heterozygotes are almost the same as the curtisii homozygotes but slightly lighter. Heterozygotes are always darker than individuals homozygous for the standard comes gene, which is therefore called recessive.
Like his mentor Ronald Fisher, whom we’ve already met, Ford liked to speak of ‘modifiers’, genes whose effect is to modify the effects of other genes. According to Fisher’s theory of dominance, to which Ford subscribed, when a gene first springs into existence by mutation, it is typically neither dominant nor recessive. Natural selection subsequently drives it towards dominance or recessiveness via the gradual accumulation, through the generations, of modifiers. Dominance is not a property of a gene itself, but a property of its interactions with its companion modifiers.
Modifiers don’t change the major gene itself. What they change is how it expresses itself, in this case its degree of dominance. The language of this chapter would say that a major gene such as curtisii has modifiers among its ‘good companions’, which affect its dominance, meaning its tendency to express itself when heterozygous. For reasons we needn’t go into, natural selection favoured a significant proportion of dark curtisii morphs on certain Scottish islands. And one way this favour showed itself, according to the theory of Fisher and Ford, was by selection in favour of modifiers that increased its dominance.
Barra is an island in the Outer Hebrides, west of Scotland. Orkney, north of Scotland, is an archipelago 340 kilometres from Barra as the crow flies, and too far for the moth to fly. Ford collected and studied moths from both these locations. Both have mixed populations of Lesser Yellow Underwings, the normal pale form living alongside significant numbers of dark curtisii morphs. Breeding experiments, with both Barra and Orkney moths, separately confirmed the dominance of curtisii within both islands. However, when Ford took moths from Barra and crossed them with moths from Orkney, he got a remarkable result. The dominance broke down. It disappeared. No longer did Ford see tidy Mendelian segregation of dark versus light forms. Instead there was a messy spectrum of intermediates. Dominance had disappeared.
What had evidently happened was this. Dominance on Barra had evolved by an accumulation of mutually compatible modifiers, good Barra companions. Dominance on Orkney had independently and convergently evolved by a different consortium of modifier genes, good Orkney companions. When Ford bred across islands, the two sets of modifiers couldn’t work together. It was as though they spoke different languages. To work properly, each modifier needed its normal set of good companions, the set that had been built up over generations of selection on the different islands. That’s what being good companions is all about, and Ford’s experiment dramatically demonstrates a principle that I believe to be general. The ‘major’ gene, curtisii, is the same on both Barra and Orkney. However, for all that a gene itself is the same, its dominance can be built up in more than one way by different consortia of modifiers. This seems to have been the case with curtisii on different islands.
There’s a potential fallacy lurking here. It’s easy to presume that the Barra good companions lie close to each other on a chromosome and therefore segregate as a unit. And likewise, the Orkney consortium of good companions. That kind of thing can happen, and Ford and his colleagues discovered it in other species. Natural selection can favour inversions and translocations of bits of chromosome that bring good companions closer to each other. Sometimes they end up so close that they are called a ‘supergene’, so close that they are rarely separated by crossing over. This is an advantage, and the translocations and inversions that contribute to the building of a supergene are favoured by natural selection. But if Ford’s modifiers had been clustered together as a supergene in the case of his yellow underwings, he wouldn’t have got the results that he did.
Supergenes can be demonstrated in the lab by breeding large numbers of individuals for many generations until suddenly, by a freak of chromosomal crossing-over, the supergene is split. But the supergene phenomenon is not necessary for good companionship, and there’s no reason to suppose it applies in this case of the lesser yellow underwing. The suites of cooperating modifiers could lie on different chromosomes all over the genome. Separately, in their respective island gene pools, they were assembled by natural selection as good team workers in each other’s presence. In this case, they work well together to increase the dominance of the curtisii gene. But the principle is more general than that. We don’t have to subscribe to the Fisher/Ford theory of dominance in particular.
Natural selection favours genes that work together in their own gene pool, the gene pool of their species. Genes that go with being a carnivore (say, genes for carnivorous teeth) are naturally selected in the presence in the same gene pool of other ‘carnivorous genes’ (say genes for short carnivorous intestines whose cells secrete meatdigesting enzymes). At the same time, on the herbivore side, genes for flat, plant-milling teeth flourish in the presence of genes for long, complicated guts that provide havens for plant-digesting micro-organisms. Once again, the alternative suites of genes may be distributed all over the genome. There’s no need to assume that they cluster together on any particular chromosome.
Unfortunately, good companionship sometimes breaks down. It is even subject to sabotage. We’ve already met ways in which the genes within a body can be in conflict with one another. The uneasy pandemonium of genes within the genome, sometimes cooperating, sometimes disputing, is captured in Egbert Leigh’s ‘Parliament of Genes’. Each acts ‘in its own self-interest, but if its acts hurt the others, they will combine together to suppress it.’
Cell division within the body is vulnerable to occasional ‘somatic’ mutation. Of course it is. How could it not be? We are familiar with the idea that random copying errors, mutations, produce the raw material for natural selection between individuals. Those ‘germline’ mutations occur in the formation of sperms and eggs, and they are then inherited by an individual’s children. These are the mutations that play an important role in evolution. But most acts of cell division occur within the body – somatic as opposed to germline mutation – and they too are subject to mutation. Indeed, the mutation rate per mitotic division is higher than for meiotic division. We should be thankful our immune system is so good at spotting the danger early. Most somatic mutations, like most germline mutations, are not beneficial to the organism. Sometimes they are beneficial to themselves but bad for the organism, in which case they may engender malignant tumours – cancers. Subsequent natural selection within the tumour can generate a progression through increasingly ominous ‘stages’ of cancer. I shall return to this.
We can think of the (somatic) cells in a developing embryo as having a family history within the body, springing from their grand ancestor, the single fertilised egg cell of a few months or weeks previous. At any stage in this history of descent, starting with the embryo and on throughout the rest of life, somatic mutation can occur. Vertebrate development is the product of countless cell divisions, so embryologists have found it convenient to trace cell lineages in a simpler organism. The tiny roundworm Caenorhabditis elegans has only 959 cells. It was the genius of the great molecular biologist Sydney Brenner to pick this animal out as the ideal subject for a genre of research that has since spread to dozens of labs throughout the world. Its embryo at one of its developmental stages has precisely 558 cells. Every one of those 558 cells has its own ‘ancestral’ sequence within the developing embryo. The pedigree of each of those 558 cells within the embryo has been painstakingly worked out (illustration below). Necessarily, it’s impossible to print the details legibly on one page of a book, but you can expand it here (https://www.wormatlas.org/celllineages.html) and get an idea of the diverging pedigree of cells in the embryo, consisting of ‘families’ and ‘sub-families’. If you could read the labels by the side of families of cells, you’d see things like ‘intestine’, ‘body muscle’, ‘ring ganglion’. We shall have need to return to that idea of families of cells procreating in the embryo.
Now, if that’s what the cellular pedigree looks like for a mere 558 roundworm cells, just think what it must look like for our 30 to 40 trillion cells. Similar labels – muscle, intestine, nervous system, etc. – could be affixed to cells in a human embryo (opposite). This is true even though the pedigrees are not determined so rigidly in a vertebrate embryo, and we can’t enumerate a finite tally of named cells. It’s important to stress that these different families of cells within the developing embryo are, until something goes wrong, genetically identical. If they weren’t, they might not cooperate. When something goes wrong and they’re no longer genetically identical, well that’s when there’s a risk of their becoming bad companions. And then there’s a risk of their evolving, by natural selection within the body, to become very bad companions indeed: cancers.
As you can see on the diagram on the facing page, after some early cell generations within the embryo, the pedigree of our cells splits into three major families: the ectoderm, the mesoderm, and the endoderm. The ectoderm family of cells is destined to give rise, further down the line, to skin, hair, nails, and those hugely magnified nails that we know as hooves. Ectodermal derivatives also contribute the various parts of the nervous system. The endoderm family of cells branches to give rise to sub-families that eventually make the stomach and intestines; and other sub-families that make the liver, lungs, and glands such as the pancreas. The mesoderm dynasty of cells spawns numerous sub-families, which branch again and again to produce muscle, kidney, bone, heart, fat, and the reproductive organs, although not the germline, which is early hived off and sequestered for its privileged destiny, on down the generations.
Somatic mutants apart, every one of the cells in the expanding pedigree has the same genome, but different genes are switched on in different tissues. That is to say they are epigenetically different while being genetically the same (see the relevant endnote if popular hype has confused you as to the true meaning of ‘epigenetics’). Liver cells have the same genes as muscle cells, but once they pass beyond a certain stage in embryonic development, only liver-specific genes are active there. And the liver ‘family’ of cells in the pedigree goes on dividing until the liver is complete. They then stop dividing. The same applies to all the ‘families’, which each have their own stopping time. Cells must ‘know’ when to stop dividing. And that is where trouble can step in.
With an important reservation, the number of cell generations before the arresting of cell division varies from tissue to tissue and is typically between forty and sixty. That may seem surprisingly few. But remember the power of exponential growth. Fifty liver cell generations, if each one was a division into two (fortunately it isn’t) would yield a liver the size of a large elephant. Different cell lines stop dividing after different limits, producing end organs of different sizes. You can see how important it is for each cell line to know when to stop dividing.
Cactus with somatic mutation
Every one of the 30 trillion cells in a body was made by a cell division. And every one of those cell divisions is vulnerable to somatic mutation. Now we come to that ‘important reservation’, the one relevant to the topic of bad companions. The cells in a lineage are genetically identical only if no somatic mutation intervenes during the lineage’s successive generations. Most somatic mutations are harmless. But what if a somatic mutation arises in a cell such that it changes its behaviour and refuses to stop dividing? Its lineage in the ‘family tree’ doesn’t come to a disciplined halt, but goes on reproducing out of control. The daughter cells of the mutant cell inherit the same rogue mutation, so they too divide. And their daughter cells inherit the rogue gene, so … This is the kind of thing that produces weird growths such as adorns the cactus opposite.
Let’s follow the subsequent history of a rogue cell’s descendants, for example in a human. Reproducing for an indefinite number of generations without discipline, these cells will now be subject to a form of natural selection. Why say ‘a form of’? It is natural selection, plain and simple. The rogue cells will be subject to natural selection, every bit as Darwinian as the natural selection that chooses the fastest pumas or pronghorns, the prettiest peacocks or petunias, the most fecund codfish or dandelions. Rogue somatic mutant cells can evolve, by natural selection within the body, into cancers that spread menacingly (‘metastasis’) to other parts of the body. Now natural selection of cells within the tumour will favour those that become better cancers. What does ‘better’ mean, for a cancer? They become expert, for example, at usurping a large blood supply to nurture themselves. The whole subject, fascinating, disturbing, and not at all surprising to a Darwinist, is expounded in books such as Athena Aktipis’s The Cheating Cell, and The Society of Genes by Itai Yanai and Martin Lercher.
Since cancers evolve by natural selection (within the body), we should treat their evolutionary adaptations in just the same way we might treat the adaptations of pronghorn or codfish, except that the ecological environment is the interior of a (say) human body instead of the sea or an open prairie. This chapter’s discussion of Good Companions has prepared us for the idea of an ecology of genes within the body, to parallel the more conventional idea of an external ecology. And that internal ecology is also the setting where bad companions can thrive. An important difference is that natural evolution in the open sea or prairie goes on into the indefinite future. The evolution of a cancer tumour ends abruptly with the death of the patient, whether that death is caused by the cancer or something else. The cancer evolves to become better and better at (as an inadvertent by-product) killing itself. This, too, should not surprise. Natural selection, as I’ve said over and over, has no foresight. A tumour cannot foresee that increased malignancy will eventually kill the tumour itself. Natural selection is the blind watchmaker. Despite ending with the death of the organism, the number of generations of cell division in a tumour is large enough to accommodate constructive evolutionary change. Constructive from the point of view of the cancer. Destructive for the patient. Athena Aktipis’s book artfully treats the evolution of cancer cells in the body in just the kind of way we might treat the evolution of buffalos or scorpions in the Serengeti.
Cancer cells, then, or rather the mutant genes that turn cells cancerous, are one kind of ‘bad companion’. Another type is the so-called segregation distorter. Sperms and eggs – gametes – are ‘haploid’ cells you’ll remember, having only one copy of each gene, instead of two like normal body cells. The special kind of cell division called meiosis makes haploid gametes (having only one set of chromosomes) out of diploid cells, which have two sets of chromosomes, one set from the individual’s mother and another set from the father. It is only when gametes are made by meiosis that the two sets meet each other in the same chromosome. Meiosis performs an elaborate shuffle, cutting and pasting exchanged portions of paternal and maternal chromosomes into a new set of mixed-up chromosomes. Every gamete is unique, having different assortments of paternal and maternal genes in each of its (twenty-three in humans) chromosomes. The result of the shuffle is that each gene from the diploid set of (forty-six in humans) chromosomes has a 50 per cent chance on average of getting into each gamete.
The ‘phenotypic effect’ of a gene commonly shows itself somewhere in the body – it might affect tail length or brain size or antler sharpness. But what if a gene were to arise that exerted its phenotypic effect on the process of gamete production itself? And what if that effect was a bias in gamete production such that the gene itself had a greater than 50 per cent chance of ending up in each gamete? Such cheating genes exist – ‘segregation distorters’. Instead of the meiotic shuffle resulting in a fair deal to each gamete as it normally does, the deal is biased in favour of the segregation distorter. The distorter gene has a greater than even chance of ending up in a gamete.
You can see that if a rogue segregation distorter were to arise it would tend, other things being equal, to spread rapidly through the population. The process is called meiotic drive. The rogue gene would spread, not because of any advantage to the individual’s survival or reproductive success, not because of benefit of any kind in the conventional sense, but simply because of its ‘unfair’ propensity to get itself into gametes. We could see meiotic drive as a kind of population-level cancer. A special case of a segregation distorter is the ‘driving Y-chromosome’, that is, a gene on a Y-chromosome whose effect on males is to bias them towards producing Y sperms and therefore male offspring. If a driving Y arises in a population, it tends towards driving it extinct for lack of females: population-level cancer indeed. Bill Hamilton even suggested that we could control the yellow fever mosquito by deliberately introducing a driving male into the population. Theoretically, the population should drastically shrink through lack of females.
Other ways have been suggested to control pests by ‘driving genes’. I’ve already mentioned in Chapter 8 the crass irresponsibility of the 11th Duke of Bedford in introducing grey squirrels, native to America, into Britain. He not only released them in his own estate, Woburn Park, but made presents of grey squirrels to other landowners up and down the country. I suppose it seemed like a fun idea at the time, but the consequence is the wiping-out of our native red squirrel population. Researchers are now examining the feasibility of releasing a driving gene into the grey squirrel gene pool. This would not be carried on the Y-chromosome but would produce a dearth of females in a slightly different way. The authors of the idea are mindful of the need to be careful. We want to drive the grey squirrel extinct in Britain but not in America where it belongs and where it would have stayed but for the Duke of Bedford.
Bad companions, at least in the form of cancers, force themselves upon our forebodings. But for our purposes in this book, it is the gene’s role as good companion that we must thrust to prominence. It remains for the last chapter to pin down exactly what makes them cooperate. Fundamentally, it is, I maintain, the fact that they share an exit route from each body into the next generation.
Good companions dressed for field work: RA Fisher and EB Ford. See endnote for my suspicion that this is a historic photogaph.
13 Shared Exit to the Future
Purveyors of scientific wonder like to surprise us with the prodigious – disturbing to some – numbers of bacteria inside our bodies. We’re accustomed to fearing them but most of them are, in the words of Jake Robinson’s title, Invisible Friends. Mostly in the gut, estimates vary from 39 trillion to 100 trillion, the same order of magnitude as the number of our ‘own’ cells, where 40 trillion is a round-number estimate. Between a half and three-quarters of the cells in your body are not your ‘own’. But that doesn’t take account of the mitochondria. These miniature metabolic engine-rooms swarm inside our cells and the cells of all eucaryotes (that is, all living creatures except bacteria and archaea). It is now established beyond doubt that mitochondria originated from free-living bacteria. They reproduce by cell division like bacteria, and each has its own genes in a ring-shaped chromosome, again like bacteria. In fact, let us not mince words, they are bacteria: symbiotic bacteria that have taken up residence in the hospitable interior of animal and plant cells. We even know, from DNA-sequence evidence, which of today’s bacteria are their closest cousins. The number of mitochondria in your body is many trillions.
The bacteria that became mitochondria brought with them much essential biochemical expertise, the research and development of which was presumably accomplished long before they became incorporated as proto mitochondria. Their main role in our cells is the combustion of carbon-based fuel to release needed energy. Not the violent high-speed combustion of fire, of course, but a slow, orderly, trickle-down oxidation. Not only are you a swarm of bacteria, you couldn’t move a muscle, see a sunset, fall in love, whistle a tune, despise a demagogue, score a goal, or invent a clever idea without the unceasing activation of their chemical knowhow, expert tricks cobbled together by natural selection choosing between rival bacteria in a lost pre-Precambrian sea.
The interiors of plant cells swarm with green chloroplasts, which also are descended from bacteria (a different group, the so-called cyanobacteria). Like mitochondria, chloroplasts are bacteria in every sensible meaning of the word. Again like mitochondria, they brought with them a formidable dowry of biochemical wizardry, in this case photosynthesis. Virtually all life on Earth is ultimately powered by energy radiated from the gigantic nuclear fusion reactor that is the sun. It is captured by photosynthesis in chloroplast-equipped solar panels such as leaves, and is subsequently released in the chemical factories that are mitochondria, in all of us. Solar photons that fall on the sea are captured not by leaves but by single-celled green organisms. Whether on land or at sea, solar energy is the base of all food chains. I think the only exceptions are those strange communities whose ultimate source of energy is hot springs, undersea ‘smokers’ and such conduits of heat from the Earth’s interior.
Our mitochondria couldn’t do without us, just as we wouldn’t survive two instants without them. We are joined deep in mutual amity. Our genes and their genes are good companions that have travelled in lockstep over 2 billion years, each naturally selected to survive in an environment furnished by the other. Most of the genes that originated in their bacterial forebears have long since either migrated into our own chromosomes or been laid off as redundant. But why are mitochondria, and some other bacteria, so benign towards us, while other bacteria give us cholera, tetanus, tuberculosis, and the Black Death? My Darwinian answer is as follows. It is an example of the take-home message of the whole chapter. Mitochondrial genes and ‘own’ genes share the same exit route to the future. That is literally true if we are female, or if we for the moment overlook the fact that mitochondria in males have no future. The key to companionable benevolence, I shall show, or its reverse, is the route by which a gene travels from a present body into a body of the next generation.
Mitochondria and chloroplasts may be the earliest examples of bacteria being coopted into animals, but they are not the only ones. Here’s a much more recent re-enactment of those ancient incorporations, and it is highly congenial to the thesis of the gene’s-eye view. The embryonic development of vertebrate eyes requires a protein called IRBP, which facilitates the separation of retinal cells from one another and helps them to see better. In a large survey of more than 900 species, IRBPs were found in every vertebrate examined, plus Amphioxus, a small, primitive creature related to vertebrates, although it lacks a backbone. But of the 685 invertebrate species, the only one with a molecule resembling IRBP was an amphipod crustacean, Hyalella. Among plants, a single species, Ricinus communis, the castor oil plant, has something like an IRBP. And there’s a little cluster of fungi too. Molecules resembling IRBPs are ubiquitous among bacteria.
A family tree of IRBP-like molecules shows a richly branched pedigree among bacteria, paralleling that of the vertebrates in which they live, both pedigrees springing from a single point. The isolated pop-ups (crustacean, fungi, and plant) also spring from within the bacterial tree, but widely separated parts of it. This is good evidence of horizontal gene transfer from various bacteria into the eucaryote genome. The evidence strongly suggests that vertebrate IRBPs are ‘monophyletic’, all descended from a single ancestor, which means a single jump from a bacterium right at the base of vertebrate evolution. Ever since that event, the genes concerned have been passed vertically down the generations. This is like the bacteria that became mitochondria, although mitochondrial ancestors were whole bacteria, not single genes.
I want to give a general name to bacteria that are transmitted from host to host in host gametes: verticobacter, because they pass vertically down the generations. The ancestors of mitochondria and of chloroplasts are prime examples of verticobacters. Verticobacters can infect another organism only by riding inside its gametes into its children. By contrast, a typical ‘horizontobacter’ might pass by any route from host to host. If it lives in the lungs, for instance, we may suppose its method of infection is via droplets coughed or sneezed into the air and breathed in by its next victim. A horizontobacter doesn’t ‘care’ whether its victim reproduces. It only ‘wants’ its victim to cough (or sneeze, or make bodily contact by hands, lips, or genitals), and it works to that end – ‘works’ in the sense that its genes have extended phenotypic effects on the host’s body and behaviour, driving the host to infect another host. A verticobacter, by contrast, ‘cares’ very much that its ‘victim’ shall successfully reproduce, and ‘wants’ it to survive to reproduce. Indeed, ‘victim’ is scarcely the appropriate word, which is why I protected it behind quotation marks. This is, of course, because a verticobacter’s ‘hope’ of future transmission lies in the offspring of the host, exactly coinciding with the ‘hopes’ of the host itself. Therefore, if a verticobacter’s genes have extended phenotypic effects on the host, they will tend to agree with the phenotypic effects of the host’s own genes. In theory a verticobacter’s genes should ‘want’ exactly the same thing as the host’s genes in every detail.
The pertussis (whooping cough) bacterium is a good example of a horizontobacter. It makes its victims cough, and it passes through the air to its next victim, in droplets emitted by the cough. Cholera is another horizontobacter. It exits the body via diarrhoea into the water supply, whence it ‘hopes’ to be imbibed by somebody else, drinking contaminated water. It doesn’t ‘care’ if its victims die, and it has no ‘interest’ in their reproductive success.
The notion of a parasite’s ‘wanting’ its victim to do something needs explaining, and this again is where the extended phenotype comes in, as promised at the end of Chapter 8. The parasitology literature is filled with macabre stories of parasites manipulating host behaviour, usually changing the behaviour of an intermediate host to enable transmission to the next stage in the parasite’s complicated life cycle. Many of these stories concern worms rather than bacteria, but they convey the principle I am seeking to get across. ‘Horsehair worms’ or ‘gordian worms’, belonging to the phylum Nematomorpha, live in water when adult, but the larvae are parasitic, usually on insects. The insect hosts being terrestrial, the gordian larva needs somehow to get into water so it can complete its life cycle as an adult worm. Infected crickets are induced to jump, suicidally, into water. An infected bee will dive into a pond. Immediately the gordian worm bursts out and swims away, the crippled bee being left to die. This is presumably a real Darwinian adaptation on the part of the worm, which means that there has been natural selection of worm genes whose (‘extended’) phenotypic effect is a change in insect behaviour.
Here’s another example, this time involving a protozoan parasite, Toxoplasma gondii. The definitive host is a cat, and the intermediate host is a rodent such as a rat. The rat is infected via cat faeces. Toxoplasma then needs the infected rat to be eaten by a cat, to complete its life cycle. It insinuates itself into the rat’s brain and manipulates the rat’s behaviour in various ways to that end. Infected rats lose their fear of cats, specifically their aversion to the smell of cat urine. Indeed, they become positively attracted to cats, though apparently not to non-predatory animals, or predators that don’t attack rats. There is some evidence that they lose fear in general, owing to increased production of the hormone testosterone. Whatever the details, it’s reasonable to guess that the change in rat behaviour is a Darwinian adaptation on the part of the parasite. And therefore an extended phenotype of Toxoplasma genes. Natural selection favoured Toxoplasma genes whose extended phenotypic effect was a change in rat behaviour.
The infected snail’s bulging eyes are a tempting target for birds
Leucochloridium is a fluke (flatworm), parasitic on birds. Its intermediate host is a snail, and it needs to transfer itself from snail to bird. The snails that it infects are largely nocturnal, while the birds who are the next host feed by day. The worm manipulates the behaviour of the snail to make it go out by day. But that is only the beginning of the snail’s troubles. One of the life-history stages of the worm invades the eye stalk of the snail, which swells grotesquely, and seems to pulsate vividly along its length.
This is said to make the eye stalk look like a little crawling caterpillar. Be that as it may, it certainly renders the eye stalks conspicuous, and birds readily peck them off. Infected snails also move around more actively than unparasitised ones. The snail is not killed but only blinded. It is able to regenerate its eye stalks to pulsate another day and perhaps be again plucked off. The fluke, for good measure, castrates its snail victim. And that’s an interesting story in its own right. ‘Parasitic castration’ is common enough to be a named thing. It is practised by a wide variety of parasites from around the animal kingdom, including protozoa, flatworms, insects, and various other crustaceans. Including Sacculina, the parasitic barnacle that I introduced in Chapter 6 and promised to return to.
Sacculina is perhaps the most extreme example of the ‘degenerative’ evolution typical of parasites. Darwin, in his monographs on barnacles, which distracted him for eight of the twenty years when he might have published on evolution, misdiagnosed the affinities of Sacculina. And who can blame him? Just take a look at it. The externally visible part of Sacculina is a soft bag clinging to the underside of a crab. Most of the ‘barnacle’ consists of a branching root system permeating the inside of the unfortunate crab’s body. Eventually, it fills the body so completely that if you could sweep away the crab and leave only the Sacculina, this is what you might see.
This is not a crab
How do we know that this system of branching rootlets, this sprawling entity that looks like a plant or fungus, is really a barnacle? How do we even know it’s a crustacean? The various larval stages of the life cycle give it away. The nauplius larva is followed by the cyprid larva, and both are unmistakeably crustacean. As if final clinching were needed, Sacculina’s genome has now been sequenced. ‘It is written’, as the Muslims say: ‘Crustacean’.
Sacculina larvae
The first organs that Sacculina attacks are the crab’s reproductive organs. This is the ‘parasitic castration’ that I mentioned above. Barnacles themselves are sometimes castrated by parasitic crustaceans; marine isopods related to woodlice. So, what is the point of parasitic castration? Why would a parasite head straight for the gonads of its host, before eating other organs?
As with all animals, the host’s ancestors have been naturally selected to fine-tune a delicate balance between the need to reproduce (now) and the need to survive (to reproduce later). A parasite such as Sacculina, however, has no interest in assisting its host to reproduce. This is because its genes don’t share the host genes’ exit route to the future. Sacculina genes ‘want’ to shift the host’s ‘balance’ towards surviving, to carry on feeding the parasite. Like a docile, castrated ox being fattened up, the crab is forced by the parasite to renounce reproduction and become a maintained source of food.
The situation reverses in those cases where parasites – ‘verticoparasites’ – pass to the next host generation in the gametes of the host. Verticoparasites infect only the offspring of their individual hosts rather than potential hosts at large. The genes of a verticoparasite share the ‘exit route’ of the host genes, so their extended phenotypic effects will agree with the host genes’ phenotypic effects. Exercise our usual cautious licence to personify, and consider the ‘preferred options’ of a verticoparasite such as a verticobacter. It travels inside the eggs of a host directly into the host’s child. Here, the interests of parasite and host coincide, and their genes ‘agree’ about the optimal host anatomy and behaviour. Both ‘want’ the host to reproduce, and survive to reproduce. Once again, if the genes of vertically transmitted parasites have extended phenotypic effects on their hosts, those effects should coincide, in perfect agreement and in every detail, with the phenotypic effects of the host animal’s ‘own’ genes.
Mitochondria are an extreme example of a verticoparasite. Long transmitted vertically down the generations inside host eggs, they became so amicably cooperative that their parasitic origins are hard to spot, and were long overlooked. A horizontoparasite such as Sacculina has opposite ‘preferences’. It has no ‘interest’ in its host’s successful reproduction. Whether or not a horizontoparasite ‘cares’ about its host’s survival depends on whether it can benefit from it, presumably, as in the case of Sacculina, by feeding on the living host. If, by castration, it can shift the balance of the host’s internal economy away from reproduction and towards survival, so much the better.
The tapeworm Spirometra mansanoides doesn’t castrate its mouse victims but it achieves a similar result. It secretes a growth hormone, which makes them grow fatter than normal mice. And fatter than the optimum achieved by natural selection of mouse genes seeking a balance between growth and reproduction. Tribolium beetles normally develop through a succession of six larval moults, increasing in size, before they eventually change into an adult. A protozoan parasite, Nosema whitei, when it infects Tribolium larvae, suppresses the change to adult. Instead, the larva continues to grow through as many as six extra larval moults, ending up as a giant grub, weighing more than twice as much as the maximum weight of an uninfected larva. Natural selection has favoured Nosema genes whose extended phenotypic effect was a dramatic doubling in Tribolium fatstock weight, achieved at the expense of beetle reproduction.
A small tapeworm, Anomotaenia brevis, needs to get into its definitive host, a woodpecker. It does so via an intermediate host, an ant of the species Temnothorax nylanderi, which has the habit of collecting woodpecker droppings to feed to its larvae. Tapeworm eggs are often present in the droppings, and can therefore find themselves being eaten by ant larvae. The parasite then has an interesting effect on the ant’s behaviour when it becomes adult. It refrains from work and is fed by unparasitised workers. Parasitised ants also live longer, up to three times longer, than normal ants. This increases their chance of being eaten by a woodpecker – which benefits the tapeworm.
There are parasitic flukes who persuade their snail victim to develop a thicker shell than normal. Shells are presumably an adaptation to protect the snail and prolong its life. But a shell, like any other part of the body, is costly to make. In the personal economics of snail development, the price of thickening the shell is presumably paid out of non-shell pockets, such as those committed to reproduction. Natural selection of snails has built up a delicate balance between survival and reproduction. Too thin a shell jeopardises survival. Too thick a shell, although good for survival, takes economic resources away from reproduction. The fluke, not being a vertically transmitted parasite, ‘cares nothing’ for snail reproduction. It ‘wants’ the snail to shift its priorities towards individual survival. Hence, I suggest, the thickened shell. In extended phenotype language, natural selection favours genes in the fluke that exert a phenotypic effect on the snail, upsetting its carefully poised balance. The thickening of the shell is an extended phenotype of fluke genes, benefiting them but not the snail’s own genes. This case is interesting as an example of a parasite apparently – but only apparently – doing its host a good turn. It strengthens the snail’s armour and perhaps prolongs its life. But if that were really good for the snail, the snail would do it anyway, without the ‘help’ of a parasite. The snail balances a finely judged internal economy. Too lavish spending on survival impoverishes reproduction. The parasite unbalances the snail’s economy, pushing it too far in the direction of survival at the expense of reproduction.
According to the gene’s-eye view of life that I advocate, genes take whatever steps are necessary to propagate themselves into the distant future. In the case of ‘own’ vertically transmitted genes, the steps taken are phenotypic effects on the form, workings, and behaviour of ‘own’ bodies. Genes take those steps because they inherit the qualities of an unbroken, vertically travelling line of successful genes that took the same steps through the ancestral past – that is precisely why they still exist in the present. All of our ‘own’ genes are good companions that agree with each other about what the best steps are. Everything that helps one member of the genetic cartel into the next generation automatically helps all the others. All ‘agree’ about the goal of whatever it is they variously do to affect the phenotype. And why do they agree? Precisely because, in every generation, they share with each other the same exit route into the next generation. That exit route is the gametes – the sperms and eggs – of the present generation. And now we return to verticobacters and other verticoparasites. They have exactly the same exit route as the host’s own genes, and therefore exactly the same interests at heart.
The genes of a verticobacter look back at the same history of ancestral bodies as its host’s own genes. Verticobacter genes have the same reason to behave as good companions towards our own genes as our own genes have towards each other. If an animal benefits from fast-running legs and efficient lungs for running, then its internal verticobacters will also benefit from the same things. If a verticobacter has an extended phenotypic effect on running speed, that effect will be favoured only if it is positive from the organism’s point of view too. The interests of host and bacterium coincide in every particular. A horizontobacter, on the other hand, might be more likely to ‘want’ its victim, when pursued, to cough with exhaustion – coughing being exactly what the horizontobacter needs in order to get itself passed on to another victim. Or another horizontobacter might want its victim to mate more promiscuously than the optimum ‘desired’ by the host’s own genes, thereby maximising contact with another host, and hence opportunities for infection. An extreme horizontobacter might devour the host’s tissues completely, reducing it to a bag of spores which eventually bursts, scattering them to the winds, where they may find fresh hosts to conquer.
A verticobacter ‘wants’ its victims to reproduce successfully (which means, as we saw earlier, that ‘victims’ is not really an appropriate word). Its ‘hopes’ for the future precisely coincide with those of its host. Its genes cooperate with those of the host to build a strong body surviving to reproductive age. Its genes help to endow the host with whatever it takes to survive and reproduce; with skill in building a nest, diligence in gathering food for the infants, success in fledging them at the right time to prepare to reproduce the next generation, and so on. If a verticobacter happens to have an extended phenotypic effect on a host bird’s plumage, natural selection could favour verticobacter genes that brighten the feathers to make the host more attractive to the opposite sex. Verticobacter genes and host genes will ‘agree’ in every respect.
Exactly the same argument applies to viruses, of course. And now we approach the twist in the tail of this chapter and this book. Any virus that travels from human (for example) generation to generation via our sperms or eggs will have the same ‘interests’ as our ‘own’ genes. Whatever colour, shape, behaviour, biochemistry is best for our ‘own’ genes will also be best for (let’s call them) verticoviruses. Verticovirus genes will become good companions of our own genes, accounting for the familiar fact that viruses can help us as well as harm us. Horizontovirus genes, by contrast, don’t care if they kill their victims, so long as they get passed on to new victims by their route of choice – coughing, sneezing, handshaking, kissing, sexual intercourse, whatever it is.
A good example of a horizontovirus is the rabies virus. It is transmitted via the foaming saliva of its victims, whom it induces to bite other animals thereby infecting their blood. It also leads its victims, for example ‘mad’ dogs, to roam far and wide (and out in the midday sun), rather than stay, perhaps sleeping, within their normal home range. This helps the virus by spreading it over a larger geographical area.
What would be a good real example of a verticovirus? It has been estimated that about 8 per cent of the human genome actually consists of viral genes that have, over the millions of years, become incorporated. Among these ‘retroviruses’, some are inert but others have effects that are beneficial. For example, it has been suggested that the evolutionary origin of the mammalian placenta was the result of a beneficial cooperation with an ‘endogenous’ retrovirus that succeeded in writing itself into the nuclear DNA. LP Villarreal, a leading virologist, has gone so far as to suggest that ‘viruses were involved in most all major transitions of host biology in evolution’, and ‘From the origin of life to the evolution of humans, viruses seem to have been involved … So powerful and ancient are viruses, that I would summarize their role in life as “Ex virus omnia” (from virus everything).’
And now, can you see where I am finally going in this chapter? In what sense are our ‘own’ genes different from benign, good companion viruses? Why not push to the ultimate reductio? Why not see the entire genome as a huge colony of symbiotic verticoviruses? This is not a factual contribution to the science of virology. Nothing so ambitious. It’s more like an expansion of what we might mean by ‘virus’ – rather as ‘extended phenotype’ was an expansion of what we might mean by ‘phenotype’. Our ‘own’ genes are verticoviruses, good companions held together and cooperating because they share the same exit route to the next generation. They cooperate in the shared enterprise of building a body whose purpose is to pass them on. Viruses as we normally understand the word, and computer viruses, are algorithms that say ‘Duplicate me’. An elephant’s ‘own’ genes are algorithms that say, in the words of an earlier book of mine, ‘Duplicate me by the roundabout route of building an elephant first’. They are algorithms that work only in the presence of the other genes in the gene pool. They are equivalent to an immense society of cooperating viruses.
I’m not just saying that our genome consists of ‘endogenous retroviruses’ (ERVs) that were once free, infected us, and then became incorporated into the chromosomes. That is true in some cases and it is important, but it’s not what this final chapter is suggesting. Lewis Thomas also didn’t mean what I now mean, although I would love to borrow his poetic vision in pushing the climax of my book.
We live in a dancing matrix of viruses; they dart, rather like bees, from organism to organism, from plant to insect to mammal to me and back again, and into the sea, tugging along pieces of this genome, strings of genes from that, transplanting grafts of DNA, passing around heredity as though at a great party.
The phenomenon of ‘jumping genes’, too, is congenial to my vision of a genome as a cooperative of verticoviruses. Barbara McClintock won a Nobel Prize for her discovery of these ‘mobile genetic elements’. Genes don’t always hold their place on a particular chromosome. They can detach themselves, then splice themselves in at a distant place in the genome. Some 44 per cent of the human genome consists of such jumping genes or ‘transposons’. McClintock’s discovery of jumping genes conjures a vision of the genome as a society, like an ants’ nest: a society of viruses held together only by their shared exit route, and hence shared future and shared actions calculated to secure it.
My suggestion is that the important distinction we need to make is not ‘own’ versus ‘alien’ but vertico versus horizonto. What we normally call viruses – HIV, coronaviruses, influenza, measles, smallpox, chickenpox, Rubella, rabies – are all horizonto viruses. That, precisely, is why many of them have evolved in a direction that damages us. They pass from body to body, via routes that are all their own, by touch, in the breath, by genital contact, in saliva, or whatever it is, and not via the gametic routes with which our own genes traverse the generations. Viruses that share the same genetic destiny as our own genes have no reason to dissent from good companionship. On the contrary. They stand to gain from the survival and successful reproduction of every shared body they inhabit, in exactly the same way as our own genes do. They deserve to be considered ‘our own’ in an even more intimate sense than mitochondria, for mitochondria pass down the female line only. And, from this point of view, our ‘own’ genes are no more ‘own’ than a retrovirus that has become incorporated into one of our chromosomes and stands to be passed on to the next generation by exactly the same sperm or egg route as any other genes in the chromosome.
I cannot emphasise strongly enough that I am not suggesting that all our genes were once independent viruses that later ‘came in from the cold’ and, as retroviruses, ‘joined the club’ of our own nuclear genome. That is known of some 8 percent of our genes, it may be true of many more, it is interesting and important, but it is not what I am talking about here. My point is rather to downplay the distinction between ‘own’ and ‘other’, and to emphasise instead the distinction between vertico and horizonto.
Our entire genome – more, the entire gene pool of any species of animal – is a swarming colony of symbiotic verticoviruses. Once again, I’m not talking only about the 8 percent of our genome that consists of actual retroviruses, but the other 92 percent as well. They are good companions precisely because they are vertically transmitted, and have been for countless generations. This is the radical conclusion towards which this chapter has been directed. The gene pool of a species, including our own, is a gigantic colony of viruses, each hell-bent on travelling to the future. They cooperate with one another in the enterprise of building bodies because successive, temporary, reproduce-and-then-die bodies have proved to be the best vehicles in which to undertake their vertical Great Trek through time. You are the incarnation of a great, seething, scrambling, time-travelling cooperative of viruses.
2024-11-07
让-巴普蒂斯特·德·帕纳菲厄《沙滩上的智人：带着人类演化史去度假》

目录
序言
第一章起源
起立，猴子！ “大有可为”的基因突变开始双足行走双足行走有何好处？
第二章南方古猿
汤恩幼儿露西 “开枝散叶”的南方古猿傍人
第三章原始人类
在人属诞生之前怎样才算人类？最初的人属最初的工具容量与日俱增的大脑直立人是个大个子多种用途的两面器新面貌 “开枝散叶”的原始人
第四章去往世界尽头
走出非洲改变最初的欧洲人狩猎与传统火的掌控
第五章其他人属物种
欧洲的尼安德特人尼安德特文化尼安德特艺术家冰期的幸存者丹尼索瓦人弗洛里斯的“霍比特人” 其他人属物种的结局
第六章最初的智人
智人的出现既是智人又是现代人！伊甸园起源问题
第七章征服地球
从非洲到美洲迁徙造就智人旧石器时代晚期的文化新人类？
第八章史前时代的结束
中石器时代大型动物的灭绝新石器时代革命基因变化迁移
结语今天的智人
过去的痕迹基因的多样性人类种族存在吗？未来的人类控制演化的痴心妄想
序言
智人（Homo sapiens）是现今人类的祖先，大约在30万年前出现在非洲。在智人诞生前的数百万年里，非洲大陆上生活着一些双足行走的人科动物，它们的后代就是我们所说的智人了。在不断演化的过程中，一些人科动物彼此隔离，隔离的时间足够久之后，演化出了不同的物种。在历史上的许多时候，地球上同时生活着不止一种人类。
在过去的几十年里，由于古人类学家的辛勤努力，我们得以重建这一段人类历史，并绘制了人类的遗传树（古人类学家称之为系统发生树），其繁茂程度是前人所无法想象的。我们人类是如何从“人丁兴旺”的大家族中脱颖而出的？我们人类的演化是渐进式的还是跃迁式的？在演化过程中，我们是在什么时候，又是怎么成为今天意义上的人类的？这些问题里的一部分已经找到了答案，或至少找到了部分答案，但是这些答案又引发了新的问题。
从西班牙的“骨坑”到南非的斯泰克方丹洞穴，从格鲁吉亚的德玛尼西遗址到印度尼西亚的弗洛里斯岛，考古新发现层出不穷。得益于越发精细的考古挖掘技术，我们得以想象祖先生存的环境是什么模样。如今，我们也能够对岩石内部进行探测，进而揭示颅骨化石最微小的细节。而通过对化石进行化学分析，我们可以了解远古生物的饮食习惯。不过，真正意义上的革新，是对史前人类进行DNA（脱氧核糖核酸）分析。即便这种方法问世尚不足十年，古生物遗传学也已经取得了惊人的研究成果，比如确认了一个无人设想过的物种的存在，抑或是提供了不同种的人类曾经相互杂交的证据。时至今日，我们身上仍留有这段历史的痕迹。
无论是在社会层面还是政治层面，人类起源都是个敏感话题。如今，许多人仍然坚信宗教神话，不愿意面对冷冰冰的化石骨骸证据，不愿意相信人类源自动物的事实，不愿意承认人性是缓慢习得的。古人类学的历史，也是我们人类社会的历史。某些国家的研究人员在培养民族自豪感的目标驱动下，试图寻找某种比其他人更古老或更灵巧的古人类，以回溯本民族起源而非人类起源。
诚然，我们今天讲述的故事，未来可能会发生改变。未来的新发现，或将充实这套叙事，或将把某些篇章整个推倒重写。古人类学有助于我们理解我们是谁，并把人类作为一个具有多样性的整体来思考。我们之所以痴迷于研究自身的演化史，是因为它不但揭示了我们的起源，还揭示了我们的本性。
地质时期和文化年表
根据骨骼化石确定的不同人亚族的分布时期图
第一章　起源
黑猩猩和人类有许多相似之处，比如二者拥有一个最近共同祖先，由这个共同祖先分化而来。直至21世纪，我们才对自己的远祖——第一批原始人类——有了更加清晰的认识。
起立，猴子！
人类的祖先是一种哺乳动物，浑身毛发，长着尾巴和尖尖的耳朵，生活在旧世界，很可能过着树栖生活。 ——达尔文，1871
原始人类中可是有不少名人的，比如露西（Lucy，距今320万年），但我们的历史未必就要从露西开始写起。我们也可以把厚厚的家谱翻到30万年前，最早的智人降生的时刻；或者再往前翻到距今700万到1 000万年，最早的人亚族诞生时。我们还可以继续向前追溯：距今5 500万年，最早的灵长目动物登场；距今2.2亿年，最早的哺乳动物出现；大约5.5亿年前，最早的脊椎动物产生。
从动物学角度说，我们属于人亚族（Hominina）。人亚族包括了与黑猩猩亲缘关系更远、与现代人类亲缘关系更近的所有灵长目动物，比如南方古猿。自最早的人亚族诞生之时起，我们的历史便与现今依然存活于世的其他动物分道扬镳，因此，将关注点聚焦于最早的人亚族是个不错的选择。
根据古生物学和分子生物学数据，人类和黑猩猩的最近共同祖先生活在距今500万至1 000万年的非洲。之所以年代估算出现这么大的差值，是因为两门学科的研究成果无法就此达成一致：化石遗存显示最近共同祖先生活在700万到800万年前（甚至可能更早），但分子生物学的研究结果表明其生活在距今500万年到700万年之间。或许，以下事实能够解释出现这种现象的原因：在与祖先物种分化后，两个支系有过杂交，由此导致两个支系的分化期变长。
关于这个最近共同祖先，除了它可能群居且茹素外，我们所知甚少。我们不知道它究竟是四足行进还是双足行走。如果它四足行进，那么人亚族就是自行发展出了双足行走的典型特征；如果它双足行走，那就意味着更古老的灵长目动物早就开始依赖双腿行动了，而后来的黑猩猩则退回了一种特殊的四足行进方式——移动时以双手第二指骨的背部作为支撑［即“指背行走”（knuckle-walking）］。
虽然我们依然不甚了解这个最近共同祖先，但化石的存在使我们得以管窥它的面貌。2000年，古人类学家马丁·皮克福德（Martin Pickford）和布里吉特·森努特（Brigitte Senut）共同描述了属于一个新物种的骨化石，这个新物种名叫图根原人（Orrorin tugenensis），生活在600万年前的肯尼亚。根据股骨颈的内部结构，皮克福德和森努特猜测，图根原人经常双足行走。图根原人生活在森林里，擅长攀缘树枝。
一年后，研究员米歇尔·布鲁内特（Michel Brunet）宣布，在乍得发现了生活在700万年前的乍得沙赫人（Sahelanthropus tchadensis）的一块头盖骨，并将其命名为“图迈”。根据枕骨大孔（指颅骨底部的孔，大脑通过此孔与脊髓相连）的位置，“图迈”似乎也靠双足行走。人类的枕骨大孔位于颅骨下方、脊柱正上方。黑猩猩的枕骨大孔则位于颅骨靠后的位置，与四足动物一样。
黑猩猩与人类的枕骨大孔对比图
然而，由于化石非常不完整，很难确定“图迈”在人亚族演化史中的位置。因此，部分古人类学家更倾向于将“图迈”归入日后演化为黑猩猩甚至大猩猩的谱系。我们之所以无法给“图迈”的演化位置下定论，是因为处于猿类和人类分化期前后的人科物种都具有很大的相似性。如果对“图迈”颅骨发现地找到的股骨加以分析，或许能够更加精确地确定它在灵长目演化树上的位置。
另一个有趣的化石来自地猿（Ardipithecus），其年代更近，保存也更完整。美国古人类学家蒂姆·D. 怀特（Tim D. White）对埃塞俄比亚多个发掘点出土的数以千计的整骨和碎骨进行了长达15年的精心研究，随后于2009年对这些可追溯到440万年前的地猿化石进行了解读。根据地猿化石周围的动物化石推断，地猿生活在森林里，身高约1.2米，既能行走又能攀缘。地猿长有对生的大脚趾，但不如黑猩猩的灵活。虽然双腿移动起来比大猩猩还要容易一些，但是地猿的双臂和指骨长而弯曲的手指非常适于树栖生活。地猿的犬齿强健有力，具有明显的祖先特征（直接遗传自祖先），脑容量接近黑猩猩。一些人认为地猿是南方古猿（和人类）的直系先祖，另一些人则将地猿视为远房表亲，与黑猩猩的亲缘关系更近。
最初的人亚族分布图
人猿总科、人科、人亚族
在灵长目动物中，失去了祖先的长尾而拥有了尾椎的猴子都被归入人猿总科（Hominoidea）。该科包括了原康修尔猿（Proconsul，2 300万年前生活在非洲）的全部后代和十来个现存物种：长臂猿、猩猩、大猩猩、黑猩猩、倭黑猩猩和人类。
原康修尔猿是第一批失去尾巴的猴子之一，也是人猿总科的祖先
除了尾椎之外，人猿总科的独特之处还在于手骨及肩胛骨的结构。人猿总科对应的是猴总科，即“旧世界猴”，后者依然长有长尾（尾巴并没有在进化过程中丧失）。至于美洲的“新世界猴”则属于阔鼻小目（Platyrrhini），是与前述两者亲缘关系更远的灵长目类群。
最近几十年，根据在亲缘关系、灭绝物种化石和DNA方面层见叠出的研究成果，人猿总科内部的分类经常出现变动。如今，人科（Hominidae）包括了猩猩、大猩猩、黑猩猩、倭黑猩猩、人类和许多化石物种。
至于人亚族，指的是人科内部与人类亲缘关系较近、与黑猩猩亲缘关系较远的全部物种。古人类学家一共描述过二十来种，包括乍得沙赫人、南方古猿、傍人，以及人属（Homo）的多个物种，比如能人（Homo habilis）、直立人（Homo erectus）、尼安德特人（Homo neanderthalensis）或智人。人们认为，这些物种都是双足行走的。
“大有可为”的基因突变
借助化石，我们能够了解最早的人亚族的大致面貌。现如今，我们拥有了一个与此迥异的补充性信息来源，那就是DNA。近些年来，基因测序已经成了生物学和古生物学的惯用研究手段（参见第10页《DNA、基因、突变》）。
人类和黑猩猩分化后，基因突变导致二者的DNA有所不同。已经发现的突变现象有：点突变（比如碱基A替换为碱基C），DNA片段缺失和重复，以及内部重组（人类和黑猩猩的染色体数量不同）。
一些基因突变并没有产生明显的后果，另一些可就是导致人类区别于黑猩猩的“元凶”了。通过对比人类和黑猩猩的基因组，人们希望能够确定导致二者演化分离的遗传事件。
在人类和黑猩猩的分化过程中，共同祖先某些DNA片段的遗失或失活似乎发挥了重要作用。人们发现，在一种参与合成肌球蛋白（肌肉收缩所必需的一种蛋白质）的基因上，人类和黑猩猩有所不同。基因MYH16负责合成一种咀嚼肌特有的肌球蛋白。然而，人类体内的MYH16基因却失活了。或许，正是这个突变导致了人类支系的下颌变小。
一些突变可能导致行为上的变化。比如，人类失去了形成触须（粗壮的感觉毛，包括黑猩猩在内的许多哺乳动物都有）和阴茎刺（覆盖在黑猩猩阴茎表面的小型角蛋白突起）的基因。失去阴茎刺会使阴茎敏感度降低，交配时间延长（黑猩猩可是出了名的快枪手）。另外，我们还知道，失去阴茎刺的灵长目往往都是单配偶型物种。
这一变化也关系到人类和猿类的其他区别，比如：人类在排卵期开始前不再有身体上的变化，以及出现乳房和光滑脸庞等第二性征。使得交配时间延长的基因突变或许改变了人亚族的生活方式，强化了雄性与雌性之间的纽带，而这一纽带正是实现社会凝聚、更好地保护后代的关键因素。
DNA、基因、突变
我们体内的每个细胞都含有46条染色体。所谓的染色体，就是扭曲折叠的DNA细丝。人类的基因组（也就是全部的DNA）由32亿个排成链状的核苷酸组成，核苷酸分为A、T、C、G四种。所谓的DNA测序，就是确定一个个核苷酸的排列顺序（比如AGATCC）。在不同物种之间或同一物种的不同个体之间，都可以进行核苷酸序列对比。
基因是细胞为了生产自身活动所需分子而转录的DNA片段。人类拥有2万个基因，其中包含了人体发育和细胞正常工作所需的全部信息。DNA的其他部分在调节这套转录系统时起着至关重要的作用，可以控制基因的“表达”（也就是基因的活动）。实际上，在不同的发育阶段或不同类型的细胞里，基因活性也有高有低。
突变指偶然发生的DNA序列改变。基因发生突变时，其活性往往也会改变。每个基因都可能因为先前发生的突变而存在多种变体，即所谓的等位基因。
如果某个基因突变导致生殖细胞（卵子或精子）发生变化，而该生殖细胞又成功受胎，那这个突变将出现在由此细胞孕育而成的新个体的所有体细胞里（不过仅存在于新个体自身一半的生殖细胞内）。这样一来，突变就能一代代传递下去。每个生物个体都带有从亲代遗传而来的100到200个新的突变，不过大部分突变都没有产生什么显性影响。
开始双足行走
从四足行进过渡到双足行走是人亚族历史上的重大事件，因为两足的移动方式使其有别于绝大部分近亲［不过还有一些与人亚族无关联的灵长目动物也发展出了两足行走的能力，比如生存于800万年前的山猿（Oreopithecus）］。人类家族中出现的这一现象该怎么解释呢？
首先，这一移动方式的改变意味着身体骨架的全面重组，并且影响到了胚胎的发育。足部形成足弓，以支撑身体的全部重量。大脚趾与其他脚趾并列，再也不能与其他脚趾构成钳形。脚踝关节和膝盖关节得以强化，同时髋关节位置发生变化，使得双腿更加靠近身体重心线。为了使上半身保持竖直状态，需要强壮的肌肉；强壮的肌肉又塑造了我们的臀部，而臀部可说是典型的人类演化创新。骨盆呈盆状展开，上托腹腔脏器，下承大腿肌肉。除此以外，骨盆还须满足分娩的需要。双重限制之下，人类的妊娠期变短，使胎儿出生时颅骨发育不全，以便顺利通过骨盆入口。腰椎位于脊柱的底端，在强度提升的同时也变得更宽更短。枕骨大孔移动至颅骨正下方，大大减轻了支撑头部的颈部肌肉的负荷（参见第4页插图）。
在布里吉特·森努特和苏珊娜·K. S. 索普（SusannahK. S. Thorpe）等众多古人类学家看来，树栖（指一生中的大部分时间都栖息在树上）的人科动物或许最先发展出了双足行走的特征。我们的直系祖先恰恰生活在森林里，它们应该不是四足行进的，且极有可能习惯于攀缘！我们已经发现，作为现存树栖特征最为明显的猿类，猩猩在踏上柔软树枝时会尽量增加腿部的伸展幅度，与人类在有弹性的地面上奔跑时的肢体反应别无二致，而其他猴子的做法却恰恰相反。由此推断，地面上的双足行走应该是由树上的双足行走发展出来的。古生物学家称此现象为在树上进行的“直立姿势预适应”。
另有一些研究人员认为，是四足攀爬的猴类最先发展出了双足行走能力。远古人亚族［比如拉密达地猿（Ardipithecus ramidus）］的骨骼研究结果似乎证明了这一论断，因为远古人亚族的腕骨与现存四足灵长目动物的腕骨相似。还有一些研究人员则认为，双足行走最先出现在半水栖人科动物身上，然而迄今为止，尚未发现任何支持这种假说的化石！
但是，不管怎么样，我们都不应这样设想：人类从四足姿势“站起来”，历经数百万年，本着主观意愿，终于获得了我们今日了不起的直立行走姿势。首先，我们探讨的是解剖学意义上的进化，自从生命起源以来，在所有动物物种身上已经产生了不胜枚举的类似例证。达成某种目标（无论结果多么有益）并不需要诉诸意愿，哪怕只是无意识的。解剖学层面的演化创新或许在日后具有很大的益处，然而，以此益处为基础建立起来的解释体系却是不可接受的，因为进化只是进化，并不能预见物种将来需要什么！如果一定要给出一个达尔文式的解释，那就需要探究向着双足行走演进过程中的每个阶段分别带来了什么好处。有朝一日能够跑马拉松这样的好处可就不要提了，以双足姿势行走能比祖先移动时间更长这种朴实的小优势可能更合理。
人科动物演进图
似是而非的图画
这幅著名的图画诞生于1965年。画面上，四足行进的猴子在前进过程中渐渐站立起来，并朝着越来越像人类的方向演化：先是原始猴子，然后是南方古猿，接着是原始人类，再往后是尼安德特人，接下来是克罗马农人（Cro-Magnons），最后是大步迈向未来的现代人。
这幅从猴到人的行进图来自时代生活图书公司（Time-Life Books）出版的图书《早期人类》（The Early Man）。毋庸置疑，这幅插图在普及演化思想方面确实发挥了作用。可不幸的是，它在多个层面上都传达了错误的信息。首先，这幅插图给人的感觉是，这些灵长动物无一例外地朝着智人的方向前进，仿佛成为人类是它们不可避免的终极演化结果。其次，插图里的几个物种，并非一个就是另一个的后代：人类的演化不是直进式的，而是分支式的，在演化的过程中，许多物种都消失在了历史长河里，并没有留下任何后代。
双足行走有何好处？
成为人类，是从脚开始的。 ——安德烈·勒鲁瓦-古朗（André Leroi-Gourhan），1982
双足行走大有好处。首先，在探索周围环境时，双足行走成本更小。实际上，从能量消耗的角度上看，双足行走比四足行进更加经济。在速度相同的条件下，人消耗的能量仅为黑猩猩的四分之一。其实，黑猩猩也能双足行走；不过由于关节构造更适于四足行进，黑猩猩在两种移动方式下的能量消耗是相等的。
在1000万年前，全球气温略有下降，尤为重要的是，天气变得更加干燥。气候变化导致非洲广大的茂密森林消失不见，取而代之的是稀树草原和稀疏森林。一些人科动物选择继续在森林里度日，另一些则着手开发新的资源。在不同以往的环境条件下，后者充分利用了自身双足行走的能力，并在自然演化的作用下强化和巩固了这种移动方式。
在比森林更加开阔的环境里，站起来的好处或许就是看得远。然而，虽然是四足行进，狒狒却在稀树草原生活得如鱼得水。所以，站起来看得远并不是个非常充分的解释。另有一些假说将关注的焦点集中在大范围分散的食物来源上。事实上，在这种情况下，能够自由移动并将采集到的食物带给族群中的其他成员，着实是很有好处的。我们发现，与双足行走相伴而来的，是食谱的变化——块根和块茎在食物中占的比例更大了——这一显著变化导致了牙釉质加厚。与此相反，喜食水果或嫩叶的动物，比如大猩猩，牙釉质就偏薄。
另一个问题则涉及双足行走与制造工具之间的关系。双足行走是否通过解放双手促进了第一批石质工具的诞生呢？这个问题也可以反过来问：制造工具的需求是否促进了向双足行走的过渡呢？一些日本古人类学家倾向于后一种假说。他们认为，首先是双手变得灵巧，而且这一过程与双足行走是没有关联的。
另外，还有一种可能性。人类手指和脚趾中最为粗壮的当属大拇指和大脚趾，而它们的“发展壮大”可能是同一个演化机制作用的结果。采用双足行走姿态后，自然选择强烈作用于脚趾之上，这种强化转而又作用于拇指，进而使得双手更加灵巧。
一些古人类学家，如美国的欧文·拉夫乔伊（OwenLovejoy），将双足行走的出现与向单配偶制的转变关联起来。最初的人亚族开始双足行走后，脑容量增大导致营养需求增加，雌性可能不得不分散到广阔的地域里寻找高能量的食物。雌性的分散可就苦了雄性，“妻妾成群”的雄性绞尽脑汁，只为了避免自己的配偶靠近其他雄性……单配偶制在我们的演化谱系中早早出现的假说，满足了保守的美国卫道士（如果他们凑巧还是演化论的支持者）的期望，但与实际情况却是背道而驰的。首先，脑容量增大是在几百万年后才发生的。其次，还需要考虑生物的性别二态性（sexual dimorphism）。所谓的性别二态性，指的是同一物种的雌性和雄性在身材和外形上的差异。在这一方面，我们对人亚族始祖一无所知。不过，继之而来的南方古猿具有非常明显的性别二态性，这与实行单配偶制的社会形式似乎并不匹配。
事实上，在灵长目动物中，两性之间体形差异过大的会形成“后宫型”社会组织形式，在这种社会里，一个雄性严格掌控一群雌性。正因如此，雄性大猩猩比雌性大猩猩要大得多，也重得多，这是激烈的性竞争导致的结果。雄性因为身体强壮、犬齿硕大而占据统治地位。于是，自然选择的天平向最为健壮的雄性倾斜，它们也得以将自身特征传给下一代。在大猩猩和狒狒中，雄性通过炫耀犬齿的方式来吓唬或制服竞争对手。相比之下，雌性的犬齿就非常小。而在黑猩猩族群中，社会结构更加灵活，虽然雄性也居于主导地位，但并不像大猩猩那么专横霸道，性别二态性也不如大猩猩那么明显。至于奉行单配偶制的长臂猿，它们的雌性和雄性具有相同的大小，犬齿也都很小。
在人亚族中，双足行走的发展与犬齿的减小是分不开的。南方古猿依然表现出比较明显的性别二态性，而在最初的人属物种中性别二态性已经有所降低，这就说明，在最初的人属物种中，雄性之间的争斗相对没有那么激烈，而单配偶制或许也更为普遍。
另一种假说则着眼于性选择。这里的性选择，不以雄性的好勇斗狠为基础，而以雌性做出的选择为基础——这在动物界可谓是屡见不鲜。雌性或许对自然而然保持直立姿势的雄性青眼有加，进而使得整个族群越来越趋于双足行走（因为这些雄性更多地将基因传了下去），随后，双足行走又因为在寻找食物上具有无可比拟的优势而得到进一步巩固与强化。
在人亚族向着双足行走演化的过程中，多种相得益彰的因素很有可能共同发挥了作用，比如：生活环境的改变，解放双手的优势，社会纽带的巩固，以及妙不可言的性！
人类演化：达尔文vs拉马克
在最近出版的一部著作里，我们还能读到这样的说法：尼安德特人的颌骨强健有力且向前凸出，是因为它们“重度使用”牙齿。现在，没有任何已知机制能够解释，为什么器官会因为被使用或不被使用而演化。某个个体的器官可以发生改变，但是这种改变并不能传给后代，这与19世纪初期拉马克所持的观点（如“用进废退”）恰恰相反。同样，外部环境的约束并不会直接塑造器官。
然而，想通这一点却实属不易。人们更乐于相信：之所以发展出双足行走的能力，是为了解放双手，并让祖先能够运用同期出现的大容量的大脑制造工具；或者反过来，大脑的演化注定是为了让我们能够制造工具，更何况我们的双手已经因双足行走得到了解放，而后者只是一种从属的演化适应而已。
在很长的时间里，拉马克的观点一直是法国动物学界和史前研究中的主流：我们祖先的演化，是朝着明确的方向进行的，也是有着明确的目标的，这个目标便是“人化”。后来，虽然这种定向演化（大概是在神的意志下发生的）的观点并未完全消弭于无形，但是进化理论和达尔文思想已经渐渐传播开来。如今，绝大多数古人类学家会通过自然选择或性选择理论来理解人亚族在数百万年里所经历的种种变化。
根据进化理论，生物的DNA偶然发生突变，而突变可在生物的种群中引发解剖学上的、生理上的或行为上的改变。当改变对生物个体有利时，生物个体便有更多生存和繁殖的机会，这种改变也就更有可能传给后代，并随着一代一代的繁殖而传遍整个种群。这个机制被称为自然选择；自然选择在整个生物界里屡见不鲜，而且形式多种多样。那种认为我们的祖先摆脱了演化规律约束的想法，是完全站不住脚的。
第二章　南方古猿
在500万年前至100万年前，南方古猿和它们的亲戚傍人在非洲稀树草原上繁衍生息。在很长一段时间里，对于这些双足行走的猿人，人们的了解只限于著名的露西女士。不过，自21世纪初期开始，骨骸化石的发现激增，让我们对这些人亚族物种有了更好的了解。
汤恩幼儿
第一个登上人类系统发生树的南方古猿，是绰号“汤恩幼儿”（Taung Child）的幼猿。“汤恩幼儿”的化石发现于南非汤恩的采石场，澳大利亚人类学家雷蒙德·达特（Raymond Dart）于1925年对其进行了描述。雷蒙德·达特确认“汤恩幼儿”是具有惊人特征的幼猿，认为它是猿和人之间的过渡物种，并将其命名为南方古猿非洲种（Australo-pithecus africanus）。
雷蒙德·达特展示“汤恩幼儿”的颅骨
“汤恩幼儿”的颅骨化石带有天然形成的脑模。雷蒙德·达特还指出，“汤恩幼儿”是双足行走的。现在，人们认为，“汤恩幼儿”是在230万年前被一只猛禽杀死的，殁年仅4岁。在当时的学界，雷蒙德·达特受到了激烈的抨击，人们期待中的“缺失环节”应该是一种有着猿的身体和类似人的大脑的生物，可大脑似猿而牙齿似人的“汤恩幼儿”与人们的预期相去甚远。而且，人们一直在亚洲而不是非洲寻找这个所谓的“缺失环节”。
随着新的化石接连出土，比如1947年在南非斯泰克方丹出土的普莱斯夫人（Mrs. Ples）颅骨化石，达特的观点逐渐被研究人员所接受。刚出土的时候，普莱斯夫人被命名为德兰士瓦迩人，后来才被确认与“汤恩幼儿”属于同一物种。普莱斯夫人为双足行走的猿人，身高约1.1米，臂长腿短，脑容量约为450毫升至500毫升，略大于黑猩猩脑。
普莱斯夫人，为南方古猿非洲种，出土于南非斯泰克方丹
1997年，古人类学家罗纳德·J. 克拉克（Ronald J. Clark）在斯泰克方丹发现了一具近乎完整的南方古猿非洲种（或邻近物种）的骨架，并将其命名为“小脚”，其生活年代距今370万年。罗纳德·J. 克拉克认为，“小脚”是雌性古猿，身高约1.3米，去世时约30岁。由于骨骸被封存在极其坚硬的矸石之中，人们在20年之后才将它取出，直至2017年才对它进行了描述。
寻找缺失环节
“缺失环节”的概念诞生于19世纪，指的是能够解释从一种形态向另一种形态（比如从“猿”到人）过渡的缺失物种化石。正如其名称（猿人）所示，欧仁·杜布瓦（EugèneDubois）发现的直立猿人（Pithecanthropus erectus，后来归入直立人；参见第77页第四章）本来有望成为“缺失环节”，但它与史前史学家彼时的想象实在是天差地别。
时至今日，“缺失环节”的概念已遭彻底弃用。一方面，演化不再被视为由一个一个的物种组成的演化链条，而是被视为枝杈繁多的演化树。另一方面，如果仅仅考察一个世系（即演化树的一个分支），那么它必然总有一些缺失环节，也就是说，总是会缺少某些从一个物种到另一个物种的转变阶段。实际上，由于化石化是极为罕见的现象，肯定不会所有的演化中间形态都能保留下来，特别是当演化速度非常快的时候（在地质年代的尺度上）。
自达尔文提出演化论以来，反对者便试图利用“过渡物种”的明显缺失来反对达尔文的观点。然而，古生物学家已经发现了为数众多的“过渡物种”，比如始祖鸟，这种具有爬行动物特征的鸟类说明了小型恐龙是怎样演化为鸟类的。达尔文尚在人世时，人们便已经对始祖鸟进行了描述，随后也发现了大量的中间物种。但是，在反对达尔文的人眼中，过渡物种总是欠缺的。对缺失环节的找寻，也只能以失败告终。
欧仁·杜布瓦于1891年发现的直立猿人（或爪哇人）遗骨
现在，人们在欧仁·杜布瓦发现的直立猿人附近又发现了大量化石。在这些化石上，原始特征和衍生特征、祖先特征和演化创新相互镶嵌，导致其演化位置很难被确认，加之化石数量众多，形成了不止一条演化链，所以现在的问题已经不是有环节缺失，反而是环节太多了！
露西
同一时期，另一副骨骸化石的发现使南方古猿的存在成为全世界普遍接受的观点。这副骨骸于1974年11月24日在莫里斯·塔伊布（Maurice Taieb）、伊夫·柯本斯（Yves Coppens）、唐纳德·约翰松（DonaldJohanson）组织的埃塞俄比亚科考活动中被发现，并被编号为AL 288，后来根据披头士乐队的歌曲《缀满钻石天空下的露西》得名露西。
这件南方古猿化石标本包含大约40%的骨架，是当时发现的最为完整的远古人科生物化石。露西属于南方古猿阿法种（Australopithecus afarensis），现在已经发现了属于这个物种的300余件化石（几乎都是碎片）。
在410万年至290万年前，这些南方古猿生活在东非的稀树草原上。相应地，与人类相比，露西的双臂较长、双腿较短。露西既能双足行走又过着树栖生活，肩膀和手臂的构造非常适于攀缘；骨盆较大、股骨向内，使得行走时更加稳定。大脚趾偏离其他脚趾，靠脚掌外侧支撑全部体重；脚跟高高隆起，像黑猩猩一样。膝盖不能完全展开，导致它行走时比人类消耗更多的能量。
露西的骨骸，已有330万年的历史
露西主要食用水果和树叶，或许也吃小型动物，尤其是白蚁和容易捕捉的昆虫，它们富含营养物质，往往也数量众多。此外，在2010年，人们发现了一些食草动物骨骼，同时出土的还有南方古猿化石，在这些食草动物的骨骼上发现的切割痕迹，令人不禁猜想露西所属的南方古猿可能也有食腐行为，也就是说吃死亡动物尸体上的肉。这也意味着，露西曾经使用石质工具切割肌腱（参见《最初的工具》）。
露西通常被视为年轻的雌性古猿，身高约1.05米。这一物种的雄性平均身高为1.35米、雌性为1.1米，体重为25千克至45千克。它们颅骨较小（脑容量约400毫升），额头后缩，面部前凸。犬齿很小，具有现代特征；臼齿较大，更具原始特点。牙上覆着厚厚的牙釉质，以免牙齿快速磨损。牙齿在颌骨上呈圆弧状排列，曲度介于猿类的平行牙弓和人类的抛物线牙弓之间。
南方古猿阿法种的性别二态性相当明显。因此可以猜想，雄性之间的竞争极为激烈。幼崽的发育很可能比较缓慢，就像现存的猿类一样，而且亲代（雌性古猿？）照顾子代的时间很长。牙齿的化验结果表明，雌性在发育期后会改变食谱，而雄性却不会。为了解释这个现象，人们提出了如下的假说：雌性在成年后会离开原生族群并加入另一个族群，和现存的黑猩猩一样。
南方古猿阿法种和智人的颅骨对比
关于南方古猿阿法种的演化位置，人们曾展开激烈争论。虽然知名度高，但露西未必就是我们的祖先！30年前，众多美国研究人员将露西定为我们的先祖，可伊夫·柯本斯认为它只是我们的“姑婆”，代表一个已经灭绝的旁系。另一些研究人员则认为露西与傍人有亲缘关系（参见第43页）。如今，随着已被描述的南方古猿物种的增多，露西的演化位置很难有定论，更何况研究人员尚未就某些化石是否属于该物种达成共识。
年代测定
化石的年代与化石物种间是否存在亲缘关系无关：年代更古老的人亚族未必就是年代更近者的祖先！但精确测定化石年代对更好地理解物种演化至关重要。早先的年代测定只能给出相对年代：由于沉积物一层层沉积，从理论上说，在沉积物上层发现的化石比在其下层发现的化石年代更近。
今天，借助多种技术，我们已经可以确定化石的“绝对”年代，在时间的长河里将其精确定位。不少技术以石头或物质的天然辐射性为基础，碳—14年代测定法就是其中的典型代表。
碳元素以多种同位素的形式存在，其中包括普通的碳—12和稀有的放射性碳—14。植物利用空气中的二氧化碳进行光合作用时，会同时将这两种不同形式的碳元素吸收进体内。当植物被食草动物吃掉或食草动物被食肉动物吃掉的时候，食草动物或食肉动物也会将这些碳元素吸收进自己体内。在它们死亡以后，碳—14会慢慢衰变成为氮—14。碳—14的半衰期为5 734年，换句话说，碳—14需要花上5 734年才能失去一半的放射性。这么一来，通过测定骨头或木炭中两种碳同位素的比例，我们就能确定骨头或木炭的年代。不过，如果测定对象的年代在4万年以上，这个方法的测定结果就会非常不精确，因为碳—14的残余量还不到初始量的1%。
为了测定更加古老的骨头或岩石的年代，我们可以使用其他同类型的“原子钟”。比如：铀—钍定年法适用于测定50万年前的骨骸或石笋的年代，钾—氩定年法可用于确定几百万年前的火山岩的年代。
除此以外，还有基于其他物理原理的测定方法，比如：基于熔岩固结时磁性矿物记录的地磁场变化的古地磁法（paleo-magnetism）；通过测量曾经经受高温的矿物在再次受热时发出的光以确定矿物年代的热释光法（thermoluminescence），这种方法适用于燧石和陶器；还有与热释光法原理相似但用于测定牙釉质、富碳化石（石笋、珊瑚等）或沉积石英颗粒年代的电子自旋共振法（Electron Spin Resonance，ESR）。
“开枝散叶”的南方古猿
如果一小部分弱小的原始人种群没有在非洲稀树草原残酷命运（或物种灭绝）的屡次打击中幸存下来，那智人就不会出现并迁徙到世界的各个角落。 ——斯蒂芬·J. 古尔德（Stephen J. Gould），1996
自“汤恩幼儿”出土以来，研究人员已经描述了为数众多的物种（参见第v页图），其中绝大多数发现于非洲大陆的南部和东部，年代在450万年前至200万年前。在这段漫长的历史时期里，有些物种不断演化并获得了全新的特征（即“直进演化”），或分成两个种群并逐渐分化直至形成两个新的物种（即“分支演化”），南方古猿家族的兴盛部分来源于此。
其间，这些物种中的某几种在同一时期生活在同一地区。它们之间可能并不会为了食物或其他有用资源展开直接竞争。否则，竞争通常会导致两个物种中的一个消亡或转化。有些人已经描述了居于不同生态位的物种在齿系上的细微差别，这些细微差别正是这个物种假说的有力支撑。
直进演化和分支演化——互补的两种演化方式
人们认为，最古老的人亚族物种主要吃素，与现存主食水果和嫩叶的猿类似。但是，这并不妨碍黑猩猩吃白蚁并主动猎杀小型猿猴。南方古猿是否在树林中捕猎，现在已经不得而知，但是它们大概还是会吃昆虫和容易捕捉的小动物。
属与种
博物学家为每个现存物种或化石物种都取了由两部分组成的学名。这么一来，所有的南方古猿都拥有了相同的属名Australopithecus（即南方古猿属），这个属名说明了它们之间的相似性和亲缘关系。在南方古猿属下，存在多个“种”，比如南方古猿阿法种和南方古猿非洲种。
对于现存动物，“种”是互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合。对于化石物种或古生物种，这些标准就不适用了：首先，我们无法考察它们的繁衍能力；其次，即便它们之间曾存在亲缘关系，在漫长的时间里它们也能变得足够不同，使人们将它们视为不同的物种。
通常情况下，如果新发现的骨骸与已知骨骸不同，便可确定为新物种（但也有例外，比如丹尼索瓦人就是通过DNA检测确定的，参见第106页《丹尼索瓦人》）。但是，仅仅凭借几块残骨便给某个人亚族生物取个种名往往很难做到，因为原始人种非常相似，往往只有几处骨头是某个物种特有的。在确定物种时，还需要考虑性别差异和发育过程中的变异。因此，原本分别定名为腊玛古猿（Ramapithecus）和西瓦古猿（Sivapithecus）的两个生物，后来被确定为同一物种的雄性个体和雌性个体。
另外，我们对物种的实际变异性所知甚少。如果拥有大量化石，还可以通过统计对比将某个化石归入某个类群。可是，当化石数量稀少且多为碎片的时候，判断的武断性就不可避免地增加了。
最后，还有一个不在科学范畴之内的现象：发现人亚族遗骨需要耗费大量心血和精力，这就导致研究人员往往会夸大新化石的特征并给化石取个新名字。这种操作有助于研究人员获得资金支持，尤其是当研究人员声称发现的是人类祖先的化石的时候，不过，这也导致本已相当复杂的系统发生树更加“枝繁叶茂”。所以，在科学出版物里，常常会有物种随着研究人员的偏好和科学知识的进步出现而后又消失的现象。实际上，学界历来将研究人员分为“分裂派”和“归并派”，前者倾向于利用似乎与其他物种有所区别的细枝末节创造新物种，后者倾向于考虑物种的自然变异性，将不同物种归并汇总，但是归并范围往往极为宽泛（参见第72页《“开枝散叶”的原始人》）。
除上文所述的南方古猿阿法种和南方古猿非洲种外，再略举几例南方古猿属的其他有趣物种。
南方古猿湖畔种（Australopithecus anamensis）
南方古猿湖畔种是根据在东非发现的化石描述的，经测定，其化石年代为420万年前至380万年前。南方古猿湖畔种身高约1.4米，生活在相当湿润的林地里，在双足行走方面比露西更强。下颌又长又窄，颇具原始特点；牙齿细小，更有现代特征。一些古人类学家认为，南方古猿湖畔种可能是人属的祖先。正因如此，有人提议将其改名为非洲前人（Praeanthropus africanus）。
黑猩猩、南方古猿湖畔种和现代人的下颌对比
南方古猿加扎勒河种（Australopithecus bahrelghazali）
1995年，米歇尔·布鲁内特率队在乍得发现了一块下颌骨化石，后将其命名为南方古猿加扎勒河种，昵称为“阿贝尔”（Abel）。这是唯一一种在非洲东部和南部以外地区发现的南方古猿，生活在360万年前。在这一时期，撒哈拉还是广袤的森林和稀树草原。南方古猿加扎勒河种可能并不是一个不同以往的物种，不过，这块下颌骨化石证明，南方古猿的领地范围比已知化石的分布区域更广。
南方古猿惊奇种（Australopithecus garhi）
生活在距今250万年的埃塞俄比亚的南方古猿惊奇种，于1997年由埃塞俄比亚古人类学家伯海恩·阿斯法（Berhane Asfaw）率领的研究团队发现，它们拥有较小的脑容量和巨大的牙齿。化石的共同发现者蒂姆·怀特猜想，南方古猿惊奇种有可能是我们的祖先。但是，它们与最初的人类生活在同一时期的事实，并不足以提高这种假设的说服力。
南方古猿近亲种（Australopithecus deyiremeda）
南方古猿近亲种于2011年发现于埃塞俄比亚，生活年代为340万年前，无论在地理区域上还是生活年代上，都可以视为露西的邻居。南方古猿近亲种的颌骨粗壮，牙齿形状也和露西不同，这说明其食谱略有不同。
南方古猿源泉种（Australopithecus sediba）
在南非马拉帕（Malapa）发现了两个保存状况相当完好的骨骼化石之后，李·伯杰（Lee Berger）于2010年描述了这个年代很晚近的物种（生活于200万年前至180万年前）。源泉种的大脑较小，但与其他南方古猿相比更加不对称，因此与人属更加接近。其骨盆比较宽，通常认为这与颅骨变大有关。胸腔呈锥形，上窄下宽，手臂可以做大幅度的动作，非常适于攀缘。脚跟具有原始特征，与猿人的脚跟相似，但脚踝比其他南方古猿更具现代特征。同样，源泉种的双手拇指较长、指节末端增宽；通常认为这是源泉种手巧的一个证明，也是它与人类更接近的一个特征。最后，源泉种的牙齿比阿法种小。
这种原始特征和全新特征（所谓的“衍生特征”）叠加的现象被称为“镶嵌演化”。演化并不会同步地触及所有器官，这就导致很难确定物种在人亚族系谱图上的精确位置。除此之外，还有一个难题：发现的两副骸骨中，一副属于年幼的雄性，其解剖学特征尚未最终定型，因为在个体的发育过程中许多骨头会发生变形。骸骨的发育模拟结果显示，其成年后的体形与南方古猿非洲种接近。正因如此，有一些人将源泉种视为非洲种的“接班人”，而非洲种在之后就灭种了，并没有留下直系后代。另一些人则与化石发现者一样将源泉种视为直立人可能的祖先，所以将它的种加词定为“sediba”，这在当地语言里正是“源泉”的意思。
南方古猿和傍人分布图
然而，源泉种本身也是年代相当近的物种了。在源泉种尚存活于世的时候，人属已经在非洲大地上生活几十万年了。只是，迄今发现的最为古老的化石也只是些碎片，化石的身份也非常有争议（参见第51页《最初的人属》）。此外，源泉种可能诞生得更早，但是至今尚未发现相关遗迹。
平脸肯尼亚人（Kenyanthropus platyops）
1999年，古人类学家米芙·利基（Meave Leakey）在肯尼亚的洛迈奎（Lomekwi）发掘点发现了一个颅骨，经测定其年代为340万年前。这个颅骨的面部扁平，与下颌前凸的南方古猿反差非常明显，特征上更接近古老的人属成员鲁道夫人。米芙·利基对其进行了描述，并因为它与其他物种差异甚大而为其取了新的属名“平脸肯尼亚人”。但是，由于在沉积压力作用过程中发生了形变，围绕这一颅骨化石的争议很大。
1976年，另一类型化石的发现点燃了研究人员的热情。在这一年，玛丽·利基（Mary Leakey）在坦桑尼亚莱托里（Laetoli）地区发现了南方古猿的脚印。370万年前，三只南方古猿列队前进，在火山灰中留下了脚印，火山灰硬化后便将脚印保存了下来。这些脚印为南方古猿双足行走提供了补充证据。
南方古猿的脚印，莱托里（坦桑尼亚），距今370万年
上述这些南方古猿物种中，一种将不断演化，最终产生最初的人类，另一种——也有可能是同一种——则演化成了傍人。还有一些继续维持原先的生活，直至彻底消亡在历史的长河里，没有留下任何子孙后代。
傍人
20世纪下半叶发现的部分南方古猿因颅骨硕大而被描述为“粗壮”型，其他的则相应地被描述为“纤细”型。随后，这些“粗壮”型南方古猿被归入广为接受的傍人属（Paranthropus）。
头大、颌沉是傍人的典型特征。傍人臼齿巨大，适于咀嚼质地坚硬且纤维丰富的食物。牙齿化验结果显示，一种傍人特别爱吃比嫩叶或水果坚硬得多的草本植物。草中往往富含二氧化硅，这也在傍人的牙齿上留下了非常典型的磨损痕迹。在傍人种群里，雄性比雌性大很多，而且与雄性大猩猩一样，颅骨上存在骨嵴，而骨嵴正是强壮的咀嚼肌的固着点。可是，虽然发现了为数众多的傍人颅骨化石，傍人的其他骨骼是什么情况，我们依然知之甚少。
在埃塞俄比亚发现的埃塞俄比亚傍人（Paranthro-pus aethiopicus）是最古老的傍人物种，生活在270万年前至230万年前。随后，鲍氏傍人（Paranthropus boi-sei）在东非出现，并一直生存至120万年前。第三种傍人名叫粗壮傍人（Paranthropus robustus），220万年前至100万年前生活于南非，有人认为它们应是南方古猿非洲种的后代。这三种傍人的确拥有一些共同特征，但它们的亲缘关系并没有那么明显。生活在相似环境中的物种能够演化出相似的特征，这种趋同现象在动物演化史上屡见不鲜，有时候确实容易与遗传得来的物种相似性混淆。
无论如何，到了距今约100万年时，所有的傍人都消失得一干二净，没有留下任何子孙后代。或许，与众不同的饮食习惯使傍人难以适应气候变化和环境改变？或许，人类在傍人的灭绝过程中发挥了某种作用？事实上，傍人的确曾与其他人亚族物种，即人属的成员在这个星球上共同生活过。
第三章　原始人类
在很长的一段时间里，人们一直认为，人类演化史是简单的线性历史：南方古猿演化为一种原始人，也就是能人；能人接着演化为具有现代身体的直立人，而直立人正是智人的直系祖先。然而，近些年来的考古发现对上述每个阶段都提出了质疑，同时勾勒了一幅更加复杂的演化图景。
在人属诞生之前
按照现代演化论的原则，人类起源的研究不能简单归纳为寻找假定存在的人类祖先。怎么就能够确信某个化石代表了某个物种的祖先呢？对于动物物种，古生物学家倾向于探寻它们之间的亲缘关系，而不考虑物种在时间上的先后顺序。如果化石显示两个物种具有相同的衍生特征（即解剖结构上的创新性状），我们就认为这两个物种有亲缘关系。这两个物种也就拥有共同祖先，不过，在大多数情况下，共同祖先都不是明确的，尽管某些化石可能与其相近。这种研究方法就是所谓的“支序分类”法，由此可以得到更加严谨、更便于客观探讨的系统发生树（展现物种之间的亲缘关系）。
涉及我们的物种时，人们往往会将科学理论的严谨性搁置一旁，因为将某个化石定位到人类的演化世系中具有极大的象征意义。无论是对还是错，直系祖先总比绝后表亲更引人关注。在众多的南方古猿和邻近物种（肯尼亚人、傍人等）中找出谁是现代人类谱系的真正起源、谁是最初的人属物种的祖先，确实很有诱惑力。
于是，古人类学家对化石进行探测，以图确定最为“类人”的特征。他们随即遇到了几个难题。一方面，由于遗骨不完整，往往缺少能够确定物种演化位置的有用要素。另一方面，如果采取这种人类中心视角的话，那每个物种都同时呈现出原始特征和“现代”特征，即更加类人的特征（参见《既是智人又是现代人！》）。
最后，正如我们前面提到的，在相似的环境压力作用下，演化可使得多个物种发生相似的改变。换言之，某个物种身上出现现代特征，并不能证明这个物种就是我们的先祖。因此，尽管都曾制造石质工具，但多个人亚族物种并没有因此被列入我们祖先的行列。
怎样才算人类？
即便能够确定某种南方古猿最有可能是人类支系的起源，也无助于找到下面这个重要问题的答案：在演化过程中，这个物种是什么时候变成人的？是不是存在某些明确无误的特征，能够将其鉴别为人类而不是南方古猿？
对古生物学家而言，这个问题马虎不得，因为他们要给发现的化石取名。物种的名称不是没有利害关系的。属名取为“南方古猿”还是“人”，这里面的差别很大，关系到能不能引起公众、记者和能为后续挖掘工作提供资金支持的机构的注意！当然了，按说不应该有这些顾虑的，但实际上这些顾虑的影响不容忽视。
这个问题不但是哲学问题（是否存在“人类特性”？），也是生物学问题（鉴于黑猩猩与人类的基因相近度，是否应将黑猩猩归入人属？），还是古人类学问题：从什么时候起，或变化积累到什么程度，某个人科物种就可以被算作人属了？这个问题也可以反过来问：从现代人开始回溯历史，最早在过去的哪个时刻我们的祖先就能被视为人类了？
自史前研究开始以来，许多人回答了这些问题。他们给出的答案里，往往借用了略显老套的“人类特性”。随着动物行为学、古人类学、神经学和分子生物学等多个学科不断取得新的进步，这些答案也很快地落伍了。
我们举两个例子。使用工具长期被视为典型的人类特征，但有些动物也会使用工具，比如黑猩猩（用木棍捕捉白蚁、用石头砸开坚果）、海豚（用海绵保护自己的吻突），甚至某些鸟类（用刺捕捉树皮下的蛴螬）。早在南方古猿独自在稀树草原上纵横的时候，就已经出现了最为古老的石质工具（参见第55页《最初的工具》）。另一个例子是大脑的增大。无论是在我们的历史中，还是对我们现今在动物界的地位而言，这个现象都非常重要，在某个时期甚至曾经合理化了“脑容量界值”（cerebral rubicon）的概念：脑容量低于某个值的，就是猿；脑容量高于某个值的，就是人。可是，无论是工具还是脑容量，类似的标准都必须摒弃，因为它们过于简化，没有真正的用处。
无论是基因还是解剖学特征，由于各器官以不同的速度演化，很难制定显而易见的临界标准——只要达到了这个标准，猿就应被称为人。在实际操作中，古人类学家根据的是一整套特征，其中包括了在化石上经常能够观察到的特征，比如脑容量或牙齿的大小和形态。但是，学界始终无法取得普遍共识；关于多个原始人种的分类，就一直未曾达成一致。
最初的人属
1961年，玛丽·利基和路易·利基（Louis Leakey）在坦桑尼亚的奥杜韦发现了一个人亚族生物的颅骨和手骨的化石碎片，这个生物生活在大约180万年前，与当时已知的南方古猿和傍人都不同。此前不久，他们在同一个挖掘点发现了一些石质工具和一个傍人的骨骼化石，并认为是这个傍人制造了这些工具。但是，新化石的发现改变了整个局面。这个新发现的人亚族生物，手掌更加类人，臼齿也比较小；初步估计脑容量约为600毫升，比南方古猿的脑容量大；指骨像黑猩猩一样呈弯曲状，但末端指节变宽，应该便于抓握物体。与此前发现的傍人相比，这个人亚族生物似乎更像是石质工具的打造者。它被命名为能人。
在接下来的几年里，古人类学家发现了许多新的原始人遗骨，不过这些遗骨具有不同的特征，似乎有必要将它们定义为新的物种，也就是后来的鲁道夫人（Homo rudolfensis，意为来自鲁道夫湖的人。鲁道夫湖即现在的图尔卡纳湖）。鲁道夫人体形更高大、面部更扁平。经测定，全部遗骨的年代都在大约230万年前至180万年前。到了2015年，在埃塞俄比亚的勒迪—戈拉鲁（Ledi-Geraru）发现的半个下颌骨化石，似乎将人属的诞生时间向前推了50万年，即距今280万年。
能人和智人的颅骨对比
同南方古猿的化石一样，能人的化石也不完整，往往呈碎片化，这导致复原工作很有争议。这些化石是属于两个不同的物种呢，还是属于一个具有很大形态多样性的种群呢？此外，这些人属生物表现出的明显性别二态性，使问题变得更加复杂。不过，这样一来，在化石上观察到的差异就可以部分地归为雌雄两性的差异。最近，人们甚至开始质疑它们是否应该被归入人属了。有些古人类学家认为，许多化石其实属于南方古猿，而真正的人属稍晚才会出现。
另一个难题是，我们对这些人属生物的颅后骨骼（即除了颅骨和颌骨外的全部骨骼）所知甚少。目前，尚未发现与露西同样完整的骨骸。根据已经出土的骨骼化石碎片，我们发现的是比南方古猿稍大也更善于双足行走的人亚族生物，尽管它们依然保留了部分树栖生活习性。它们拥有更短的大脚趾，行走起来更有效率，也具有了能够缓冲震荡的足弓。
地域偏见
年代在200万年以上的人亚族化石全都发现于非洲，这为人属的非洲起源假说奠定了基础。实际上，这些人亚族化石几乎全部出土于东非（从埃塞俄比亚到坦桑尼亚，特别是肯尼亚）。在南非，化石往往发现于洞穴中，那时候的人亚族生物不过是大型猫科动物的口中餐。由于滑坡和流水造成地层扰动，很难确定南非发现的化石的年代。
与此相反，在东非，连续不断的火山喷发让年代测定变得较为简单。半沙漠的自然环境为确定化石位置提供了极大的便利。更何况，东非的地质条件也非常有利。地壳板块运动导致地壳岩层断裂、分离，进而造就了漫长的东非大裂谷，大部分考古研究工作都是在东非大裂谷的两侧展开的。在大裂谷的形成过程中，沉积层发生倾斜，原先无法企及的地层现在触手可及。呈现在古人类学家面前的，是几十万年间形成的连续沉积层，而且还是在很小的面积内。
而在占非洲大陆面积95%以上的撒哈拉以南非洲，考古研究完全不能开展或很难开展。在又湿又热的森林地区，不但底层土壤难以企及，化石也往往因为环境不利于保存而消失不见。再往北，在撒哈拉沙漠里，考古工作非常辛苦，但也会结出累累硕果；乍得沙赫人“图迈”和南方古猿加扎勒河种的发现就是最好的证明。至于北非，对最初的人类化石来说，那片土地还是太年轻了。
其实，在具有相应年头且可能含有丰富化石的地方，只要努力寻找就能挖到人亚族化石。从目前人亚族化石的发现地来看，东非还不足以被视为不容置疑的人属“摇篮”。
最初的工具
2015年，在肯尼亚图尔卡纳湖畔的洛迈奎挖掘点，出土了最为古老的石质工具，都是粗糙凿成的，其中有石锤和用作石砧的巨大石块，年代为大约330万年前。
洛迈奎的原始工匠使用的制造技术相当简单：直接用要加工的石块（即所谓的“石核”）撞击石砧。这个技术被称为“撞击法”，可以加工锋利的石片，尽管很难对加工成果进行精细的控制。其实，石匠的目的可能只是获得石片，石核不过是锤击产生的残留物。但无论如何，这种加工行为意味着它们对所需物品产生了心理表征。
由于这个时候人属尚未登上人亚族演化的舞台，所以这些工具不可能是人属物种制造的。那时候，在非洲大陆上活跃的人亚族物种只有南方古猿，尤其是南方古猿阿法种（即露西所属的物种）和平脸肯尼亚人。不但没有任何证据能够将这些工具与某个特定的人亚族物种联系起来，而且制造石器的生产工艺也曾被不同物种（包括人类演化谱系以外的物种）屡次加以改进和完善。
能人的工具制造精度更高。这些被称为“砾石砍砸器”的石质工具，至少一侧具有锋利的刃口。能人制造工具时，通常是一手握着加工对象，一手握着石锤。加工产生的碎片也能为之所用。这些砍砸器定义了人类有史以来的第一个石器文化——奥杜韦文化（Oldowan，以其发现地奥杜韦命名）。
锋利石片的生产，可能为能人日后的成功提供了助力，使它们在获取更加多样化的食物方面拥有了巨大的优势。实际上，这些石质工具表明，它们越来越适应食肉的饮食习性。
属于奥杜韦文化的石质工具
容量与日俱增的大脑
同南方古猿的情况一样，人属的诞生似乎也与气候变化有关。在大约290万年前至240万年前，气候变得更加凉爽也更加干燥，由此导致森林的面积进一步缩小，并分割成更加开阔、更加多样的栖息地。
与南方古猿相比，人属物种的食物种类更杂，肉类和脂肪所占的比例也更高。人属物种确曾取食尸体上的肉，但并不能因此认为它们拥有猎杀水牛和其他大型动物的能力，这更多的是食腐行为（指食用意外死亡的动物或大型食肉动物杀死的猎物的尸体），而且还要与鬣狗和秃鹫争抢才行。借助手中锋利的石质工具，它们能够切断肌腱获取兽肉，并砸开骨头食用骨髓。
富含蛋白质和脂类的动物性食物的增加，或许与人亚族大脑的增大有所关联。实际上，大脑重量虽然仅占人类体重的2%，能量消耗却占人体能量消耗总量的20%左右（当然是在不做体力劳动的情况下）。大脑大，就需要进食营养格外丰富的食物。
拥有大容量的大脑有什么好处呢？同双足行走一样，我们不能用大脑在几百万年后才显现的优点对此加以解释。一般而言，灵长目动物的大脑比羚羊和猫科动物的大脑更发达。这个特点与寻找食物没有关系，与危机四伏的野外生活也没有关系，而是与灵长目的社会组织形式有关。分辨敌我、日常协作、构建长期联盟关系等等，构成了个体间纷繁复杂的关系，而这又要求对族群内部关系有深入的了解和理解。对南方古猿来说，在面对比森林更加凶险的开阔环境时，抱团生活会安全很多。这样一来，由于能够促使族群成员之间建立深入合作关系，较大的大脑就具有了演化优势。
但是，安全也是要付出代价的！发达的大脑需要营养更加丰富的动物性食物。动物性食物更容易消化，其吸收过程对肠道造成的负担小，肠道消耗的能量相应地更少，由此节省下来的能量正好可为大脑所用。如果大脑能够更加高效地运转，就能够找到更多的食物来源或者制造有助于获取食物的工具。这是发达的大脑带来的第一个良性循环！其产生的第二个良性循环如下：较大的大脑有助于族群成员搭建良好的社会关系，反过来，良好的社会关系确保了它们更好地开发利用环境，比如，通过共享新资源等方式。在今天看来，这些相互作用至少部分解释了人属为什么会出现。
脑容量
通过某个人亚族生物的颅骨化石，可以大致估算其脑容量大小。在同一个物种内部，脑容量的差异非常大（人类的脑容量为1000毫升至2000毫升，最大值几乎是最小值的两倍），也根本不可能知道每个个体的实际脑容量。
在物种之间，平均脑容量的差异比较大：黑猩猩的平均脑容量为400毫升，明显与我们人类（平均脑容量为1350毫升）不同。不过，在对比时还应考虑两个物种的身材差异，因为黑猩猩比人类小很多。理论上说，脑容量随着身材的增加而增加，但二者并非成正比例关系。人类的大脑只占体重的2%，鼩鼱的大脑却能占到体重的10%！
因此，在比较两个物种时，更多的是对比它们的脑化指数。所谓的脑化指数，指的是动物的实际脑大小与根据体重得出的预期脑大小之间的比值。人类的脑化指数比其他物种高，约为7.5，这说明人类的大脑比同等体形的哺乳动物的预期脑要大七八倍。黑猩猩的脑化指数为2.5，海豚的脑化指数为5.3。
与南方古猿相比，最初的人属生物不但拥有更大的大脑，还拥有更大的身体。其实，直至大约50万年前，人亚族的大脑主要都是随着身材的增大而增大的。在那之后的脑容量才是真正地增加了。
无论是脑容量还是脑化指数，都不足以描述人属身上实际发生的变化。其他因素与智力（这里简单理解为解决新问题的能力）的关联更紧密，比如大脑皮层（即大脑表层灰质）神经元的数量及功能（即神经元与其他神经元连接的能力或神经冲动的传导速度）。
脑同样在颅骨上留下了自己的印记，这就为我们提供了一些与脑的构造有关的信息，比如大脑各个脑叶的相对大小或左脑与右脑的差异。人类的演化同样伴随着脑部结构的改变，这些改变可能与脑容量的增加具有同样重要的意义，但是化石并没有给出大脑构造的相关细节（除非化石中保留了DNA，参见《既是智人又是现代人！》）。
直立人是个大个子
1984年，纳利奥科托美（Nariokotome）男孩（又称“图尔卡纳男孩”）的发现，使我们对最早人类的了解向前迈出了一大步。这是一副近乎完整的骨架（只缺了手和脚），其生理特征与我们更加接近。
这个骸骨化石可以追溯至大约150万年前，由理查德·利基（Richard Leakey）考古队的卡莫亚·基穆（Kamoya Kimeu）在图尔卡纳湖畔发现。图尔卡纳男孩死亡时仅有8岁，身高刚过1.5米。成年后，身高或许将达到1.7米，甚至更高。一开始，人们认为它已经11岁了，而且身材更加高大，但牙齿化验结果显示，它的发育速度比人类快了不止一星半点——才到8岁，它就几乎完成了身体发育！与南方古猿相比，它的骨骼与人类更加相似，四肢比例非常接近人类。骨盆和股骨的结构说明它善于行走甚至能够奔跑，但论起攀缘树木或许就不是祖先的对手了。2009年，人们在肯尼亚发现了这一物种的脚印，其中不少与智人的脚印难以区分。
图尔卡纳男孩的颅骨相对较小，面部向前凸出。牙齿比人类粗大，但与能人相比还是有所减小。眼眶上方有粗壮的眉骨，额头后倾，几乎没有下巴。
凭着800毫升左右的脑容量，直立人的演化程度比能人略高，但直立人的身材可比能人高大得多。不过，与能人相比，直立人的大脑更加不对称，布罗卡区和韦尼克区（对人类语言能力至关重要的两个脑部区域）比较发达，但这并不意味着直立人具有语言能力，因为那还需要咽喉的结构满足条件才行。然而，化石并未给出与咽喉结构有关的任何信息。
直立人和智人的颅骨对比
最开始，人们将图尔卡纳男孩确定为直立人（迄今为止还没有发现如此完整的直立人骸骨）。随后，一些古人类学家认为，同亚洲发现的直立人相比，图尔卡纳男孩足够不同，完全可以视之为另一个物种。最终，图尔卡纳男孩被定名为匠人（Homo ergaster），并被视为亚洲直立人的非洲先辈。
无论被称为匠人还是直立人，该人属物种自190万年前起便生活在非洲大地上。人们曾经认为它们是能人的后代，不过人们已经发现了二者的同时代化石（距今约150万年），这说明它们曾经在这个星球上共存了至少50万年。或许，生活方式的不同削弱了相互之间的竞争。与直立人相比，能人的食性更加偏向素食。
同步加速器带来的发现
X射线微断层扫描（X-ray microtomography）可以非常精确地探测骨化石内部且不会损坏化石。在牙齿上取得的研究结果格外引人注意。随着生物个体不断发育，坚硬无比的牙釉质在牙齿表面逐渐沉积。借助同步加速器，可以发现以细纹形式存在的牙釉质沉积。这么一来，就能确定生物个体生命中重大事件的发生日期，比如出生或断奶，因为这些事件都会在牙釉质中留下痕迹。
通过对牙齿微观结构的观察，我们发现南方古猿的发育速度很快，与黑猩猩的发育速度很接近。图尔卡纳男孩的发育速度相对缓慢，但与我们人类还是大有不同。
多种用途的两面器
如果我们在定义自己的物种时坚信历史和史前史所示的人类和智慧长久以来的特征，那我们或许不会自称智人，而会自称工匠人（Homo faber）。
——亨利·伯格森（Henri Bergson），1920
与图尔卡纳男孩同时出土的还有一些砍砸器，和能人制造的砾石砍砸器类似。不过，在这个时期，非洲大陆上已经出现了新的工具——两面器。所谓的两面器，是指加工成杏仁状的石头，多多少少呈椭圆形或三角形，两个侧面做了对称加工，两面之间是锋利的刃口。
迄今为止发现的两面器最早可追溯至大约170万年前，都是直立人制造的。直立人还制造了与两面器类似的手斧，二者的区别在于，手斧的一个面未经加工，且刃口几乎与其自身中轴垂直。制造过程中产生的碎片，直立人也不会丢弃，而是通过打磨将其改造成较小的工具。
属于阿舍利文化的石质工具
根据1872年在圣阿舍利（法国亚眠下辖地区）发现的两面器，人们将这个文化命名为“阿舍利文化”（Acheulian）。阿舍利文化紧接奥杜韦文化而来，但二者的石器制造技术在时间和空间上都有所重叠。奥杜韦文化和阿舍利文化共同定义了旧石器时代早期。
人们先后在近东和印度发现了两面器，其历史可追溯至大约150万年前。欧洲最早的两面器诞生于距今65万年前后。阿舍利文化在大概30万年前逐渐被最初的智人和尼安德特人特有的莫斯特文化（Mousterian）所取代。
两面器的产生是重大技术变革的结果。这是因为两面器的制造有两个前提：首先，要事先对所需工具有精确的初步设想；其次，要拥有比制造砾石砍砸器更高超的手艺。在两面器中，史前史学家还看到了有美感的外观，以及创造对称形工具的主观意愿，而这可比制造单纯满足具体用途的工具要复杂很多。
另外，对于直立人怎么使用两面器，我们依然没有头绪。当然了，两面器能用来切割肌腱、剥离关节或砸开骨头以获取骨髓，为直立人食用尸体提供了极大的便利；在牛尸上进行的试验也为此提供了佐证。但是，两面器的造型多种多样，想必还有其他用途，比如挖掘土地、砍斫树干、刺穿皮肤甚或击打对手（参见《狩猎与传统》）……此外，还可以通过不断的打磨对工具进行改造并改变其用途。
随后，在距今50万年前后，出现了以骨头或鹿角制成的“柔软”手锤，这使得打磨的精度更高。借助这种手锤，直立人使用在远方发现的奇石精心制造了用于祭祀或象征威望的两面器。
新面貌
猴子共有193种，其中192种身披毛发，唯一一种全身光滑无毛的猴子自称为智人。 ——德斯蒙德·莫里斯（Desmond Morris），1960
有些古人类学家，比如丹尼尔·E. 利伯曼（DanielE. Lieberman），将直立人的奔跑能力视作人类世系演化的关键因素。出色的体能加上可用作武器的先进工具，使直立人成了人类历史上第一个真正的猎手。
在稀树草原上，许多动物跑得比人快，但能与人类一样长时间奔跑的却少之又少。人类的真正特长其实是耐力！我们可以非常容易地想象直立人通过追逐而累垮猎物的画面。当然，这里说的猎物可不是那些体形庞大的野兽，而是羚羊或野兔这样的小动物。
直立人之所以善于奔跑，是因为获得了修长双腿之外的新特征。直立人身材更加苗条，胸腔更加呈圆锥状。由于摄取的植物性食物减少，它们的肠道变得更短，腹部也变小了。由于身处热带，它们应当出汗很多。汗液的蒸发快速消散了肌肉运动产生的热量，这是调节体温的有效方式，而动物往往因为不能这样调节体温而耐力受限。
直立人大量出汗，是因为皮肤上有数以百万计的汗腺，这意味着直立人已经失去了祖先曾长有的绝大部分毛发。至于人类是什么时候失去毛发的，化石没有给出任何信息，但失去毛发与善于奔跑有所关联并非毫无根据的假设。
为了了解得更多一些，我们可以问问……虱子！所有灵长目动物的身上都有寄生虫。在今天的人类身上，甚至生活着好几种不同类型的虱子：头虱、体虱、阴虱。这几种虱子之间互有亲缘关系，与黑猩猩或大猩猩身上的虱子也有相似之处。通过分析它们的DNA，我们得到了与人类演化有关的非常有趣的信息。事实上，头虱和体虱是近亲，十来万年前才开始分化，而它们的分化可能与衣物的诞生有关。另外，它们与生活在黑猩猩身上的虱子还有共同祖先，这个共同祖先生活在大约560万年前，而人类和黑猩猩两个支系差不多就是在这个时候分道扬镳的。
人科物种身上虱子的系统发生树
至于阴虱，则与生活在大猩猩身上的虱子亲缘关系较近，二者在约330万年前发生分化。在此之前，虱子应当可以轻而易举地在人亚族和大猩猩族之间传播，至于传播途径，或许是人亚族和大猩猩族重复使用每日在树下形成的枯枝落叶层。不过，两种虱子的分化表明其生存环境变得有所不同，这或许与第一批人属失去毛发脱不了干系。于是，原本生活在大猩猩身上的虱子继续在大猩猩的每一寸毛发中繁衍生息，而生活在人属身上的虱子最终选择蜗居在阴部。这么说来，失去毛发应当比人类诞生还要稍早一些！
毛发的减少还产生了另一个结果。赤道地区光照强烈，这就要求对皮肤提供强有力的保护，以使其免受危险的紫外线的伤害。在毛发的保护下，南方古猿的皮肤可能呈浅色，就像经常在现存猿猴身上观察到的一样。而原始人裸露在外的皮肤中快速积累了大量黑色素，在保护皮肤的同时，这种物质还或多或少给现代人类皮肤着色。
“开枝散叶”的原始人
迄今发现的原始人属物种遗骸，构成了一幅马赛克镶嵌画，各个物种随着最新的解读、重建和发现改变着自己的位置。如上所述，人们已经描述了能人、鲁道夫人和直立人，这三个物种似乎曾经共存，或者至少在某些时期共存。
有些研究人员质疑是否应当将人属分成几个物种。他们的主要依据是1991年至2005年在格鲁吉亚德玛尼西发现的工具和骸骨，其中有五个保存相当完好的颅骨，均可以上溯至大约180万年前。第五个颅骨，代号为D4500，与五年前出土的一个下颌骨极为相配，其脑容量约为550毫升，接近能人的最小脑容量，面部与直立人类似，牙齿则与鲁道夫人相仿。另外四个颅骨的脑容量稍大，为630毫升至700毫升。这些颅骨和颌骨均具有镶嵌演化特点；兼而有之的原始特征和衍生特征，将它们与同一时代的全部人属物种（能人、鲁道夫人、匠人）关联起来。
然而，这些在同一个地方发现的属于同一个时代的颅骨大有可能属于同一个种群。最初对它们进行描述的研究人员认为，这些颅骨在不同的地点出土，彼此之间差异很大，应被视为五个不同的物种。实际上，它们的差异之处应归因于年龄的不同：牙齿分析结果表明，一个颅骨的主人当属“英年早逝”，而另一个牙齿掉光的颅骨，显然来自一个垂暮老者。此外，还应当考虑它们的性别和种群内部的个体变异性。
2013年，颅骨的发现者——大卫·罗德基帕尼泽（David Lordkipanize）及其同事——发布了关于这个种群的分析报告。在他们看来，在个体变异性方面，这个种群足以与人类或黑猩猩等量齐观。正因如此，他们提议将这一时期的人属生物全部划入同一个物种——直立人。不过，由于忽略了与其他地区出土的化石的实际差异，这种“一刀切”的归并方式没有得到全体古人类学家的赞同。
德玛尼西出土的五个颅骨的3D复原图
2013年，美国古人类学家李·R. 伯格（Lee R. Berger）带领考古队在南非的新星（Rising Star）洞穴——距离他发现南方古猿源泉种的地方仅有1 000米远——发现了大量的骨骼化石。总数超过1 500件，来自至少15个年龄各异的个体，通过拼凑可以得到几乎完整的骷髅。这一发现——至少在数量上——堪称古人类学史上最为重大的发现，也为“枝繁叶茂”的人类演化树增添了新的枝叶。
与南方古猿类似，这些骸骨兼具原始特征和现代特征，身材短小，脑容量也小（约为500毫升），乃至于李·R. 伯格将其视为新物种，命名为纳莱迪人（Homo naledi），并将其视为人类的潜在祖先。纳莱迪人的手臂适于攀缘，但双手似乎能够进行精细操作，髋关节与露西类似，双足则极具现代特征。同南方古猿源泉种一样，纳莱迪人也表现出兼具原始特征和衍生特征的镶嵌演化现象。而某些骨头，比如股骨，则带有在南方古猿和现代人身上都前所未见的独特细节特征。
一般而言，骸骨大量累积是掠食者进食或地下河冲刷造成的。由于遗址里没有发现羚羊或其他动物的骸骨，李·R. 伯格断定这个遗址应当是纳莱迪人丧葬行为的结果。不过，绝大多数专家都不认同这个假说，因为如果这个假说成立，那就意味着纳莱迪人已经学会了使用火，否则它们是无法抵达洞穴底部的，然而，迄今为止尚未发现脑容量那么小的人亚族成员会使用火。
起初，经过测定，骸骨的年代为大约200万年至100万年前，这就很难确定其在人属中的演化位置了。2017年人们进行了第二次测定，确定其年代仅为大约30万年前，这样一来，解释它们在人类系谱图中的位置变得更加复杂。尽管骸骨数量众多，但这些年代上的问题使众多研究人员不得不就纳莱迪人的演化位置乃至其是否仍可被归为人属进行激烈的辩论。此外，在尚未于科学期刊中详细描述考古发现前，李·R. 伯格就将其大肆展示并发表在大众杂志上，学界对此也是持保留态度的。
第四章　去往世界尽头
如今，智人已经遍布全球。所有的发现和证据都令人猜想人类起源于非洲。如果果真如此，那我们还需要弄明白下面这两个问题：我们的祖先是怎么迁移并占领其他大陆的？这些迁移的原始人在现代人的诞生过程中发挥了什么样的作用？
走出非洲
解决方法路上找。 ——古代谚语，传为第欧根尼所言
许许多多的考古发现将人类祖先走出非洲并逐渐占领欧亚大陆新地盘的日期向前推，而且越推越远。
虽然已经成了约定俗成的用语，但是“走出非洲”这个说法并非没有缺陷。“走出非洲”，给人的感觉像是一蹴而就，而事实上，我们的祖先是追随着角马或斑马迁徙的脚步逐渐扩大自己的分布区域的。角马或斑马的迁徙受制于气候变化，追随着它们的步伐，以打猎和采集为生的原始人便得以探索新的地域。或许，这些原始人也曾短暂地面临人口增加的压力。其探索行为并非出于自愿或事先规划，而是时快时慢的整体移动，在几千年间不断持续进行，且在个体层面几乎难以察觉。即便每一代人的移动距离可能都不到10千米，在几万年的持续迁移中，还是有些原始人能够到达远东的，而大部分年代测定技术的精度甚至还达不到几万年。
在红海和里海之间的德玛尼西（位于格鲁吉亚）发现的人属物种化石证明，远在大约180万年前，原始人的足迹便已踏上欧亚大陆。它们使用的工具不过是造型简单的砾石砍砸器，与能人制造的工具类似。还有人提出，人类走出非洲大陆的日期其实更早。2016年，在印度北部的马索尔（Masol）挖掘点出土的牛科动物骸骨上发现了食腐的痕迹。随着这些骸骨出土的也有砾石砍砸器，其年代甚至可以追溯至距今260万年。
改变
在迁移过程中，原始人面临着与非洲大陆截然不同的生活环境。生活环境、生活方式及周围物种的不同，导致它们沿着不同的演化路径分化，尤其是在出现了长期的地理隔离后。领地的不断扩大，伴随着纷繁丛杂的演化，最终导致多个人属物种的诞生。
1891年，欧仁·杜布瓦在印度尼西亚的特立尼尔（Trinil）挖掘点发现了历史上第一个直立人化石，并将其命名为直立猿人。随后，在爪哇岛发现了这个物种的其他骸骨，比如可以追溯至大约80万年前的桑吉兰（Sangiran）颅骨。从1921年到1937年，在北京附近的周口店出土了属于同一物种的大量骸骨，起初人们将其命名为中国猿人（昵称“北京人”）。可惜，二战期间，这些化石在运往美国的途中全都消失在了茫茫大海上。
到了20世纪50年代，古人类学家达成了共识，将在亚洲发现的这些化石全部归为“直立人”。直立人五官粗犷：眉骨粗壮，眼眶后侧颅骨缩小，额头后倾，臼齿较大，几无下巴（参见第62页插图）。颅骨向后伸长，像是在枕骨上盘起的发髻。颅骨骨壁极为厚实。脑容量通常在800毫升至1 100毫升之间（如果将德玛尼西发现的化石也考虑进来，那这个数值还要减小）。颅盖的形状很有特点。直立人的颅骨骨壁向着头顶的方向收缩，而智人颅骨的最宽处位于头颅中部。
然而，亚洲的直立人与非洲的直立人并无根本性的差异。1978年在中国云南大理发现的距今26万年的颅骨与在赞比亚发现的年代稍古老些的卡布韦（Kabwe）颅骨及在希腊发现的佩特拉罗纳（Petralona）颅骨非常相似。这些相似之处令人猜想，直立人在旧世界广泛分布，且没有出现大的分化。尽管四散在世界各地，直立人各个群落之间的基因交流或许仍在进行，并没有发生具有决定性意义的中断。
最初的欧洲人
欧洲最为古老的人亚族化石是2018年在西班牙奥尔塞（Orce）发现的一颗人亚族生物牙齿，其年代约为距今140万年。除此化石孤本以外，已经证实的最为古老的欧洲人亚族骸骨发现于西班牙阿塔普埃尔卡（Ata-puerca）的几个矿层里。
在“象坑”（Sima del Elefante）里，出土了几个人亚族骸骨化石和一些人亚族活动的痕迹，还有几件奥杜韦风格的工具。经测定，这些骸骨化石的年代为距今120万年，被认为是直立人。与此相比，邻近的格兰多利纳（Gran Dolina）洞穴的骸骨更加丰富，共出土了距今78万年的约80件骨骼碎片。由于与直立人的骨骼稍有不同，这些骨骼碎片被命名为前人（Homo ante-cessor，又称先驱人）。前人使用的工具也是砾石砍砸器，没有一丁点儿两面器的特点。2013年，在英国黑斯堡（Happisburgh）的海滩上发现了同一年代的脚印，研究认为是两个成年原始人和几个儿童留下的。
凭着另一个年代更近的矿层，阿塔普埃尔卡挖掘点享誉世界，这便是被称为Sima de los Huesos的“骨坑”。自1984年开始挖掘以来，坑中已经出土了几个完整颅骨和6800余件其他骨骼，它们属于距今大约43万年的至少28个个体。今天，这些古人类被命名为海德堡人（Homo heidelbergensis），其身材高大强壮，男性平均身高为1.7米，女性平均身高为1.57米，男女之间的身高差与我们差不多。海德堡人能够制造阿舍利风格的两面器。
海德堡人的骸骨兼具原始特征和衍生特征：它们拥有与直立人相似的粗犷面部；两个眼眶上方均有眉骨，但直立人的两个眉骨是连在一起的；脑容量为1 230毫升，比亚洲直立人的脑容量平均值高出很多。一些解剖学特征，比如不明显的颧骨，令人将它们与稍晚时候生活于同一地区的尼安德特人关联起来（参见第94页《欧洲的尼安德特人》）。此外，DNA分析结果也显示海德堡人可能是尼安德特人的直接祖先。
另一些在欧洲发现的年代更近的化石也表现出类似的特征，比如亨利·德·拉姆利（Henry de Lumley）团队于1969年在法国东比利牛斯省阿拉戈（Arago）洞穴发现的距今45万年至30万年的陶塔维尔人（TautavelMan），或在德国发现的施泰因海姆（Steinheim）颅骨和在希腊发现的佩特拉罗纳颅骨。许多专家都将它们视为海德堡人。在一些研究人员看来，海德堡人是严格意义上的欧洲人种，是从至少100万年前来到欧洲的直立人发展而来的“前尼安德特人”。在演化过程中，这些直立人想必经过了前人阶段。
假说一　人属多个物种的两种系统演化关系假说
假说二　人属多个物种的两种系统演化关系假说
然而，即便直立人、前人、海德堡人这几个不同的类群相继出现并生活在同一地区，却没有任何证据表明它们互为亲代子代。理论上讲，一些类群非常可能灭绝且没留下任何后代，并被来自其他地方的新种群所取代。
于是，另一些研究人员就认为，直立人至少曾经两次向欧洲移民。第一拨移民在120万年前抵达欧洲，它们是前人的祖先，但在60万年前的大冰期期间全部消失了。或许，来自非洲的原始人取代了它们；这些原始人是尼安德特人的祖先，拥有更大的大脑并带来了阿舍利文化。
在这种情况下，海德堡人就是一个欧非物种。在大约30万年前，海德堡人在欧洲诞下了尼安德特人（以及丹尼索瓦人），并在非洲诞下了智人。在此需要明确说明一下，某些被视为海德堡人的化石，特别是卡布韦颅骨，最初被命名为罗德西亚人（Homo rhodesiensis）。许多专家提议将二者统一归入海德堡人名下，但另一些人仍坚持将二者区分开来的做法。
狩猎与传统
无论是晚期直立人还是海德堡人，它们都是优秀的猎手，能够猎杀大型动物，比如马甚或犀牛。也是它们最先在长矛上安装尖利的石头以增强杀伤力。在英国博克斯格罗夫（Boxgrove）曾经发现了一块似乎曾被燧石刺穿的马肩胛骨，其年代为距今50万年。
同样，在德国的舍宁根（Schöningen）发现了三件40万年前制作的标枪。标枪为木质，长达两米，保存状况极佳。标枪被精心切削成流线型，重心位于距前端三分之一处，以便投掷。
猎杀大型猎物是一项非常复杂的活动，需要参与者之间的高度配合。某些研究人员认为，这项活动证明了参与者之间还存在精心设计的沟通语言。在“骨坑”中曾经出土了两块舌骨。舌骨位于咽部上方，是舌头肌肉和咽喉肌肉的附着点，在产生语言的过程中发挥着重要作用。阿塔普埃尔卡发现的舌骨与尼安德特人和智人的舌骨相仿，但这并不足以证明海德堡人拥有语言。
另一个问题涉及埋葬行为。同样是在“骨坑”，所有的骸骨都集中在同一个地点。一些研究人员猜想，它们是在死后被有意丢进坑里的，而且一同丢进坑里的还有漂亮的红黄色两面器。不过，有确切证据的安葬行为是在更近的年代（大约距今10万年）才发生的。
在出土的骸骨中，有个颅骨的额骨上有两处骨折。根据对骨折情况的详细研究，这个人应当是在致命的搏斗中被敌人用同一件武器两次猛击头部而死的。这么一来，这个颅骨可以算是一桩最古老悬案的证据了。
许多挖掘点都发现了食人的痕迹。在格兰多利纳洞穴发现的骸骨中，不少被斩首，骨头上布满被砍的痕迹，还被砸开以吸食骨髓，处理方式和动物（野牛或鹿）骨头别无二致。究竟是为了生存而不得不食人还是祭祀性的食人，现在已经不得而知。由于许多受害者都是儿童，人们猜想当时的原始人视邻近部落的年轻人为捕猎对象。这种行为在陶塔维尔也有发现，在当时似乎相当普遍且并未随着时间的推移而消失。
火的掌控
乌拉穆尔人（Oulhamr）逃进了可怕的黑夜里。他们痛苦不堪、筋疲力尽，在圣火已灭的终极灾难面前，一切似乎都失去了意义。 ——J.H. 罗斯尼·埃内，1911
火，是人类驾驭自然的最古老象征之一。在能够自行生火前，我们的祖先或许先是学会了如何掌控并利用雷电或火山喷发引发的火。
迄今为止，考古学家已经发现了许许多多与火有关的遗迹，有木炭，也有因受热而崩裂的石头。不过，区分自然火灾的痕迹和人工生火的遗迹已非易事，证明某个火堆的确是某个原始人类点燃的更是难上加难。
在南非奇迹洞（Wonderwerk Cave）发现的古老篝火遗迹（可追溯至距今100万年）似乎是烤炙骨头留下的。原始人应当是使用了取自大自然的火来烹饪肉食。如此一来，有些研究人员认为，人类开始烹饪食物的时间远远早于炉灶的出现。但是，依然无法确定这个时期的原始人是否确实学会了使用火。
生火方法
原始人主要使用两种方法生火。第一种方法利用两块木头相互剧烈摩擦产生的热量引燃干草，遗憾的是，用来取火的木制工具几乎没有留存下来。第二种方法利用的是燧石突然摩擦富含硫化铁的石头时产生的火花（两块燧石相互摩擦是没有用的），这种生火工具出现的年代较晚（距今不到3万年），留下的遗迹也极为罕见。
已经确认的最古老的炉灶出现在距今约40万年，人们在法国布列塔尼的梅奈兹——德雷冈（Menez-Dregan）遗址和中国北京的周口店遗址都发现了原始人搭建的炉灶。这些炉灶呈圆形，以煅烧后的石头垒成，炉灶中有灰烬和木炭，还伴有其他原始人类活动的痕迹，比如动物骸骨或石质工具。
当然，火有着多种用途。火可以驱散掠食动物，可以温暖并照亮营地或住所，可以加热矛尖使其变硬，可以为石头切削提供便利，还可以弄熟食物并有助于保存肉或鱼。对于来到了高纬度地区的原始人而言，燃起的篝火延长了它们在日落后或冬天里的交流时间，进而在它们的社会化过程中发挥了某种作用。
烹饪食物时会散发出诱人的香味。不能生吃的食物在弄熟后也会变得美味可口。火还能降低因食物腐坏或感染致病微生物而导致中毒的风险。另外，弄熟之后，块根和块茎的咀嚼和消化消耗的体能更少，也更容易消化吸收，并为机体提供更多的能量。原始人就不必再花费大量时间寻找食物。这样看来，煮熟食物与人属大脑的增大和牙齿的减小或许有着一定的关系。
第五章　其他人属物种
我们智人的祖先曾与其他人属并肩同行。显而易见，这些人属与我们的祖先拥有相同的远祖，但在进化的道路上，它们与我们的祖先分道扬镳了。也许，我们的祖先和它们都视彼此为同类，并曾共同生儿育女，但由于差别实在太大，不能合而为一。这些“其他的人属”，在我们的历史上书写了迷人的篇章，使我们得以遐想自己可能会成为的模样。尼安德特人在大约4万年前灭绝了，而我们的祖先则生存了下来。比起它们的存在，尼安德特人的灭绝更令我们着迷。
欧洲的尼安德特人
在“其他人属”中，尼安德特人是当今最为知名的，或许也是与我们最接近的。一个半世纪以来，史前史学家就尼安德特人与我们的异同展开了旷日持久的探讨，与我们如此相似却又和我们如此陌生的尼安德特人，深深地吸引了公众的注意。
在很长的一段时间内，尼安德特人被视为我们的祖先。后来，尼安德特人被降格为近亲，成为智人下面的一个亚种。再后来，尼安德特人被从智人中分离出去，成为与我们不同的新物种。1997年，通过分子生物学领域的研究，人们先是确认了尼安德特人与人类的分野，随后发现了两个物种之间的杂交特征。
根据“骨坑”出土化石的DNA分析结果，在大约76.5万年前至55万年前，智人（现代人的直系祖先）
根据西班牙“骨坑”出土化石的DNA分析确定的现代人类、尼安德特人和丹尼索瓦人的共同起源示意图
和尼安德特人的共同祖先海德堡人生活在非洲的某个地方（参见第140页《史前DNA》）。海德堡人是直立人的后代，但与直立人并无太大不同，只是比直立人稍微高大一些，也稍微更像人类一些。海德堡人的一些后代后来到了欧洲，并在欧洲继续演化，最终形成了尼安德特人。
尼安德特人在演化过程中也发生了变化：生活在10万年前的尼安德特人与生活在40万年前的前尼安德特人是有所区别的，尽管后者已经具有了某些标志性的特征。为了更好地与智人相区分，人们往往描述的都是后者，它们的化石是19世纪发现的史前原始人化石的组成部分。
尼安德特人和智人的颅骨对比
与同时代的智人相比，尼安德特人的个头较小（男性身高1.7米、女性身高1.6米）。尼安德特人面容粗犷，骨头厚实，肌肉强健。与我们相比，尼安德特人的颅骨较大但偏低，呈伸长状，平均脑容量比我们大，为1 500毫升。相应地，尼安德特人的面部较大，且向前凸出，鼻子大而长，颧骨不太明显，额头后倾，下巴也向后倾斜，与智人的外貌迥异。
尼安德特文化
最为古老的尼安德特人生活在阿舍利文化时期，能够制造两面器。到了大约30万年前，在继续制造两面器的同时，尼安德特人开发了新技术，即“勒瓦娄哇”（Levallois）切削技术（以在巴黎近郊勒瓦卢瓦采石场发现的石质工具命名）。原石（或石核）经过一系列击打除去碎屑后，再一击定型，得到设想的工具。
使用这种切削方法时，不但要对所用材料有很好的了解，还要拥有相当的手艺。这种方法增加了用一个石核制造出的工具的数量。由勒瓦娄哇切削技术得到的石片和尖状器定义了一个全新的石器文化——莫斯特文化。从时间上看，莫斯特文化对应的是距今30万年至4万年的旧石器时代中期。
勒瓦娄哇切削技术器
属于莫斯特文化的石质工具
莫斯特文化在欧洲与尼安德特人相关联，在非洲却与早期智人相对应。究竟是欧洲的尼安德特人还是非洲的早期智人发明了这项制造技术，我们无从知晓。是它们分别独立发明了这项技术吗？还是它们之间的接触实现了新技术的传播呢？而2018年在印度发现的距今38.5万年的勒瓦娄哇风格的工具，更是使得笼罩在这项技术上的迷雾变得越发浓重。
尼安德特人非常关注所用燧石的属性。人们发现，尼安德特人通常只开采居住地附近的石头（5千米范围内），却将制造的工具携带到很远的地方（超过50千米）。它们占据了许多岩洞作为住所。或许，它们在露天自建的住所都是轻型建筑，所以没有留下任何痕迹。在它们的住所里，炉灶很常见，也使用了很长时间。
为了将尖状器固定在长矛上，尼安德特人会使用桦树的树皮制作黏胶，为此，需要按照精确的程序将树皮缓慢加热至一个相当低的温度。尼安德特人猎杀驯鹿和野马，也不放过原牛（现存家牛的祖先）和犀牛这些凶猛的动物。骸骨化石显示，一些尼安德特人曾经骨折，其伤痕形状与今天的牛仔驯服野牛时骨折的伤痕相仿。领地偏北的尼安德特人占据了和狼类似的生态位，食物中有丰富的动物性食物。
领地偏南的尼安德特人主要以植物、蘑菇和小动物（鸟、龟、鱼等）为食。由于冰期导致海平面变化，尼安德特人在海岸上的大部分居住点都消失了，但西班牙南部洞穴中发现的遗迹说明，这一食性在尼安德特人的演化历史上发挥了重要作用。尼安德特人能够用火烹饪食物，但并不是一直都这么做，也不是每个地方的尼安德特人都会这么做。一些古人类学家猜想，尼安德特人并不会生火，所以在某些居住点或在某些时期没有炉灶。不过，也有可能是因为在罕有木材的草原上很难维持火焰的持续燃烧吧。
人们同样注意到，尼安德特人身上没有智人拥有的AHR基因变体。肉类在烧烤时会产生有害分子（致癌的多环芳烃类物质），而AHR基因的作用就是降低这些物质的有害影响。智人似乎经历了AHR基因的强烈选择，尼安德特人则没有。
此外，与它们之前的（和之后的）人属生物一样，尼安德特人也经常有食人行为，正如法国阿尔代什省的穆拉——盖尔西（Moula-Guercy）遗址所示。
牙垢的用处
骨头部分由胶原蛋白构成。在胶原蛋白中，氮原子以不同形式（无放射性的同位素）存在，尤其是异常丰富的氮—14和非常稀少的氮—15。氮—14和氮—15这两种同位素所占的比例因个体死前十年间的饮食不同而异。实际上，植物中会富集少量的氮—15，食草动物中富集的氮—15比植物中稍微多一些，食肉动物中富集的氮—15又比食草动物中多一些。借助同位素化学的研究方法，能测定元素的比例。尽管年代久远，化石中依然含有少量的胶原蛋白，由此便可测定氮—14和氮—15的含量。氮—15含量高的，就说明食性偏肉食。
此外，提取化石中的DNA并对其进行分析测序是另外一项应用日益广泛的化学技术。有待提取并测序的不是骨头中的DNA，而是牙垢中甚或洞穴土壤中含有的DNA。牙垢能够提供与食物相关的信息。即使在洞穴里没有发现任何骨骸，也可以弄清楚谁曾在洞穴里居住过或者什么东西曾经在洞穴里被吃掉。我们就是这样识别出了鬣狗和熊——后者常居住在洞穴里——以及猛犸象、犀牛、驯鹿和马，当然还有人类。虽然尼安德特人没有留下任何可见的遗迹，但我们依然发现它们曾经在洞穴里待过。
同样，通过分析身份不明的残骨的蛋白质（其实是古蛋白质组），也能弄清楚残骨属于什么物种，如果是人亚族的残骨，还可以弄明白它与已知谱系的基因相近度。
尼安德特艺术家
在几十万年的历史期间，尼安德特人的切削技术几乎没有任何改变。不过，在大约4.5万年前，差不多在第一批智人抵达欧洲的时候，不少文化上的创新横空出世。这些创新是文化同化现象，还是尼安德特人对智人技术的模仿，抑或是演化将尼安德特人推向了新的方向，现在已经不得而知。至今，史前史学家仍在就此进行激烈的争论。
其实，上述问题只是尼安德特人禀赋大辩论的冰山一角。在很长一段时间内，人们一直认为，迥异于智人的尼安德特人虽然拥有硕大的脑袋，但并不能进行创新，也没有任何艺术才能。然而，近期的发现对这种负面看法提出了质疑。
在意大利的一处尼安德特人居住地，人们发现了被拔去了羽毛的鸟的骸骨。这些鸟不是普通的鸟，而是秃鹫和胡兀鹫，它们的肉又硬又难闻。即便不了解尼安德特人的口味偏好，也可以猜想它们不是为了食用鸟肉而是要用鸟羽做饰品。
同样，尼安德特人也使用红赭石，有可能是为了装饰自己的身体或者在岩壁上作画。2018年，人们在西班牙发现了距今6.5万年（比智人抵达欧洲的时间早2万年）的岩画，这被认为是尼安德特人的作品，它们不但画了动物和几何符号，还留下了自己的手掌印。
不过，这处遗址的年代测定仍有争议，但另一处遗址的年代测定却是确凿无疑的。2017年，在法国的布吕尼凯勒（Bruniquel）洞穴里，发现了用石笋建造的环形建筑。经测定，其年代为距今约17.5万年，彼时在欧洲大地上生活的人亚族物种只有尼安德特人。为了完成这些功能未知的环形建筑，它们还点燃了木头和骨头取火。
最后，许多的墓葬证明，尼安德特人会保护亡者的尸体。由于它们不像后来的智人那样在掩埋亡者时埋入陪葬品，我们也不清楚它们是否会为亡者组织葬礼。
冰期的幸存者
由于某些历史原因——史前史研究始于西欧，大部分尼安德特人遗址都是在西欧发现的，但那里仅占尼安德特人疆域的五分之一，实际上，尼安德特人的活动范围直至西伯利亚边缘。它们曾在欧洲和中亚生活了几十万年，度过了好几次冰期和温暖的间冰期，曾在泰晤士河畔猎杀河马，也曾在西伯利亚追逐长毛犀牛。气候变化有时是非常迅速的剧变，持续不超过一代人的时间。
尼安德特人的栖息地时常被严酷的气候弄得支离破碎，人口数量也经历了数次明显的衰减。据估计，整个欧洲范围内的尼安德特人总数不超过6 000，这减少了不同族群之间的交流，或许还限制了文化演化的可能性。DNA测序结果显示，在西伯利亚发现的一位女性尼安德特人是近亲交配的产物（父母是同父异母或同母异父的兄妹或姐弟，甚至是舅甥或叔侄），而且近亲结合在它的祖上非常频繁。气候条件和隔绝状态似乎塑造了它们的历史。
尼安德特人的解剖特征或生理特征，究竟是隔离的体现还是适应生活环境的结果呢？人们猜想，粗壮的身材和短小的四肢是它们适应寒冷气候的结果，因为这种身形能够减少热量散失。在它们的DNA里，也找到了适应环境的体现（参见第140页《史前DNA》）。在尼安德特人体内，参与黑色素合成的MRC1基因拥有一个让其效率降低的突变。这意味着，尼安德特人皮肤和头发的颜色比非洲远祖的更浅。它们生活的地区光照较弱，患皮肤癌的风险较低，但缺乏维生素D的风险有所升高。而皮肤黑色素含量较低的话，有助于吸收维生素D合成所需的紫外线。
丹尼索瓦人
从今天起，我们可以大声宣布，人类类群比我们想象的更加多种多样、更加人丁兴旺，而且在演化过程中，同样也受到放之四海而皆准的生物规律的约束。 ——马塞兰·布勒（Marcellin Boule）
如上所述，60万年前抵达欧洲的海德堡人似乎是尼安德特人的祖先。根据“骨坑”出土骸骨的DNA分析结果，海德堡人还有另一个后代——丹尼索瓦人！
丹尼索瓦人的发现时间是2010年，发现方式非常独特，因为我们对它们的了解仅限于DNA。它们的名字取自西伯利亚的丹尼索瓦洞穴（Denisova Cave）。为了测定一个尼安德特人的基因组序列，研究人员在丹尼索瓦洞穴里提取了一些骨骼。但是，一块指骨中发现了意料之外的DNA，这个DNA既不属于智人又不属于尼安德特人，但是与尼安德特人的DNA比较接近。最终，研究人员得出结论，这个DNA属于一个新的物种，但是，除了这块指骨和一颗具有原始特征的牙齿以外，我们对这个物种的形态一无所知。人们将这一物种命名为丹尼索瓦人。在大约4万年前，这些丹尼索瓦人来过这个洞穴。
丹尼索瓦人于大约43万年前与尼安德特人分化。DNA显示，丹尼索瓦人的种群数量较为庞大，或许占据了很大一片区域，说不定直至东南亚。此外，丹尼索瓦人的DNA里含有尼安德特人和智人的DNA里没有的未知基因。这些基因有可能是它们通过与直立人杂交而获得的；直立人在此之前很久就走出非洲，并在亚洲一直生活到比我们想象中更晚的时期。
我们对丹尼索瓦人的外貌一无所知，因为迄今为止尚未发现任何丹尼索瓦人的骨骼化石；我们对丹尼索瓦人的文化也一无所知，因为尚未发现任何与它们有关联的原始工具。一些古人类学家提出，某些神秘的化石应该属于丹尼索瓦人，比如在中国辽宁金牛山发现的一具女性骨骼化石或1982年在印度发现的距今至少24万年的讷尔默达（Narmada）头盖骨化石（这个化石最初被认为属于晚期直立人或早期智人，随后被归为海德堡人）。不过，这些纯属假说，丹尼索瓦人身上依然迷雾重重。
弗洛里斯的“霍比特人”
2003年，由澳大利亚和印度尼西亚研究人员组成的联合考古队在印度尼西亚的弗洛里斯岛发现了人亚族生物的化石。它们的颅骨有些类似爪哇直立人，不过非常小，脑容量只有380毫升；个头只有1米高，脚却大得出奇。研究人员根据托尔金的小说《魔戒》将它们昵称为“霍比特人”。
这些化石可追溯至距今5万年，后来被命名为弗洛里斯人（Homo floresiensis）。尽管身材矮小，但它们并不是南方古猿，况且它们还能制造石质工具。一些古人类学家认为它们是直立人的后代，这些直立人到达亚洲之后由于隔离而演化出了矮小的形态。另一些古人类学家则觉得它们更像智人，是在更早的年代迁移到岛上的。2016年，在弗洛里斯岛上又发现了一块更小的下颌骨，其年代为距今70万年，似乎与直立人有亲缘关系。
生活在岛屿上的许多动物的身材都会缩小，生物学家将这一现象称为“岛屿侏儒症”。食草动物会向着矮小的方向演化，因为矮小的个头使它们能够轻而易举地在比大陆贫瘠的岛屿上发现食物。弗洛里斯岛上的古象（大象的亲戚）肩高只有1.8米，而附近大陆上的古象肩高甚至能达到4米。或许，正是这种现象导致了弗洛里斯人身材矮小。
其他人属物种的结局
在5万年前，地球上生活着数个人属物种：尼安德特人、丹尼索瓦人、弗洛里斯人及其他幸存的原始人类，当然还有智人。但如今，只剩下了我们智人，其他人属物种都消失不见了。是什么原因导致它们灭绝的？是气候变化，还是智人入侵？我们只能对尼安德特人的灭绝提出几种假说。至于其他几个物种，我们的了解太过零碎。根据我们的DNA里留存的痕迹，丹尼索瓦人在智人到达亚洲后就消失了，对于丹尼索瓦人的历史，我们所知仅限于此。
尼安德特人在欧洲生活了几十万年，度过了4次冰期和4次间冰期，直至智人的到来。尽管偶尔因为严寒和干旱背井离乡，尼安德特人还是很好地适应了环境。然而，尼安德特人的数量太少，而且散布在广阔无垠的领地上，由此导致近亲结合极为常见，这不利于它们适应新的状况，比如其他人属物种的到来。
在大约4万年前，或许还要晚一些，尼安德特人灭绝了。具体的灭绝日期尚不可考，因为人们依然无法确定某些遗址的归属。另外，尼安德特人不是同时灭绝的。在西班牙南部（尼安德特人占据的最后一片领地），尼安德特人或许继续存在了几千年，但这里发现的遗址的年代测定结果并未得到所有专家的认可。极有可能，尼安德特人和智人曾在欧洲共同生活了好几千年。
为了解释尼安德特人的灭绝原因，人们提出了各种假说。比如，剧烈的火山喷发（3.9万年前发生于意大利的火山喷发，火山灰一直飘到了俄罗斯）引发了突如其来的气候变化，最终导致了尼安德特人的灭绝；但是智人早在这次喷发之前就到达了欧洲。或者，智人传播了传染病，而尼安德特人对此没有免疫力；但是，尼安德特人种群散居各地，人口密度非常低，传染病是怎么扩散起来的呢？再或者，尼安德特人是被智人直接屠杀至灭绝的；就我们对自己所属物种的了解，这个假说倒也站得住脚，不过，已经发现的尼安德特人骸骨上鲜有直接暴力留下的痕迹，而且，如果真是智人将尼安德特人赶尽杀绝，尼安德特人又怎么可能继续生存几千年之久呢？
尼安德特人和智人处于竞争状态：它们猎杀同样的动物，栖身在同样的岩洞之下。即便二者没有直接冲突，这终归不是长久之计，其中之一必然要出局。不过，尼安德特人有个优势——它们生活在祖祖辈辈一直生活的环境中。当然了，这些也都是猜想。有人提出，智人比尼安德特人更能适应环境，在必要的时候，能够从猎杀大型猎物转为捕杀小动物或捕鱼。可是，人们在尼安德特人身上也发现了这样的饮食习惯。
或许，尼安德特人比智人低的生育率导致了竞争的加剧。同样，死亡率的不同进一步扩大了二者之间的差距。人们由此猜想，由于成年尼安德特人的大量死亡，只有极少的幼儿得到了祖父母的照顾，由此导致幼儿的存活率低，年轻人能够学到的生存技能也很有限。这就是所谓的“祖母假说”，不过这一假说缺乏证据的支撑。
上文所述的这些要素，每一个单独拿出来都不足以导致尼安德特人灭绝，但每一个都能够弱化本就非常稀少的尼安德特人种群。或许，应将上述各种要素综合起来考虑：严重的人口危机和偶发的种族冲突，增加了尼安德特人在与智人竞争时的劣势，最终导致了尼安德特人的灭绝？
第六章　最初的智人
在尼安德特人、丹尼索瓦人和直立人踏遍欧亚大陆的每个角落的时候，人类的演化在非洲仍在继续。根据各种可能性，最初的智人正是诞生于非洲。数不胜数的化石和基因证据，为智人的演化史提供了支撑。然而，在成功使别人信服之前，研究人员遭到了不少的反对，而且，这些反对意见往往并不是科学上的，而是哲学或政治层面上的。
智人的出现
至少30万年前，最初的智人似乎出现在了非洲大地上。让—雅克·于布兰（Jean-Jacques Hublin）和阿卜杜勒瓦希德·本—恩赛尔（Abdelouahed ben-Ncer）的考古队2017年在摩洛哥的杰贝尔依罗（Jebel Irhoud）挖掘点发现的两个颅骨就可以追溯至这个年代。与众多其他化石一样，这两个颅骨兼具祖先特征和衍生特征。颅骨的牙齿很小，还有下巴和颧骨，同较平的面部一样，都是很现代的特征。牙齿的发育细节也说明他们拥有与我们相近的发育时序。
两个颅骨的脑容量分别是1 300毫升和1 400毫升（现代人的平均脑容量为1 350毫升），但颅骨呈伸长状，这是一个明显的祖先特征。这两个远古智人骨头粗大，眉骨相当粗壮，面部也如海德堡人一样很大。然而，形态学统计分析的结果将这些细节归为智人颅骨变异性的范畴，这让他们成了已知最古老的智人。
3D复原
借助X射线微断层扫描技术（参见第63页《同步加速器带来的发现》），可以获得物体表面或内部结构的3D图像。这样一来，就能以虚拟的方式摆弄碎片，而无须将其从脉岩中取出，以免毁坏。此外，我们还能看到隐藏在骨骼内部的结构，比如内耳小骨。沉积层在化石上施加的压力会导致化石产生形变；而通过计算，我们就能够对形变进行校正，继而可以采用3D打印技术复制扫描对象。
颅骨的数字化还有另一个好处。它能使解剖测量工作变得更加容易，还能借助统计工具对颅骨进行对比从而得出较为客观的结果。我们甚至能够量化个体发育带来的形态变化，并通过儿童颅骨来推测创建成人颅骨的3D图像。就像DNA测序技术出现时一样，新技术工具势必会带来能够处理更多信息的数码工具。
扫描杰贝尔依罗挖掘点发现的化石后完成的早期人类颅骨3D复原
在这个发现之前，人们将智人的诞生追溯至更晚近的年代。在南非的弗洛里斯巴德（Florisbad）以及埃塞俄比亚的奥莫基比什（Omo Kibish）和赫托（Herto）也曾发现具有类似特征的化石，其年代分别为26万年前、19.5万年前和16万年前。这些化石将智人的诞生地定位在东非，而杰贝尔依罗发现的化石似乎否定了这一结论。无论如何，数量稀少的化石不足以在时间和空间上精确定位某个事件，比如新物种的诞生（参见第53页《地域偏见》）。
我们可以平行对比尼安德特人和智人的历史。这两个物种都向着脑容量大的方向演化，都发展出了比祖先更加复杂的文化。但是，除去骨骼上的差异外，智人的演化史上究竟发生了什么与众不同的故事呢？
考虑到人亚族演化过程中体重有所增加，人亚族大脑的增大实际开始于约50万年前，而且尼安德特人和智人都曾经历这一过程。然而，DNA的对比显示，尼安德特人和智人在一些重要方面发生了分化，特别是神经系统的发育和功能。这些基因突变中的一部分，至今仍存在于现在的大多数人身上，这表明，这些突变受到了强烈的正向选择。
正是通过孩子，我们才真正成为人。 ——让-雅克·于布兰，2017
人们已经发现的突变里，有一些关系到胎儿的大脑皮层发育，另一些则参与神经元连接形成或与神经冲动传导的基因有关。此外，还有在语言和说话能力的习得过程中非常活跃的FOXP2基因。尼安德特人和智人拥有的FOXP2基因为同一版本，且与祖先的不同。不过，智人身上出现了调节FOXP2基因表达的突变，在语言演化过程中，它或许发挥了某种作用。
胎儿和儿童各个发育阶段延长，是智人演化史上的关键事件。在一些古人类学家看来，这一转变实际上是一种形式的“幼态持续”。所谓的“幼态持续”，指的是生物个体在性成熟后仍然保留幼年特征。这种现象，在许多物种的演化过程中屡见不鲜，但在智人中，实际上并没有真正意义上的“幼态持续”。不过，胎儿的早产和幼年的延长，极大地提高了我们的学习能力，这在我们的演化史上产生了重大的影响。鉴于我们一生都保留着幼年的行为，比如难以满足的好奇心和对游戏的喜爱，说我们是“幼态持续”倒也站得住脚。
既是智人又是现代人！
接下来，非洲的智人一点一点获得了与我们相近的特征：骨骼较轻，颅骨较圆且稍小，面部缩小且更扁，下巴因颌骨和面部变小而显得突出。这些变化不是同步出现的：在面部获得更为现代的形态之后很久，颅骨才具有了现在的样子。
智人诞生的具体细节尚不可考，那能否至少明确智人哪里与祖先不同并确定使其成为新物种的特征呢？根据以往的经验，这并不复杂：只要注意到我们独有的特征在骨骼化石上的出现或消失即可。
不过，事情可没有那么简单！首先，尽管我们是“现代人”，但我们依然保留了一些原始特征，比如凹陷的眼眶使颧骨突显，而这些特征早就出现在南方古猿身上了！与此相反，尼安德特人颧骨较平的脸孔倒是个新的特征（即“衍生特征”），也就是说更“现代”！此外，最初的智人留下来的化石不但少之又少，而且同其他人亚族物种的化石一样，往往兼具原始特征和衍生特征。
这个问题或许看上去无关紧要，但其实牵连甚广。实际上，当人们试图制定原始智人颅骨与现代人颅骨的区别标准时，就有将某些现代人颅骨排除在现代人范畴之外的风险，最终导致人们毫无根据地判定不同种族的现代性。过去种种将人类分门别类的尝试导致了什么后果，我们都很清楚（参见第177页《人类种族存在吗？》）。
其实，并不存在公认的能将智人与其他人属物种区分开来的智人定义。自从DNA分析揭示了智人曾与尼安德特人和丹尼索瓦人杂交以来，这个问题变得更加复杂。实际上，一些古人类学家认为，应当扩大对我们所属物种的定义范围，将曾与智人杂交的全部物种涵盖进来。这种做法回归了生物物种的严格定义（物种是“互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合”）。这些古人类学家主张将智人、尼安德特人、丹尼索瓦人归为同一个物种，这个物种下还将包括智人、尼安德特人、丹尼索瓦人的共同祖先海德堡人，甚至直立人。
在动物界，杂交是平常现象，比如，由共同祖先新近分化而来的两个姐妹物种之间往往存在杂交现象。在大多数情况下，杂交后代的生殖能力较低或根本不能生育，由此导致两个物种难以融合或不能融合。不过，杂交可能性的存在，并不妨碍动物学家将物种区分开来，尼安德特人和智人就属于这种情形。
此外，如果真的将我们这一物种的定义扩展至涵盖全部人属生物，那这个壮大的物种将具有比目前的智人或任何其他灵长目物种高得多的变异性。为了区分不同形态的人类，就得创造同等数量的亚种，这可一点儿也没有简化人属的“术语库”！所以，大部分古人类学家都认为，应当将智人这个名称保留给解剖学意义上的现代人。
伊甸园
众多研究人员认为，原始智人和现代人之间存在不连续性。根据他们的观点，人类演化史上应当有过“瓶颈期”，也就是导致物种多样性显著降低并改变物种演化路径的人口数量锐减期。
这些研究人员的主要依据是，最古老的智人骸骨彼此之间差异非常大，而且与现今的人类相比更加多样化。同样，人类历史悠久，但人类的基因多样性却没有预期的那么高，人口数量可能是造成这种情况的原因之一。在大约20万年前至15万年前，或许是巨型火山喷发而引发的极端天气，导致智人陷入了繁衍的瓶颈期。智人的数量或许从接近1万锐减至寥寥数百。有些人甚至精确提出，我们的祖先随后逃到了非洲的最南端，那里当时属于地中海气候，环境条件更加宜居。
也正是在南非，我们发现了智人在大约7.5万年前生活的遗址，这些遗址颠覆了我们对智人行为和能力的看法。滨海的布隆波斯（Blombos）洞穴里出土了为数众多的物品，类似物品通常被视为年代更近的人属物种所特有。当时生活在这个地区的智人善于利用海洋资源。他们用骨头或石头制作的尖锥捕杀登上海岸的海狮；他们采集贝类并在贝壳上钻孔，很有可能是为了制作项链；他们还在石头上刻画几何符号，这些符号也是人类历史上最古老的象征或美学作品之一。
另一个证据来自现代人的线粒体DNA（mtDNA）。在对比了全球各地采集的线粒体DNA后，人们发现，现代人的线粒体DNA来自生活在大约20万年前（后重新测定为距今17万年至10万年之间）的共同祖先。换句话说，我们或许能够追溯到全体人类的祖先了！至少，当上述研究结果在1987年公布于世时人们是这样宣称的。很快，这个共同祖先就被冠以“线粒体夏娃”的绰号。随后，对Y染色体DNA的分析研究让人们找到了生活于大约14万年前的“亚当”。
实际上，即便线粒体夏娃将她的线粒体遗传给了全体现代人，她也并非所有人类的祖先，也不是第一个女性智人。与线粒体夏娃同时代的其他女性也属于我们的直系尊亲，只不过她们的线粒体在她们通过儿子而非女儿参与种族繁衍的过程中被清除了。线粒体夏娃的唯一特殊性，在于她是我们现在可以通过母系血统追溯到的唯一女性。尽管如此，线粒体夏娃还是证明了人类非洲起源的唯一性，在非洲也观察到了最为多样化的线粒体。Y染色体亚当也是一样，他确定了人类的父系血统，当然了，这一血统也来源于非洲。
线粒体DNA
线粒体DNA指线粒体内含有的DNA。线粒体存在于大部分细胞内，是细胞内部化学反应所需能量的制造过程所不可或缺的细胞器。线粒体DNA的特殊性在于，它只能通过女性一代一代传递下去。实际上，在受精时，精子的线粒体会消失，只有来自母亲的线粒体会遗传给后代。因此，通过分析线粒体DNA就能追溯物种的母系血统。
对于人类，同样可以跟踪Y染色体携带的DNA，因为Y染色体仅能通过父系遗传。
在实际研究中，分析的对象是男性单倍群（haplogroup）或女性单倍群，即Y染色体或线粒体含有的DNA的特定片段，人们会对现代人或化石的单倍群的DNA序列进行对比。
这便是所谓的“走出非洲”模型或“伊甸园”模型，得出这些研究成果的研究人员和报道它们的记者显然受到了“伊甸园”这一《圣经》用词的启示。《新闻周刊》（Newsweek）杂志曾经以《追寻亚当和夏娃》为标题出刊，并配有一对非洲黑人夫妇的插图，这幅插图可把有些读者给惹恼了！
不过，即便这种方式能够回溯人类的历史并确定人类的起源，也未必就能确定智人曾经有过人口危机。其实，所谓的瓶颈期可能是文化层面上的，比如说，某些智人是否比其他智人更倾向于过游民生活（并在接下来的人类历史中发挥极为重要的作用），或在种群繁衍上取得了更大的成功。
起源问题
现存人类种族之间在外表上的差异曾使史前史学家提出人类多重起源假说，即每个“人种”——黑种人、白种人、黄种人——分别是一种猿的后代（参见第177页《人类种族存在吗？》）。不过，这种观点与现代进化理论并不相符。其实，随着时间的推移，来自同一祖先物种的多个姐妹物种会变得越来越不一样，以至于最终不能彼此交配繁衍后代。尽管类似的生活方式偶尔会使不同物种产生类似的外表或行为，但这种趋同现象并不能让它们彼此融合或形成单一物种。如果存在多种猿类且每种猿类分别演化形成了一个拥有巨大脑袋的双足行走的亚种，那么这些亚种之间的差异会比父代物种之间的差异更大。这些亚种也不会彼此融合为新的单一物种，因为经过数百万年的分化后，这种杂交已经不具有基因上的可能性。所以，黑猩猩和猩猩不能互相杂交，它们的后代也不能。
到了20世纪60年代，人类“多重起源”观点卷土重来，不过这次的形式不像之前那么极端了。有些人认为，原始人在一两百万年前出现在非洲，随后逐渐散布到整个欧亚大陆，并在各地形成了当今世上的“各大人种”。他们以“枝形烛台”模型来解释这种假说。源自非洲的现代人散布到世界各地以后，通过多次杂交一点一点地将当地的原始人变成了现代人。
智人多地起源假说——“枝形烛台”模型
有些中国古人类学家持这种观点，并得到了一些美国和法国古人类学家的支持。这些中国古人类学家一心想要证明，亚洲人自古以来便扎根于亚洲，并没有接受来自非洲的现代人基因或其他有限的外来基因。所以，1978年于中国陕西发现的距今26万年的大荔人头骨被他们描述为“沿着亚洲连续的演化世系”从原始亚洲直立人衍生而来的原始直立人。他们的依据是颅骨的一些解剖特征，但是这个假说里也含有其他考量。
另一些古人类学家则支持与此相反的人类“单一起源”假说。他们认为，人类是在更晚近的时代（距今不到10万年）走出了非洲，而且仅仅经历了短期的“隔离”。这就是所谓的“走出非洲”模型，现代人和化石的DNA分析结果大都支持这个模型。由于欧亚大陆上的原始人类种群都被走出非洲的现代人所替代，所以这种模型也被称为“替代假说”。
枝形烛台模型偶尔也会再次被推到台前。2006年发现的属于印度直立人的讷尔默达头盖骨化石，就曾被称为“印度现代人的可能祖先”。
同样，研究者针对最近在印度马索尔出土的年代非常久远的工具提出假说，认为它们由某种亚洲猿类的后代制造。在生物学层面上，很难想象人属居然出现在数百万年来与人亚族生物分隔两地的另一科灵长动物中。
智人单一起源假说——“走出非洲”模型
在人们发现了晚期智人（现代智人）曾经与走出非洲的第一批原始人的后代杂交后，单一起源观点也稍稍恢复了一点生命力。但是，即便能够解释某些解剖学特征或遗传学特征，这些杂交史曾发挥的作用似乎非常有限。来自其他物种的大多数基因都经过了严苛的筛选。
第七章　征服地球
在大约10万年前，智人“在解剖学层面上已经具有现代特征”，也就是说，他们的骨骼在各个方面上都与现代人的骨骼相似。正是在这个时期，智人走出非洲。这一次，在征服全球之前，他们不会停下自己的脚步！在这一过程中，智人将遇到另外一些人属物种，它们在智人之前便生存在地球上，并曾按它们自己的方式演化。
从非洲到美洲
紧随着不计其数的其他人属物种的脚步，智人也扩大了自己的狩猎范围并走出了非洲。迄今发现的最古老的智人遗址中，有以色列的斯虎尔（Skhul）洞穴和卡夫泽（Qafzeh）洞穴，其年代为大约12万年前至8万年前。智人曾在这两个洞穴里居住并埋葬死去的同伴。洞穴里发现了成人和儿童的骸骨，还有鹿角。一些骸骨曾用赭石上色，说明下葬时举行了葬礼。洞穴里出土的钻孔贝壳则被视为最古老的装饰品之一。
智人或许过去曾路过这里，或者来到这里的时间比我们想象的更早。2018年在以色列米斯利亚（Misliya）洞穴中发现的距今约18.5万年的半块颌骨似乎恰恰说明了这一点。另外，基因方面的数据也令人猜想，智人曾在距今20万年至10万年间数次离开非洲。然后，在大约7万年至6万年前，更庞大或者说更成功的一拨移民离开非洲并远渡重洋，在人类历史上首次抵达了澳大利亚和美洲的海岸。
一旦到了中东，就没有什么能够阻挡智人继续向东迁移的脚步了（尽管他们只留下了寥寥无几的迁移痕迹）。在历史上，他们必然曾经多次走过这条路。除此以外，还存在其他可能的迁移路线，比如取道直布罗陀，不过这条路线似乎在较晚的时候才被启用。另一些智人走的是“南路”，经由红海最窄之处的曼德海峡前往阿拉伯半岛，接下来，在横渡波斯湾以后，就有可能沿着海岸抵达印度和东南亚。当时的海平面比现在低，印度尼西亚的大部分地区都可以经陆路到达。
为什么人类再次踏上探索世界的征途呢？有些史前史学家提出假说，认为智人的这次迁移与印度尼西亚多巴火山的喷发有关。他们猜想，在大约7.5万年前，多巴火山的灾难性喷发导致了全球气温显著降低并且持续了很长时间。不过，无论是在火山喷发的年代上还是在火山喷发对环境和智人演化的实际影响上，这个假说的争议都非常大。
每次迁移事件，既不是单个猎人的个人行为，也不是整个种族的全员外逃。踏上迁徙之路的是规模不大的群体，每次只有几十个人，通常认为只有25人，差不多是6户人家，这也是以狩猎和采集为生的族群的通常规模。大多数踏上迁徙之路并走出非洲的族群无疑已经灭绝了。在中国发现的一些遗址，是智人早就到来的见证，不过，这些智人随后就灭绝了，没有留下子孙后代。
但是，另一些族群却繁衍壮大，成了今天人类的祖先，因为我们每个人或多或少都遗传了他们的某些基因。特别是，在我们的线粒体DNA内就能找到它们的踪迹（参见第123页《线粒体DNA》）。为此，人们定位了单倍群（即特定的DNA片段）上基因突变的准确位置。这些突变数量繁多，因区域而异。通过对比突变的序列，就能根据智人种群随着时间推移散居世界各地的情况来追溯突变的历史。研究发现，L3线粒体单倍群是由一个更加古老的单倍群发生突变后于8.4万年前出现在非洲的。人们在非洲发现了多种多样的单倍群，其中就有L3单倍群。世界其他地方的L3单倍群都是由非洲的L3单倍群衍生而来的，最初的变体出现在大约6.3万年前。换言之，现在非洲以外的所有人类都是一个携带L3单倍群的非洲智人种群的后代。
2015年，在湖南道县遗址出土了智人的牙齿，人们由此猜想智人或在大约10万年前至8万年前就已经来到了中国，尽管这个时间仍有争议。不过，一些智人确于大约6.5万年前至5万年前抵达了澳大利亚，他们想必是划着用树干挖成的独木舟漂洋过海而来的。或许，是雷暴引燃灌木丛产生的烟雾吸引了智人远渡重洋来到这块新的土地上？
再往北，来到了东北亚的智人也在大约1.5万年前趁着海平面降低徒步穿越了白令海峡。他们在抵达了彼时正处于冰川期的美洲后是怎么在恶劣的环境中继续探索之路的，我们不得而知。或许他们取道了两块大陆冰川之间的一条走廊？又或者，他们沿着海岸航行直到发现了较为温暖的海岸？无论如何，他们在南方发现了广阔无垠的处女地和数不胜数的猎物：有乳齿象（美洲的一种猛犸象），还有大群的野牛。这些智人就是后来的古印第安人，能制造燧石工具或精细切割黑曜石，他们的文化以美国新墨西哥州克洛维斯（Clovis）村的名字命名为克洛维斯文化。
他们中的一些人继续探索，直到抵达巴塔哥尼亚（位于今天的阿根廷）和火地岛（现在的南美大陆最南端的群岛）。有些史前史学家提出，他们迁移并定居这里的年代更早，应在大约3万年前。另外一些人甚至认为人类抵达美洲的时间还要再早，依据是在大约13万年前被石块砸开的乳齿象骨骸。
在东扩良久以后，智人于距今大约4.5万年的时候开始西征。彼时的欧洲正处于最后一个冰期，恶劣的气候或许减慢了智人的脚步，相比之下，亚洲南部的环境更加接近他们所熟悉的生活环境。一些研究人员认为，当时生活在欧洲的尼安德特人成了阻挠智人西征的另一个“障碍”，好在尼安德特人数量稀少，智人能够轻而易举地跨越这个“障碍”。
迁徙造就智人
如今，在我们身上的每个细胞和每个分子里，都能找到演化的痕迹。 ——弗朗索瓦·雅各布，1981
自大约10万年前起，尼安德特人也曾在近东地区活动，有时候甚至与智人生活在相同的地点。尼安德特人和智人制造相差无几的工具，也都有埋葬逝者的习惯。很有可能，这两个物种的男男女女就是在这个地区邂逅彼此并生儿育女的。其实，尼安德特人和现代人的基因组对比显示，不同人属物种之间曾经杂交繁殖，而这导致了物种之间的基因交换（参见第140页《史前DNA》）。
今天，非洲以外的智人携带着1%至4%的尼安德特基因。由于每个人携带的尼安德特基因不完全相同，遗传学家斯万特·帕博（Svante Pääbo）估计，尼安德特人基因组的20%至40%仍在我们体内延续。反过来，尼安德特人体内也有来自智人的基因。不过，尼安德特基因在智人基因组中比例很低的事实说明，尼安德特人和智人并未发生普遍的融合。或许，二者的杂交后代繁殖力低下，阻止了“外来”基因在物种中的扩散。
现代非洲人的基因组里没有这些尼安德特DNA，这就说明两个物种的种间杂交发生在智人走出非洲、移居欧亚大陆和美洲之后。留在非洲的智人未曾遇到尼安德特人，即便后来有些尼安德特基因通过从欧洲向非洲回迁的智人传到了北非。
进入智人体内的新基因随后发生了突变并改变了序列，进而变得与尼安德特人的初始基因有所不同。通过研究突变的数量，就能够确定杂交发生的年代。研究结果表明，杂交可能发生在大约10万年前，那时两者在近东地区比邻而居；抑或是在大约6万年前至5万年前，原先留在非洲的智人最终走出非洲之时。因此，根据4.5万年前生活于西伯利亚的一个智人的DNA研究结果，他的先祖曾在他出生前1.3万年至0.7万年就已经经历过杂交。罗马尼亚的欧亚瑟洞（Pestera cu Oase，意为“骨头洞”）里出土了生活于约4万年前的智人骸骨（这是欧洲已知最古老的智人），他体内的尼安德特DNA占比是现代人的3倍；从他往前追溯就能发现，他4至6代前的祖先还是尼安德特人！不过，这个智人似乎没有留下后代，因为现代人的基因组里已经没有了他的遗传特征。
人属物种的种间杂交繁殖不止于此。丹尼索瓦人也曾将它们的一些基因传给了智人，这些智人的后代后来移居到了澳大利亚、巴布亚新几内亚和菲律宾。在亚洲大陆居民和美洲原住民的体内也发现了丹尼索瓦人的基因，不过数量很少。更加惊人的是，遗传学家在丹尼索瓦人的基因组里发现了未知DNA的遗存，据猜测，这些未知DNA来自更加古老的人属物种，可能是亚洲直立人。同样，一些非洲民族的DNA里携带着明显源于其他依然不为人知的人属物种的序列，这些人属物种应当是在距今70万年时与海德堡人的先祖分道扬镳，最终在距今3.5万年时灭绝。
人属物种的杂交
上述杂交对智人可能是有利的，比如杂交使智人能够更快地适应高纬度地区更寒冷的环境。智人没有等待自然选择去利用偶尔发生的有利突变，而是直接利用了其他物种中经过数十万年的演化逐渐获得了必要适应性特征的既有基因。这种有用基因（或其等位基因）在物种间转移的现象，被称为适应性基因渗入。如果突变产生的新等位基因在种群中频繁出现，我们就视之为正突变。
我们的祖先利用了其他物种的这种“非自愿援助”，特别是作用于皮肤、免疫系统和消化系统等方面的基因。比如，在杀灭病毒过程中发挥作用的stat2基因就是尼安德特人送给我们祖先的。直至今日，在欧亚大陆，10%的人仍携带这个基因，而在美拉尼西亚这一比例还要更高。尼安德特基因的引入，使我们的祖先能更好地抵御他们在非洲没有遇到过的不同微生物引发的感染。Toll样受体（Toll-like receptors，TLR）属于免疫系统蛋白质，至今仍奋战在抵抗细菌和寄生虫入侵的最前线；而在为Toll样受体编码的基因中，有两个源自尼安德特人，一个与丹尼索瓦人的基因类似。
中国藏族人似乎从丹尼索瓦人那里获得了有助于适应高原生活的基因。在寒冷的环境中，棕色脂肪组织能够产生热量。居于中国南部的纳西族以及生活在西伯利亚东北部的雅库特人和鄂温人都拥有在棕色脂肪组织的发育中发挥作用的TBX15基因，而这个基因也来自或许非常适应冰川气候的丹尼索瓦人。
我们每个人身上都带着尼安德特人的痕迹。 ——斯万特·帕博
我们DNA的一些区域受到尼安德特基因渗入（即基因转移）的影响甚小，要么是因为尼安德特基因提高了不育的风险，要么是因为尼安德特基因的存在会由于形态上或社会上的原因导致负向选择。X染色体携带着与男性生育能力有关的重要基因，它含有的尼安德特DNA微乎其微，似乎种间杂交产生的变化都已被自然选择所抹去。基因的携带者生殖能力较低的话，就不能将自己的性状遗传下去——自然选择往往就是这么简单！同样的，对语言能力至关重要的FOXP2基因区域里也没有来自尼安德特人的基因。可以想见，携带这种尼安德特式突变的智人将失去舌灿莲花的能力，也就很难找到另一半了（不过我们没有任何证据）。
另外，并非所有来自尼安德特人的基因都大有用处或不再有用。SLC16A11基因来自尼安德特人，它的等位基因与罹患糖尿病风险的升高有关，在美洲原住民身上非常常见，在亚洲人身上也有发现。不过，这个基因在尼安德特人身上具有什么功效，我们就不得而知了。
史前DNA
1997年，遗传学家斯万特·帕博与同事完成了人类历史上首次尼安德特人DNA片段测序（参见第10页《DNA、基因、突变》）。自此以后，对古代DNA的分离与提纯技术取得了长足的进步。2010年，斯万特·帕博与同事分析了3个生活于距今约4万年的尼安德特人的基因组，证明了现代人的细胞内存在尼安德特人的DNA。2016年，DNA分析确认了西马·德·洛斯·乌埃索斯骸骨坑内发现的可追溯至距今43万年的骸骨实为前尼安德特人，并确定了它们的起源。
在人类中，据估算每个核苷酸每年的基因突变率约为0.5×10^-9。根据两个基因组之间的差异，可以计算出二者开始分化的时间。显而易见，估算结果只是近似值，不过可以借助化石的年代加以校准。
对古代DNA进行分析还能获得人口方面（通过每个基因的等位基因的多样性）和社会方面（比如给定社会里的近亲结合程度）的信息。
旧石器时代晚期的文化
在距今大约4.5万年，晚期智人来到了欧洲。在同一时期，工具的制造发生了重大变化，从尼安德特人（及较古老的智人）的莫斯特文化过渡到了奥瑞纳文化（Aurignacian）。对史前史学家而言，人类文明从旧石器时代中期过渡到了旧石器时代晚期。
智人发明了新的切削技术，可以将石核加工成大量细长的船底形石叶或小石叶。他们制造了多种多样的工具，比如刮削器、端刮器、石锥、雕刻器等等。此外，智人还用硬质动物材料（如骨头或象牙）制作标枪枪尖用于狩猎，史前史学家由此观察到了人类与其他生物的决裂；这些学者认为，尼安德特人不使用硬质动物材料制造武器，因为它们不愿使用以动物身体材料制成的武器猎杀猎物。
这些新欧洲人依然以狩猎采集为生。根据在目前仍以打猎和采集为生的极少数部落（如卡拉哈里沙漠的桑人或亚马孙流域的美洲原住民）中观察到的结果，可以猜想那时候只有男人猎杀大型动物。最为常见的猎物是驯鹿，不过人们也发现了大量其他动物，如马、原牛、盘羊、犀牛、猛犸象等，各遗址发现的动物都有所不同。女人则捕捉小动物（如蜗牛、蜥蜴、鸟等），采集鸟蛋，捡拾贝壳。此外，她们还会采集各种植物、块茎、可食用块根、野果、蘑菇等。尽管打猎提供了大量的肉类和脂肪，但女人的采集收获往往在智人的食物中占据较大的比例。
各个地区和时期的工具、武器和日用品有所不同。根据史前史学家的划分，欧洲先后出现了以下文化。
奥瑞纳文化（距今4.5万年至2.6万年）：将燧石切割成狭长石叶的技术已经普及，用木头或鹿角制成的“柔软”手锤也被普遍使用。与石质手锤相比，木质手锤或鹿角手锤精度更高，智人可以用它们敲打燧石块制造石片。人们还发现了用牙齿或贝壳制成的首饰。人类历史上最古老的小雕像也诞生于这个时期，比如德国福格尔赫德出土的动物牙雕或者霍伦斯坦因——施塔德尔洞穴发现的狮子人牙雕。或许，狗的驯化也可以追溯到这个时期。
属于奥瑞纳文化的工具
牙雕小马（德国福格尔赫德）
格拉维特文化（Gravettian，距今2.7万年至1.9万年）：工具以带柄长直石叶为典型代表。在遗址里发现了被称为“维纳斯”的女性小雕像，雕像往往造型非常夸张，可能是生殖力的象征，比如在奥地利发现的维伦多夫的维纳斯和在法国朗德省发现的布拉桑普伊（Brassempouy）妇人小雕像。
维伦多夫的维纳斯（奥地利，距今2.5万年）
梭鲁特文化“月桂叶形”燧石叶
梭鲁特文化（Solutrian，距今2万年至1.6万年）：在这个时期，生活于法国和西班牙的智人制造细长的“月桂叶形”燧石叶，并采用压制法而非锤击法加以精修。最大的石叶可能用作装饰或象征威望。他们还发明了投掷器，能以较高的准头将标枪投射至很远的距离。在这个时期的遗址里，还发现了欧洲历史上最早的骨针。
马格德林文化（Magdalenian，距今1.7万年至1万年）：马格德林文化分布甚广，且有多个变体，从葡萄牙至波兰皆有发现。这个时期的工具愈加精巧且多样，出现了用作箭头的三角尖形器。当时的智人能用骨头或象牙制作鱼叉，还能制作鱼钩。他们还用驯鹿角制成“穿孔棍”，或许是用来将受热弯曲的木制标枪矫直，抑或是用来拉紧帐篷上的绳索。他们还制作了乐器，比如用鸟骨做的穿孔骨笛。
某些属于这个时期的遗址反映了当时人类的生活面貌，不过我们却很难将这些人与史前时期挂钩。俄罗斯的松基尔（Sungir）遗址可追溯至距今3.2万年。在这个遗址里，埋葬着一个成年男人和两个青少年的骸骨。下葬的时候，他们身穿兽皮衣服，上缀数以千计由猛犸象牙雕成的珠子，每颗珠子的制作都得花上至少一个小时的工夫；腰缠饰以狐狸犬齿的腰带；还戴着象牙手镯、贝壳项链和垂饰。墓穴中还摆放了象牙标枪、武器和小雕像作为陪葬品。这些惊人的财富说明了墓穴中的三人生前拥有很高的社会地位，也说明了他们生活在一个组织严密、阶级分明的社会里。DNA分析结果显示，这三个人有亲缘关系，但并不是直系亲属。
克罗马农
在当代人的想象里，“克罗马农”几乎是“史前人类”的同义词。实际上，克罗马农是法国多尔多涅省韦泽尔山谷中的一个天然洞穴的名字。1868年修建公路时，人们在洞穴中进行挖掘，发现了5个人的骸骨、石质工具和动物骨骼，之后，史前史学家路易·拉尔泰（Louis Lartet）对其进行了描述。
这处遗址是个墓葬，共埋了5个智人的骸骨，其中3个男人、1个女人、1个儿童，年代为大约2.8万年前。3个男人中，一个身高接近1.8米，肌肉极为发达。由于他的牙齿已经全部掉光，人们给他取了个绰号叫“老头”，不过他死亡的时候可能只有50来岁。一同出土的工具则属于奥瑞纳文化。
由于这些化石名气甚高，“克罗马农”这个名字便在很长时间里被用来指称生活于距今4.5万年至1万年间的旧石器时代晚期的人类。如今，人们多使用“智人”或“解剖学意义上的现代人”这两个名称。
与之前的时期不同，旧石器时代晚期出现了大量描绘动物的作品，或涂或刻，以各种材料为载体。男人（或女人）雕刻木头、骨头和象牙，并在岩壁上涂画壮观的壁画。尽管尼安德特人似乎也曾作画，但岩画创作在旧石器时代晚期变得更加频繁。
然而，绘画风格并无显著发展。肖维（Chauvet）岩洞的壁画创作于大约3.5万年至3万年前，远早于创作于距今1.7万年的拉斯科（Lascaux）洞穴岩画，但前者所表现出来的智力水平和艺术才能与后者完全相同。尽管欧洲最先发现并研究了岩画，但岩画艺术并非欧洲独有。在印度尼西亚的苏拉威西岛多个洞穴的岩壁上，发现了距今4万年时画上去的手印和动物。有可能，生活于当时人类疆域两端附近（从西欧到澳大利亚）的智人独立完成了各自的第一批艺术作品。不过，也有可能，岩画创作只是随着智人移居世界各地时传播开来的一项古老传统。
澳大利亚原住民素有在峭壁上和不深的洞穴里绘画的传统，而且将这个传统延续至今。他们会定期翻新古老的作品，所以无法准确确定作品的初创时期，不过，画上沉积的赭石和黑赤铁矿石可追溯至距今5万年至4万年。也许有一天，我们会发现澳大利亚第一批居民的画作呢。
在他们的作品里，有些描绘的是关于人类起源的原住民神话，有些讲述的是他们群体生活的某些场景。各地的岩画或许具有不同的含义。欧洲的岩画以动物为主角，并配有各种几何符号、手印和女阴，人的形象少得可怜。某些岩画似乎与狩猎有关（比如拉斯科洞穴岩壁上受伤的野牛），但狩猎并不是非常重要的创作主题。岩画上的动物中有狮子和鬣狗，不过它们并非用来食用，而作为主要猎物的驯鹿，出现的数量却少得可怜。
肖维岩洞石壁上的原牛、马和犀牛（法国，距今3.3万年）
一些洞穴的污泥中留下了脚印，比如法国的佩什梅尔（Pech Merle）洞穴或蒂克·德·奥杜贝尔（Tucd’Audoubert）洞穴，脚印的大小说明曾有年轻人进来过，可能是为了进行启蒙教育。尽管我们提到的往往都是男性“艺术家”，但是，根据岩壁上的手印（以嘴吹赭石的方法绘制），女性似乎也参与了岩画的创作。
新人类？
旧石器时代晚期的艺术作品突出表现了智人生活的巨大变化：他们探索的疆域远超前辈曾经抵达的边界。与此同时，由于新技术或新文化习俗的出现，日常用品的制作也迅速发生了改变。而在过去的几十万年里，制作技术未曾有过大的变动。
这些翻天覆地的变化是智人过往历史的简单延续吗？还是说，智人的演化经历了一次质的飞跃，否则该怎样解释这种突飞猛进呢？人们猜想，在大约5万年前至4万年前，智人的创造能力和语言能力由于脑组织结构的改变而提高，进而引发了一场迅速席卷全球的“人类革命”。
然而，智人突然之间取代尼安德特人，成了在欧洲发生的主要变化。如果摒弃传统史前史学的欧洲中心论，同时以同样的重视程度审视世界的其他角落，就会发现亚洲和非洲所经历的是渐进式的过渡。在过去，一些信号被视为从尼安德特人的旧石器时代中期向智人的旧石器时代晚期过渡的标志；而近些年来，不计其数的考古发现否认了它们与此过渡进程的相关性。在奥瑞纳文化诞生前，生活在非洲的智人就已经在制造骨质尖状器了，还能用针缝制衣物，佩戴项链或其他饰品，以及在洞穴岩壁上作画（参见第122页提及布隆波斯洞穴的段落）。
上述两种模型并非截然不同。智人的很多新行为，其实在过去就已出现，只是形式没有那么丰富罢了。显然，在深入地下洞穴绘制无与伦比的岩画前，肖维岩洞里的创作者曾花费数年光阴在洞外学习绘画技术、改进绘画姿势，但是他们的学习过程并没能保留下来。同样，虽然他们的前人也没有留下任何遗迹，但他们的行为或许只是在延续一项非常古老的传统。
晚期智人
我们偶尔会用Homo sapiens sapiens（即“晚期智人”）这个称呼，不过，重复两遍sapiens（本义为“聪明的”），不但累赘，更显自负，那为什么会起这么个名称呢？在原则上，拉丁文三名法用于物种的亚种；所谓的亚种，指与同一物种的其他种群存在地理隔离且表现出不同特征的种群。人们假定（或者已经证实），被称为亚种的种群可与同一物种的其他种群互相交配并繁殖可育后代。“亚种”的说法有时很实用，尽管“种”的概念本身已然很复杂且有争议。
在古生物学上，往往很难赋予化石物种精确的种名，亚种的定义也就没有任何意义，因为无法证实已经灭绝的动物是否能够互相交配并繁殖可育后代。不过，在考察物种时，不但要从空间的维度考虑，还要从时间的维度考虑；亚种的概念，不但有助于凸显化石之间的相近性，还有助于设想它们之间存在直接亲代关系。不过，这么一来，就要考虑不断变化的物种定义的问题。而在此基础上，还要考虑亲代关系的问题；但是，由于通常情况下根本无法建立亲代关系，所以演化分类时不将其纳入考虑。
史前史学家引入智人这个名称，是为了与尼安德特人做区分；那时的学界还将二者视为同一物种。当时，尼安德特人被称为尼安德特智人，而将尼安德特人变成智人的近亲，也算是为尼安德特人“正名”。今天看来，尼安德特人和智人之间互相交配并繁殖可育后代的能力似乎非常有限，仍将二者归为同一物种已成无稽之谈。所以，我们将二者加以区分。
不过，一些古人类学家意欲将智人分为早期智人和晚期智人（现代智人）两个亚种。所以，埃塞俄比亚赫托发现的可追溯至距今16万年的颅骨被命名为长者智人（Homo sapiens idaltu），这个名称说明他与解剖学意义上的晚期智人相近但有所区别。长者智人被视为罗德西亚人和智人的过渡种。长者智人，尽管字面意思似乎已经非常明确，但其定义并不明确：长者智人在何时变成晚期智人？判断标准又是什么？
如果长者智人向晚期智人的转变非常迅速，比如经历了生物学和文化两个方面的质的飞跃，那或许能够确定转变发生的年代和方式。
第八章　史前时代的结束
随着最近一次大冰期的结束，气候再次改变了人类的演化历程。新的文明，也就是我们现今的文明，取代了旧石器文明。正是在这一时期，人类开始改变环境：森林变成了农田，奶牛替代了原牛。在大约1万年前，当最初的牧民开始建造最初的村落时，我们生活的这个世界诞生了。
中石器时代
大约1.5万年前，全球气候开始变暖。尽管有过最近一次突然袭来的大冰期，全球变暖仍在1.2万年前变成常态（我们现在仍处于温暖的间冰期）。几百年间，地球平均气温升高了8摄氏度，大气也变得更加湿润。撒哈拉沙漠成了稀树草原，欧洲则森林遍布。巨大的冰盖融化产生的水涌入海洋，导致海平面上升了120米。
在中石器时代，以打猎和采集为生的智人适应了与其生活在旧石器时代晚期的祖先大相迥异的生活条件。较为温暖的气候深刻地改变了地球的面貌。冻原和荒原消失不见，松树林和橡树林先后取而代之。一些动物，比如原牛和马，适应了新的生活环境；另一些动物则消失了。驯鹿迁往北边，猛犸象从此灭绝，取而代之的是鹿、野猪和野兔。比起之前的冰期，野生动物更加丰富多样，这使我们的祖先得以长时间定居在同一个地方。
对于中石器时代猎人的生活方式，我们知之甚少，因为当时的环境条件不利于遗址的保存。不过，我们还是发现了重大的文化变迁。当时的智人能将石头加工成主要用作箭尖的“小石叶”。由于在森林里弓箭比投掷器更加实用，所以弓箭的使用相当普遍。在法国，岩画艺术似乎走向了倒退；在西班牙，却诞生了新的岩画风格，作者非常乐于在作品中表现人的形象。
在海边，贝类采集几乎具备工业规模，堆积在海岸上的贝壳就像一座座沙丘。他们还用编织的渔网或捕鱼篓捕鱼，建造独木舟在江河湖海上航行。也是这个时期，人类首次定居在科西嘉和克里特等地中海岛屿。
西班牙东部的岩画作品（中石器时代）
大型动物的灭绝
在冰期结束时，大量物种灭绝，尤其是那些被归为大型动物的物种，即体重超过45千克的动物。由于体形较大，它们在考古遗址中的消失是显而易见的。这次灭绝是全球现象，从欧亚大陆的猛犸象，到南美洲的大地懒，还有澳大利亚的袋狮，都未能幸免。
几十年来，两个灭绝假说一直针锋相对，那就是气候变化假说和人类活动假说。前者认为，气候变暖改变了植被状况。然而，食草动物往往比较专一，吃草的猛犸象不能改为吃树叶。驯鹿等物种已经北迁，以寻找可以接受的生存环境，但对于猛犸象和长毛犀牛来说，这是不可能的，因为气候变暖已经导致适合它们生存的寒冷荒原消失殆尽。
然而，上面这些并不足以解释全部的物种灭绝事件和灭绝速度。对于人类活动假说而言，单单看到人类到来和某个物种消失之间的模糊巧合是远远不够的，还要证明人类的的确确猎杀了这个物种。除此以外，还需要确定人类到来和物种消失的准确年代。如果物种灭绝在人类到来之前，那人类就与物种的灭绝没有任何干系。如果物种灭绝在人类到来之后，那人类在物种灭绝中负有责任的可能性就会增加，但这未必就是确凿无疑的事实。
体形大的物种往往繁殖率较低，而对繁殖率较低的动物而言，哪怕很低的猎杀压力也足以导致它们灭绝，牛顿巨鸟就是个很好的例证。牛顿巨鸟是生活在澳大利亚的一种不会飞的鸟，体重超过200千克。2015年，在200多个距今5.4万年至4.3万年的遗址上，发现了具有炭化痕迹的牛顿巨鸟的蛋壳。然而，要在蛋壳上留下类似的炭化痕迹需要很高的温度。因此，有些人认为，这些痕迹排除了仅仅是灌木丛起火这一种可能性。人类收集鸟蛋（或许还猎杀成鸟），似乎成了导致牛顿巨鸟灭绝的原因。另外，澳大利亚还生活着另一种名叫鸸鹋的善于奔跑的走禽。虽然人类也食用鸸鹋的蛋，但这种体形比牛顿巨鸟小很多的鸟并未灭绝。
在同一时期灭绝的物种还有重达半吨的巨袋鼠、重达2吨的巨袋熊和身长达7米的巨蜥（与科莫多巨蜥有亲缘关系，体形为科莫多巨蜥的3倍大）。它们或许不是被澳大利亚的第一批居民直接消灭的，但此间的巧合着实令人不安。
史前巨袋熊复原像
同样的故事也发生在许多岛屿物种身上，比如新西兰的恐鸟和马达加斯加的象鸟。同样未能逃过一劫的，还有北美洲的乳齿象及南美洲的雕齿象和大地懒。不过，雕齿象和大地懒的种群在人类到来之前就已经因为气候变化而变得脆弱不堪。
在人类定居于新发现的岛屿和大陆前，生活在那里的动物与人类从未有过接触。即便不像南太平洋的物种那样一动不动地看着水手靠近并杀掉自己，它们也毫不适应人类这个新型掠食者的狩猎技术。非洲和欧亚大陆的情形则与此不同，在气候变化中躲过一劫的物种没有再遭遇其他不测，最终存活了下来（直至现代人对它们展开了血腥的大屠杀，从鲸到犀牛都是如此；这里仅举几例大型动物）。
新石器时代革命
在一些地区，比如近东，中石器时代更像是个过渡期。在大约1万年前，生活在这些地区的智人渐渐转为定居，并用原生黏土建造了人类最早的房屋。他们依然像从前一样栽种作物，有豌豆、扁豆、小麦、黑麦，不过采用了更加系统化的栽种方式。他们制造了必需的工具——带有燧石刀刃的木柄镰刀，并挑选了最适应他们的播种技术或收割技术的品种。在打猎的同时，他们还开始饲养动物，先是盘羊和野山羊，然后是原牛和野猪，后面两个最后被驯化为奶牛和家猪。
在地中海东岸（包括以色列、黎巴嫩和现土耳其的一部分）及底格里斯河和幼发拉底河流域（叙利亚和伊拉克），考古学家发现了这些人类活动留下的无数遗迹。这个地区呈新月状，土地肥沃，物产丰富，因而得了“肥沃新月”的美称。稍晚以后，世界其他地方的智人经历了相同的过渡期，不过他们栽种的作物和饲养的动物都有所不同：作物有土豆、水稻或高粱，动物则有火鸡、羊驼或骆驼。
这种全新的生活方式与过去的截然不同，以至于人们将两种文化间的过渡期称为“新石器时代革命”。以打猎和采集为生的迁徙部落向以耕作和养殖为业的定居农民的转变尽管花费了数千年才完成，但是对自然环境和人类自身都产生了极为重大的影响。
在新石器时代，智人继续加工燧石制造“小石叶”，然后将小石叶挨个摆放整齐，用来制成镰刀和小刀的刀刃。此外，他们还制造石斧并对其进行打磨（新石器时代过去也被称为“磨制石器时代”）。再往后，他们用翡翠（一种在阿尔卑斯山脉发现的绿色石头）制作礼斧，而礼斧之后将在从西西里岛到爱尔兰的整个欧洲范围内流通。
他们早就知道怎样把黏土塑造成型，还懂得通过加热使其硬化。定居之后，他们制造了陶器以储藏谷物，这就降低了单纯依赖野生作物作为谷物来源的供应风险。不过，食用谷物也造成了一些后果。为了获得面粉，就需要磨碎谷物颗粒。妇女承担了这项任务。她们跪在地上，用石杵将谷物颗粒在磨盘上碾碎，一碾就是几个小时。长时间的碾磨在她们的骨骼上留下了痕迹，引起了脊柱和大脚趾变形。另外，臂骨结构说明她们的手臂肌肉和当今的划船冠军一样强健有力，而她们的脊柱由于头部长时间承受很大的负荷而发生了形变。
杰尔夫·阿合玛尔（Jerf el-Ahmar）遗址（叙利亚，距今9 000年）
随着时间的推移，村落里的人口数量逐渐增加。由于人们不再频繁迁移，生活垃圾慢慢地污染了水源。霍乱和斑疹伤寒等疾病变得愈加严重。与动物杂居一处，也成了寄生虫和细菌传播的重要原因。苍蝇和家鼠渐渐适应了这种对它们生存非常有利的环境，寄居于人类粮仓的老鼠则成了寄生虫和多种疾病的传染源。
在新石器时代，智人的牙齿饱受新食物之苦。由于唾液中含有淀粉酶，谷物中的淀粉自入口时便开始消化，消化产生的糖类导致龋齿。在这个时期的骸骨上，能观察到明显的龋齿数量的增加。
基因变化
我们或许会认为，比起数百万年的人亚族历史或数十万年的智人历史，仅数千年的新石器时代在人类演化过程中没有发挥任何作用。不过，在短暂的新石器时代里，人类经历的生活方式变化产生了强大的选择压力。
从体格上看，与祖先相比，新石器时代的智人身材较小，不过这似乎并不是基因演化的结果。生活条件的变化，比如较大的劳动强度（对于孩童也是一样）或虽然丰富但与机体不相适应的饮食，足以对此加以解释。
也正是在饮食方面我们观察到了显性基因变化。我们出生时能够制造乳糖酶，这种酶能分解母乳中含有的乳糖。在随后的发育中，我们将失去制造乳糖酶的能力。由于哺乳动物成年后原则上不再食用奶，肠道细胞便不再制造失去用处的乳糖酶。
新石器时代，农民饲养牧羊、山羊和奶牛，它们提供的鲜奶是有益食品。不过由于缺乏乳糖酶，成年智人不能很好地消化吸收鲜奶。在人类细胞里，负责制造乳糖酶的是LCT基因。大约8 000年前，生活在高加索地区的一个智人的LCT基因发生了突变。这改变了LCT基因的活性，使它在成年智人体内仍能正常制造乳糖酶。这一突变在欧亚大陆的智人种群中迅速扩散。如今，75%的欧洲人体内都有这个突变。在至少四个非洲智人种群（比如马萨伊人）中，也独立发生了类似突变。
这些突变的快速选择证明，通过遗传从父母处得到突变的人确实具有演化优势，存活率也大大提高。所以，可以这样猜想：在年成不好的时候，奶可以作为智人（包括成年智人）的重要食物。另一种假设是，奶可以提供维生素D。既然智人有可能因为继承了祖先的深色皮肤而无法制造足够的维生素D，那么他们就要依赖动物奶来满足自身需求。
在谷物消化方面也发生了类似现象。谷物富含淀粉，在淀粉酶的作用下，淀粉可以转化为糖类。一些地区的农民适应了这种饮食，与狩猎采集者相比，谷物成了他们更加重要的食物来源。他们的后代比祖先更多地携带了AMY1基因，进而能够制造更多的淀粉酶并以更快的速度消化淀粉。
迁移
农民需要木头建造房舍、烹煮食物，不久以后，他们还要燃烧木材烧制陶器。为了获得木头，他们砍伐了村落周围的树林，又不留给树足够长的生长时间以恢复树林。我们已经发现，在某些遗址里，房梁的直径呈逐渐减小之势。山羊的数量越来越多，也对树木的再生造成了危害。在每年播种前，农民都会通过焚烧清除土地上的植被（即所谓的“刀耕火种”），最终导致村落周围地区的沙化。然后，农民就会遗弃旧的村落，另觅环境退化较不严重的地方建设新家园。
此外，随着游民生活走向终结、食物供应日益稳定，农民的人口数量也与日俱增，这就需要更多的土地播种作物、饲养动物。于是，农民开始从近东地区向各个方向迁移，并将他们的技术传遍各地，尤其是位于西北方向的欧洲。
根据考古学家的描述，农民的迁移主要有两条路线。一些人沿着地中海北岸迁移，最终抵达了西班牙。他们留下的遗址里有饰有几何图案的陶器，这些图案是他们用名为鸟蛤（Cardium）的软体动物的壳镶嵌制成的，正因如此，他们的文化得名为“鸟蛤陶文化”。凭借饲养的绵羊和山羊，他们将小麦、大麦和扁豆带到了欧洲。
在北边，另一些人顺着中欧的多瑙河迁移，最终抵达了布列塔尼。他们制作的陶器上带有不同的花纹，后世称之为“线纹陶文化”。他们的迁移给欧洲带来了奶牛和家猪。他们建造的房屋呈长方形，墙壁使用木材和泥土，房顶覆以茅草。
鸟蛤陶文化也好，线纹陶文化也好，它们其实代表了智人的移民潮。不过，这些移民活动极为缓慢，几乎察觉不到，用了4000年时间才抵达大西洋岸边。在迁移的道路上，代表了新石器文明的农民遇到了以打猎和采集为生的人群。但这一次不存在杂交的问题了，因为他们都是智人。不过，他们说的语言不同，生活方式也完全不同。他们在多大程度上相互融合或相互冲突，现在已经不得而知。这两种情形或许都曾经发生；不过，在文化层面上，新石器文化在世界各地都成了主流，中石器时代的生活方式渐渐地消失了。
考古遗址见证了新石器文化在欧洲的推进过程。在法国阿韦龙省特雷耶（Treilles）发现的新石器时代墓地中，出土了24个埋葬于5 000年前的智人骸骨化石，他们的DNA就是印证。线粒体DNA和Y染色体给出了他们母系和父系血统的相关信息。根据研究结果，这24个人都是近亲，父系血统起源于地中海，可能来自土耳其的阿纳托利亚，母系血统则可以追溯至在旧石器时代生活于法国的人类种群。
在两种文明的碰撞和冲突中，代表了新石器文明的农民似乎更加暴力。人们在德国的塔尔海姆（Talheim）发现了7 000年前发生的屠杀留下的遗迹，男人、女人、孩童共计34人惨死于弓箭或石斧之下，骸骨上的伤痕正是新石器文明制造的武器造成的。而在法国阿尔萨斯地区阿克奈姆（Achenheim）的一处遗址里，出土了6个人的化石遗骸，他们死于斧子击打造成的多处骨折。凶手把他们的左臂都砍了下来，要么是作为战利品，要么是为了证明自己高效的屠杀能力。但是遗址里没有发现女人的骸骨，这或许意味着凶手的突袭并未大获全胜。村落里发现了300座储存粮食的筒仓，或许这就是凶手发动袭击的原因吧。在旧石器时代的遗址里，带有箭伤的骸骨并不鲜见，但到了新石器时代，暴力留下的痕迹明显增加，这种情况或许与定居生活带来的财富积累有关。
新石器文明，我们社会的基础，曾经是一个希望。然而，直至今日，我们的历史仍未达到预想的高度。 ——让·纪莱讷（Jean Guilaine）
在德国黑克斯海姆（Herxheim）的线纹陶文化遗址里，考古学家发现了许多人类被烹煮和食用的遗迹。这种食人行为可能是仪式性的，用来纪念死亡的同伴，或是庆祝消灭敌人。
与新石器时代的开端一样，新石器时代的结束也是个渐进的过程，其间发生了多个重要但不同步的事件。比如，大城市的出现——在距今5 500年时即出现了拥有近5万居民的美索不达米亚古城乌鲁克（Uruk）——就可以视为其标志性事件之一。最早的牛拉战车和最早的青铜器也在这一时期诞生。不过，史前时代结束和历史开始的真正标志是书写的登场：大约5400年前，人类历史上最早的书写系统出现在近东地区。然而，新石器时代虽然在近东地区宣告结束，却在西欧继续延续了几千年。直到距今4 000年，西欧才告别了新石器时代，正式迈入了青铜时代。
结语　今天的智人
如今生活在地球上的70亿现代人，都是10万年前居住在非洲的几千个智人的后代。我们现在具有的大部分特征都是从这些智人身上遗传而来的，不过，自从分散到世界各地之后，我们的祖先并没有停止演化。他们生活在多种多样的环境中，与其他人邂逅，改变了自己的生活方式后又被生活方式所改变。我们现在拥有的多样性，正是这段历史带来的遗产。
过去的痕迹
人类在扩张至所有大陆后的几千年里，继续积累基因突变，以适应生活环境和加强文化特色。基因交流从未中断，尤其是在地理上相邻的种群之间。此外，许多事件也促进了基因组的重组，比如征服战争、探险活动、奴隶贸易、经济移民、旅游观光等等。
在近代历史（从地质学意义上说，为最近的5万年）上，人类产生了各种各样的差异：身高、肤色、体毛形态、糖尿病倾向等等。这些多样性里，一部分是人类适应环境而产生的，不过并非所有特征都是适应的结果。在与世隔绝的小种群里，比如在岛屿上，可能会发生遗传漂变现象（genetic drift），某些基因频率会在没有经历自然选择，也就是没有刻意适应环境的情况下发生变化。在所罗门群岛的美拉尼西亚人中就观察到了这种现象。那里的美拉尼西亚人都拥有深色皮肤和金色头发。这种所罗门群岛岛民独有的特征与TYRP1基因的突变有关，而在北欧居民身上发现的控制金发的基因则与此不同。
某些身体特征并非仅由基因决定。人的身高不仅取决于基因，还取决于童年时期的生活方式。因此，身高并不是完全由遗传决定的。的确，在20世纪，欧洲男性和女性的身高有所增加，但这并非人类演化产生的变化，而是生活方式改变的结果——儿童不再下矿工作，与过去相比，他们吃得更好，睡得更多。不过，这种改变是可逆的。如果回归19世纪的生活方式，那人类的身高或许会平均减少10厘米至20厘米！然而，人群之间的身高差异与环境适应是有部分关系的。因纽特人的矮小身材就与极地的严寒气候不无关系，因为这种身材能减少热量损失。但是，其他因素也发挥了作用，比如对某些身体特征的文化偏好。
肤色显然与环境有关。紫外线能引发皮肤癌变，皮肤里的黑色素能防御紫外线的伤害，而黑色素含量高的话，肤色就会较深。此外，黑色素还能避免叶酸的分解，无论是对孕妇体内胎儿的神经系统发育还是对男人精子的产生，叶酸都发挥着重要作用。与此相反，在光照强度较弱的地区，颜色较浅的皮肤有利于更好地合成维生素D，不过合成过程中还是需要一定量的紫外线的。
纬度和黑色素含量之间存在很大的关联。在人体内，黑色素的合成大约受10个基因的控制，其中每个基因都存在几种变体，各个变体的活性有高有低。通常情况下，自然选择根据当地的光照情况影响这些基因的分布。可是，演化从未跟上智人迁移的速度。所以，尽管在欧洲生活了成千上万个年头，人类在很长时间内依然保留了继承自祖先的深色皮肤。
这些情况，是切达人（Cheddar man）的基因告诉我们的。切达人生活于距今1万年的英格兰，彼时尚处于中石器时代。切达人有着深色的皮肤（因为黑色素含量很高）和蓝色的眼睛，与7 000年前生活在西班牙的另一批人毫无二致。（但两种人之间没有任何亲缘关系！）SLC24A5基因参与人体内黑色素的合成，在不久以后欧洲智人皮肤淡化的过程中，这个基因发挥了重要作用。其实，在大约6 000年前，随着第一批近东农民的到来，SLC24A5的等位基因Alal 11 Thr就在欧洲出现了。另一些研究表明，当时生活在斯堪的纳维亚的智人拥有较浅的肤色，这或许是来自中亚的外来基因造成的。在旧石器时代晚期，欧洲的智人尽管为数不多，但无疑表现出了很强的多样性。很有可能，克罗马农人的肤色远比我们通常想象的要黑得多。
这么看来，相对较低的光照强度似乎并未造成太大的选择压力，或许是因为以打猎和采集为生的智人从食物中获得了足量的维生素D。
相反，到了新石器时代，智人的食谱变得较为贫乏。当北方的智人转而从事农耕生活时，肤色变白就变得至关重要，这也是SLC24A5基因的变体在智人种群中迅速传播开来的原因。到了今天，95%的欧洲人体内都含有这个变体。
至于蓝色眼睛，或许是性选择的功劳。从遗传学角度来看，存在好几种不同的蓝色眼睛；不过，在欧洲，蓝色眼睛这一特征与1万年前至6 000万年前出现的单一基因突变有关。但是，这个远不如肤色重要的特征为什么会被选择并遗传下来呢？这是因为，与蓝色眼睛相关的基因突变也在肤色变白过程中发挥了作用，尽管作用微乎其微。不过，这个理由似乎不足以确保这个突变的传播。会不会是这个突变位于某个重要基因附近，所以只是搭了后者的便车才得以遗传下来呢？虽然达尔文对这些基因一无所知，但他还是给我们提供了另一种解释。我们知道，稀有特征会带来非同寻常的吸引力。因此，拥有蓝色眼睛的人可能会留下更多的子孙后代，也就是更多蓝眼睛特征的携带者！
当然了，文化偏好在其他方面发挥了作用，比如现代人的体毛差异。在体毛方面，人类表现出明显的性别二态性，这无疑与我们远祖的偏好有关。不过，男人有胡子而女人没胡子，是因为女人偏爱有胡子的男人还是因为男人喜欢没胡子的女人导致的呢？同样，现代人今天具有的多样性，也正是人们择偶品味不尽相同产生的结果。
基因的多样性
人类的DNA由32亿个核苷酸排列而成，这些核苷酸是分子的组成部分。在这其中，只有大约500万个核苷酸是因人而异的。换句话说，任意两个人在遗传物质上的相似程度达99.6%。从基因角度考虑的话，人与人之间的差异程度比黑猩猩之间的差异程度要小。
不同种群在这些个体突变的频率上存在差异。个体突变的频率导致了种群之间的差异。没有任何基因突变只存在于一个种群中并且出现在这个种群的每个个体身上。换言之，任何个体变异都不是某个大陆或某个种群所独有的。
另外，许多研究结果表明，同一种群的个体之间的基因多样性要大于两个不同大陆上的两个不同种群之间的平均基因变异性。任何特定的种群内部都包含人类整体基因多样性的80%。尽管从外表上看不出来，但来自卡拉哈里沙漠的两个布须曼人之间的基因差异可能比一个欧洲人和一个亚洲人之间的基因差异更大。
个体之间的差异很小，但并非随机分布。尽管不存在某个种群独有的标记，但是，以个体变异的特定组合为基础，我们可以相当容易地将某个DNA归于某个大陆。与此相反，知道了某个个体的来源并不能让我们了解这个个体的基因情况。
对数千人进行的基因研究表明，人类存在几个大的地理类群。其中一项研究将人类分为以下7个不同类群：撒哈拉以南非洲人、欧洲人、中东人、中亚和南亚人、东亚人、大洋洲人以及美洲原住民。另一项研究则将人类分为以下3个不同类群：撒哈拉以南非洲人，欧洲、北非和西南亚人，亚洲其他部分、大洋洲和美洲人。
整体而言，基因相似性和地理邻近性之间存在很强的一致性。南北方向上观察到的基因变异性高于东西方向上观察到的基因变异性，这也与环境适应性随着纬度增加而更加明显的情况相符。
人类种族存在吗？
无论是在历史上还是在文化上，人类多样性的问题都关系到种族是否存在。显然，这是个非常敏感的问题，因为在人类的历史上，物种分类往往与划分种族等级甚至灭绝某些种族的企图有关联。政治利益或经济利益常常隐藏在伪科学的考量身后。
在德国纳粹于20世纪推行种族灭绝政策之后，联合国大会于1965年通过了《消除一切形式种族歧视国际公约》。在法国，种族并非官方认可的类别，官方甚至禁止进行任何将人群以“种族”划分的调查。在其他许多国家却不是这种情况，那里的居民必须明确自己的种族归属。通常情况下，我们已经不再按传统方式将人们划分为白种人、黑种人、黄种人这三大种族，而是将人们划入根据肤色和地理来源等标准人为构建的类别[比如在美国就存在黑人（非裔）、白人、西班牙裔、美洲原住民等类别]。
今天的生物学已经不再认可传统种族的存在，并将传统的种族划分视为毫无逻辑，不但无用还往往有害。然而不争的事实是，大多数人仍会提及种族。即便人类无法分类，但“人类种族不存在”的论断似乎也是在挑战基本常识。这么一来，科学可就站到我们对世界认知的对立面了（生物学并不是唯一出现这种情况的领域，地球和太阳的相对运动就是另一个例子）。
该怎么理解科学知识和一般感知之间的矛盾呢？在面对极为多变的集合时，我们本能地倾向于找出有利于进行信息组织的极端情形，偶尔会将数量上更多的中间情形抛在脑后。挪威人显然与日本人不同，我们也能够准确无误地将来自这两个群体的任何一个个体归入其中一个群体。不过，此举并不意味着给“挪威人种”和“日本人种”甚或“白种人”和“黄种人”下了定义，因为如此一来，就等于将从西欧到远东的其他民族都置于一边了，而他们显然不能被列入“挪威人种”或“日本人种”中的任何一类。
传统的“三大人种”划分依据的标准只有一个，那就是肤色，而肤色实际上是不同种群分别独立获得的。这意味着，非洲、印度和澳大利亚的黑肤色种群与各自比邻而居的浅肤色种群的亲缘关系比他们彼此之间的亲缘关系更近。
显而易见，最先试图定义种族的人类学家不得不先建立子类，然后再把子类继续细分，以至于产生了几十个“种族”，而最终这些“种族”还是不免与族群、种群或民族混为一谈。这些“种族”便成了所谓的“原始意象”（archetype）——它们以形态、地缘、文化、宗教标准建立，不具备任何精确性，而且没有任何生物学层面的事实依据。
生物学上的“种族”（race）概念
在生物学家眼中，race指野生物种内部与其他种群相互隔离且在大小、外形或行为上有明显区别的群体。这个术语差不多是品种（更多地用于植物）或亚种的同义词。这也是与亲本物种分化的一个阶段，最终可能导致新物种的出现。
race一词的另一个含义是指通过严格控制家养动物的繁殖，获得非常独特的动物，比如暹罗猫或奥布拉克奶牛。
这两个定义中的任何一个都不适用于我们人类，因为人类的繁殖并不受控，而且人类个体并非相互隔离。
不过，有些科学家和企业家提出，“种族”概念具有潜在的医疗利益。实际上，随着DNA测序新技术的诞生，人们可以考虑开发基于基因的个性化药物，这样既能虑及病人对不同病原体的易感性，又能虑及他们对治疗的不同反应。英国和中国正在实施的旨在建立巨型遗传信息库的“十万人基因组”计划正是希望实现这个目标。一些专门从事DNA测序的公司正在推动建立基因组图谱，以详细说明我们基因组中存在的所有潜在的有害突变。
某些疾病在特定种群中更加常见。比如，在阿什肯纳兹犹太人中，BRCA—1基因和BRCA—2基因上发生的特殊突变增加了乳腺癌的风险；在撒哈拉以南的非洲人中，能够引发镰状细胞性贫血的基因突变则更加常见（因为这个突变能够保护携带者免遭疟疾的困扰）。有些实验室提供“人种”检测，据称能让受测对象知道祖先的地理起源。制药业开发了针对特定种族的专用药，比如因为开发过程缺乏科学严谨性、概念模糊带来重大风险而在2005年引起很大争议的拜迪尔（BiDil）。
一棵交了好运的树上长出了一根出人意料的树枝，这根树枝上又发出了出人意料的枝杈，这个枝杈上又萌出了一个小小的细枝，这个细枝就是智人。 ——斯蒂芬·J. 古尔德，1989
不过，即便某些疾病只在特定种群中高发，也不能证明这些疾病是由基因决定的。还存在着与病人的社会背景或文化背景有关的可预见因素。在医学上，病人的直系尊亲是远比病人所属“种族”更有用的信息，何况“种族”更多是个文化概念，而非生物学事实。按这种逻辑，在欧洲或在美国，双亲分别为欧洲裔和非洲裔的儿童在文化上会被视为“黑人”，但这种划分并未给出一丁点儿生物学层面上的依据。
基因组学在医学上的另一个应用，是将人类的演化纳入疾病研究的考量范畴之中，这正是演化医学的基础。而演化医学的一个目标就是弄明白旧石器时代选择的基因可能对当今人类造成怎样的负面影响，毕竟我们的生活条件和饮食习惯都与祖先的截然不同。
未来的人类
由于人类的历史伴随着重大的解剖特征变化和心理变化，我们禁不住设想人类未来将如何演化。我们很自然地会循着两个看似符合逻辑的方向设想：首先是过去经历的转变在未来的延续，然后是对现代环境和新的生活条件的适应。按照这个思路，未来的人类将拥有硕大无比的头颅、虚弱不堪的身体、高度发达的手指（用于敲打微型键盘）和适于观看屏幕的双眼。
这种观点建立在对演化机制缺乏了解的基础上（参见第19页《人类演化：达尔文vs拉马克》）。即便我们真的需要，我们也没有理由获得更加修长、更加强壮、更加灵活的手指，除非自然选择发挥作用并有利于偶尔获得这些特征的人生存和繁衍（而且这些特征还须是源于基因的特征）。而现在，似乎还未具备这些条件，在很长的时间里，我们的手指恐怕还要保留现在的外形和能力。
至于头颅，须知演化是受限于解剖结构的。我们可以设想拥有硕大无比的头颅（和硕大无比且更加出色的大脑）的个体在日常生活中占据优势并且子孙满堂。但这样一来，分娩将会更加艰难，除非胎儿较早出生，可这样就增加了早产的风险，或者骨盆将发生改变，但那样又面临干扰双足行走的风险。
另外有一种可能发生的演化，虽然比较不明显但时常被提及，那便是智齿的演化。智齿是我们的第三臼齿，在发育过程中萌出较晚，萌出时往往令人难受，须由牙医拔除。约有20%的人只长部分智齿或完全不长智齿。我们祖先颌骨的减小阻碍了第三臼齿的正常萌出，引发了龋齿、肿块，甚至导致邻近牙齿或颌骨的破坏。在旧石器时代，这些情况都是可能导致死亡的。因此，智齿被置于强烈的负向选择之下。但在今天，这个负向选择已经消失，至少在发达国家是如此。在没有负向选择的情况下，即便基因突变持续累积，演化也不会再朝着特定方向进行了。
然而，没有任何理由认为，我们将会抵达演化终点并将不再继续发生转变。现如今全球人口已达70亿，自旧石器时代以来，人类的多样性显著增加，基因突变在基因组中持续累积，而自然选择也不再像过去那样严苛。我们的婴儿死亡率大大降低，我们生产药物对抗致命疾病，在现代医学的帮助下，本身不孕不育的夫妻也有了繁衍后代的可能。这么一来，人类究竟有哪些实际的演化可能性呢？
有些方面依然在自然选择的作用之下。每当细菌或病毒引发流行病的时候，人们就会发现有些人具有天然的抵抗力；与此相反，如果是严重的流行病，另一些人就会因此而丧命。在这种情况下，自然选择以粗暴的方式发挥作用，一些人失去生命而另一些得以幸存。流行病结束后，由于死亡率不同，具有抵抗力的人群占比上升，种群整体对这种流行病的抵御能力便有所上升。
由人类免疫缺陷病毒（HIV）引起的艾滋病（AIDS）就是如此。在人类免疫缺陷病毒攻击人体淋巴细胞时，CCR5基因会发挥作用，其变体CCR5—Δ32能够阻断病毒。在亚洲西部和欧洲，10%的人拥有CCR5—Δ32变体，人们猜想，这个变体是因为能够保护人体免受另一种恶性疾病（或许是天花）的侵害而在过去被选择的。在非洲，这个变体更加鲜见，但在得了艾滋病的人群中，它正因为艾滋病的较高致死率而经历着强烈选择。
同样的，人群中可能存在一些对合成分子较不易感的个体。而某些合成分子（比如内分泌干扰物）似乎是造成发达国家不孕不育率上升的罪魁祸首。那么，对这些合成分子的抵抗性将自动成为正向选择的目标，因为具备抵抗性的人类个体拥有更高的生殖能力。不过，这需要人类长时间接触这些合成分子才行。我们还是祈祷这种情况不要发生，否则人类的演化可就要告终了！
这类不太引人注意的生理演化可能伴有更为明显的诱发变异，但至少要在几个世纪后才能看见。至于更加显而易见的解剖特征变化，就需要等待更久，可能要等上几千年，而那时人类的生活环境如何，现在的我们是无法想象的。
从长远来看，在以百万年为单位计量的物种演化进程中，只要充分考虑人类现在的身体结构和实际发挥作用的生物学原理，我们大致可以预见人类将会发生的任何改变。至于在科幻小说里，一切皆有可能！
控制演化的痴心妄想
自19世纪以来，优生学企图通过对生育的“科学”控制来达到改良人类的目的。优生学往往带有浓厚的种族主义色彩；德国纳粹在20世纪实施的种族灭绝政策，还有对数百万人实施的绝育政策（如20世纪70年代前的瑞典或美国），使得优生学成为一门臭名昭著的学科。
随着医疗辅助生殖技术和DNA测序技术的进步，优生主义观点悄悄卷土重来了。当人们试图避免将携带严重遗传病的胚胎植入女性子宫的时候，没有人会表示不满；借助同样的技术，父母还能为未来的孩子选择理想的基因。人类基因组大规模测序项目还有另一个目的，那就是找出在其他方面发挥作用的基因，比如体型、肤色、智力甚或性格。
这些项目既虚幻又危险。说它们虚幻是因为，第一，我们成为什么样的人并不完全由基因控制，社会环境和家庭环境的作用或许更大；第二，基因的“质量”往往取决于携带者的生活条件。说它们危险，则是因为基因选择能力很容易转变成社会控制。在胚胎或胎儿性别检测成为可能之后，有些国家的男婴出生量急剧增多。最后，在胚胎上实施的任何操作都会产生长期影响，因为会波及被“选择”或改造的个体的子孙后代。今天，大多数国家禁止改造人类胚胎，但是资金或政治上的压力或许会在某一天突破伦理上的障碍。
另一种意识形态，超人类主义，旨在通过合成生物学、神经学、纳米技术或计算机科学等学科的结合，超越人类现有的生理极限。超人类主义不仅要修复人类机体，还要“提升”人类的体能和智力。与旨在改良人类本性的优生学相反，超人类主义考虑的首先是个体。不过，一些超人类主义者也提出了着眼整个人类物种未来的远期目标，比如无限延长我们的寿命。
近代史上，引导人类演化、打造“新人类”的尝试往往涉及种族灭绝或屠杀不符合标准的群体。这些梦想（或梦魇）无助于解决人类面临的各种问题，如资源过度开采、人口过剩、疾病流行、贫困等等。如果真想改变人类，我们首先应该倾向于改变人与世界的关系，还要充分考虑到人类这一物种所具有的种种多样性。
术语表
（DNA）序列 组成DNA的腺嘌呤（A）、胸腺嘧啶（T）、鸟嘌呤（G）、胞嘧啶（C）四种碱基的精确排列顺序。
DNA 脱氧核糖核酸，为包含生物发育和功能所需信息的分子。人类DNA由32亿个核苷酸（分为A、T、C、G四种）构成。细胞内的DNA分布在多个称为染色体的细丝上。
阿舍利文化 距今140万年至20万年的文化，以制造两面器为特征，往往与直立人和海德堡人有关联。
奥杜韦文化 是人类创造的最古老文化（距今约330万年至130万年），以制造粗糙的砍砸器为特征。
傍人南方古猿的全部邻近物种，拥有粗壮的骨骼和硕大的臼齿，在约100万年前灭绝。
测序测定个体的一个DNA片段或全部DNA的序列。
单倍群 多个基因组成的DNA片段，其序列视个体或种群不同而不同。由于单倍群为DNA片段通过累积突变而衍生得来，通过研究可以回溯单倍群之间的亲缘关系。
等位基因 一个基因往往有多个序列相异的变体，这些变体被称为等位基因。变体的活性有高有低，甚至可能完全失活。
分支演化 以物种共有的新特征（即“衍生特征”）为基础的系统发生树构建方法。
古人类学 以人类起源和演化为研究对象的学科。
古生物种 仅能通过化石了解的已经消失的动物或植物物种。
基因含有细胞所需物质（往往是蛋白质）的制造所需信息的DNA片段。
基因组 物种的全部DNA。DNA分为细胞核DNA和线粒体DNA。个体的基因组即个体的基因型。
旧石器时代 史前时代最古老的时期，开始于约300万年前人属和最初的石质工具的出现，结束于1.2万年前冰期末期。
旧石器时代早期 与奥杜韦文化和阿舍利文化对应的时期。
旧石器时代中期 始于大约30万年前。在欧洲，该时期与尼安德特人及莫斯特文化有关。
旧石器时代晚期 在欧洲，与智人有关，开始于距今约4万年，结束于距今约1万年的冰期结束之时。
旧世界 欧洲、非洲和亚洲，与曾被称为新世界的美洲相对应。这个名称诞生于欧洲人发现澳大利亚和南极洲之前。
两面器 对称的切削石块，往往呈杏仁形，用作工具或武器。
灵长目 全部的猿、狐猴及二者的共同祖先。
莫斯特文化 尼安德特人和非洲早期智人创造的文化。
染色体 携带个体遗传信息的DNA细丝。
人科包括所有猿类的灵长目动物科，包括猩猩、黑猩猩、倭黑猩猩、大猩猩、人类及其祖先。
人亚族 包含与智人的亲缘关系比与黑猩猩的亲缘关系更近的灵长目动物亚族，包括乍得沙赫人、图根原人、南方古猿、傍人及人属的全部物种。
山猿在意大利和东非发现的可追溯至距今900万年至700万年的一种已经灭绝的灵长目动物。某些古生物学家认为它能双足行走，不过双足行走对它的重要性仍未有定论。
适应在演化过程中动物或植物随着环境变化而改变的现象。
适应性基因渗入 基因渗入指基因从一个物种向另一个物种的转移，比如基因在尼安德特人和智人杂交时发生的转移。当发生转移的基因对个体有用且通过正向选择在其基因组里保留时，即为适应性基因渗入。
手锤以石头、骨头或鹿角制成的用于切削石头的工具，用其反复敲打石块可获得石片。
突变 DNA序列的改变。突变是偶然发生的，是等位基因和单倍群存在的原因。基因发生突变时，往往其活性会改变。
物种在生物学上，指互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合；前述标准在古生物学上不适用，在古生物学中，人们根据化石的解剖特征确定物种。
系统发生树 一种呈现自祖先物种演化而来的多个物种之间的亲缘关系的树状图。可为某个生物类群（如脊椎动物、哺乳动物）或某些物种（如人亚族、人属）构建系统发生树。
线粒体DNA 线粒体中含有的DNA。线粒体为负责制造能量的细胞器。只有女性能通过卵细胞将线粒体DNA遗传下去。
镶嵌演化现象 化石物种通常表现出同时具有原始特征和衍生特征的现象。实际上，演化并非以同样的速度作用于所有器官上。所以，有些灭绝的人亚族物种虽然已能双足行走（演化创新），但大脑仍与其祖先相似。
小石叶 以燧石或黑曜石制成的小型工具，往往安装在支撑物上（如鱼叉、鱼钩等）。
新石器时代 在距今约1万年的近东地区紧接中石器时代而来的时期，在此期间，随着农业和畜牧业的发展，原先以狩猎和采集为主的经济被以农业生产为主的经济所取代。
性别二态性 同一物种的雌性个体和雄性个体的解剖学差异（性器官除外）。
衍生特征 表现形式与祖先不同且在演化过程中发生了改变的特征，又称“派生特征”。人类的非对生大脚趾为一种衍生特征，因为这个特征仅在人类世系中出现并使人类有别于其他灵长目动物。
演化自生命在地球上起源以来物种诞生和转变的历史。
幼态持续 物种演化过程中发育时间顺序改变导致的成年期仍保留幼年特征的现象。
原康修尔猿 一种已经灭绝的灵长动物，最古老的化石可追溯至大约2 300万年前的中新世。
爪哇人 欧仁·杜布瓦于1891年在爪哇发现的化石，起初被命名为直立猿人，最终在1950年被归为直立人。
正向选择 在物种演化过程中，基因组的改变（比如发生突变）有利于携带者生存或繁殖的，称为正向选择；突变缩短了携带者生命或降低了携带者生殖力的，称为负向选择。
中石器时代 上承冰期结束时的旧石器时代、下启动植物被大量驯化的新石器时代的时期，以狩猎、捕鱼和采集以及小石叶的制造为特征。
转录细胞使用DNA携带的信息制造所需分子的机制。
祖先特征 表现形式与祖先相同的特征，又称“原始特征”或“祖传特征”。人类的对生大拇指为一种祖先特征，因为从至少5 000万年前起所有灵长目动物都具有了这个特征。
最近共同祖先 两个物种共有的最近的祖先物种，通常不得而知。

2024-11-03
马特·里德利《基因组》（节选）

目录
第一号染色体生命
第二号染色体物种
第三号染色体历史
第四号染色体命运
第五号染色体环境
第六号染色体智慧
第七号染色体本能
X和Y染色体冲突
第八号染色体自身利益
第九号染色体疾病
第十号染色体压力
第十一号染色体个性
第十二号染色体自我组装
第十三号染色体史前
第十四号染色体永生
第十五号染色体性别
第十六号染色体记忆
第十七号染色体死亡
第十八号染色体疗法
第十九号染色体预防
第二十号染色体政治
第二十一号染色体优化人种论
第二十二号染色体自由意志
第一号染色体生命
一切归于腐朽之物皆源于他方一个接一个地，我们抓住生命的气息而后死亡如同产生于物质之海的泡沫上升、破裂、重归海洋 ——亚历山大·波普：《论人类》
太初有“词”。这个词以自己携带的信息充斥了整个海洋，永不停息地复制它自己。这个词发现了如何重组化学物质，以便抓住熵的潮流中微小的逆流并给它们以生命。这个词把我们这个星球上的陆地从布满灰尘的地狱变成了郁郁葱葱的天堂。最终，这个词到达了鼎盛期，巧夺天工地造出了一种粥样的、被称为人脑的机器。这个机器发现并意识到了这个词的存在。
每次我这么想的时候，我的那个粥样的机器就翻腾个不停。地球有40亿年的历史，我却幸运地活在当今这个时代；地球上有500万个物种，我却幸运地生为一个有意识的人；地球上有60亿人，我却荣幸地生在发现了这个“词”的国家；在地球所有的历史、地理环境与生物环境中，偏偏就在我出生的5年前、距离我出生的地方只有200英里处，我这个物种的两个成员发现了DNA的结构，从而揭示了宇宙中最大、最简单而又最惊人的秘密。如果你愿意，你可以嘲笑我的激情，就当我是个可笑的物质至上者吧：居然对一个缩写词（指DNA）都肯倾注这么大的热情。不过，跟着我到生命的源头去看看吧，我希望我能够让你相信这个词是多么迷人。
1794年，博学的诗人、医生伊拉斯谟·达尔文（Erasmus Darwin）这样问道：“远在动物存在之前，地球和海洋就充满了植物；远在某种动物存在之前，其他动物就已存在。在这种情况下，我们能否假设：所有的有机生命都源自于，且仍然产生于，同一种有活性的纤维？”这样一个猜想在那个时代被提出来，让人惊愕。不仅仅是因为“所有有机生命都有共同来源”。这一大胆假说比他的孙子查尔斯·达尔文有关这一题材的书还早了65年，也是因为“纤维”这一古怪的用词。确实，生命的秘密就是在一条纤维里。
但是，一根纤维怎么就能创造出有生命的东西？生命是不大好定义的，但是所有生命都有两种能力：复制自己的能力和制造秩序的能力。有生命的东西都能够造出跟自己差不太多的拷贝：兔子生兔子，蒲公英生蒲公英。但是兔子还会干一些别的。它们吃的是草，却能将其转化成兔子的骨与肉，不知怎么一来，就在混沌随机的世界里造出了有秩序有复杂性的身体。它们并没有违反热力学第二定律——在一个封闭的系统里所有事物都倾向于从有序变成无序。这是因为兔子不是一个封闭系统。兔子是靠消耗大量能量才得以建立一个有序的、复杂的局部结构——它的身体的。用爱尔温·薛定谔_{（物理学家，《生命是什么》的作者）}的话说：生命是从环境里“把秩序喝进来”的。
生命的两种能力，关键都在于信息。复制的能力之所以有可能存在，是因为存在一种“配方”，里面有制造一个新的身体所需要的信息。兔子的卵就带有组装一只新兔子的指南。通过新陈代谢来创造秩序同样也靠的是信息——用来建造和维修制造秩序的机器的指南。一只有生殖能力和代谢能力的成年兔子，是由它的生命纤维预先规划设计好的，正如一个蛋糕是在烘蛋糕的配方里就规划设计好了。这个想法可以直接追溯回亚里士多德。他曾说过，鸡的“概念”是隐含在鸡蛋里的，而橡树把自己的计划直接传达给了橡实。亚里士多德的这种模糊的信息学观念，在被物理学与化学埋没了多年之后，又被现代遗传学重新发现。麦克斯·德尔布吕克（Max Delbruck）（遗传学家）曾开玩笑地说：这位古希腊哲人应该因为发现了DNA而被追授诺贝尔奖。
DNA的纤维就是信息，是一种用化学物质的密码写成的信息，每一个字母都是一种化学物质。而且，DNA密码事实上是用一种我们能够理解的方式写的，这真有点令人大喜过望。就像书面英语一样，遗传密码是写在一条直线上的线性语言；就像书面英语一样，遗传密码是数码式的，意思是说每一个字母都同等重要。更有甚者，DNA的语言比英语简单多了，因为它的字母表里只有四个字母，按惯例被称为A、C、G和T。
当我们现在知道了基因就是用密码写的“配方”之后，就很难想象在过去只有那么少的人曾经想到过这一可能性。20世纪的上半叶有一个没有被回答的问题在生物学里一再出现：什么是基因？当时，基因简直是神秘莫测。让我们回到——不是DNA对称结构被发现的1953年，而是此前10年——1943年。10年之后在破解DNA的秘密上做了最突出工作的人，那时候都在干别的。弗兰西斯·克里克（Francis Crick）当时在朴次茅斯（Portsmouth）那边设计水雷；只有15岁的“神童”詹姆斯·沃森（James Waston）刚刚在芝加哥大学注册读本科，而且已立志用自己的一生去研究鸟类学；莫里斯?威尔金斯（Maurice Wilkins）在美国协助研制原子弹，罗萨琳·富兰克林（Rosalind Franklin）则在替英国政府工作，研究煤的结构。（四人是在1953年发现DNA结构上贡献最大的科学家；罗萨琳因罹患癌症于1958年去世；另外三人于1962年获得诺贝尔生理学及医学奖）
还是1943年，在奥斯维辛集中营，约瑟夫·门格尔_{（Josef Mengele，纳粹医生，在犯人身上进行人体实验，被称为“死亡天使”）}正对孪生子们进行致命的折磨，他的“科学研究”其实是对科学研究的一种极其恶劣的嘲讽。门格尔是在试图理解遗传学，但他的优化人种论已经被证明不是正确的途径。门格尔的实验结果对他之后的科学家是没有用处的。
1943年，在都柏林，一个从门格尔那种人手下逃出来的难民、物理学家爱尔温·薛定谔，正在圣三一学院讲授一个名为“什么是生命”的系列讲座。他是在试图定义一个问题。他知道染色体载有生命的秘密，但是他不知道染色体是怎样储存生命秘密的。“就是这些染色体……以某种密码写就的程序，储存了每一个体发育的整个模式，以及发育成熟之后每一个体应有的功能。”他说，基因那么小，小得不可能是任何其他东西，而只能是一个大分子。他的这一见解影响了一代科学家——包括克里克、沃森、威尔金斯和富兰克林——去攻克一个顿时不再是无从下手的难题。但是，已经如此接近答案的薛定谔却偏离了轨道。他认为这个大分子之所以能够成为遗传物质的载体，是由于他心爱的量子理论。而他对自己这个想法执迷的研究最后被证明是走进了一条死胡同。生命的秘密跟量子没有任何关系。关于生命的答案并不出自物理学。
1943年，在纽约，一位66岁的加拿大科学家奥斯瓦尔德·埃弗里（Oswald Avery），正在对一个实验进行最后的调整。这个实验将决定性地证实DNA是遗传的化学表现。这之前他已经发现，仅靠吸收一种化学溶液，一种肺炎菌就能从无害转变为有害。到了1943年，埃弗里已经总结出：发生了转变的东西就是DNA。但是他在发表自己结果的时候，表达得过于谨慎，以至于一段时间之内都没几个人注意到他的成果。在1943年5月写给他兄弟罗伊的信里，埃弗里也只比以前稍稍大胆了一点：
如果我们是正确的（当然，这一点还有待证明），那就意味着核酸（DNA）并不仅仅是结构上重要，而是功能上活跃的物质，能够决定细胞的生化活性与特性。那么，就有可能利用已知的化学物质去准确地改变细胞并使这种改变遗传下去。这是遗传学家长期的梦想。
埃弗里几乎已经走到这一步了，不过他仍然只是从化学的角度在思考。简·巴普提斯塔·冯·赫尔蒙特_{（Jan Baptista van Helmont，化学家、生理学家、医生）}在1648年说过：“一切生命都是化学。”但这只是一种猜想。1828年，弗雷德里克·维勒（Friedrich Wohler）说：至少有些生命是化学。那时他刚用氯化氨和银的氰化物合成了尿素，从而打破了一直以来化学的世界与生物的世界之间不可逾越的界限。在他之前，尿素是只有生物体才能制造出来的东西。“生命就是化学”这句话是对的，不过也很煞风景，就像谁说足球就是物理一样。大概计算一下，生命可以说是三种原子的化学。生物体中98％的原子都是氢、氧和碳。但是，生命整体的特性，比如说遗传性，才有意思，而不是组成生命体的每一个零件。埃弗里想象不出来，是DNA的什么化学性质使它能够载有遗传性的秘密。这个问题的答案也不是从化学来的。
1943年，在英国布莱奇利（Bletchley），一位天才数学家艾伦·图灵（Alan Turing）正在眼看着他最有洞察力的一个想法在绝密环境下变成真实的机器。图灵论证过：数字能够自己进行运算。为了破解德国军队洛伦兹编码器的秘密，英国制造了一台建立在图灵理论上的计算机：克劳索斯。这是一台多功能机器，有可以修改的内存程序。当时没有人意识到图灵也许比任何人都更接近生命的秘密，图灵自己更是没想到。遗传，实际上就是一种可以修改的内存程序；新陈代谢就是多功能的机器。把两者连接起来的是一种密码，是以物理的、化学的，甚至是非物质的形式存在的一种抽象信息。它的秘密就在于它能够复制自己。任何能够利用这世界上的资源把这密码复制出来的事物，就是有生命的东西。这种密码最可能的存在方式是数码方式：一个数字，一个短程序，或是一个词。
1943年在新泽西州，一个有点与世隔绝的沉静的学者，克劳德·香农（Claude Shannon），正在琢磨一个他几年前在普林斯顿大学的时候想到的想法。香农的这个想法是说，信息和熵是一个硬币的两面，两者又都与能量有紧密的联系。一个系统的熵越小，它所含的信息就越多。蒸汽机之所以能够收集煤燃烧发出的能量并把它转化为旋转运动，是因为蒸汽机本身有很高的信息含量。人体也是如此。亚里士多德的信息理论与牛顿的物理学在香农的大脑中相遇了。像图灵一样，香农也根本没有想到生物学。但是香农这一深刻的想法，却比堆积如山的物理学与化学理论更接近于“什么是生命”这一问题的答案。生命也是数码信息，是用DNA写成的。
太初有“词”，这个词却不是DNA。DNA的出现，是在生命已经出现之后，在生物体已经把两种活动——化学反应与信息储存，新陈代谢与复制——分工进行之后。但是DNA一直存着这个“词”的一份纪录，在漫长的岁月里将其忠实地传递下来，直到今天。
想象一下显微镜下一个人类卵子的细胞核。如果有可能的话，你可以把23对染色体按大小重新排列一下，大的在左边，小的在右边。现在在显微镜下放大一下最左边的一根——纯粹是随意地，这根染色体被称为一号染色体。每一根染色体都有一条长臂和一条短臂，由一个被称为中心体的窄节所连接。如果你仔细地读，你会发现，在一号染色体的长臂上接近中心体的地方，有一串长约120个字母（A、C、G和T四种字母）的序列，重复出现了很多次。在每两个这种序列之间，是一些没有什么规律的“文字”，但这120个字母组成的“段落”却像一段耳熟能详的乐曲一样重复出现，总共出现了100次以上。阅读这种“段落”也许就是我们与最初的“词”最接近的时候。
这个短“段落”是一个小基因，它也许是人体内最活跃的一个基因。它的120个字母不断地被制成一小段RNA，称为5SRNA。它与其他一些RNA、一些蛋白质被仔细地缠在一起，住在一个名叫核糖体的结构里。核糖体是把DNA配方翻译成蛋白质的地方。而蛋白质又是使得DNA能够复制的东西。借用萨缪尔·巴特勒（Samuel Butler）（19世纪作家）的风格，我们可以说：蛋白质就是一个基因用来制造另一个基因的手段，基因就是蛋白质用来制造另一个蛋白质的手段。厨师需要做菜的菜谱，而菜谱也需要厨师。生命就是蛋白质和基因这两种化学物质的相互作用。
蛋白质代表的是化学反应，是生命活动、是呼吸、是新陈代谢、是行为——生物学家们称为“表现型”的那些东西。DNA代表的是信息，是复制、是繁殖、是性活动——生物学家们称为“基因型”的那些东西。两者都不能单独存在。这是一个经典的“先有鸡还是先有蛋”的问题：是先有基因还是先有蛋白质？先有DNA是不可能的，因为DNA只是一件含有些数学信息的无生气的东西，不能催化任何化学反应，非得有其他东西帮忙不可。先有蛋白质也不可能，因为蛋白质虽然能进行化学反应，却不能精确地复制自己。这样看来，不可能是DNA创造了蛋白质，也不可能是蛋白质创造了DNA。如果不是最初的那个“词”在生命的纤维中留下了一点淡淡的痕迹，这个谜团也许会一直让人觉得奇怪和糊涂。正如我们现在已经知道的，蛋是在鸡出现之前很久就有了的（爬行类动物是所有鸟类的祖先，它们是下蛋的），现在也有越来越多的证据表明在蛋白质存在之前有RNA。
在当代，RNA是把DNA和蛋白质这两个世界联系起来的一种化学物质。它的主要作用是把信息从DNA语言翻译成蛋白质语言。但是，从它的行事特点看来，它几乎毫无疑问地是二者的祖先。如果DNA是罗马城，RNA则是希腊；如果DNA是维吉尔（Vivgil），RNA就是荷马。
RNA就是那个“词”。RNA留下了五条线索，使我们看到了它是先于DNA和蛋白质的。直到今天，要想改变DNA序列中的任何组成部分，都是通过改变RNA序列中相应的组成部分而完成的，没有更直接的办法。而且，DNA语言中的字母T是从RNA语言中的字母U造出来的。现代的很多酶，虽然是蛋白质，但它们要想正常发挥功能却离不开一些小的RNA分子。更有甚者，RNA与DNA和蛋白质还有不同的一点，就是RNA能够复制自己，不需要任何外界帮助：给它正确的原料，它就能将其织成一条信息链。不管你观察细胞的哪一部分，最古老最基本的功能都需要RNA的参与。基因中的信息是以RNA的形式被一种需要RNA才能正常工作的酶提取出来的。这个信息，是由一台含有RNA的机器——核糖体翻译出来的。而在翻译过程中需要的氨基酸，又是一种小小的RNA分子给搬运过来的。在所有这些之上，还要加上一条，与DNA不同的是，RNA可以做催化剂，可以把分子——包括RNA——打断或是连上。它可以把RNA分子切断、连上，造出RNA的组成成分，把一条RNA链加长。一个RNA分子甚至可以在自己身上做“手术”，把自己的某一段切除，再把两个自由端接在一起。
20世纪80年代早期，托马斯·赛克（Thomas Cech）和西德尼·奥特曼（Sidney Altman）（他们因在RNA功能方面的工作于1989年共获诺贝尔化学奖）发现了RNA的这些惊人特性，从而彻底改变了我们对于生命起源的理解。现在看来，最早的基因，“原基因”，很有可能是复制与催化合为一体的，是一个消耗自己周围的化学物质以复制自己的“词”。它的结构很有可能就是RNA。把任意一些RNA分子放在试管里，然后一遍遍地选出它们中间催化作用最强的成员，就可以重现RNA从什么也不是到具有催化作用的“进化”过程——几乎可以说是又进行了一次生命起源。这种实验最惊人的结果之一，就是最后得到的RNA往往含有一段序列，读起来酷似核糖体RNA基因——比如说，一号染色体上的5S基因——的序列。
在第一只恐龙出现之前，在第一条鱼出现之前，在第一条虫子、第一棵植物、第一种真菌、第一种细菌出现之前，世界是RNA的世界。这大概是40亿年前，地球刚刚形成不久，宇宙也仅仅有100亿年历史的时候。我们不知道这些“核糖生物体”是什么样子的。我们只能猜想它们是怎样“谋生”的——从化学意义上说。我们不知道在它们之前有什么，但从存留在今天的生物中的线索看来，我们可以比较肯定地说RNA世界确实存在过。
这些“核糖生物体”面临着一个大问题。RNA是不太稳定的物质，几小时之内就可以解体。如果这些“核糖生物体”去了比较热的地方，或是试图长得比较大，它们自己的基因就会迅速坏死，遗传学家们称为“由错误而引起的灾难”。后来，它们中的一个从试验与错误中发明了一种新的、更“坚强”的RNA的变种：DNA。它还发明了一套从DNA复制RNA的系统，包括一种我们称为“原核糖体”的机器。这套系统既要快速又要准确，于是它把遗传信息连在一起的时候每次连三个字母。每个三字母的小组都带有一个标签，使得它能够更容易地被“原核糖体”找到。这个标签是氨基酸做的。很久以后，这些标签被连在一起，制成了蛋白质，而那些三个字母的“词”，则成了制造蛋白质的密码——遗传密码。（所以直到今天，遗传密码每个词都有三个字母，作为制造蛋白质的配方的一部分，每个词拼出20个氨基酸中的一个。）这样，一个更复杂的生物就诞生了。它的遗传配方储存在DNA里，它体内的各种“机器”是蛋白质做成的，而RNA则在两者之间架起一座桥梁。
这个生物名叫露卡（Luca）——所有物种在分化之前最后的一个共同祖先。（原文是The Last Universal Common Ancestor，缩写为LUCA）它长得什么样子？住在什么地方？传统的回答是:它长得像个细菌，生活在一个离温泉比较近的温暖的水塘里，或生活在浅海湾里。不过，在过去的几年里比较时髦的做法是给露卡一个环境比较险恶的住处，因为变得越来越清楚的是，地下与海底的岩石上存在着亿万种以化学物质为养分的细菌。现在一般认为，露卡存在于地下极深的地方，存在于火成岩的裂缝里，“吃”硫、铁、氢和碳为生。直到今天，生活在地球表面的生物仍然只是地球所有生物中薄薄的一层。地下深层那些喜热细菌——也许就是造就天然气的那些物质——体内含有的碳的总量，也许是地球表面所有生物含碳量的十倍。
不过，在试图确认最早的生命形式的时候，有一个概念上的困难。现在，绝大多数的生物都不可能从它们父母以外的任何地方得到基因了，但是过去却不一定如此。即便是今天，细菌也可以通过吞掉其他细菌来得到它们的基因。在过去某一阶段，也许有过很普遍的基因交换，甚至基因“盗窃”。很久以前，染色体可能是既多且短的，每条染色体可能只有一个基因，失也容易得也容易。如果真是如此，卡尔·沃斯（Carl Woese）（微生物学家）指出，那么这样的生物就还不是一个能够存活一阵的生物体，而只是暂时存在的一组基因。也因此，存在于我们所有人身体里的基因，也许来自很多不同的“物种”，要想把它们归类溯源是徒劳的。我们不是来自于某一个祖先，而是来自于由带有遗传物质的生物体组成的整个“社区”。正如沃斯所说，生命物质从何而来有史可循，生命却没有家族史。
你可以把这种“我们不是来自于某个个体，而是来自于一个社区”的结论看成是一种推销集体主义精神和全局观念的、意在让人感觉良好的模糊哲学。你也可以把它看成是“自私的基因”这一理论的终极证明：在过去那些日子里，基因之间的战争比今天更甚，它们把生物体作为临时的战车，只跟生物体建立短暂的联盟，而现在的战争更像是基因与生物体组成的团队与其他团队之间的战争。这两种说法信哪一种，你自己选吧。
就算以前有过很多露卡，我们仍然可以猜想它们以前生活在哪里，以什么为生。这里，“嗜热细菌是所有生命的祖先”这一说法出现了第二个问题。由于三位新西兰人（A.Poole、D.Jeffares和D.Penny）在1998年公布的精彩的探索工作，我们突然瞥见了一种可能性，那就是，在几乎每一本教科书上都可以看到的生物进化树，可能都是大头朝下了。那些书都肯定地说，最先出现的生物是类似于细菌的简单细胞，它们的染色体是环状的，而且每个染色体只有一份；所有其他生物的出现，都是因为多个细菌结成“团伙”，变成了复杂细胞。现在发现，也许倒过来是更有道理的。最初的现代生物一点也不像细菌，它们也不生活在温泉里或是海底深处火山通道口。它们与原生质（protozoa）很像：它们的基因组是分成片段的，有多条线性染色体而不是一条环状染色体，而且它们是“多倍体”——每一个基因都有几个备份，用来帮助改正复制中出现的拼写错误。还有，这些原生质应该是喜欢比较冷的气候。正如帕特里克·福泰尔（Patrick Forterre）（微生物学家）一直坚持的，现在看起来，细菌可能是后来才出现的，是高度简化与功能特化了的露卡的后代，是在DNA—蛋白质世界被发明之后很久才出现的。它们的把戏是把在RNA世界里形成的很多“设备”都扔掉，以便在很热的地方存活。在细胞里存留了露卡那些原始的分子特征的生物是我们；细菌比我们“进化得更高级”。
一些“分子化石”的存在支持这个奇怪的说法，这些“分子化石”是一小点一小点的RNA：向导RNA，桥RNA，小细胞核RNA，小核小体RNA，自我剪接的内含子_{（这是一些不同功能的RNA）}。它们在你的细胞核里转悠，干一些完全无用的事，比如说，把它们自己从基因里切出去。细菌就没有这些玩意。“细菌把这些东西给扔掉了”是比“我们发明了它们”更简约的解释。（可能让人有点吃惊的是，从原则上说，除非有其他理由，否则科学认为简单解释是比复杂解释更有可能的，这个原理在逻辑上被称为“奥卡姆剃刀”。）细菌在“侵入”很热的地方，比如说温泉或温度可达170摄氏度的地下岩层的时候，就把这些旧的RNA扔掉了。为了尽量减小由热而导致的错误，它付出的代价就是简化自身的设备。扔掉这些RNA之后，细菌发现它们的细胞中经过简化的新设备使得它们在一些繁殖速度越快越有优势的生存夹缝里——比如寄生的环境或以腐烂的动植物为生的环境——有了竞争实力。我们人类保留了那些旧的RNA，那些功能早已被其他“机器”代替了的旧“机器”的残余，一直没有把它们整个扔掉。与竞争极为激烈的细菌世界不同，我们——所有动物、植物和真菌——从来就没有遇到过如此激烈的、要简单快速才占优势的竞争。相反，我们看重的是复杂的结构、是有尽可能多的基因，而不是一台高效使用这些基因的机器。
遗传密码中那些三个字母的词在所有生物中都是一样的。CGA的意思是精氨酸，GCG的意思是丙氨酸——在蝙蝠里、在甲虫里、在白桦树里、在细菌里，都是如此。即使是在那些古细菌_{（这些“古细菌”现在仍然存在）}里以及那些名叫病毒的微小而又狡猾的囊状物里，它们的意思也是一样的。尽管这些古细菌有些生活在大西洋表面之下几千英尺处温度达到沸点的硫磺泉里。不管你去世界的什么地方，不管你看到的是什么动物、植物、昆虫或其他一团什么东西，只要它是有生命的，它就用的是同一个字典、理解的是同一套密码。所有的生命原是一体。除了在个别小范围内有些改动——主要是由于不明的原因而发生在有些纤毛原生动物里——之外，每一个生命体都用同样的遗传密码。我们都用的是同一种“语言”。
这就意味着——信仰宗教的人士也许会发现这是一个有用的说法——只有一次创世纪，生命的诞生源自一个单独的事件。当然，最初的生命仍然有可能是发源于另一个星球并由太空船播撒在地球上的；也有可能最初有过千万种生命，但只有露卡在那一“锅”原始汤里那种无情的、“谁有本事谁拿”的竞争中幸存下来。但是，在60年代遗传密码被破解之前，我们不知道我们现在知道了的东西：所有生命都是一体；海带是你的远房表哥，炭疽菌是比你更发达的你的亲戚。生命的统一性是从经验中得到的事实。伊拉斯谟·达尔文当年不可思议地接近了这一事实：“所有的有机生命都源自于，且仍然产生于同一种有活性的纤维。”
就这样，从基因组这部“书”里，我们可以读到一些简单的真理：生命的统一性，RNA的重要性，地球上最早的生命的化学特性，大的单细胞生物可能是细菌的祖先，细菌不是单细胞生物的祖先。40亿年前的生物是什么样的，我们没有化石可以研究。我们只有这部了不起的书：基因组。你的小指头上细胞里的基因，是第一个有复制功能的分子的嫡系传人。这些基因通过一条永不断裂的复制链，在复制了几十上百亿次之后到达我们这里，它们携带着的数码信息里仍然留有最早的生存竞争的痕迹。如果人类基因组可以告诉我们原始汤里发生的事情，它会告诉我们多少那之后的4000万个千年里发生的事！人类基因组是一部我们历史的纪录，它由密码写就，为运行的“机器”而写。
第二号染色体物种
具有那么多高贵品质的人，肉体仍然带着他的卑微起源的抹不去的痕迹。 ——查尔斯•达尔文
有些时候，你会对一些显而易见的东西熟视无睹。1955年以前，人们一致认为人有24对染色体。这是那种“人人都知道这是对的”的事。之所以人人都知道这是对的，是因为在1921年，有个名叫西奥菲勒斯•佩因特（Theophilus Painter）的得克萨斯人，把因为精神失常和自虐而被阉割了的两个黑人和一个白人的睾丸拿来，做成了极薄的切片，把这些切片用化学试剂固定之后，在显微镜下进行观察。佩因特试着数了这几个倒霉蛋的成精细胞里那些缠成一团的、不成对的染色体，最后得出了24这个数。“我自信这个数字是正确的”，他说。其他人之后又用其他方法重复了他的实验。所有的人都得到了24这个数。
之后的30年，没人对这个“事实”表示过怀疑。有一组科学家还放弃了他们在人的肝脏细胞上进行的实验，因为他们在这些细胞里只找到23对染色体。另一个研究人员发明了一种把所有染色体都分离开的方法，但他仍然认为自己看到了24对染色体。直到1955年，一个印度尼西亚人庄有兴（Joe-Hin Tjio）从西班牙到瑞典去跟阿尔伯特•莱文（Albert Levan）工作，真相才被发现。
庄和莱文使用了更好的技术，清清楚楚地看到了23对染色体。他们甚至还回过头去在一些书中的照片里数出了23对染色体，尽管照片下面的文字注明应该有24对。没有人会糊涂到不想看见事实真相的地步。（这句话在这里都含有讽刺的意味。）
人类没有24对染色体，其实是一件叫人惊讶的事。大猩猩有24对染色体，黑猩猩也是。在猿类动物里我们是个例外。在显微镜下面，我们与其他猿类动物最大、最明显的区别，就是我们比它们少一对染色体。原因很快就弄清了，并不是猩猩的染色体到我们这儿丢了一对，而是在我们的身体里，两对猩猩的染色体融合在一起了。人类染色体中第二大的一条，二号染色体，是两条中等大小的猩猩染色体融合起来形成的。这一点，从人类染色体与相应的猩猩染色体上那些暗带的排列就可以看出来。
教皇让•保罗二世（PopeJohn-PaulII）在1996年10月22日对天主教科学院所作的讲话中提出，古猿与现代人类之间存在一个“本体的断裂”——这个断裂点就是上帝向动物的一个分支注入了人的灵魂的时刻。这种说法可以使教廷与进化论达到和解。也许这个本体的飞跃是发生在两条猩猩染色体融合的时候吧，也许编码灵魂的基因就在人类二号染色体中间的地方？（这句话在这里都含有讽刺的意味。）
先不提教皇了。人类这个物种怎么说也不是进化的巅峰。进化没有巅峰，进化也没有进步退步之分。自然选择不过是生命形式不断变化的过程，而变化是为了适应由物质环境和其他生命形式提供的多种机会。生活在大西洋底硫磺出口的黑烟菌，是在露卡时代之后不久就跟我们的祖先分开了的一族细菌的后裔。起码在基因水平上，这种细菌大概比一个银行职员还进化得更高级。因为这种细菌每一代都比人的一代更短，所以它有更多次机会去完善自己的基因。
这本书只专注于一个物种——人类——的状况，但这并不说明这个物种的重要性。当然，人类是独特的。在他们的两只耳朵之间，拥有地球上最复杂的生物机器。但是复杂性并不是一切，复杂性也不是进化的目的。这个星球上的每一个物种都是独特的。独特性是一种过剩了的商品。尽管如此，我还是想在这一章里探讨一下人类的独特性，去发现我们这个物种特性的根源。原谅我的狭隘吧。起源于非洲的没毛灵长类，虽然有短暂的繁荣，但他们的故事只是生命的历史中的一个脚注。不过，在这些没毛的灵长类自己的历史里，他们的故事可是占据中心地位的。我们这个物种的独特“卖点”到底是什么呢？
在对环境的适应上，人类是个成功者。他们也许是整个地球上数量最多的大型动物。他们有大约60亿个成员，加在一起有3亿吨生命物质。那些在数量上达到或超过人类水平的大型动物，要么是那些被我们驯化了的动物：牛、鸡、羊，要么是依赖于人类环境的动物：麻雀和老鼠。相比之下，全世界只有不到1000只山地大猩猩。即使是在我们开始屠杀它们、毁坏它们的生存环境之前，它们的数量也很可能超不过现有数量的十倍。还有，人类这个物种显示了征服多种生存环境——热的、冷的，干的、湿的，海拔高的、海拔低的，海洋、沙漠——的惊人能力。除了人之外，鹗、仓枭和燕鸥是仅有的在南极洲之外的各大洲都比较兴旺的大物种，而在各个大洲，它们的生存环境都很有限。人类在适应环境上的成功无疑是付出了高昂代价的，我们注定很快就要遇到大灾难（环境破坏）：作为一个成功的物种，我们对未来真是出奇地悲观。不过到目前为止，我们还算成功。
但是，一个惊人的事实是：我们来自于一长串失败。我们是猿，而1500万年前，面对那些“设计”得更好的猴子的竞争，猿差点儿灭绝了；我们是灵长类，而4500万年前，面对那些“设计”得更好的啮齿动物的竞争，灵长类哺乳动物差点儿灭绝了；我们是由爬行动物进化来的四足动物，但是2亿年前，面对那些“设计”得更好的恐龙的竞争，我们的爬行动物祖先差点儿灭绝了；我们是有叶状鰭的鱼的后代，但是3.6亿年前，面对那些“设计”得更好的伞状鰭鱼的竞争，有叶状鰭的鱼差点儿灭绝了；我们是脊索动物，但在5亿年前的寒武纪，面对那些非常成功的节肢动物的竞争，我们是侥幸生存下来了。我们在适应环境上的成功，是克服了那些让人胆战的困难才取得的。
在露卡之后的这40亿年里，那个“词”在——用理查德•道金斯_{（Richard Dawkins，生物学家）}的话说——制造“生存机器”方面变得越来越高明了。“生存机器”是那些大型的、用血肉构造成的生物体，它们善于把局部的熵减小以更好地复制自己体内的基因。它们能做到这一点，是因为它们经历了漫长的、大规模的尝试与失败：自然选择。上千亿的生物体被造出来并被试验过，只有那些达到了越来越苛刻的生存条件的生物体，才得以繁衍下去。一开始，这只是一个比较简单的、化学反应是否高效的问题：最好的生物体是那些发现了把其他化学物质转变成DNA和蛋白质的细胞。这个阶段持续了大约30亿年。其他星球上的生命在那个时候是什么样的我们不知道，但在地球上，生命好像就是不同种类的变形虫之间的竞争。在那30亿年间曾经生活过上千亿的单细胞生物，每一个生命在几天之内繁殖，然后死亡。那30亿年里发生了大量的尝试与失败。
但是生命并没有到此为止。大约10亿年前，很突然地出现了一种新的世界秩序：更大的、多细胞的生物体被发明了，大型生27物爆炸性地大批出现。从地质学角度来看，只是一眨眼的工夫（俗称的寒武纪大爆发也许只持续了1000万到2000万年），就出现了大批结构无比复杂的生物：跑得飞快的、几乎有一英尺长的三叶虫，比这还长的拖着黏液的蠕虫，半码_{（1码约0.914米）}长的舞动的藻类。单细胞生物仍然占据着统治地位，但是这些不认输的大型“生存机器”在给自己划出一块生存的地域。而且很奇怪，这些多细胞体获得了一些带有偶然性的成功。尽管从外太空来的陨石曾经砸到地球上，造成一些零星的倒退，而且很不幸的是，这种灾难总是倾向于灭绝更大、更复杂的生命形式，但是进化的趋势还是清晰可辨。动物存在的时间越长，它们中的一些就变得越复杂。具体地说，那些大脑最发达的动物的大脑，每一代都变得更大：古生代最大的大脑比中生代最大的要小，中生代最大的大脑比新生代最大的要小，新生代最大的大脑又比当代最大的要小。基因们发现了一种实现自己“野心”的方法：制造一种不仅仅能够生存，而且还具有智慧行为的机器。现在，如果一个基因发现自己是在一个受到了冬季暴风雪威胁的动物体内，它可以指望这个动物做些聪明的事，比如迁徙到南方，或是给自己搭个避风的住所。
从40亿年前开始的这个让人喘不上气的“旅程”把我们带到了距现在1000万年前的时候，最初的昆虫、鱼、恐龙和鸟类都早已出现，那时地球上大脑最大（大脑与身体的比例最大）的生物可能就是类人猿一我们的祖先。距现在1000万年前的那个时候，在非洲可能有两种，甚至两种以上不同的猿。这两种猿，一种是大猩猩的祖先，另一种是黑猩猩和人类的共同祖先。大猩猩的祖先们有可能在中部非洲的一串火山区的森林里安顿了下来，从此在基因上与其他的猿隔断了。那之后的500万年间，另一种猿有了两种不同的后代，最终导致人类和黑猩猩的出现。
我们之所以知道这段历史是因为它是写在基因里的。就在1950年，伟大的解剖学家J•Z•杨（Young）还写道：我们还不清楚人类到底是与猿来自于同一祖先，还是起源于与猿在6000万年前就分开了的另一灵长类的分支。那时还有人认为棕猩猩（orangutan）是人类最近的表亲。但是现在，我们不仅知道黑猩猩与人类分开是在大猩猩之后，还知道人类和猿的分开发生在不到1000万年前，甚至可能是不到500万年前。（现在一般认为，人的祖先与棕猩猩的祖先是在1000万〜1500万年前分开的，人的祖先与大猩猩的祖先是在600万〜800万年前分开的，而人的祖先与黑猩猩的祖先是在500万〜700万年前分开的。）物种之间的关系可以从基因中那些随机的“拼写”错误积累的速度中看出来。黑猩猩和大猩猩基因的区别比黑猩猩和人类基因的区别要大——每一个基因、每一个蛋白质序列、每一段你任意捡起来的DNA序列，都是如此。用最没有诗意的话说，一条人类DNA与一条黑猩猩的DNA组成的杂合体在比较高的温度下才能分解成两条，而大猩猩DNA与黑猩猩DNA的杂合体或人类DNA与大猩猩DNA的杂合体，在较低温度下就可分开。
比确定谁是谁的祖先更难的，是校正分子钟以精确判断新物种出现的年代。因为猿的寿命很长，而且年龄比较大的时候才开始生育，所以分子钟走得比较慢（基因的拼写错误大多是在DNA复制的时候、在制造卵子和精子的时候产生的）。但是我们还不知道在校正分子钟的时候怎样把这个因素考虑进去，而且，基因和基因也不一样。有些DNA片段好像暗示着人类和黑猩猩分开是很久以前的事；其他的DNA，比如说线粒体DNA，又显示一个更近的日期。500万到1000万年是被普遍接受的一个范围。
除了二号染色体是由两条猩猩的染色体融合而成之外，人类染色体和黑猩猩的染色体只有极少和极小的看得见的区别。有13条染色体是一点区别都看不出来的。如果你随机选取黑猩猩基因组里的一个“段落”，然后把它与人类基因组里相应的“段落”比较，你会发现只有个别几个“字母”是不一样的：平均每100个字母只有不到两个不同。我们就是黑猩猩，这句话有98%的准确度；
黑猩猩就是人，这句话的可信度是98%。如果这还不能打击你的自信，那么想一想，黑猩猩97%是大猩猩，人类的97%也是大猩猩。换句话说，我们比大猩猩更像黑猩猩。
这怎么可能呢？我跟黑猩猩之间的区别太大了。黑猩猩毛比我多，它的头的形状跟我的不同，它身体的形状跟我的不同，它的四肢跟我的不同，它发出来的声音也跟我不同。黑猩猩身上就没有一样东西是跟我有98%的相同的。可是，真是这样吗？黑猩猩和人的区别到底多大，得看跟谁比。如果你拿两个黏土做的老鼠模型，要把一个改成黑猩猩模型，另一个改成人的模型，大部分的改变会是一样的；如果你拿两个黏土做的变形虫模型，要把一个改成黑猩猩模型，另一个改成人的模型，大部分的改变会是一样的。两个模型都需要加上32个牙、四肢、每只手上五个指头、两只眼睛、肝脏；每个模型都需要毛发、干的皮肤、脊柱和中耳里的三块小骨头。从变形虫的角度说，或者从一个受精卵的角度说，人类和黑猩猩就是98%地相似。黑猩猩身体内的骨头没有一块是我们没有的；黑猩猩大脑里的化学物质没有一样是在人脑里找不到的；我们的免疫系统、消化系统、血液系统、淋巴系统、神经系统，没有哪一部分是黑猩猩没有的，反过来也是一样。
黑猩猩大脑里的脑叶也没有哪个是我们没有的，我们的脑叶黑猩猩也都有。维多利亚时代的解剖学家理查德•欧文爵士（Sir Richard Owen），在为了抵抗自己这个物种是猿的后代这一理论所作的最后的、绝望的努力中，声称海马区小叶是人脑特有的结构，是灵魂的所在地，是神造人类的证据。这是因为从探险家保罗•杜查禄（Pauldu Chaillu）带回的来自刚果的大猩猩大脑标本里，欧文没能找到海马区小叶。托马斯•亨利•赫胥黎_{（Thomas Henry Huxley，19世纪生物学家，达尔文进化论的坚定捍卫者）}愤怒地回应说：海马区小叶在类人猿的大脑里是存在的。“不，它是不存在的”，欧文说。“它就是存在的”，赫胥黎说。1861年间有一个短暂的时期，“海马区问题”是维多利亚治下的伦敦关注的焦点，在幽默杂志《木偶剧》和查尔斯•金斯利（Charles Kingsley）的小说《水婴》里都被讽刺过。赫胥黎的观点-今天也有很多人响应——并不仅限于解剖学：“我不是那种要把人的尊严建立在他那伟大的脚趾头上的人，也不想灌输如果类人猿有海马区小叶人类就没救了这种观念。相反，我已经尽我所能去扫掉这种‘虚荣心’。顺带说一句，在“海马区问题”上赫胥黎是对的。
归根结底，从黑猩猩和人类的共同祖先住在非洲中部的日子到现在，人类只繁衍了不到30万代。如果你拉着你妈妈的手，她又拉着你外祖母的手，她又拉着你曾外祖母的手……这条线刚刚从纽约延伸到华盛顿，你们就已经要跟“丢失的一环”（这里指人和黑猩猩的共同祖先，目前还没有找到它的化石。）——人类与黑猩猩的共同祖先一拉手了。500万年是一段很长的时间，但是进化不是按年计算，而是按代计算。细菌要想经历这么多代只需要25年时间。
那“丢失的一环”长得是什么样子呢？通过仔细研究人类祖先的化石，科学家们已经离答案非常近了。离“丢失的一环”最近的化石可能是一种小小的猿人的骨架，这种猿人被取名为阿底皮西卡斯（Ardipithecus），存在于距今大约400万年前。尽管有几个科学家认为阿底皮西卡斯存在于“丢失的一环”之前，这其实不太可能：阿底皮西卡斯的骨盆主要是为直立行走而“设计”的；从这种设计退化回与大猩猩和黑猩猩的骨盆相似，是极不可能的。当然，我们需要找到比阿底皮西卡斯还要早几百万年的化石，才能够准确无误地知道我们在观察阿底皮西卡斯的时候是否就是在观察人与黑猩猩的共同祖先。不过，我们通过阿底皮西卡斯可以大致猜想一下那“丢失的一环”长得什么样子：它的大脑可能比现代的黑猩猩的大脑要小；它的身体活动，在靠两条腿支撑的时候，可能与现代的黑猩猩一样灵活；它的饮食结构也许跟现代的黑猩猩差不多：以果类和其他植物为主；公的比母的个子大很多。从人类的角度来看，很难不想到这个“丢失的一环”跟黑猩猩比跟人相似。黑猩猩当然可能不同意，但是看上去，我们这一支无论如何是比黑猩猩的一支经历了更多的变化。
与曾经生活过的每一种猿一样，这“丢失的一环”很可能是生活在森林里的：一种标准的、现代的、上新世的、以树为家的猿。在某一时刻，它们的群落分成了两支。我们知道这一点，是因为一个群落分成相互隔绝的两部分时常常引发特化_{（speciation，指在相对稳定的环境中充分进化以至于不再能适应其他环境）}:这两个部分在基因上逐渐有差别了。造成“丢失的一环”分成两支的，有可能是一座山脉，也有可能是一条河流[今天，刚果河分隔着黑猩猩和它的姐妹物种一小猩猩（bonobo），也有可能是大约500万年前形成的西部大裂谷把人类的祖先隔在了干旱的东侧。法国古生物学家伊夫•科庞（Yves Coppens）把这最后一种假设称做“东侧理论”。这方面的理论越来越不着边了。也许是当时刚形成不久的撒哈拉沙漠把我们的祖先隔在了北部非洲，而黑猩猩的祖先留在了南部。也许在500万年前，当时很干旱的地中海盆地被源自直布罗陀海峡的巨大洪水——比尼亚加拉河_{（美国与加拿大交界处的河流）}的流量大1000倍——给淹了，这样，就突然把“丢失的一环”中的一部分给隔绝在了地中海里的一些大岛上，它们在那里以涉水捕捉鱼和有壳的海洋生物为生。这个“洪水假说”闹得沸沸扬扬，却没有任何确凿证据支持它。
不管具体机制是什么，我们可以猜想到，我们的祖先是与其他猿隔绝的很小的一支，而黑猩猩的祖先当时则是主流的一族。这是因为从人类的基因里，我们发现人类在进化过程中经过了一个非常窄的“瓶颈”（也就是说，有一个人口数量极少的时期），比黑猩猩经过的“瓶颈”窄得多：在人类基因组里，随机的变异比黑猩猩基因组里的少得多。
那么，让我们来勾画一下孤岛（不管是真的岛还是假的）上的这群被隔绝的动物吧。这一小群猿人开始近亲繁殖，面临着灭绝的危险，被遗传学上的“初始效应”（如果一个群体在开始的时候只有数目很少的个体，意即群体里的所有个体都是很少的几个祖先的后代，那么祖先身体里偶然产生的基因变异就会在这个群体里变得非常普遍，这就是初始效应。在一个祖先数目很多的群体里这种情形就不会发生。）所影响（这种效应使得一个很小的群落可以有很大的、完全是由偶然性造成的遗传变异）。在这一小群猿人中出现了一个很大的突变：它们的两条染色体融合起来了。从这以后，它们的繁殖就只能在自己这一群之内进行了，就算是这个“岛”跟大陆重新接合之后也是如此。它们与大陆上它们的“亲戚”杂交而生的后代是不育的。[我要瞎猜了，我们跟黑猩猩到底能不能生出有生育能力的后代？科学家们好像对我们这个物种在繁殖的孤立性方面（reproductive isolation）很缺乏好奇心嘛。]
这个时候，其他惊人的变化开始出现了。骨架的形状开始变化，使得直立和用两条腿行走变得可能了，而这很适合于在平坦的地区长途跋涉；其他猿的行走方式更适合于在比较起伏的地区短途行走。皮肤的变化也出现了——毛越来越少，而且在热天大量出汗，这一点在猿类动物里是比较特殊的。这些特点，再加上给脑袋遮阴的一层头发，加上头皮上结构像散热器一般的血管，示意着我们的祖先已经不再生活在有树阴、多云的森林里了；它们行走在开阔的陆地上，行走在赤道上的烈日下。
什么样的生存环境造成了我们的祖先骨架方面的巨大变化？你可以尽情地猜测。只有极少的几个说法被证明是有可能的，也只有极少的几个被证明没有可能。在那几个有可能的理论里，最可信的一个是说这些变化的发生是因为我们的祖先被隔绝在了一块比较干旱和开阔的草原。这个生存环境找上了我们，我们可没有去找它：在非洲很多地区，那个时代正是森林被热带草原取代的时候。一段时间之后，在大约离现在360万年前，从现在的坦桑尼亚的萨迪曼火山飘出来的火山灰刚开始湿润，在这些火山灰上，三个古人类有目的地从南走向北。走在最前面的是最大的那个；紧跟它的足迹的是中等大的那个；最小的那个走在它们左边一点，要甩开大步才能跟上。一段时间之后，它们短暂地停了一下，向西面偏了偏，然后又继续前行，就像你我一样直立地前行。在雷托利（位于坦桑尼亚北部。）发现的脚印化石，要多清楚就有多清楚地讲述了我们祖先直立行走的故事。
即便如此，我们所知仍然很少。雷托利的那三个猿人是一男、一女和一个孩子，还是一个男的和两个女的？它们吃些什么？它们喜欢什么样的栖息地？由于东非大裂谷阻挡了从西面而来的潮湿的风，非洲东部在当时毫无疑问地越来越干了，但是这并不说明它们是在找干旱的地方。事实上，我们对于水的需要，我们的易出汗，我们的适应于含有大量油和脂肪的鱼类食物，还有其他一些因素（包括我们对海滨、对水上运动的喜爱），暗示着我们的祖先可能是喜欢水的。我们游泳游得相当不错。最初，我们的祖先是生活在水边的森林里或是湖边吗？
当时间合适的时候，我们的祖先戏剧性地变成了食肉动物。但在那之前，一种崭新的类人猿——实际上是几种——出现了。它们是雷托利猿人那样的生物的后代，但不是现代人类的祖先，而且它们可能是只以植物为食的。它们被称为南方古猿_{（robust Australopithecus，robust一词是“结实、粗壮”的意思）}。在研究这些猿人的时候，基因帮不上我们，因为这一支猿人已经灭绝，也没有进化成其他物种。正如如果我们不能“阅读”基因，我们就无从得知我们与猩猩的表亲关系一样，如果我们一这里我所说的“我们”，主要是指李基—家_{（Louis S.B. Leakey，他的太太Mary Leakey和儿子Richard Leakey；三人都是20世纪英国著名考古学家、古人类学家，雷托利的南方古猿的脚印就是由Mary带队的一组考古学家于1976年发现的）}、唐纳德•约翰逊_{（Donald Johanson，考古学家，于1974年在埃塞俄比亚发现了一具相当完整的古人类的骨骼，是目前为止发现的年代最古远的古人类的骨骼，被起名为“露西”（Lucy））}等人一没有发现那些化石，我们就不可能知道我们曾经有过很多南方古猿这样的更近的表亲。别看南方古猿名字挺“粗壮”，其实只是指它们的下颚很结实。它们是很小的动物，比黑猩猩小，也比黑猩猩笨，但是它们的身体已经直立了，脸部也很发达：有着由巨大的肌肉支撑着的庞大的下颚。它们咀嚼很多，可能咀嚼的是草和一些比较硬的植物。为了能够更好地把植物在嘴里翻来覆去地嚼，它们的犬齿也逐渐消失了。最后，大约100万年前吧，它们灭绝了。我们可能永远不会知道太多它们的事情了。也许是我们把它们吃了呢。
言归正传吧，当时我们的祖先是比南方古猿更大的动物，跟现代人一样大或者更大一点：它们身高接近两米，很是魁梧，就像艾伦•沃克（AlanWalker）（艾伦•沃克：当代美国考古学家。）和理查德•李基（RichardLeakey）描述的、存在于160万年前的著名的纳瑞奥科托米（Nariokotome）男孩（Nariokotomeboy，指的是在肯尼亚纳瑞奥科托米沙流地带发现的一具古人类骨骼。）的骨骼。它们已经开始使用石器工具，代替它们的牙齿。这帮家伙有着厚厚的头骨，有石头做的武器（这两者可能缺一不可），已经完全能够杀死和吃掉毫无抵抗能力的南方古猿了。在动物世界里，表亲关系一点不可靠：狮子会杀死猎豹，狼会杀死草狼。没有导演，是一些有竞争优势的自然进程把这个物种带入了后来爆炸般的成功——它们的大脑越来越大了。有些特别喜欢拿数学折磨自己的人计算过，大约每过10万年，大脑就增加1.5亿个脑细胞，当然，这个数字就像是旅游手册上常见的那种一点用处都没有的统计资料。发达的大脑、食肉、缓慢的发育、在成年之后仍然保留孩童时期的特征（光滑皮肤、小下颚、拱形的头盖骨），这些都必须同时存在。如果不吃肉，需要大量蛋白质的大脑就成了昂贵的奢侈品。如果头骨过早定型，就不会有大脑所需的空间。如果不是发育缓慢，就不可能有时间去学习如何充分发挥一个发达大脑所具备的优势。
这整个过程可能是由性选择来推动的。除了大脑的改变之外，另外一个很大的变化也在发生。与雄性相比，雌性身材的变化很大。在现代的黑猩猩里、南方古猿里和最早的猿人化石里，雄性是雌性的一倍半大，但在现代人里这个比例小得多。在化石纪录里这个比例稳步地在降低，这是史前纪录里最受忽视的事实之一。它意味着这个物种的交配方式发生了变化。黑猩猩那种多配偶的、短暂的性关系，大猩猩那种“妻妾”成群的多“妻”制，被一种类似于一夫一妻制的形式所代替，身体大小方面性别差异的减小就是一个清晰的证据。但是，在一个一夫一妻制的系统里，雄性和雌性都会感到认真选择配偶的压力。在多妻制下，只有雌性需要小心选择配偶。配偶之间长久的纽带把每一个猿人与它的配偶在它生育期的大部分时间内都拴在一起了：质量，而不是数量，突然重要起来了。对于雄性来说，选择一个年轻的配偶突然至关重要起来，因为年轻雌性的生育能力还能保持很多年。对于异性身体上象征年轻的、如孩童般的特征的青睐，意味着对于年轻人的拱形的大头盖骨的青睐，大脑增大的过程也就从此开始。
把我们推向习惯性的一夫一妻制，或起码是把我们往这里拉得更深一些的，是在食物方面产生的性别分工。我们发明了一种跟地球上所有其他物种都不同的性别之间的合作关系。由于女性采到的植物类食物是两性分享的，男性就赢得了从事危险的打猎活动的自由；由于男性得到的肉类食物是两性分享的，女性就可以得到高蛋白的、易于消化的食物，而不必为了自己去寻找这种食物而放弃照顾幼小的孩子。这意味着我们这个物种在干旱的非洲平原上找到了减少饥馑的生存方法。当肉类比较少的时候，植物类食物补充了不足；当干果和水果少的时候，肉类可以填充不足。这样，我们得到了高蛋白的食物，却没有必要像猫科动物那样发展出高度专门化的捕猎方法。
通过性别分工而培养出来的一些习惯也延伸到了生活的其他方面。我们擅长分享东西，就像是有人逼着我们这么做似的。这就带来了新的好处：每个个体可以发展专门的技能。我们这个物种特有的这种在“专家”之间的分工，是我们成功适应环境的关键，因为它使得技术的发展成为可能。今天我们生活的社会在分工方面更加有独创性，涉及范围更大。
从那个时候开始，这些变化就有一种内在的连贯性。体积大的脑子需要肉类食物（今天的素食者是靠吃豆类食品而防止缺少蛋白质的）分享食物使得吃肉的习惯变得可能（因为男性的捕猎活动可以失败）分享食物要求有个比较大的脑（如果不能有意识地记住细节，你会很容易就被一个想占便宜的家伙骗了）按照性别分工推动了一夫一妻制（一对配偶现在成了一个经济实体）一夫一妻制导致性选择的时候对于代表青春的身体特征的重视（配偶年轻有更大优势）。理论就是如此这般一圈圈地转，我们用这些螺旋形的让人宽心的理由来证明我们是怎样成为今天这样的。我们用一些非常脆弱的证据，建造了一个一碰就倒的科学房子。但是我们相信这些理论有一天是可以验证的。化石纪录显示不出多少过去动物的行为；那些骨骼太干，哪块被发现也太随机。但是基因纪录会告诉我们更多。自然选择就是基因改变其序列的过程。在改变的过程之中，那些基因留下了一份我们这个星球上40亿年的生命的纪录。它们是比尊敬的毕德（Venerable Bede）（7世纪基督教教士，因撰写基督教早期历史而闻名，有“英国历史之父”的称号。）写的手稿更为珍贵的信息来源，只要我们会解读它们。换一种说法，关于我们的过去的纪录是刻在我们的基因里的。
基因组中大约2%的成分讲述了我们在生存环境与社会环境方面的进化与黑猩猩的有什么不同。当一个有代表性的人和一个有代表性的黑猩猩的基因组被输入到电脑里，当活跃的基因从背景“噪音”里被分离出来，当人和黑猩猩基因的区别被列成一个表之后，我们就可以瞥见，更新世时期的生存压力是怎样作用在两个具有共同起源的物种上的。人和黑猩猩相同的那些基因的功能是一些基本的生物化学反应和身体的总体设计。也许惟一的区别是那些调节激素与发育的基因。不知怎么一来，那些基因用它们的数码语言告诉人胚胎上的脚长成一个平板的东西，有脚跟，有大脚趾；同样的这些基因却告诉黑猩猩胚胎上的脚去长成一个更加弯曲的东西，不太有脚跟，脚趾更长、更能抓东西。
试着想象一下基因是怎么做到这些的，就让人思绪起伏。虽然基因控制生长和形态是毋庸置疑的，但是它们是怎样控制生长与形态的？科学才刚刚有了一些最最模糊的线索。人类和黑猩猩之间除了基因的区别以外，两者毫无二致。那些强调人类的文化环境、否认或怀疑人与人之间、人种与人种之间基因区别的重要性的人，也同意人类与其他物种之间的区别主要是基因的区别。假设我们把一个黑猩猩的细胞核注射到一个去掉了细胞核的人类卵细胞里去，并把这个卵细胞植入一个人的子宫，生下来的婴儿（如果它能存活）在一个人类家庭长大，它会长得什么样子呢？你都用不着去做这个极端不道德的实验就会知道：它会长得像个黑猩猩。尽管它一开始有人类的细胞质，用的是人类的胎盘，在人类中间长大，但它长得一点都不会像人。
摄影提供给我们一个有用的比喻。想象你照了一张黑猩猩的照片。要冲洗它，你要按规定的时间把它放在显影液里，但是不管你怎么费劲，你都不可能通过改变显影液的配方而得到一张人的照片。正如一张底片要被浸在显影液里，影像才能出现，一张用卵细胞中基因的数码语言写就的黑猩猩的设计图，也要有适合的环境才能成为一个成年的黑猩猩——养分、液体、食物、照料——但是它已经有了怎样成为一个黑猩猩的信息。
同样的道理，在动物行为上就不一定对了。典型的黑猩猩的“硬件”可以在另外一个物种的子宫里组装起来，但是“软件”却有点不那么对劲了。一个被人类养大的黑猩猩的婴儿，会与被黑猩猩养大的“泰山”（美国电影《人猿泰山》里的人物，是一个英国绅士遗留在非洲的孩子，被猩猩抚养长大。）一样，在与自己物种的其他成员相处上有些糊涂。比如说，泰山就不可能学会说话，被人类养大的黑猩猩也不会去学怎样讨好那些居支配地位的动物，怎样去威吓居从属地位的动物，怎样在树上做巢或怎样抓白蚁。在行为上，仅有基因是不够的，起码对黑猩猩是如此。
但是基因是必需的。线性数码信息中一点小小的区别就能指挥人类和黑猩猩身体上那2%的区别，如果想到这里会让你思绪起伏，那你想象一下这些信息里小小的改变就能够精确地改变黑猩猩的行为，这可能更让你思绪起伏了。我刚才随便提到了不同种类猿的交配系统——常换配偶的黑猩猩，一夫多妻的大猩猩，一夫一妻的人类。我这样做的时候是随便地假设了每个物种都有一个比较典型的做法，而这个假设就要进一步假设这个做法至少是部分受基因的影响和控制的。一堆基因，每一个都是一串四个字母的密码，怎么就能够决定一个动物是有一个还是多个配偶？答案：我一点门儿都摸不着。不过，我不怀疑基因能够做到这一点。
基因是动物结构的配方，也是动物行为的配方。
第三号染色体历史
我们发现了生命的秘密。 ——弗兰西斯•克里克（1953年2月28日）
在1902年，阿奇博尔德•加罗德（Archibald Garrod）虽然只有45岁，他已经是英国医学界的一根顶梁柱了。他是著名教授、有爵士头衔的艾尔弗雷德•巴林•加罗德（Alfred Baring Garrod）的儿子。这位教授在痛风病——上流社会最普遍的疾病——方面的理论被认为是医学研究的胜利。阿奇博尔德•加罗德自己的医学生涯也不费力地就得到了认可，后来他因为一战期间在马尔他所做的医疗工作也被封为爵士。之后，他又得到了一项最为荣耀的奖赏：继尊敬的威廉•奥斯勒爵士_{（Sir William Osle，19世纪末20世纪初医学家、医学教育家，1905年起在牛津大学任教）}之后，任牛津大学瑞吉尤斯（Regius）医学教授之职。
你能够勾勒出他的形象，是不是？他是那种死板的、墨守成规的爱德华时代的人物，硬硬的领子、硬硬的嘴唇、僵硬的思维，挡在科学进步的路上。那你就错了。就在1902年，阿奇博尔德•加罗德提出了一个有些风险的假说，从而证明了他是一个远远领先于他的时代的人，而且在不知不觉中，他的手指已经放在了从古至今生物学最大谜团的答案上了。这个谜团就是：什么是一个基因？事实上，他对基因的理解如此有天才性，在他去世之后很多年才有人开始理解他的想法：一个基因就是一种化学物质的配方。这还不算，他认为自己已经发现了一个基因。
在伦敦大欧尔茫德街圣巴托洛密欧医院工作的时候，加罗德接触到了一系列患有一种少见但不太严重的疾病——尿黑酸尿症——的病人。这些病人除了有一些如风湿痛之类的不太舒服的症状之外，他们的尿和耳垢遇到空气就会变成红色或是墨黑色，视他们的饮食情况而定。1901年，一个患有这种病的男孩的父母生了他们的第五个孩子，这孩子也有这种病。这让加罗德开始想到，这种病是否是家族遗传的。他注意到这两个病儿的父母是第一代表兄妹。于是他回过头去检查其他的病例，4个家庭中有三个是第一代表亲结婚，那17个尿黑酸尿症病人，有8个互相是第二代表亲。但是这种疾病并不是简单地从父母传给孩子，大多数病人有正常的孩子，但是这种病又会在孩子的孩子身上出现。非常幸运的是，加罗德对于最先进的生物学观念很有了解。他的朋友威廉•贝特森_{（William Bateson，生物学家）}对于格雷戈尔•孟德尔（Gregor Mendel）的研究成果在两年前被重新发现非常激动，正在写一本巨著向公众介绍并捍卫孟德尔“主义”。这样，加罗德知道他是在跟孟德尔所说的隐性性状打交道——一种特性可以被某一代人“携带”，孩子如果从父母双方都得到这种特性的遗传，才会表现出来。他甚至引用了孟德尔用在植物上的术语，称这种人是“化学突变种”。
这就给了加罗德一个新的想法。他想到，也许这种病之所以只发生在得到父母双方遗传的人身上，是因为有什么东西丢失掉了。因为他对于生物学与化学都很精通，他知道黑色的尿和耳垢是由于一种叫做尿黑酸的物质大量积累而造成的。尿黑酸可能是人体化学反应的一个正常产物，但是在正常人里这种物质会被降解和排出体外。之所以会大量积累，加罗德想，也许是因为降解尿黑酸所需要的催化剂没有正常工作。这个催化剂，他想，一定是用蛋白质做成的一种酶，而且一定是一种遗传物质（现在我们就会说，一个基因）的产物。在那些病人体内，这个基因制造了一种有缺陷的酶；对于那些携带者，这个缺陷没有什么害处，因为他们从父母中健康的一方得到的基因是正常的。
这样，加罗德的大胆假说“先天代谢错误”就诞生了，假说中包含了一个意义深远的假设：基因是制造化学反应催化剂的，一个基因制造一种功能非常专门的催化剂。也许基因就是制造催化剂的机器。“先天代谢错误，”加罗德写道，“产生于代谢过程中一个步骤的错误，而代谢过程中步骤的错误又产生于一种酶的缺失或不正常的功能。”因为酶是由蛋白质组成的，它们无疑是“个体化学差异的载体”。加罗德的书于1909年出版，受到广泛的好评，但是这本书的评论家们完全曲解了他的思想。他们以为加罗德只是在谈一种罕见的疾病，而没有意识到他谈的是对所有生命都适用的基本原理。加罗德的理论被忽略了35年之后才被重新发现。那时候，遗传学中新的观点爆炸般出现，加罗德已经去世10年了。
我们现在知道，基因的主要功能是储存制造蛋白质所需的配方。蛋白质则是完成身体内所有化学、结构、调节功能的物质：它们产生能量，抵御感染，消化食物，形成毛发，运输氧气，诸如此类。
每一个蛋白质都是通过把一个基因携带的遗传密码翻译出来而被制造成的。这句话反过来就不一定对了：有些基因永远也不会被翻译出来用来制造蛋白质，比如说一号染色体上的核糖体RNA。不过就算是这些基因，也是被间接用来制造蛋白质的。加罗德的假说大体上是正确的：我们从父母那里得到的不是别的，是一份规模巨大的配方，用来制造蛋白质和制造蛋白质所用的机器。
加罗德的同代人也许没有理解他的思想，不过他们起码给了他应有的荣耀。但是对于加罗德站在其肩膀上的那位“巨人”，格雷戈尔•孟德尔，我们却不能说同样的话。很难想象有比加罗德和孟德尔的背景差别更大的两个人了。孟德尔的教名为约翰孟德尔，1822年出生在莫拉维亚（Moravia）（中欧的一个地区，现在归属捷克共和国。下文的奥尔姆茨即为莫拉维亚的一个城市。-译者注）北部一个名为海恩曾多尔夫（现在叫做海诺伊斯）的小村庄。他的父亲安东租了一小片农场，靠给地主干活来抵租。约翰16岁那年，在特洛堡的文法学校里正一帆风顺的时候，父亲被一棵倒下来的树砸到，健康与生计都毁了。安东把农场转手给了自己的女婿，换些钱来支付儿子上文法学校和后来在奥尔姆茨（OlmUtz）上大学的学费。但是这样的生活太艰难了，约翰需要更有钱的人资助。最后，他当了奥古斯丁教派的修道士，开始使用格雷戈尔兄弟这一名字。他在布鲁恩[BrUnn，现在的伯尔诺（Brno）（捷克东南部城市。）产的神学院里完成了学业，成了一名神父。他按照要求做了一段时间的教区神父，不太成功。他又进了维也纳大学学习，试图成为一个科学教师，但是却没有通过考试。
他又回到了布鲁恩，31岁，一无所成，只能在修道院里生活。他很擅长数学和象棋，有个数学脑子，也很乐天。他还是一个热情很高的园丁，从父亲那里学到了嫁接果树和授粉的方法。就是在这里，在他没有通过正规学习而得到的农业知识里，埋藏着他的洞察力的根源。当时，养牛和养苹果树的人们对于颗粒遗传学的基础已经有了一些模模糊糊的认识，但是没有人系统地研究过它。“没有一个实验的设计与深度能够使得我们有可能确定每一代里不同性状的数量，或是确定它们之间的统计关系”，孟德尔写道。你可以听见，听众已经打起鼻鼾了。
于是，34岁的孟德尔神父在修道院的花园里，利用豌豆开始了一系列实验，前后持续了8年。这些实验包括了种植3万多棵植物，仅1860年一年就种了6000棵。这些实验最终永远地改变了世界。实验结束之后，他对自己的成就很清楚，而且把它清楚地表达出来，发表在布鲁恩自然科学学会的进展报告上。所有好的图书馆都存有这份刊物，但对他的成就的认可却迟迟没有到来。被提升为布鲁恩修道院的院长之后，孟德尔渐渐对他的花园失去了兴趣，成了一个善良、忙碌却又好像不特别敬神的神父（他在文章里提到美味佳肴的次数比提到上帝的次数还多）。他生命的最后岁月耗在了一场越来越痛苦与孤独的反对政府对修道院增收一项新的税收的运动里。孟德尔是最后一个需要交这项税的院长。在他的黄昏岁月里，也许他曾经想到过，他这一生最大的成就，可能是让一个音乐学院里天才的19岁男孩里奥•亚那谢克（Leos Janacek）（19世纪末20世纪初作曲家）当了布鲁恩合唱团的指挥。
在花园里，孟德尔做了一些杂交实验：把不同种的豌豆拿来杂交。但是这可不是一个业余科学家的游戏，这是一个大规模的、系统的、认真设计出来的实验。孟德尔选择了七对不同种类的豌豆来杂交，圆粒的与皱粒的杂交；黄子叶的与绿子叶的杂交；鼓豆荚的与瘪豆荚的杂交；灰色豆皮的与白色豆皮的杂交；未成熟时豆荚是青色的与未成熟时豆荚是黄色的杂交；在轴上开花的与在顶端开花的杂交；长秆的与矮秆的杂交。他还杂交了多少对其他种类的豌豆，我们不得而知。这七对性状都是代代相传的，也都是由一个单个基因决定的，所以，他肯定是已经从初步结果中知道了可能的结果是什么，才选择了这七对。每一对杂交出来的后代都跟双亲中的一个一模一样。双亲中的另一个的特征似乎消失了。其实没有：孟德尔让那些杂交后代自我繁殖之后，消失的特征又在大约四分之一的“孙子”辈里出现了。他数了又数，第二代的19959棵植物中，显性特征与隐性特征的比例是14949比5010，大约是2.98比1。如罗纳德•费希尔爵士（Sir Ronald Fisher）（罗纳德•费希尔：20世纪英国统计学家、遗传学家，对统计学在生物学里的应用做出了巨大贡献。）在下一个世纪里说的，这个比例跟3接近得令人起疑。别忘了，孟德尔数学很好，而且在做实验之前，他就知道他的豌豆们应该遵从什么样的数学公式。
像一个中了邪的人一样，孟德尔从豌豆又转向倒挂金钟和玉米等其他植物，并得到了同样的结论。他知道他发现了遗传学方面非常重要的东西：遗传的特征不会混杂起来。在遗传里有一些结实的、不可分的、量子化的、颗粒化的东西。遗传物质没有像液体一样均匀融合起来，没有像血液一样融在一起，相反，遗传物质像很多很小的宝石颗粒，暂时地混杂在一起了。事后看起来，这个原理一直是很明显的。否则，怎么解释一个家庭里可以既有蓝眼睛的孩子又有棕眼睛的孩子？达尔文虽然把自己的理论建立在遗传特性的融合性上，但是他几次暗示过这个问题。“近期以来，我倾向于猜想，”他在1857年写信给赫胥黎道：“模模糊糊、粗略地猜想，将来我们会发现，通过受精卵而完成的繁殖，是两个独特的个体的一种混合，却不是一种真正的融合……。除此之外，我想不出其他原因去解释为什么两性繁殖的后代与它们的前辈如此之相像。”
在这个问题上达尔文很是紧张。此前他刚刚被一个苏格兰的工程学教授猛烈地抨击过。这个教授有个奇怪的名字：弗里明•詹金（Fleeming Jenkin）。他指出了一个简单而又无懈可击的事实，那就是自然选择与遗传特性的融合性是互相矛盾的。如果遗传确是通过把遗传物质均匀融合起来而完成的，那么达尔文的学说就不太可能是正确的，因为每一个新的、有生存优势的变化都会被其他因素给稀释掉。詹金用了一个故事来阐明他的观点，一个白人想通过与一个岛上的黑人生孩子而把这个岛上的人群变白。他的白人的血很快就会被稀释到无足轻重的地步。从内心说，达尔文知道詹金是对的。连素来火暴的托马斯•亨利•赫胥黎面对詹金的观点也默不作声。但是达尔文也知道，他自己的理论也是正确的。他不知道应该怎样调和这个矛盾。如果他能读到孟德尔的学说就好了。
事后再看，很多事情都非常明显，但是仍然需要一个天才来戳穿这层纸。孟德尔的成就在于他揭示了：大部分遗传特性看上去像是融合得很好的东西，惟一的原因，是这些遗传特性是由多种“颗粒”决定的。19世纪早期，约翰•道尔顿_{（John Dalton，物理学家、化学家）}已经证明了水是由亿万个坚硬的、不可再分的小东西——原子——组成的，从而击败了他的对手——持有连续性理论的人们。现在，孟德尔证明了生物学里的“原子理论”。生物学里的原子在过去可能被起了很多五花八门的名字，在20世纪的第一年里用过的名字就有要素、原芽、质粒、全因子、生源体、依德、异丹。不过，流传下来的是“基因”这个名字。
44从1866年起，在四年的时间里，孟德尔不断地把自己的论文和想法寄给慕尼黑的植物学教授卡尔一魏海姆•尼亚戈利（Karl-Wilhelm Nageli）。他越来越大胆地指出自己的发现的重要性。但是在四年的时间里尼亚戈利总是误解他的意思。他居高临下地给这位执著的修道士写去礼貌的回信，告诉他去研究山柳兰。就算一个人再努力也不可能给出比这个更捣乱的建议了。山柳兰是单性生殖的，也就是说，它虽然需要花粉才能生殖，却不接受传给它花粉的“同伴”的基因。这样，杂交实验就会得出奇怪的结果。与山柳兰斗争了一阵之后，孟德尔放弃了，转而研究蜜蜂。他在蜜蜂上做了大量实验，所要结果却从来没有被找到。他是否发现了蜜蜂特殊的单倍二倍体的遗传方式呢？（雄性蜜蜂每一条染色体只有一份，是单倍体；雌蜂则每条染色体有两份，是二倍体。）
与此同时，尼亚戈利发表了他自己论遗传学的长篇巨著。在他的文章里，他提到的自己的一项工作是孟德尔理论的一个绝好例子，但是他仍然没有明白孟德尔的理论，也没有在文章中提到孟德尔的发现。尼亚戈利知道，如果你把安哥拉猫与另一种猫交配，安哥拉猫特有的皮毛就会在下一代里消失得干干净净，但是在再下一代里又会重新出现。很难找到比这更好的例子来说明孟德尔所说的隐性性状了。
不过，在他的有生之年，孟德尔差点儿就得到了认可。查尔斯•达尔文通常是很惯于从别人的工作里得到灵感的。他甚至给自己的一个朋友推荐过一本福克（W.O.Focke）写的书，里面引用了14篇孟德尔的文章。可是达尔文自己却好像根本没有注意到这些。孟德尔的命运是在他与达尔文都去世多年之后，在1900年被重新发现的。这是在三个不同地点几乎同时发生的。重新发现他的人——雨果•德弗里斯（Hugo DeVries）、卡尔•克伦斯（Carl Correns）和埃里奇•冯•丘歇马克（Erichvon Tschermak），三个都是植物学家，每一个人都是辛辛苦苦地在不同物种上重复了孟德尔的工作之后，才发现了孟德尔的文章。
对于生物学界，孟德尔理论来得太突然了。进化理论中没有任何东西要求遗传“一块一块”地发生。事实上，孟德尔的学说仿佛是在破坏达尔文费尽力气试图建立的所有理论。达尔文说过，进化就是自然选择之下微小的、随机的变化的累积。如果基因是45硬邦邦的小东西，如果遗传特性可以在隐藏了一代之后又完整地出现，那么它们如何能够逐渐地、微妙地变化呢？从很多角度来说，20世纪初人们看到的是孟德尔学说打败达尔文学说。当威廉•贝特森说，颗粒遗传学的作用起码是限制了自然选择的作用时，他说出了很多人的想法。贝特森是个脑筋糊涂文风枯燥的人。他相信进化是跳跃性的，从一种生命形式跳到另一种，没有中间的过渡。为了证明这个离奇的理论，他在1894年出版了一本书，
阐述到遗传是颗粒性的。为此，他从那以后一直受到“真正”的达尔文主义者的强烈攻击。如此说来，他对孟德尔学说张开双臂欢迎并第一个把它译成英文，就毫不奇怪了。“在孟德尔的发现里，没有任何东西是与正统的理论——亦即物种产生于自然选择——相矛盾的”，贝特森就像一个自称是惟一能够诠释圣保罗的神学家那样写道：“无论如何，现代科学的探索毫无例外地是为了除掉我们总结出来的自然规律里那些‘超自然’的成分，虽然有些时候这些探索本身就带有‘超自然’的烙印。坦率地说，不能否认，达尔文著作中的某些章节在某种程度上鼓励了对于自然选择原理的曲解与滥用。但是我感到安慰的是，我相信，如果达尔文有幸读过孟德尔的大作，他一定会立刻修改这些章节。”
但是，正是因为这个大家都不喜欢的贝特森如此推崇孟德尔，欧洲的进化论学者们才对孟德尔的学说很是怀疑。在英国，孟德尔学派与“生物统计”学派之间激烈的冲突持续了20年。这个冲突传到了美国，不过在美国，两派之间的争论不那么激烈。1903年，美国遗传学家沃特•萨顿（Walter Sutton）发现，染色体的行为就像是孟德尔式的遗传因子：它们是一对一对的，每一对里一条来自父方一条来自母方。托马斯•亨特•摩尔根（Thomas Hunt Morgan），美国遗传学之父，了解到这个发现之后，就及时地“皈依”了孟德尔“教派”。于是，讨厌摩尔根的贝特森就放弃了自己原本正确的立场，转而攻击这个有关染色体的理论。科学的历史就是常常被这种无聊的争吵决定的。贝特森最终变得默默无闻，而摩尔根却干成了一些大事：他创立了一个成果显赫的遗传学派，遗传学上的距离单位——厘摩尔根——也是借他的名字命名的。在英国，直到1918年，罗纳德•费希尔才用自己敏捷的数学头脑消除了孟德尔学说和达尔文学说之间的矛盾。孟德尔学说非常漂亮地证明了达尔文学说的正确性，根本没有与其抵触。“孟德尔学说，”费希尔说：“给达尔文建起来的那所建筑补上了缺失的部分。”
但是，突变的问题还是没有解决。达尔文的学说要求遗传的多样性，孟德尔的学说却提供了稳定性。如果基因就是生物学里的“原子”，改变它们岂不是像炼金术那样成了异端邪说？在这方面的突破，来自于第一次人工诱发的突变，这是由一个跟加罗德和孟德尔非常不同的人完成的。在爱德华时代的医生与奥古斯丁教派的修道士旁边，我们还得再加上一个好斗的赫尔曼•乔•穆勒（Hermann Joe Muller）。20世纪30年代，有许多聪明的犹太科技人才跨过大西洋，到美国避难，穆勒与这些人几乎各个方面都一样，只除了一点：他是向东走的。他是土生土长的纽约人，一个小型金属铸造公司老板的儿子。在哥伦比亚大学他开始热爱遗传学，但因为跟导师摩尔根合不来，在1920年去了得克萨斯大学。在对待天才的穆勒的时候，摩尔根的态度也许是有一丝排犹主义的痕迹，但是穆勒跟人闹矛盾，却是再典型不过的事。他的一生都不断跟这个吵跟那个吵。1932年，他的婚姻触礁，他的同事窃取他的思想（他自己是这么说的），他自杀未遂之后，离开得克萨斯去了欧洲。
使穆勒得到诺贝尔奖的重大发现是基因突变可以人工诱发。这就像是欧内斯特•卢瑟福（Ernest Rutherford）先他几年而发现的，原子是可以嬗变的。也就是说，在希腊文里意思为“不可分割”的“原子”这个词，是不合适的。1926年，穆勒问自己：“在所有生命过程中，突变是否真的有一个与其他过程都不一样的特点：它是否真的不可被人工改变和控制？它是否占有一个与物理学中最近才被发现的原子嬗变相当的位置呢？”
第二年，他回答了这个问题。通过用大剂量的X射线去“轰炸”果蝇，穆勒使它们的基因产生了突变，它们的后代出现了新的畸形。他写道：突变，“并不是一个远不可及的上帝，站在细胞遗传物质里一座坚不可摧的堡垒里跟我们开开玩笑”。就像原子一样，孟德尔的遗传颗粒一定也有一些内在的结构。它们可以被X射线改变。突变之后它们仍然是基因，只是不再是以前的基因了。
人工诱发突变是现代遗传学的开始。1940年，两个科学家，乔治•比德尔（George Beadle）和爱德华•塔特姆（Edward Tatum），用穆勒的X射线方法造出了红面包霉菌的突变种。然后，他们发现新的突变种无法制造一种化学物质，因为它们体内有一种酶没有正常功能。他们提出了一条生物学定律，后来被证明是基本正确的：一个基因确定一种酶。遗传学家们开始不出声地唱起来了：“一个基因，一种酶。”这其实是加罗德的旧的假说以现代的、生物化学的方式的具体表达。三年之后，莱纳斯•鲍林（Linus Pauling，化学家，因在化学键和复杂分子结构方面的工作获得1954年的诺贝尔化学奖，因在反对核武器试验、扩散方面的贡献获得1962年的诺贝尔和平奖）做出了惊人的推断：一种很严重的贫血症的病因，是制造血红蛋白的基因出了错误，这种病的病人主要是黑人，他们的红细胞变成了镰刀形。这个基因错误表现得像是一个真正的孟德尔式突变。事情慢慢地明显起来了：基因是蛋白质的配方；突变就是改变了的基因制造出来的改变了的蛋白质。
这个时候，穆勒并不在人们的视野里。1932年，他对社会主义的狂热和同样的对于有选择地繁衍人类（即优化人种论）的狂热，使他渡过大西洋去了欧洲。他希望看到精心繁殖出来的、具有马克思或列宁的特征的儿童，不过在他的书的较晚版本里，他识时务地将这一点改成了林肯或笛卡儿。他在希特勒掌权之前的几个月到了柏林。在那里，他惊恐万状地看到了纳粹分子砸毁了他的老板奥斯卡•沃格特（Oscar Vogt）的实验室，因为沃格特没有赶走在自己手下工作的犹太人。
穆勒又向东走了一步，到了列宁格勒尼柯莱•瓦维洛夫（Nikolay Vavilov）的实验室。刚到不久，反对孟德尔学说的特洛菲姆•李森科（Trofim Lysenko）就得到了斯大林的青睐，开始迫害相信孟德尔理论的遗传学家，以巩固他自己的疯狂理论。他的理论宣称，麦子就像俄罗斯人民的灵魂一样，不必通过繁殖，只要通过训练就可以让它们适应新的环境。对于不同意这种理论的人，不应该劝说，而应该将他们枪毙。瓦维洛夫死在监狱里了。还抱有幻想的穆勒把自己的有关优化人种论的新书送了一本给斯大林。但是，听说书并没有受斯大林赏识之后，他找了个借口及时离开了苏联。他参加了西班牙内战，在国际纵队的血库工作。后来他又去了爱丁堡，跟往常一样走霉运，刚到就赶上了第二次世界大战的爆发。他发现，在没有电力供给的苏格兰冬天，在实验室里戴着手套做科研很难。他绝望地想回到美国。但是谁也不想要一个好斗易怒的社会主义者，课讲得不好，还在苏联住过。
最后印第安纳大学给了他一份工作。第二年，他因为发现人工诱发突变而获得了诺贝尔奖。但是，基因仍然是不可捉摸的神秘玩意。基因本身是由蛋白质制造的，这就使得它能够决定蛋白质的结构这一能力显得更让人摸不着头脑，细胞里好像没有其他东西比基因更复杂更神秘了。没错，染色体上倒是有些很神秘的玩意：那个乏味的被称为DNA的核酸。1869年在德国的图宾根（Tubingen），一个名字叫做弗雷德里克•米歇尔（Friedrick Miescher）的瑞士医生，从受伤的士兵那些充满脓血的绷带里第一次分离出了DNA。米歇尔本人猜到了DNA可能是遗传的关键。1892年他写信给他叔叔的时候表现出惊人的先见之明：DNA也许传递了遗传信息，“就像在很多语言中，24到30个字母就能组成词和概念”。但是，那时候没有人注意DNA；它被认为是一种比较单调的物质：只有四种不同的“字母”，它怎么可能带有遗传信息？
因为穆勒的缘故，一个19岁就拿到了学士学位的早熟的年轻人去了印第安纳。他就是詹姆斯•沃森。看上去他一定不像是一个解决基因这个问题的人，但他就是解决了。在印第安纳大学，像我们可以预料的那样，他跟穆勒处不来，于是他师从了意大利移民萨尔瓦多•卢里亚（SalvadorLuria）。沃森建立了一种近乎偏执的信念：基因是由DNA而不是蛋白质组成的。为了寻找证据，他去了丹麦，之后又因为对他的那些丹麦同事不满意，在1951年10月去了剑桥。机遇把他扔到了卡文迪什（Cavendish）实验室，在那里他遇到了弗兰西斯•克里克，拥有同样天才的头脑，对于DNA的重要性也是同样坚信不疑。
之后的事情已经载入史册。克里克是早熟的反面。当时他已经35岁，却还没有拿到他的博士学位。一颗德国的炸弹炸毁了他在伦敦大学学院的仪器，使得他无法测量热水在高压下的黏性。对他来说，这倒是一种解脱。他离开自己停滞不前的物理学生涯，往生物学方面挪了几步，但是也没有得到什么成功。开始，他被剑桥的一个实验室雇用，测量细胞在外力之下吞噬了一些颗粒之后的黏性。他从这份枯燥的工作逃了出来，在卡文迪什实验室忙着学习晶体学。但是他没有耐心整天只关注自己的研究，也没有耐心只研究小的问题。他的大笑、他的自信的智慧、他的喜欢告诉别人人家正在研究的问题的答案，使他在卡文迪什实验室开始讨人嫌了。克里克也对多数人对于蛋白质的着迷隐隐地有些不满。基因的结构是个大问题，他猜测到，DNA也许是答案的一部分。受沃森的“勾引”他放弃了自己的研究，开始沉迷于DNA这个“游戏”这样，科学史上一个伟大的合作诞生了：年轻、雄心勃勃、头脑敏捷的美国人懂一些生物学，一点不费劲就成了天才却无法专注的英国人懂一些物理学。他们的合作充满友好竞争，因此也十分高产。这简直是放热反应。
短短几个月之内，利用别人辛苦收集来却没有分析透彻的数据，他们做出了也许是从古至今最伟大的科学发现之一：他们发现了DNA的结构。即使是阿基米德从浴缸里跳出来那次，都不如沃森和克里克更有资格炫耀。克里克是这么炫耀的——1953年2月他在“鹰”酒吧里说：“我们发现了生命的秘密。’沃森被这个说法吓坏了，他还是担心他们的研究是否有什么错误。
但是，他们没有错。一切都突然间清楚了：DNA带有一种密码，写在一条精巧的、缠绕在一起的双螺旋阶梯上，还可以是无限长的。靠着它的字母之间的化学亲和力，这个密码能够复制自己，并且清晰地写明了制造蛋白质的配方，这是通过一本当时还没有被发现的“密码手册”在DNA与蛋白质之间建立起对应关系而完成的。DNA结构的惊人成功，在于它让一切都显得那么容易，却又非常具有美感。正如理查德•道金斯所说：“在沃森一克里克之后，分子生物学的真正革命在于它变成了数码式的，……基因的‘机器语言’不可思议地与计算机语言接近。
沃森一克里克的DNA结构发表之后一个月，英国新女王加冕，在同一天，一个英国探险队征服了珠穆朗玛峰。除了《新闻纪事》上的一条小消息外，DNA双螺旋结构的发现都没能上报纸。而今天，大多数科学家都认为它是20世纪甚至是1000年来最重要的发现。
DNA结构发现之后，接踵而来的是很多年的让人心烦的迷惑。那个密码本身，基因借助来表达自己的那个语言，固执地保守着它的神秘。对于沃森和克里克来说，找到密码几乎是太容易了，只需要把猜测、物理学知识和灵感结合起来。破译密码却需要真正的天才。很明显这个密码是由四个字母组成的：A、C、G和T。
而且几乎可以肯定地说，就是这个密码被翻译成了有20个字母的氨基酸，氨基酸又组成了蛋白质。但是，这是怎样完成的？在哪里、以什么方式完成的？
在领着我们到达了最终答案的那些思路里，大多数思路来自于克里克，包括被他称为接合分子的东西-我们今天称为转导RNA。在没有任何证据的时候，克里克就认定这样的分子肯定是存在的。最后，它老老实实地露面了。不过，克里克也有过一个如此之好的想法，被称为是历史上最伟大的错误理论。克里克的“没有逗号的密码”理论比自然母亲所用的方法要优美得多。它是这样的：假设这个密码的每一个词有三个字母（如果只有两个，那么总共只能有16个不同的词，不够用）。假设密码里没有逗号，词与词之间没有空隙。现在，假设这个密码不包括那些如果你从错误的地方51开始读就会读错的词。打个布赖恩•海斯（BrianHayes）（布赖恩•海斯：美国当代科普作家，精通计算机。一译者注）用过的比方吧，先想出所有用A、S、E和T这四个字母组成的英文词：ass、ate、eat、sat、sea、see、set、tat、tea、tee。现在，把那些从错误的起点开始读就会读错的词去掉。比如说，ateateat可以被读成ateateat，也可以被读成ateateat，还可以被读成ateateat。在密码里这三种读法只能有一种。
CCC、GGG和TTT，然后，把剩下的60个词每三个并成一组。每一组里的三个词都含有同样的字母，字母的顺序是循环的。比如，ACT、CTA和TAC是一组，因为在每一个词里面，C都跟在A后面，T跟在C后面，A跟在T后面。ATC、TCA和CAT就是另外一组了。每一组里只有一个词是用在密码里的。这样，就整整有20个词。别忘了，蛋白质的字母表里恰好有20个由氨基酸组成的字母！一个四个字母的密码给出了一个20个字母的字母表。
克里克想让人们不要对他的理论太过认真，但他是徒劳了。“在破译密码上，我们现在的假设和推断依据不足，从理论上说，我们不应对这些推断抱有太大的信心。我们做出这个推断，只是因为它能够从合理的物理学假设出发，以一种简洁的方式给出‘20’这个有魔力的数。”但是，DNA的双螺旋结构在一开始也没有什么证据啊。兴奋的情绪出现了。有5年的时间，人人都觉得克里克的理论是正确的。
但是，专注于理论的日子过去了。1961年，其他人都还在琢磨理论的时候，马歇尔•尼伦伯格（Marshall Nirenberg）和约翰•马太（Johann Matthaei）（尼伦伯格是20世纪美国生物学家，因为在破译遗传密码以及对于遗传密码在蛋白质合成中的作用的研究获得1968年诺贝尔生理学和医学奖。他的工作初始阶段是与德国科学家马太共同进行的。一译者注）破译了密码中的一个词。他们的方法很简单：只用U（尿嘧啶，相当于DNA里的T）造了一条RNA链，然后把它扔进了氨基酸溶液里。在这个溶液里，核糖体把苯丙氨酸缝合在一起，造出了一个蛋白质。这样，遗传密码里的第一个词被破译了：尿嘧啶代表苯丙氨酸。“没有逗号的密码”理论到底是错误的。这个理论最美的地方就在于它不会出现读码移位突变，这种突变可以由于一个字母的丢失使得这个字母之后的所有信息都失去意义。但是，大自然却选用了另一种方法，虽然稍欠优雅，却能够经受住其他错误。它含有很高的重复性：一个意思可以用很多三个字一组的词表达。
到了1965年，所有的遗传密码都已经知道了，现代遗传学也开始了。60年代的前沿突破，到了90年代已经成了常规实验。因此，在1995年，科学可以重新回到阿奇博尔德•加罗德的那些早已去世的尿黑症病人那里，确信地说出，是哪一个基因上的哪一个“拼写”错误导致了尿黑酸尿症。这个故事是20世纪遗传学的一个缩影。别忘了，尿黑酸尿症是一种非常少见又不太有危险的疾病，用调整饮食的方法就可以比较轻易地治好。所以有很多年，科学家都没有去碰它。在1995年，两个西班牙人被它在历史上的重要性所吸引，开始了对它的研究。他们在曲霉真菌里造出了一种突变种——在苯丙氨酸的存在下，这种突变种体内会积存大量的紫色色素：尿黑酸。与加罗德的推测一致，在这个突变种里有一种蛋白质是有功能缺陷的，它叫做尿黑酸双加氧酶。这两个人用一些特殊的酶把曲霉真菌的基因组打成碎片，找出与正常霉菌基因组不同的片段，然后把这些片段里的密码读出来。这样，他们最终抓住了出问题的基因。之后，他们搜索了人类基因的资料库，试图发现是否有一个类似的人类基因可以与曲霉真菌里这个基因结成一对。他们找到了。在三号染色体的长臂上，有一段DNA字母与那个真菌里的基因的字母序列有52%的相似。从尿黑酸尿症患者体内找到这个基因，并把它和正常人体内的同一基因相比较之后，我们发现患者的这个基因在第690个字母或第901个字母上与正常基因的不同，是致病的关键。每一个病人都是因为这两个字母中的一个出了错，而导致这个基因造出的蛋白质不能发挥正常功能。
这个基因是那些乏味基因的一个典型：在一个没意思的身体器官里造一种没什么意思的蛋白质，一旦出了问题，会导致一种没什么意思的疾病。它没有任何一方面给人惊奇或是有什么特殊之处。它跟智商或同性恋倾向没有任何关系，它没有向我们揭示生命的起源，它不是“自私的基因”它老实地遵守孟德尔定律，它既不会致死也不会致残。不管出于什么目的要达到什么目标，你都不得不承认，它在地球上所有生命里都是一样的，连面包霉菌里都有它，而且它在那里的功能跟在我们体内的功能一样。但是，制造尿黑酸双加氧酶的这个基因无愧于它在历史上占的小小的地位，因为它的故事就是遗传学本身的故事。这个没什么意思的小小基因揭示出来的美，会让格雷戈尔•孟德尔都感到炫目，因为它是他的定律的具体表现，它讲述的故事不仅是关于那些微观的、缠在一起的、结构对称的双螺旋的，也是关于那些由四个字母组成的密码的，而且还是关于所有生命在化学上的一致性的。
第四号染色体命运
先生，您告诉我们的这些，只不过是科学的加尔文主义。 ——一位姓名不详的士兵在一场通俗讲座之后对威廉·贝特森说
打开任何一份人类基因名录，你面对的，不是人类到底有多少潜能，而是一个疾病的名单。这些疾病，大部分是以一两个名不见经传的中欧医生的名字命名的。这个基因会导致尼曼—皮克氏病，那个基因能导致伍尔夫—赫茨霍尔综合症，如此种种。你会得到这么一个印象：基因是用来导致疾病的。“新的导致精神症状的基因”，一个关于基因的网站这样宣布来自科研前沿的最新消息：“导致早发性肌无力的基因、导致肾脏癌的基因被成功分离；幼儿自闭症与血清素传输基因有关；一个新的老年痴呆症基因；偏执行为的遗传学。”
但是，用它们可能导致什么疾病来定义基因，跟用人体器官能得什么病来定义这些器官一样，有些荒唐。好像是在说：肝脏的功能是得肝硬化，心脏的功能是得心脏病，大脑的功能是中风。基因名录之所以如此，不是因为它反映了我们对于基因的了解，而是反映了我们对于基因的无知。对于某些基因来说，我们对于它们的了解仅限于它们出故障的时候会导致什么疾病，这是事实。但这只是关于这些基因的所有知识里细微得可怜的一个信息，而且还误导性极大。它导致这样一个简单的说法：“某人有伍尔夫—赫茨霍尔综合症的基因。”错！所有人都有伍尔夫—赫茨霍尔综合症的基因，除了那些——这听起来有点滑稽——有伍尔夫—赫茨霍尔综合症的病人。他们之所以有这种病，是因为这个基因从他们身体内丢掉了。在剩下的人里，这个基因起的是积极的而不是消极的作用。病人有病不是因为他们有什么特殊基因，而是他们有正常基因的突变种。
伍尔夫—赫茨霍尔综合症如此少见又后果严重——也就是说，它的基因的作用非常关键——病人通常很年轻就死去了。但是坐落在四号染色体上的伍尔夫—赫茨霍尔综合症基因，事实上却是“致病基因”里最著名的一个，因为它与另一种非常不同的病也是联系在一起的：亨廷顿舞蹈病。这个基因的一个突变种导致亨廷顿舞蹈病；这个基因的整个丢失导致伍尔夫—赫茨霍尔综合症。我们不太了解正常情况下这个基因每天的功能是什么，但是我们对于这个基因可以怎样出错、为什么出错、在哪里出错，以及出错之后对于我们的身体后果是什么，却有无比清晰的了解。这个基因含有一个词：CAG、CAG、CAG、CAG，……这个词被重复了很多次。有时候这个词被重复6次，有时候30次，有时候100多次。你的命运、你的神智、你的生命，就都悬在这条重复的线上。如果这个词重复35次或以下，你就没事。大多数人体内这个词是重复10～15次。如果这个词重复39次以上，到了中年之后你就会慢慢开始失去平衡能力、生活逐渐变得不能自理，最后过早死亡。能力的下降先是表现在智力开始出现轻微的问题，这之后，四肢出现震颤，最后出现深度抑郁，间或有幻觉和妄想。得了这种病是没法“上诉”的：这种病无法医治。但是，这种病的病人死之前要受15～25年的折磨。很少有什么命运比这更悲惨了。事实上，一旦家族里有人出现了这种病的早期症状，那种恐惧感对于很多自己还没有得病的人来说，也是很严重的：等待疾病袭来的时候，那种压力和紧张，简直是摧毁性的。
致病的原因在基因里，而不是其他任何地方。你要么带有亨廷顿突变，会得病；要么没有亨廷顿突变，不会得病。这种决定论、这种事先注定的命运，是加尔文做梦也没想到的。乍看上去，这简直是基因决定论的终极证明，基因决定一切，我们对其无可奈何。你吸烟也好，补维生素也好，有锻炼习惯也好，整天窝在沙发上看电视也好，都没关系。亨廷顿舞蹈病在什么年龄发作完全是由那一个基因上CAG这个词被重复的次数决定的，一点通融余地都没有。如果一个人带有39次重复，那么，有90％的可能是他在75岁的时候已经成了痴呆，按平均值来看，他会在66岁的时候出现这个疾病的第一个症状；如果带有40次重复，那么平均是在59岁发病；41次重复，54岁发病；42次重复，37岁发病。如此类推下去。那些带有50次重复的人，平均在27岁就会因病失去正常思维。这样打个比方：如果你的染色体长得能够绕赤道一圈，那么健康与发疯之间的区别只差多长出的一英寸。
哪一种占星术也不可能如此准确，哪一种人类活动因果关系的理论也没有这么精确，不管这理论是弗洛伊德的、是马克思的、是基督教的，还是泛灵论的。《圣经·旧约》里的先知们，古代希腊那些内视的神喻代言人，英国伯尼奥·瑞吉斯（Bognor Regis）码头上那些玩着水晶球的吉卜赛算命的，不仅没有谁能够预言一个人的生活会在哪一年被毁掉，他们根本就没有假装过自己有这个能力。我们现在在对付的是一种恐怖的、残酷的、无法改变的预言。在你的基因组里有大约10亿个3个字母的词，但是，这一个词的重复次数，就决定了你是正常还是发疯。
在1967年，歌星伍迪?格思里（Woody Guthrie）死于亨廷顿氏病，之后这种病就变得尽人皆知、臭名昭著。在1872年，它被一位名叫乔治·亨廷顿（George Huntington）的医生在长岛（Long Island）东端首次诊断出来。他注意到这种病似乎是在家族里传播的。他之后的研究发现，长岛的那几个病例是发源自新英格兰（New England）的一个大家族的一部分。在这个家族12代的历史里，可以找到1000多个病人。所有这些病人都是两个兄弟的后代，这两个人是1630年从萨佛克（Suffolk）移民来的。他们的后代中，有几个人在1693年被当成是巫婆，在萨勒姆（Salem）（萨佛克是英国东部的一个郡，新英格兰是美国东北部几个州的总称，萨勒姆则为新英格兰地区的一个城市）被烧死了。这也许是因为她们得病的症状太吓人。但是，因为这种病的症状要在病人到了中年之后才出现，也就是说，当病人有了孩子之后，所以致病的基因突变没有被自然选择淘汰掉。事实上，有几个研究还发现，带有致病的基因突变的人比起他们的没有病的兄弟姐妹来，生孩子生得更多。
亨廷顿氏病是我们发现的第一个完全显性的人类遗传病。这意味着它跟尿黑酸尿症不一样。要出现尿黑酸尿症的症状，你必须有两份致病突变，从你双亲那里各得一份。而对于亨廷顿氏病，一份致病突变就够了。如果这个突变是来自于父亲，病就好像更加严重。在这个父亲所生的子女里，出生得越晚的孩子，基因里重复的次数越多，突变越严重。
20世纪70年代晚期，一个意志坚定的妇女决心要找出亨廷顿氏病的基因。伍迪?格思里因亨廷顿氏病而痛苦地死去之后，他的遗孀建立了抗亨廷顿舞蹈病委员会。一个名叫米尔顿?韦克斯勒（MiltonWexler）的医生加入了她的行列，这位医生的太太和她的三个兄弟都有这种病。韦克斯勒的女儿南希（Nancy）知道自己有50％的可能带有致病突变，她着了魔一样想找到这个基因。别人劝她：还是算了，这样一个基因可能是找不到的，找这个基因就好像是在一个跟美国一样大的草堆里找一根针，她应该等几年，等科技进步之后有可能找到这个基因的时候再说。“但是，”她写道：“如果你有亨廷顿氏病，你没有时间等。”在看到一个委内瑞拉医生阿米里柯?尼格里特（AmericoNegrette）的报告之后，她在1979年飞到委内瑞拉，访问了马拉才博湖边的三个村庄：圣路易斯、巴伦其塔和拉古尼塔（SanLuis，Barranquitas，Laguneta）。马拉才博湖（LakeMaracaibo）实际上是个巨大的被陆地环绕的海湾，位于委内瑞拉的西端，在科尔地勒拉?德米里达（CordilleradeMerida）以西。
这个地区有一个非常大的家族，在家族里亨廷顿氏病的发病率很高。据家族成员之间流传的故事，这种病是从18世纪的一个水手那里来的。韦克斯勒成功地把他们的家族病史追溯到19世纪早期一个名叫玛利亚?康色普申（MariaConcepcion）的妇女那里。这位妇女生活在帕布罗?德阿古阿（PueblosdeAgua），那里有一些由高高地立在水上的房屋组成的村庄。她是个多产的女人，她之后的八代一共有1.1万人，在1981年的时候仍有9000人活着。在韦克斯勒去访问的时候，他们中的371人患有亨廷顿氏病，另外有3600人发病的可能性在四分之一以上，因为他们的祖父母里至少有一人患有亨廷顿氏病。
韦克斯勒有着超人的勇气。她本人就可能带有致病突变，“看着这些欢蹦乱跳的孩子，真是让人心碎，”她写道：“尽管贫穷，尽管不识字，尽管男孩子要乘着小船在波涛翻滚的湖上打鱼，又劳累又危险，尽管那么小的女孩子就要操持家务照顾生病的父母，尽管无情的疾病夺去了他们的父母、祖父母、姑姑、叔叔、表兄表妹……，他们仍然满怀希望，快乐地尽情地生活——直到疾病袭来。”
韦克斯勒开始在草堆里捞针了。第一步，她采集了500个人的血样。“炎热、喧嚣的采血的日子。”然后，她把血样送到了吉姆?居塞拉（JimGusella）在波士顿的实验室。他开始通过测试基因标志的办法来寻找致病基因：随机选择一些DNA片段，可能与正常DNA相同，也可能不同。好运向他微笑了。到1983年年中，他不仅分离出了一个与致病基因距离很近的标志，而且确定了它是在四号染色体短臂的顶端。他知道这个基因是在基因组里那百万分之三的序列里。完事大吉了吗？没有这么快。这个基因所在的区域有100万个字母长。草堆变小了些，但还是很大。八年之后，这个基因仍然是个谜。“这项工作是无比辛苦的，”韦克斯勒的语气像是维多利亚时代的探险者：［4］“四号染色体顶端这个地区，环境极其险恶。过去的八年，我们就像是在攀登珠穆朗玛峰。”
持之以恒得到了回报。1993年，这个基因终于被找到了。它的内容被读出来了，致病的突变被确认了。这个基因所含的配方可以制造一种被称做亨廷顿蛋白的蛋白质：因为蛋白质是在基因之后发现的，所以蛋白质就以基因命名了。CAG这个词在这个基因中部的重复，使得蛋白质的中部有一长串谷氨酰胺（在基因语言里，CAG的意思是谷氨酰胺）。对于亨廷顿氏病来说，蛋白质这一部分的谷氨酰胺越多，发病的年龄越小。
这个对亨廷顿氏病的解释，看上去很没有说服力。如果亨廷顿蛋白的基因有问题，那为什么它在病人生命的前30年里没有表现出异常？很明显，亨廷顿蛋白的突变型是逐渐积累起来的。与早老性痴呆和疯牛病一样，在细胞里逐渐积累起来的这些黏糊糊的蛋白质团团，最后导致了细胞的死亡。这团蛋白质可能诱导了细胞的“自杀”。在亨廷顿氏病里，这主要发生在大脑里控制运动的区域，所以后果是病人的运动越来越困难越失控。
最让人没想到的，是CAG这个词的过度重复并不是亨廷顿氏病的专利。另外有五种神经方面的疾病，也是因为所谓的“不稳定的CAG重复”而造成的，不过是在其他基因里。小脑性运动失调就是一例。还曾经有过一个奇怪的科研报告：把一长串CAG插到老鼠体内一个任选的基因里之后，老鼠出现了一种发病较晚的神经性疾病，跟亨廷顿氏病很像。所以，不管CAG的过度重复出现在什么基因里，它也许都可能导致神经疾病。还有，其他一些因神经退化而导致的疾病，也是由于一些词的过度重复而造成的，每一个这种词都以C开始以G结尾。有六种病是因为CAG重复造成的。在X染色体上有一个基因，如果CCG或CGG在它的开头部分重复了200次以上，就会导致“脆弱X综合症”。这是一种很常见的痴呆症，病人与病人之间症状区别很大。正常人的重复在60次以下，病人体内的重复可以高达1000次。在第十九号染色体上有一个基因，如果这个基因里CTG重复次数在50～1000次之间，就会出现肌萎缩症。有一打以上的疾病都是因为三个字母的词重复过多引起的，这些病被称为多谷氨酰胺病。在所有这些疾病里，比正常长度长的蛋白质都有一种倾向，就是积累成无法被降解的蛋白质块，导致它们所在的细胞死亡。这些疾病有不同症状只是因为在身体不同部位基因的表达不太一样。
以C开头以G结尾的这些词，除了代表谷氨酰胺之外，还有什么特殊之处？一种名叫“预期效应”的现象给了人们一些启发。人们早已知道，那些患有严重亨廷顿氏病或“脆弱X综合症”的人，他们的孩子发病时间一般会早于父母，病情也更严重。预期效应是这样一种现象：父母体内的重复越长，基因复制给下一代的时候，加长的长度就越长。我们知道，这些重复的DNA会绕圈，形成一个名叫“发夹式结构”的东西。DNA喜欢自己跟自己黏在一起，形成发夹式的结构，以C开头以G结尾的词里面的C和G在“发夹”中间连接起来。当DNA复制的时候，“发夹”被打开，复制机器可能会滑一下，多余的词就被插到DNA里了。
有个简单的比方，也许可以帮助读者理解。如果我在这句话里重复说六个词：CAG，CAG，CAG，CAG，CAG，CAG，你会不费劲地数清楚它们。但是我如果把这个词说36次：CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，CAG，我敢打赌你很容易会数错。DNA也是这样。重复的次数越多，复制机器在复制DNA的时候就越容易再插一个进去。它指在“书”页上的手指稍微一动地方，就忘了自己数到哪儿了。另外一种解释（可能两种解释都对）是说，检查复制情况的系统，被称为错配修复系统的系统，只善于查出比较小的错误，而不是这种一个词被大量重复的错误。
这也许可以解释为什么这些疾病都是在一定年龄之后才发病。伦敦戈爱恋医院的劳拉?曼吉亚丽尼（LauraMangiarini）造出了一些转基因老鼠，它们携带有亨廷顿基因的一个片段，里面有100次CAG重复。当这些老鼠长大了的时候，在它们的所有器官里（只有一个除外），重复的次数都增加了。最多的增加了10次。那个例外的器官是小脑，是后脑里分管运动机能的部分。小脑里的细胞自从老鼠学会了走路之后就不需要再变化了，所以它们也不再进行分裂。复制错误都是在细胞分裂的时候产生的。在人体内，重复的次数在小脑里是逐渐减少的，尽管在其他器官里重复越来越多。在那些制造精子的细胞里，CAG的重复越来越多，这就解释了为什么一个人发病的年龄跟他出生时他父亲的年龄有关：父亲年龄越大，孩子发病年龄越早，病情越严重。（顺便提一句，现在人们知道，整个基因组里的基因突变率，在男性里是女性的五倍。这是因为男性DNA在男性的一生中都在不断复制以提供新鲜的精子细胞。）
亨廷顿基因的自发突变好像在有些家族里比在其他家族里更容易出现。原因不仅仅是在这些家族里CAG的重复次数刚刚在临界值以下（比如说，在29次与35次之间）。与其他带有同样CAG重复次数的人相比，这些家族里的人基因里CAG的重复次数更容易越过临界值——容易一倍。原因仍然很简单，完全是由序列里有些什么字母决定的。比较一下这样两个人：一个人带有35次CAG重复，后面接着的是一堆CCA和CCG。如果复制DNA的机器滑了一下，加了一个CAG上去，重复次数就加了一次。另外一个人也有35次CAG重复，后面跟着一个CAA，再后面是两个CAG。如果复制DNA的机器滑了一下，把CAA读成CAG了，结果就不是多重复了一次，而是多重复了三次，因为后面已经有两个CAG等在那儿了。
虽然我劈头盖脑地扔给你那么多有关亨廷顿基因CAG这个词的细节，好像离题越来越远，但是想一想吧，几乎所有这些知识在五年以前（这本书第一次出版是在2000年。——译者注）还没人知道呢。亨廷顿基因还没有被发现，CAG重复还没有被查出，亨廷顿蛋白还是未知物，没有人猜到亨廷顿氏病与其他神经萎缩类疾病是相关的，亨廷顿基因的突变率和突变的原因都还很神秘，也没人能解释父亲的年龄为什么对孩子的病情和发病年龄有影响。从1872～1993年，人们几乎不掌握与亨廷顿氏病有关的任何知识，只知道它是遗传的。自从1993年，有关亨廷顿氏病的知识像蘑菇云一样一夜之间就出现了，这朵蘑菇云如此之大，需要在图书馆里泡上好几天才能把这些知识都读一遍。从1993年以来，有将近100位科学家发表过与亨廷顿基因有关的研究论文。所有这些论文都是关于一个基因的，人类基因组的6万到8万个基因之一。詹姆斯?沃森和弗兰西斯?克里克在1953年打开的那个潘多拉盒子具有无比的力量，如果你还对此不太确信，亨廷顿基因的故事怎么也应该说服你了吧？跟我们从基因组里搜罗到的知识相比，我们从生物学的其他分支得到的知识顶多算是一小勺。
尽管如此，仍然没有一例亨廷顿氏病被治愈。我宣扬了这么久的这些知识连一个怎样治疗亨廷顿氏病的建议都没有提出来。如果说这些知识对那些正在寻找疗法的人有什么影响，也许CAG重复这个一点不带感情色彩的简单病因使现实变得更加苍白了。大脑里有1000亿个细胞。我们哪能进去把每一个细胞里的CAG重复都人为缩短一些呢？
南希?韦克斯勒讲了一个她在马拉才博湖畔作研究时一个女人的故事。这个女人到韦克斯勒的草屋去做检查，想看看自己是否有神经疾病的征兆。她看上去很健康，但韦克斯勒知道，在病人发病之前很久，亨廷顿氏病的一些细微征兆就可以被一些医学检查测出来。这个妇女无疑是有这些征兆的。但是，跟其他很多人不一样的是，这位妇女在做完检查之后，固执地问医生，结论是什么。她到底有没有病？医生反问她：你自己认为呢？她说她觉得自己很正常。医生们最终没有告诉她检查结果，只是说在做诊断之前，他们需要更多地了解她。这个女人刚刚离开房间，她的一个好朋友就冲进来了。这个朋友近乎歇斯底里地问医生：你们跟她说什么了？医生们复述了他们的话。“谢天谢地”，这位朋友说，然后解释说，这个女人曾说过，她一定要问医生诊断结果是什么，如果医生发现她有亨廷顿氏病，她马上就去自杀。
这个故事有几个让人不安的地方。第一，是这个虚假的欢乐结局。这个女人带有亨廷顿突变，她已经被判了死刑，死刑也许由她自己执行，也许由疾病缓慢地执行。不管那些专家对她多友善，她也逃不脱自己的命运。当然，她有全权选择怎样面对她有亨廷顿氏病这个事实。就算她愿意选择自杀，这些医生也无权不告诉她事实真相。但是，医生们没有告诉她真相，也可以说是做了“正确”的事。最敏感的话题，莫过于一个关于病人是否有某种致死疾病的检查的结果；例行公事般直截了当地告诉病人检查结果，对于病人来说也许不是最好的。只报告检查结果却不给病人提供战胜疾病的办法，是给人痛苦的一种方法。不过，在这些之上，这个故事讲述的最深刻的道理，就是如果没有疗法，诊断就是没用的。那个女人认为自己是没病的。假设她还能有五年毫不知情的高兴日子，告诉她说那之后她要面对的是精神错乱一点好处也不会有。
如果一个人眼看着自己母亲死于亨廷顿氏病，她就会知道，她本人有50％的可能会染上这种病。但是这是不对的，是不是？没有人能够得这种病的50％。她要么100％地有病，要么100％地没病，这两种情况各有50％的可能。遗传测试所能做的，只是检测风险，然后告诉她，表面上的50％，对她来说是100％还是0％。
南希?韦克斯勒担心科学现在站在一个像第比斯（Thebes）的盲人预言家特瑞西阿斯（Tiresias）那样的位置上。特瑞西阿斯偶然看见了雅典娜（Athena）洗澡，于是她刺瞎了他的眼睛。后来雅典娜又后悔了，但是因为没有办法恢复他的视觉，她就给了他预言未来的能力。然而，具有看到未来的能力是一种可怕的命运，因为他可以看到未来，却无法改变未来。“具有智慧，却不能从这智慧获益，那么剩下的只有悲哀”，特瑞西阿斯对俄狄浦斯（Oedipus）说。或者如南希?韦克斯勒所说：“你想知道你什么时候死吗？尤其是在你没有力量改变你的命运的时候？”从1986年以来，很多有可能患亨廷顿氏病的人可以通过检查来确定自己是否带有致病突变，但是他们选择了不去知道。只有20％左右的人选择了去做检查。有点奇怪却又可以理解的是，男人选择不去做检查的是女人的三倍。男人为自己想的多，女人为孩子想的多。
就算是那些有得病风险的人想确知自己是否有病，这里包含的伦理也很错综复杂。如果一个家庭里有一个人去做检查，他或她实际上是在替整个家庭做检查。很多父母自己并不愿去检查，但是为了孩子的缘故还是去了。而且，就算在教科书上和医学知识小手册里，对于亨廷顿氏病的错误理解也到处都是。有一个小册子告诉带有致病突变的父母说：你们的孩子有一半会得病。这是不对的：每一个孩子得病的机会是50％。跟一半孩子得病是完全不同的两码事。检查结果如何告诉受检查者也是一个敏感度极高的问题。心理学家发现，如果告诉一个人他的孩子有四分之三的可能性是健康的，而不是有四分之一的可能性有病，他通常会觉得更宽心，虽然这两种说法是一回事。
亨廷顿氏病是遗传的一个极端。它是纯粹的宿命论，一点不受环境因素的影响。好的生活方式、好的医疗条件、健康的饮食习惯、相亲相爱的家庭、大把的钱，都于事无补。你的命运完全在你的基因里。就像奥古斯丁教派所说的，你上天堂是因为上帝的仁慈，不是因为你做了好事。它提醒我们，基因组这部伟大的书或许会告诉我们最灰暗的一些关于我们自己的知识：关于我们的命运的知识，不是那种我们可以改变的命运，而是特瑞西阿斯那样的命运。
但是，南希?韦克斯勒对寻找致病基因如此着迷，是因为她的愿望是在找到基因之后修复它从而治愈疾病。她现在离这个目标比起十年前无疑是近多了。“我是个乐观的人，”她写道：“尽管我知道，当我们处于一个能够诊断疾病却无法治疗它的阶段的时候，我们会很痛苦，但我仍然相信关于疾病的知识最终是值得我们为之痛苦的。”
南希?韦克斯勒自己怎样了？80年代晚期，她和姐姐爱丽丝（Alice）与父亲米尔顿曾好几次坐下来商量是否要去做检查。这些争论气氛紧张、言辞激烈、结论不明确。米尔顿反对去做检查，理由是检查的结果并不是百分之百准确，可能会出现误诊。南希本来已经下了决心要去接受检查的，但是她患病的可能是客观存在的，在现实面前她的决心慢慢“蒸发”了。爱丽丝在日记里记录了这些争论，后来写成了一本书：《探索命运》。最终，两姐妹都没有去做检查。南希现在跟她母亲被确诊的时候是同一年龄。
第六号染色体智慧
遗传论者的错误，并不在于他们认为智商在某种程度上是受遗传影响的，而是他们把“遗传”与“不可避免”等同起来了。 ——斯蒂芬·杰·古尔德
我一直在误导你们，而且一直在破坏我自己定的规矩。作为惩罚，我应该把下面这句话写100遍：
基因的存在不是为了致病的。
即使一个基因在坏了的时候会致病，大多数我们体内的基因都没有坏，它们只是在不同的人体内有一些区别。蓝眼珠基因不是坏了的棕眼珠基因，红头发基因也不是坏了的棕头发基因。用术语来说，它们是等位基因——是遗传信息中同一个段落的不同形式，对于环境有同样的适应性，都是“合法”存在的。它们都是正常的，正常的基因有不止一种形式。
好了，该停止东一下西一下地拨拉那些杂草了，到了集中精力对付那丛最枝蔓缠绕的灌木的时候了，到了对付基因森林里最粗壮、最扎人、最密不透风的那丛荆棘的时候了。这就是智力的遗传性。
这丛荆棘最有可能存在于第六号染色体上。1997年末，一个大胆的（也可能是傻大胆）科学家向全世界宣布说，他在六号染色体上找到了一个“决定智力的基因”。这确实需要勇气，因为不管他的证据多么有力，很多人根本就不相信“决定智力的基因”这种东西有可能存在，更别说相信它们真的存在了。他们之所以怀疑，不仅仅是因为在过去的几十年里这方面的科研被政治化，任何提及智力的遗传因素的人都会被人“另眼相看”，也是因为大量的生活常识说明智力有非遗传因素。自然母亲可不放心让一个或几个基因去盲目地决定我们的智力，她给了我们父母、学习、语言、文化、教育，让我们通过这些去塑造我们的智力。
但是，罗伯特·普洛民（Robert Plomin）宣布，他和他的实验伙伴们在智力的遗传性方面做出了一项重大发现。每年夏天，都有从全美国挑选出来的一组智力超常的孩子到爱荷华州去度夏令营。这些孩子的年龄在12～14岁之间，他们之所以被选中是因为他们在做学校作业的过程中表现出相当高的天分。在去夏令营的五年之前他们接受智商测试的时候，得到的分数比99％的人都高，他们的智商高达160以上。普洛民的研究小组认为，如果有一些基因能够对人的智力有影响，那么这些孩子一定拥有这些基因的最佳形式。他们取了所有这些孩子的血样，开始用第六号染色体DNA的片段做“鱼饵”来钓智力基因这条“鱼”。（他选择了第六号染色体，是因为他从以前的研究中得到了些启发。）渐渐地，他发现这些孩子的第六号染色体长臂上有一小段DNA序列往往跟普通人的不一样。并不是每一个聪明孩子在那个地方的DNA序列都与普通人不同，但是带有这个不同序列的孩子多得足以引起研究人员的注意。这个序列位于一个名叫IGF2R基因的中间。
智商的历史并不让人乐观。在科学史上的所有争论里，很少有像关于智慧的争论那样充满着愚蠢意见的了。我们中的很多人，也包括我自己，是带着不信任和偏见来谈这个话题的。我不知道我的智商是多少。我上学的时候测过智商，但是从来没人告诉过我我的分数。因为我当时没意识到那个测验是有时间限制的，所以我没抓紧时间做题，结果是我只来得及答了一小部分题，分数应该高不了。当然话说回来，我没意识到测验是有时间限制的，这本身就不像是聪明人干的事。这个事件让我对用一个数字来衡量人的智力这种十分粗糙的做法失去了敬意。想在半小时之内测量出智力这么一件复杂的事，在我看来很荒唐。
事实上，最早的智商测验出发点就带着偏见。弗兰西斯·高尔顿_{（Francis Galton，19世纪和20世纪初探险家、人类学家、优化人种论者）}最早开创了用研究孪生子来把先天能力和后天能力分开的办法，他一点不隐瞒他这样做的原因：
我的目的是要记录不同的人之间不同的由遗传而得到的能力，家族和种族之间的不同，以了解人类历史允许我们在多大程度上用更优良的人种去代替那些不够优秀的人种，以思考用适当方法来完成此举是否是我们的义务。惟其如此，我们才能够更快地推进进化的过程，避免因为把进化完全交到自然进程的手中而引起的焦虑。
换句话说，他想把人当成牛那样有选择地繁殖。
但是，智商测试在美国才变得真正丑陋起来。H.H. 戈达德（Goddard，心理学家）把法国人阿尔弗雷德·比内（Alfred Binet）发明的智商测试题搬到美国来，让美国人和未来的美国人接受测试。他满不在乎地总结道：很多从国外来到美国的移民不仅仅是“白痴”，而且训练有素的专家一眼就能把这些“白痴”辨别出来。他的智商测验主观得可笑，而且题目对中产阶级和受过西方文化熏陶的人有利。有几个来自波兰的犹太人知道网球场的正中间有一个网子？他一点都不怀疑智慧是天生的：“每一个人头脑的能力和智力水平都是由精、卵细胞融合的时候染色体的结合而形成的。这之后，任何因素都不会对其有什么影响，除非是严重事故破坏了染色体。”
持有这种观点的戈达德明摆着是不正常。但是他在政府决策过程中施展了足够大的影响，以至于他被批准去测试那些刚刚到达爱丽丝岛_{（美国纽约市附近的一个小岛，过去从欧洲乘船到达美国的移民要先在该岛办理入境、检疫等手续）}的移民。他之后还有些人比他还极端。第一次世界大战期间，罗伯特·亚尔克斯_{（Robert Yerkes，比较心理学家）}说服了美国陆军让他给招募来的上百万新兵进行智商测验。尽管陆军根本没有太在乎这些测验的结果，这些测验却给亚尔克斯和其他人提供了发表意见的讲坛和数据。他们的意见是，智商测验可能有很高的商业价值，对国家也很重要，因为它能够轻易快捷地把人分类。在陆军里进行的这些测验，最终影响了国会，于1924年通过了一个限制移民法案。这个法案严格限制了来自南欧和东欧的移民人数，根据是这些地区的人比在1890年以前占了美国人口大多数的北欧移民要笨。这个法案的目的跟科学没有一点关系。它更多地反映了种族的歧视与工会的地方保护主义。但是，它在智力测验这个伪科学中找到了借口。
优化人种论的故事我要留到后面一个章节再讲，但是一点不奇怪的是，智力测验的历史背景使得大多数学者，特别是社会学者，对任何与智商测验有关的东西都有很重的不信任感。在第二次世界大战之前，钟摆摆向了与种族歧视和优化人种论相反的方向，那时候，智慧的遗传性简直成了一个禁忌。亚尔克斯和戈达德那样的人把环境对人的能力的影响忽视得如此彻底，他们居然用英文试卷来考非英语国家的人，用需要书写的试卷考那些文盲——这些人在接受考试的时候才第一次拿起笔来。他们对于遗传性的相信是如此一厢情愿，以至于后来的批评者们都认为他们的主张一点儿根据都没有。归根结底，人类是具有学习能力的。他们的智商可以受他们所受的教育的影响，所以，心理学也许应该假设智慧没有任何遗传成分：智慧完全是学习的结果。
科学应该是通过建立假说然后试图证伪它们而前进的。但是有时候事情并不如此。20年代的基因决定论者们总是在寻找能够证明他们观点的证据，从来不去寻找可能证伪他们观点的证据，60年代的环境决定论者们也同样总是在寻找能够证明他们观点的证据，对于相反的证据，他们本应是去积极寻找，但他们却对这些证据视而不见。与常识相违的是，在智力遗传性研究这个科学的一角里，“专家”们总是比外行犯更大的错误。普通人一直就知道教育非常重要，但他们同时也一直相信人的内在能力是有差异的。反而是“专家”们荒唐地在走极端。
没有一个智慧的定义是被普遍接受的。智慧的标志是思考的速度，还是推理的能力？是记忆力、词汇量、心算能力？是进行智力活动时精力旺盛？还是仅仅是一个人对于智力活动的追求？聪明人在某些事上可以是惊人地笨——知识面是否广、有没有心计、是否能不撞到路灯柱上，如此等等。一个在学校里成绩很差的足球运动员也许可以在瞬息之间抓住机会作一记妙传。音乐能力、语言能力、揣测别人心理的能力都是能力，但一个人不一定同时具有这些能力。霍华德?加德纳（HowardGardner）（当代美国教育学家、心理学家。——译者注）卖力地提倡过一个理论，把智慧分成许多种，每一种天赋都是一种独立存在的能力。罗伯特?斯滕伯格（RobertSternberg）（当代美国哲学家。——译者注）则提出，智慧可以分成三类：分析能力、创造力、实践能力。需要分析能力的问题是由别人提出的，问题界定得很清楚，解决问题的所有信息都已存在，只有一个正确答案，跟日常生活经验没有关系。说简单一点，就好像是学校里的考试。实际问题则要求你把问题认识清楚、表达出来。这种问题常常没有清楚的定义，缺少一些解决问题所需的信息，不一定只有一个答案，但与日常生活直接有关。巴西街头的孩子们可能有些在学校里数学不及格，但他们在日常生活所需要的数学面前却不比谁傻。对于职业赛马手来说，用智商来预测他们的成绩，结果是很不准的。如果用需要动手的模型来测智商，一些赞比亚儿童成绩会非常好，用纸和笔来测，他们的成绩就会很糟。英国孩子则正相反。
学校注重的是分析能力，智商测验也是如此，这几乎成了定义。智商测验不论在形式和题目上有多么大的区别，它们总是青睐具有某一种特定思维的人。不过，它们还是能测出一些东西。如果你比较人们在各种智商测验里的成绩，你会发现它们有一定的一致性。统计学家查尔斯·斯皮尔曼（Charles Spearman）在1904年首先发现了这一现象，一个孩子如果在一科测验里得到高分，在其他科目里也容易得高分，各种智力能力不是独立存在，而是互相关联的。斯皮尔曼把这称为广义智慧，或者简称为g。有一些统计学家提出，g只是统计上的一种托词，只是测量人在考试中的表现的诸多方法中的一种。另外一些人则认为g是民间流传的说法的一种直接量化：在谁聪明谁不聪明这个问题上，多数人的意见往往都是一致的。不管怎么说，g无疑是管用的。在预测一个孩子今后在学校里成绩如何方面，它比几乎其他任何测量方法都准确。在g是否客观存在方面，也确有一些真正客观的证据：人们在做需要检索和找出信息的任务时，他们完成任务的速度与智商是成正比的。广义智慧在人的不同年龄惊人地一致：在6岁到18岁之间，你的智慧当然是在快速增长，但是相对于你的同龄人来说，你的智慧却是几乎不变的。事实上，一个婴儿适应一种新的感官刺激所需的时间跟他今后的智商有很强的关联。就好像是说，如果能够对一个婴儿将来受的教育有一定估计，我们就能在一个几个月大的婴儿身上看出他将来的智商。智商分数与在学校里的考试成绩有很强的关联。高智商的孩子好像能更多地吸收学校里教的那些东西。
所有这些不是要肯定教育无用论：学校与学校之间、国家与国家之间学生在数学和其他学科上平均成绩的巨大差异，显示出教育能够取得多大的成就。“智慧基因”不是在真空里运作的，智慧需要环境刺激才能发育。
现在，就让我们接受这个一看就有点愚蠢的关于智慧的定义：智慧就是几种智商测验的平均得分——“g”——然后看看这个定义把我们领向何处。智商测验以前很不精确，现在也离完美很远，还谈不上真正客观，因此，各种测试的结果比较一致，就更显得不可思议了。如果智商与某些基因的联系透过被马克·菲尔波特（Mark Philpott，哲学家）称为“不完美的测试之雾”都能够显现出来，那就更说明智力有很强的遗传因素。另外，现代的测试已经有了很大改进，客观性更强，也更不会使受试人的成绩因文化背景和是否懂得某种专门知识而受到影响。
在20年代，以优化人种为目的的智商测试达到高峰，当时关于智力的遗传性还没有任何证据，它只是对人们进行智商测试的那些专家们的假设。现在已经不同了。智商（先不说智商到底是什么）的遗传性已经在两种人里检验过了：孪生子和被领养的孩子。不管你怎么看，研究结果都叫人吃惊。在什么决定了智力这个题目上，所有研究都发现，遗传占有相当重要的地位。
在60年代的时候有一个时尚，就是把孪生子从一出生就分开，特别是如果想让别人领养他们的时候。在很多情况下，人们这样做时并没有多想，但是有些人是故意这样做的，他们的动机是科研：去检验并希望能够证实当时占主导地位的理论——人的个性取决于孩童时期的养育方式和环境，与遗传无关。最著名的例子是纽约的两个女孩，贝丝和爱咪，她们一出生就被一个极富好奇心的弗洛伊德学派心理学家分开了。爱咪的养母是个很穷的人，很胖，没有安全感，没有爱心，所以一点不奇怪地，爱咪长大之后成了一个神经质的、内向的人。这正符合弗洛伊德理论的预言。但是，贝丝也成为了这样的人，跟爱咪一模一样，尽管她的养母富有、安详、愉快而有爱心。20年之后当贝丝和爱咪重新见面的时候，她们二人性格上的差别小得简直看不出来。对于她们二人的研究，远没有证明养育在塑造我们性格方面的重要性，相反地，它证明了天性的力量。
研究被分离开的孪生子，最初是由环境决定论者开始的。但是后来他们的对手也开始用这一方法，代表人物之一，是明尼苏达大学的托马斯?布沙尔（ThomasBouchard）。从1979年开始，他在世界各地寻找那些被分离开了的孪生子，并利用测试他们的个性与智商的机会让他们团聚。同时，其他的研究则注重于比较被收养的人与他们的养父母、亲生父母、同胞手足之间智力的差异。把所有这些研究放在一起，把成千上万人的智商测验结果集中起来，就得到了以下这个表。每一个数字都是一个百分比，代表的是两种人的智力之间的相关性，百分之百的相关性意味着两人智力完全一样，百分之零意味着两个人的智力完全无关。
同一个人接受两次智商测验87
在一起长大的同卵双生子86
从小被分离开的同卵双生子76
在一起长大的异卵双生子55
同胞兄弟姐妹47
父母与子女（生活在一起）40
父母与子女（没有在一起生活过）31
亲生父母不同却被同一个家庭收养的孩子0
没有血缘关系又不住在一起的人0
毫不奇怪地，相关性最大的是在一起长大的同卵双生子。他们有共同的基因、在共同的子宫里被孕育、生活在同一个家庭里，他们智商的区别与一个人做了两次智商测验的区别一样。异卵双生子虽然是在共同的子宫里被孕育，他们的基因却并不比两个普通兄弟的基因更相似，但是他们的相关系数比两个普通兄弟更相似，说明胚胎在子宫里的经历或者是孩子最初经历的家庭生活有一点点作用。但是让人目瞪口呆的结果，是那些有不同的亲生父母却被同一个家庭收养、一起成长的孩子，他们的智商分数之间的相关性是零。住在同一个家庭里对于智商一点影响也没有。
子宫的重要性是最近才被人们认识到的。有一项研究表明，孪生子在智力方面的相似性，有20％可以归结到子宫环境上，而对于两个非孪生的兄弟姐妹来说，子宫环境对智力的影响只占5％。区别在于，孪生子不仅是在同一个子宫里被孕育，而且是在同一时间；非孪生的孩子则不是。子宫里发生的各种事件与变化对于我们智力的影响，比我们出生之后父母对我们的教育所起的作用高两倍。所以，即便是智力中可以归结到“后天”因素而不是先天因素的那一部分，也是一种早已成为过去、不可更改的后天因素。但是另一方面，属于先天因素的那些基因，直到青少年时期都在表达。所以，是先天因素，而不是后天因素，要求我们不要在一个孩子很小的时候就对他的智力下定论。
这真是怪异之极。它简直是对基本常识的挑战：我们孩童时期读过的书、家庭成员间的对话，肯定对我们的智力有影响吧？没错，但问题不在这里。因为遗传因素可以决定在一个家庭里父母和孩子都喜欢智力活动。除了研究孪生子和被领养的孩子之外，还没有任何其他研究试图把父母的遗传与父母的教育对智力的影响分开。对孪生子和被领养的孩子的研究，在目前清楚地有利于这样一个观点：父母与孩子智力水平之间的相似性是由遗传因素决定的。对孪生子和被领养的孩子的研究当然可能会误导，因为毕竟这些研究只局限于某一类家庭。他们主要是白人中产阶层的家庭，极少有黑人或穷人家庭被列入研究之列。在美国的白人中产家庭里，如果读书范围和谈话内容大同小异，也并不是什么奇怪的事。有一项研究的对象是那些被另一种族的家庭收养的孩子，在这项研究里，人们发现孩子的智商与他们的养父母的智商有一点相关（19％）。
但是这仍然是很小的效应。所有这些研究得到的一致结论，是你的智商大约有一半是由遗传决定的，不到五分之一是由你和你的兄弟姐妹们共同的环境——家庭——决定的。剩下的是子宫的作用、学校的作用和其他外部影响，比如朋友的影响。即使是这个结论，也有点误导。你的智商不仅随年龄会有变化，遗传因素对它的影响也会变化。现在你长大了，积累了很多经验，遗传对你智力的影响也增加了。你会说：什么？是减小吧？不是的。在儿童时期，遗传对智商的影响占51％。在你长大的过程中，你内在的智力逐渐显露出来，其他因素对你智力的影响渐渐消失。你会选择与你的能力、喜好相符的环境，而不是调整你的能力、喜好去适应环境。这证明了两个至关重要的观点：基因的影响并不是从精子和卵子结合的时候起就固定不变了，环境的影响并不是一直不断地累积的。智力是遗传的不等于它是不变的。
在先天还是后天这个漫长的争论刚开始的时候，弗兰西斯?高尔顿用了一个也许很贴切的比喻：“很多人都这样取乐过：把小树枝扔进溪流中，观察它们随水流走的过程，观察它们怎样停止运动。一开始遇到偶然的一个小障碍，然后又是一个；以及它们的前进又是怎样被环境里的因素加速的。我们可能会认为这些因素每一个都对小树枝起了作用，认为小树枝的最终命运就是被这些微不足道的事件左右的。但是不管怎样，所有的树枝都成功地顺水流下去了，而且速度总体来说都差不多。”所以，有证据表明，让孩子接受更好的、高强度的教育确实可以戏剧般地提高孩子的智商，但这只是暂时的。小学毕业的时候，那些曾经在“好的开端”（HeadStart，是美国运行着的一个对幼儿实行早期教育的计划。——译者注）这样的早期教育班里受过教育的孩子，与其他没有上过这些班的孩子已经没有区别了。
对于这些研究的一种批评，是它们都只研究了社会里一个阶层的家庭，因此把遗传的作用放大了一些。如果你同意这样的批评，那随之而来的就是：在一个公平的社会里，遗传的作用比在一个不公平的社会里大。事实上，一个完美的英才社会的定义，就是一个人的成就取决于他们的基因的社会，因为所有人都有同样好的环境。在身高方面，我们已经在飞速地接近这样一个状态了：在过去，营养不良使得很多孩子在长大之后没有达到他们应该达到的“遗传”高度。今天，随着儿童期营养的普遍提高，个体之间身高的差异主要是由遗传原因决定的。所以，我猜想遗传在身高方面的决定作用是在增加的。同样的说法还不能用在智力这一方面，因为在有些社会里，环境的变量——例如学校质量、家庭习惯、财富——是在变得越来越不平等，而不是越来越平等。不过不管怎么说，在公平社会里基因的作用反而更大，这听起来像个悖论。
对于智力遗传因素的这些估量，只适用于解释个体间的差异，却不能用来解释群体间的差异。虽然遗传对于智力的影响在不同人群和种族里不一定一样，但事实证明它的影响是一样的。但是，如果因为两个个体之间智商的差异有50％是因为遗传因素，就得出结论，认为黑人的平均智商与白人的平均智商间的差异或白人与亚洲人平均智商的差异是由遗传决定的，那就犯了逻辑错误。其实，这种结论不仅在逻辑上有错误，到目前为止，与事实依据也是不符的。这样，支撑最近的一本书《钟形曲线》（20世纪90年代在美国出版的心理学书籍，探讨种族之间智力的区别及其原因。它宣称不同人种智力上的差异是由遗传决定的，有些种族的基因比其他种族优秀，因此出版之后在美国引起轩然大波。批评者认为该书曲解科学研究的成果，宣扬种族主义。——译者注）里那些观点的“栋梁”就倒塌了。黑人与白人的平均智商确有区别，但没有任何证据表明这个差异是遗传的。事实上，跨种族领养儿童的一些例子，说明在白人家庭里长大的黑人孩子，其智力与一般白人并无区别。
如果对于一个个体来说，智力有50％是遗传的，那么肯定有些基因对智力有影响。但是我们现在说不出有多少这样的基因。关于这些基因，我们现在所能说的只是：有一些基因是可变的，也就是说，它们在不同的个体里可以以不同形式存在。遗传性与决定论是非常不同的事情。对于智力影响最大的基因完全有可能在个体之间是不变的，在这种情况下，这些基因就不会导致个体差异。比如说，我每只手上有五个手指头，大多数人也是如此。原因是我们得到的遗传配方里写明了要有五个手指头。如果我走遍世界去找只有四个手指的人，那么我找到的人里，95％以上都是因为意外事故失去了一个手指头的。我会发现，有四个手指不是因为遗传因素，几乎在所有情况下四个手指都是因为环境原因造成的。但是这不说明基因不决定手指的数量。一个基因可以决定不同的个体拥有同样的身体特征，正如基因也可以决定不同的人有不同的身体特征。罗伯特?普洛民寻找智商基因的“大海捞针行动”，只会找到那些可以用多种形式存在的基因，却找不到那些在所有个体里都一样的基因。这样，他们可能会找不到一些决定智力的关键基因。
普洛民的第一个基因标识，六号染色体长臂上的IGF2R基因，乍看上去可不像是一个“智力”基因。在普洛民把它和智力联系在一起之前，它出名是因为它与肝癌有关。以前，它可能被称为“肝癌基因”，这显示了用一个基因可能导致什么病来命名这个基因的错误之处。将来我们总会决定：这个基因抑制癌症的功能和它对智力的影响哪个是主要的功能哪个只是“副作用”。当然了，也可能这两种功能都是副作用。由这个基因编码的蛋白质有着如此枯燥的功能，让人真怀疑是否有什么神秘之处我们还没发现。它的功能是“细胞内的传输”：把磷酸化了的溶酶体酶从高尔基体（存在于细胞内的小体，对于蛋白质和脂类分子的加工和分类在其中进行。——译者注）运到细胞表面的溶酶体中去。它是个分子水平的送货车。在有关它的功能方面，没有一个字提到脑电波之类的事。
IGF2R是个异常庞大的基因，总共有7473个字母，但是在编码蛋白质方面有意义的那些字母分布在基因组里由9.8万个字母所组成的一段上，中间被没有意义的字母（称为内含子）打断过48次。就好像杂志上一篇文章被广告打断了48次，怪烦人的。在这个基因的中间，有一些重复的片段，它们的长度容易变化，也许会在人与人之间智力水平的不同方面起作用。因为这个基因看起来跟胰岛素之类的蛋白质及与糖的分解隐约有些关系，所以要提一下，另外一项研究发现，智商高的人的大脑利用起葡萄糖来“效率”更高。在学着玩一个名叫“叠四块”的电脑游戏的时候，高智商的人与低智商的人相比，一旦熟练之后，大脑对葡萄糖的需要量降低得更快。但是这有点像是在抓救命稻草的样子。普洛民的这个基因如果被证明真的与智力有关，也只会成为许许多多的能够用各种不同方式影响智力的基因中的一个。
普洛民的发现，最重要的价值在于：人们可以声明，研究孪生子和被领养的孩子是太过间接的方法，不足以证明遗传因素对于智力的影响，但是面对一个随着智力水平高低而变化的基因，人们很难提出有力的反对意见。这个基因的一种形式，在爱荷华那些智力超常的孩子体内比在一般人体内多一倍，这极不可能只是偶然现象。但是它的作用肯定很小：平均来说，这个基因的这种形式只会给你的智商加4分。这就有力地说明这个基因不是什么“天才基因”。普洛民暗示过，他对爱荷华那些智力超常孩子的研究还发现了至少10个“智力基因”。但是，遗传决定智商这一说法在重新受到科学界尊重的同时，在很多角落引起的却是疑惑和惊讶。它仿佛让人们看到了二三十年代使科学声誉受损的优化人种论的幽灵。斯蒂芬?杰?古尔德就是一个对过分强调基因作用持严厉批评的人，他说过：“一个部分由遗传因素而形成的低智商者，通过适当的教育可能会有极大的改进，也可能不会。低智商是由遗传因素造成的这一点，并不足以让我们对这个人的智力下定论。”这是对的，但同时这也是麻烦所在。并不是说人们一看到遗传的作用就不可避免地都成了宿命论者。导致“阅读困难症”（一种因为神经系统原因而出现的学习困难，患儿智力正常，但无法准确、流利地识别字词，常有拼写、阅读困难。——译者注）的基因突变被发现之后，老师们并没有认为这种病没救因而放弃有病的孩子。相反，这个发现促使老师们用特殊的教学方法去教有病的孩子。
事实上，智商测试方面最著名的先驱者法国人阿尔弗雷德?比内强烈地提倡过：智商测试的目的不是为了奖励那些有天赋的孩子，而是为了更多关注那些没有天赋的孩子。普洛民却认为自己就是一个从智商测试中受益的最好例子。在芝加哥地区他们这个大家族里，他这一代的32个孩子中，他是惟一一个上过大学的。他的好运源自于他在一次智商测试里的高分数，正是这个分数促使他的父母送他进了一所强调学业的学校。美国对这类测试的热衷与英国对它的厌恶形成鲜明的对照。短命而名誉不好的“11岁以上”考试，是英国惟一存在过的一个所有11岁以上学生都必须参加的智商测试。它的依据是西里尔·伯特（Cyril Burt，心理学家）的研究数据（有可能还是伪造的）。在英国，“11岁以上”考试在人们的记忆里是灾难性的，它把有些智力很好的孩子打入了质量不好的学校。但是在以“英才社会”自居的美国，类似测试却是那些有天赋的穷人孩子在学术上取得成就所需的敲门砖。
也许，智商的遗传性暗示了一些完全不同的东西，这些东西一次性地证明了，高尔顿试图把先天与后天因素区别开来的努力，从观念上就错了。想想这么一个看上去愚蠢的事实：平均来讲，智商高的人比智商低的人耳朵更对称。智商高的人整个身体都更对称：脚的宽度、膝盖的宽度、手指长度、手腕宽度以及手肘宽度都与智商有关联。
90年代早期，对于身体对称性的兴趣又复活了。这是因为它可以揭示出发育早期的一些奥秘。身体的有些不对称性，在人群里是很有规律的。例如，在大多数人体内，心脏位于胸腔的左侧。但是，另外一些不那么明显的不对称性，却是比较随机的，哪边都可以。有些人的左耳比右耳大，另外一些人却刚好相反。这种被称为“起伏性不对称”的不对称性，它的程度，是对于身体在发育过程中受到了多少压力的很敏感的量度：感染、毒素和营养不良造成的压力。智商高的人身体更对称这一事实，说明这些人在母亲子宫里和在童年时期身体所受压力比较小。也许他们的身体有更高的抗压性。这种抗压性也许也是有遗传因素的。这样，智商的遗传性也许并不是由“智慧基因”直接决定的，而是由那些抗毒素、抗感染的基因间接决定的。也就是说，是由那些与环境相互作用的基因决定的。你遗传到的不是一个高智商，而是在某种特定环境下发展出高智商的能力。如果是这样，那么怎么能把影响智力的因素分成遗传因素和后天因素呢？明明白白地是不可能的。
支持这个理论的证据来自于所谓的“弗林效应”。詹姆斯?弗林（JamesFlynn）是一个在新西兰工作的政治学家，在80年代，他注意到这样一个现象：在世界各国，一直以来人们的智商都是在增长的，大约每十年增长三点。原因却很难确定。也许原因与身高的增长是一样的：童年时期营养的提高。危地马拉有两个村庄在几年里一直得到由外界援助的大量蛋白质补剂，十年之后再去测试，发现儿童的智商有了显著的提高，这是弗林效应在局部地区的表现。但是，在营养充分的西方国家里，人们的智商仍然是在迅速提高的。学校跟这个现象也没有什么关系。因为中断学校教育只会给人的智商带来暂时影响，而且，分数迅速上升的那些测试项目，恰好测的是学校里不教的东西。分数上升最快的，是那些测试抽象推理能力的项目。一位科学家，乌瑞克?耐瑟（UlricNeisser），（当代美国认知心理学家。——译者注）相信弗林效应的原因是当今社会日常生活中充斥着高强度的、复杂的视觉图案：动画片、广告、电影、海报、图像和其他光学显示，而这些是以书面语言的减少为代价的。儿童的视觉环境比以前丰富得多，这有助于培养他们解决视觉方面的智力测试题，而这正是智商测试里最常见的题型。
但是，这个环境因素乍看起来很难跟研究孪生子所得出的智商高遗传性的结论调和起来。就像弗林本人说的，50年来人们的智商平均增加了15点，要么是因为50年代的人好多是傻子，要么是因为现在的人好多是天才。因为我们并没有处在文化复兴的时期，所以他认为，智商测试并没有测到人的内在能力。但是，如果耐瑟是对的，那么当今世界环境只不过是一个有利于人们发展智力中的一种——对于视觉符号的娴熟——的环境。这对于“g”这个提法（智力是综合能力）是个打击，但并没有否定不同种类的智力是有遗传性的。在200万年的人类文化之间，我们的祖先传下来了通过学习才能掌握的各地不同的习俗，人脑也许已经通过自然选择学会了怎样发现和掌握在自己本地文化里重要的技能，以及自己能够掌握得比较好的技能。一个孩子所处的环境不仅与他的基因有关，也与外界因素有关，而一个孩子有能力找到甚至创造出适合自己的环境。一个有动手能力的孩子会学习需要动手的技能；一个书虫子会去找书。也许，基因创造的是一种欲望，而不是一种能力？不管怎么说，近视的遗传性不仅来自于眼球形状的遗传性，也来自读书习惯的遗传性。因此，智力的遗传性也许不仅仅是先天能力的遗传，也是后天因素的遗传。对于由高尔顿发起的这一世纪以来关于智力遗传性的争论，这真是个内容充实又令人满意的结局。
第七号染色体本能
人类本性这张“白纸”从来就不是白的。 ——W •D •汉密尔顿（W. D. Hamilton，生物学家）
没有人怀疑基因能够影响身体结构。基因能够影响行为这个说法，却不那么容易让人接受。但是，我希望能够说服你，在第七号染色体上有一个基因，它的一个重要作用是使人拥有一种本會巨，一种在所有人类文化里都占有中心地位的本能。
本能是一个用在动物身上的词：三文鱼会寻找它出生的那条溪流；黄蜂会做它那早已去世的父母做过的事；燕子迁移到南方过冬。这些都是本能。人类不需要对本能有太多的依靠，他们学习，他们是有创造力的、生活在文化环境里的、有意识的生物。他们做的每一件事都是自由意志、巨大的脑子和父母教育的产物。
在20世纪里，心理学和其他社会科学里占主导地位的说法就是这样的。如果谁不这样想，谁相信人类行为有其内在性，那就等于是掉进了决定论的陷阱，就等于在一个人出生之前就给了他一个无情的宣判：他的命运是写在他的基因里的。其实，社会科学发明了很多比基因决定论更让人心惊的决定论：弗洛伊德的父母决定论、马克思的社会经济决定论、列宁的政治决定论、弗朗兹•博厄斯（Franz Boas）与玛格丽特W•D•Hamilton：20世纪英国著名生物学家。米德（Margaret Mead）的同龄人压力文化决定论、约翰•沃森和B •F •斯金纳（Skinner）的刺激一反应决定论、爱德华•萨皮尔（Edward Sapir）和本杰明•沃夫（Benjamin Whorf）的语言决定论。（弗朗兹•博厄斯：19世纪末20世纪初著名人类学家，生于德国，后来在美国从事研究工作；玛格丽特•米德是他的学生，也是20世纪初美国著名人类学家。约翰•沃森和B•F•斯金纳都是20世纪上半叶美国著名心理学家。爱德华•萨皮尔生于德国，后移民到美国，与本杰明•沃夫同为20世纪上半叶美国著名语言学家。）在一个世纪里，社会学家们告诉具有各种不同思想的人：说生物学因素决定行为就是决定论，而说环境决定行为就不违反人有自由意志的说法；动物有本能，人类则没有。这是历史上规模最大的误导行为之一。
从1950年到1990年，环境因素决定论这座大厦轰然倒塌了。20年的精神分析法都没有能够治好的狂郁症，用一剂锂疗法就治好了，弗洛伊德的理论也就在那一时刻衰落了。（1995年，一位妇女状告她的心理医生，因为这位医生给她进行了三年多心理治疗都没有治好的病，在她服用了三个星期的百忧解（一种治疗抑郁症的药物。）之后就痊愈了。）德里克•弗里德曼（Derek Freeman）（德里克•弗里德曼：当代澳大利亚人类学家。）发现，玛格丽特•米德的理论（少年的行为有无限的可塑性，可以被文化任意塑造）是建造在主观偏见、不充分的材料以及她的那些少年研究对象恶作剧故意撒谎的基础上的。这样，文化决定论也破灭了。行为主义的破产，则源于1950年在威斯康辛州所做的一个著名实验。在这个实验里，尽管失去了妈妈的猴子幼婴只有从一个铁丝做的猴妈妈那里才能得到食物，它们仍然建立了对布做的猴妈妈的情感依恋。这违反了这样一个理论：我们哺乳动物能够对任何给我们食物的东西都建立感情。看来，喜欢柔软的母亲的感觉也许是天生的。
在语言学里，大厦出现第一个裂缝是在诺姆•乔姆斯基_{（Noam Chomsky，语言学家）}发表了《句法结构》一书的时候。在这本书里他阐述说，人类语言，人类行为里最有文化特征的一种行为，与文化的关系和与本能的关系一样强。乔姆斯基重新提出了一个关于语言的旧观点，亦即达尔文描述过的“掌握一种艺术的本能倾向”早期心理学家威廉*詹姆斯（Wiiliam James），小说家亨利•詹姆斯（Henry James）的兄弟，强烈地支持这样一个观点：人类的行为表明，人类比动物有更多种的本能，而不是更少。但是他的说法在20世纪的大部分时间里被忽视了。是乔姆斯基把这些理论重新发掘出来。
通过研究人们说话的方式，乔姆斯基得到结论，认为在所有语言之间都存在着内在的一致性，因此说明人类语言存在一种共同的语法。我们都知道怎样用这种语法，但我们对我们的这种能力并无知觉。这就意味着我们的大脑里有一部分由于基因的原因先天就有特殊的、学习语言的能力。说白一些，词汇不可能是天生的，否则我们都会说同一种没有变化的语言。但是，也许一个孩子在学习了本地社会所用的词汇之后，把它们扔进了一套天生的、内在的规则里去了。乔姆斯基的证据是语言学方面的：他发现，我们说话的时候有一种规律，既不可能是父母教的，也不可能是轻易地从日常生活中倾听别人说话的过程中学会的。例如，在英文里，把一句话变成一个问题，我们得把主要动词放到句子的最前面去。可是我们怎么知道哪个动词应该被放到最前面呢？看一看这句话：“A unicorn that is eating a flower is in the garden”（花园里有一只正在吃花的独角兽）。你可以把第二个“is”挪到最前面去，变成一个问句：“Is a unicorn that is eating a flower in the garden?”但是如果你把第一个“is”挪到最前面去，句子就不通了：“Is a unicorn that eating a flower is in the garden?”区别在于，第一个“is”是一个名词词组的一部分，这个词组在大脑里引起的意象不是随便一个独角兽，而是随便一个正在吃花的独角兽。4岁的孩子，还没有学过什么是名词词组的时候，都能够很不费力地运用这个规则。他们好像就会这个规则。他们也不需要听说过或用过“A unicorn that is eating a flower”这个词组，就知道这个规则。这就是语言的美：我们所说的每一句话都几乎是一种新的组字方法。
乔姆斯基的推测在那之后的几十年里被漂亮地证明了，证据来自许多不同领域。所有证据都可以归结到心理学家、语言学家史蒂文•平克（Steven Pinker）做出的一个结论上：为了学习人类语言，我们需要有的是人类语言的本能。频克被人戏称为第一个写出的东西别人看得懂的语言学家。他令人信服地搜集了多种证据，证明语言能力的内在性。首先，是语言的普遍性。所有人类的成员都会一种或几种语言，不同语言的语法复杂程度都差不多，
即使是新几内亚高地上那些从石器时代就与外界隔绝的人群所使用的语言也是如此。所有人都很小心很系统地遵守那些没有被言明的语法规则，即使是没有受过教育的人、那些说话比较“俗”、说方言的人，也是如此。大城市里黑人区的“黑人英语”，其语法规则的合理性一点不少于英国女王的英语。认为一种语言比另一种语言“高级”，完全是偏见。例如，双否定“不要有人不对我干这种事”）的用法在法语里是很适当的，在英语里就是土话。但在这两种语言里，人们都是同样遵守语法规则的。
第二，如果这些语法规则是像学习词汇那样通过模仿得到的，那么，为什么那些4岁孩子明明说“went”说得很准确，却会忽然改口说“goed”（“went”是“go”的过去时的正确形式，“goed”是小孩根据“动词后面加‘ed’就是过去时”这一规则（并不适用于所有动词）自己造出来的词。）？真实情况是，虽然我们必须教孩子读和写——在这些能力上我们可没有任何本能——他们在很小的年龄就可以不需我们帮助地学会说话。没有一个父亲或母亲会说“goed”，
但是几乎所有孩子在儿童期某一时刻都会这么说。没有一个父亲或母亲会给孩子解释说：“杯子”这个词可以用来指所有杯状物体，而不是单指这一个特别的杯子、不是指杯子把手、不是指造杯子所用的材料、不是指用手指杯子这一动作、不是指抽象的“杯子状”，也不是指杯子的大小和温度。一个电脑如果要学会语言，必须要有一个程序，很费劲地把这些愚蠢的错误含义给过滤掉。而儿童天生就有事先编好的“程序”——本能，天生就知道哪些用法可能合适而哪些不合适。
但是在语言本能方面，最令人吃惊的证据来自一系列在自然条件下进行的实验：让儿童给一些没有语法规则的“语言”加规则。最著名的一个例子，是德里克•比克顿_{（Derek Bickerton，语言学家）}所作的一项研究。19世纪一组被带到夏威夷的外国劳工发明了一种不伦不类的语言——一些字和词被他们用来在他们内部交流时使用。与其他类似的混杂语言一样，这种语言缺少系统的语法规则，在表达上特别繁复，表达能力却又特别有限。但是，所有这一切在他们的孩子那一代——这些孩子们在幼年学习了这种语言——就改变了。这种混杂语言有了转调规则、字词顺序以及语法规则，成为了一种有效又有表达力的语言——一种新方言。简而言之，正如比克顿总结的，混杂语言只有在被一代孩子学过之后才能变成新方言，因为孩子具有促成这种改变的本能。
比克顿的假说从手语那里得到了极大的支持。有这样一个例子：在尼加拉瓜，为聋儿而设的专门学校是从80年代才开始出现的。这导致了一种全新的语言的诞生。这些学校教孩子们“读”嘴形，但很不成功，不过在操场上一起玩耍的孩子们把自己在家里所用的手势凑到一起，形成了一种粗糙的混杂语言。几年之内，当更小的孩子入了学，学了这种混杂语言之后，它就被改造成了一种真正的手势语言，与一般语言一样有语法，有其复杂性、实用性和高效性。在这个例子里，又是孩子造就了语言。这个事实好像在说，儿童进入成人期之后，语言的本能就被关闭了。这就能够解释为什么成年人想学习一种新语言，或是新的口音都很困难。因为我们不再拥有语言的本能。这也能够解释为什么即使对孩子，在课堂上学法语也比到法国旅游的时候学法语难：语言的本能是在听到的语言上起作用，而不是在记住的语法规则上起作用。一个敏感期，在它之内有什么东西可以被学会，在它之外则不行，这是动物本能的明显特征。例如，苍头燕雀必须在一定年龄之内常听它自己这个物种的歌，才能够学到标准唱法。同样原理对人类也适用，则是因了一个女孩的真实而残酷的故事而被揭示的。这个女孩名叫吉妮（Genie），在洛杉矶的一个公寓里被发现的时候13岁。她从出生开始就一直被关在一个家具极少的小房间里，几乎从来没有与其他人接触过。她只会两个词：“别这样”和“不要了”。从这样的“地狱”里被解救出来之后，她很快就拥有了很大的词汇量，但是她始终没有学会语法——她已经过了学习敏感期，语言本能已经没有了。
但是，再坏的理论也得费好大力气才能把它“枪毙”掉。语言是一种能够改变大脑的文化形式（而不是反过来）这种说法，就是长时间不死的这么个东西。尽管有一些历史上人们最熟悉的例子是支持这个说法的，但后来发现这些例子净是假的。比如说，有一个印第安部落，语言里没有时间这个词，因此这个部落的人脑子里也没有时间观念。即便如此，语言是大脑里突触形成的原因而不是结果这个说法，却在许多社会科学分支里继续流传。其实这种说法的荒谬是显然的。比如说，只有德语里有一个词：Schadenfreude，意思是把自己的欢乐建立在别人的痛苦上，但这并不意味着其他国家的人们就不懂这是一个什么概念，尽管他们的语言里没有一个专门的词。
关于语言本能的更多证据来自多个方面，其中一个就是对儿童在他们出生后的第二年里如何发展语言能力的详细观察。不管大人直接对这些孩子说了多少话，不管是否有人教过这些孩子怎样用词，儿童语言能力的发展都要以特定的方式经过特定的阶段。对孪生子的研究也说明，语言发育早还是晚，也是有很强遗传性的。但是对于大多数人来说，语言本能的最有说服力的证据是来自于实在的科学：神经病学和遗传学。有中风患者和真实的基因做证据，反对派也不好怎么争辩。大脑里有一部分有系统地被用来进行语言处理（在大多数人的大脑里是在左半脑），即使是用手势“说话”的聋子，也是如此，尽管手语也需要用到右半脑。
如果这一部分大脑的其中一小部分被损坏了，结果就是我们所说的“布鲁卡失语症”，即丧失使用和理解语法的能力，除非是最简单的语法。但是这些人仍然能够理解语言的含义。比如说，布鲁卡失语症患者可以很容易地回答诸如“你能用锤子切东西吗？”这样的问题，但是对于“狮子被老虎杀了，它们俩谁死了？”这样的问题，患者就很难答上来。第二个问题要求听者知道字词97顺序方面的语法规则，而这恰好是被损坏的那一部分大脑的功能。大脑的另一区域，威尔尼克区（Wernicke），如果被损坏则会出现完全相反的症状。这样的患者能够说出一大串语法结构异常丰富却完全没有意义的话。布鲁卡区（Broca）的功能看起来像是制造话语，而威尔尼克区则是在告诉布鲁卡区应该制造什么样的话语。这不是故事的全部，因为还有一些其他大脑区域也参与了语言的加工处理，比较明显的是中间岛区（insula）（这可能是阅读困难症患者大脑里出问题的地方）。
有两种遗传的情况可以影响语目能力。一种是威廉姆斯症（Williams Syndrome），这是由第十—’号染色体上的一个基因引起的。患这种病的儿童智力水平一般很低，但是他们说起话来却既生动又内容丰富，而且健谈成癖。他们可以一直喋喋不休，用的都是复杂的词、长句子和非常讲究的句子结构。如果让他们举一个动物的例子，他们常常会选一个奇怪的动物，比如土豚（食蚁兽的一种），然后说这是猫（或狗）。他们学习语言的能力高于常人，但是代价却是理解能力：他们是智力迟钝的人。我们中的很多人都曾经认为思考就是一’种不发声的语目，但是威廉姆斯症儿童的存在，似乎说明这种想法是不对的。
另外一个遗传情况有着与威廉姆斯症相反的症状：它使人的语言能力降低，却不明显影响其他方面的智力，至少，它对智力的其他方面没有什么系统性的影响。它被称为语言特有损害，在一场激烈的科学争论中占有中心地位。争论双方是新兴的进化心理学与旧的社会科学，争论在于是用遗传来解释行为还是用环境解释行为。处在争论中的基因，就在七号染色体上。
这个基因是否存在，不是争论的内容。对孪生子的仔细研究，明白无误地指出语言特有损害具有极强的遗传性。这种病与出生时的神经损害没有关系，与成长过程中接触语言比较少也没有关系，也不是由于智力低下造成的。虽然各种医学检查对于这种病的定义并不完全一致，但有一些检查发现这种病的遗传性接近百分之百。也就是说，同卵双生的两个孩子，比起异卵双生的两个孩子，都有这种病的机会要大一倍。
这个基因是在七号染色体上，这一点大家也都没有多少怀疑。1997年，牛津大学的一组科学家发现了七号染色体长臂上的一个基因标识，这个标识的一种形式总是与语言特有损害同时出现。
这个证据虽然只是从英国的一个大家族里得到的，却很强很明确。
那么为什么争论呢？争论的焦点是，语言特有损害到底是什么。对有些人来说，它是大脑整体的病变，影响的是语言产生中多方面的功能，主要是影响到话语从嘴里表达出来和耳朵听话语的能力。根据这个理论，病人在语言方面遇到的困难，是从这两个方面延伸出来的。对于另外一些人来说，这个理论纯属误导。当然，在很多病人身上的确存在听力与发声方面的问题，但是另外还存在一些更能引人好奇的东西，那就是这些病人真的有理解方面的问题，而这一问题与听力与发声方面的缺陷是无关的。争论双方都能同意的一件事，就是媒体把这个基因炒成是一个“语法基因”，过于简单，太不理性，是很让媒体丢面子的事。
故事是围绕着一个英国大家庭展开的，我们称它为K家庭吧。这个家庭现有三代。一个患有语言特有损害症的妇女与一个正常男子结了婚，生下四女一男，除了一个女儿之外，所有孩子都患有语言特有损害症。这些孩子又结婚生子，在他们的总共24个孩子里，有十个有同样症状。这个家庭里的人跟心理学家们都很熟了。其他科学家们则用一系列新的检查来“争夺”他们。是他们的血液把牛津的科学家们引到了七号染色体的基因上面。牛津的这个小组是与伦敦的儿童健康研究院合作的，这两处的科学家大都是“综合病变”论的持有者，他们认为K家庭的成员表现出来的语法能力缺陷是源于他们在听、说方面的问题。他们的主要反对者和“语法病变”理论的领头人，是加拿大语言学家默娜•高布尼克（Myrna Gopnik）。
1990年，高布尼克第一次提出，K家庭的人以及其他有相似病症的人，在理解英文的基本语法规则方面有问题。并不是说他们无法理解语法规则，而是他们必须有意识地、专心地去学，才能学会这些规则，而不是本能地把这些规则内化。举一个例子。如果高布尼克给一个人看一幅漫画，画上是一个想象出来的生物，还有这样的字：“这是一个wug”那么如果高布尼克给这个人看一99幅画有两个这种“生物”的漫画，边上写着：“这是……”，那么大多数人都可以在眨眼的工夫里就回答说：“wugs”（wug是瞎编的一个词，加上s即成复数形式）。但是有语言特有损害症的人大多回答不出，即使能够回答上来，也是在长时间考虑之后。英文里的复数规则是在大多数词后面加S，而他们好像不知道这个规则。但这并不妨碍患语言特有损害症的人掌握大多数词的复数形式，只是一旦遇到一个他们没有见过的新词他们就被绊倒了。而且，他们还会犯这样的错误，即在那些我们正常人不会加s的词后面加s，比如说“saess”。高布尼克推断到，这些病人把每一个词的复数形式都当做一个单独的词存在记忆里，就像我们储存那些单数的词一样。他们没有储存单数变复数的规则。
问题当然不仅仅是在复数方面。过去时、被动语态、一些字词顺序的规则、后缀、字词组合的规则，以及所有我们下意识地就知道的英文语法规则，对于患语言特有损害症的人来说，都很困难。当高布尼克研究了那个英国的家庭，把这些结果第一次发表出来的时候，她立刻就遭到了猛烈的攻击。有一个批评家说，如果把症状归结为语言处理系统的问题，而不仅仅是语法规则的问题，是远为合理的。这是因为在说英语的人里，类似复数与过去时这样的语法形式对有口语表达障碍的人特别困难。另外两个批评家说，高布尼克是在误导读者，因为她“忘记”提及一些K家庭成员有先天性的口语表达障碍，这种障碍使他们在单个的词、音素、词汇量、语义、句法方面都有问题。他们在理解其他句法结构的时候也有问题，例如可逆被动式、跟在名词后面的修饰词、比较从句、内藏形式，等等。
这些批评颇有一丝争夺地盘的味道。K家庭不是高布尼克的发现，她怎么敢对他们做出与以往完全不同的结论？其实，在那些对她的批评之中，起码有一部分实际上是支持了她的观点，这就是K家庭的症状在所有句法规则上都表现了出来。说语法上的困难来自口语表达问题是因为口语表达问题与语法困难是同时出现的，这就是循环论证了。
高布尼克不是一个轻言放弃的人。她把自己的研究扩展到希腊和日本，在那里做了一些设计独到的实验，目的是要找到与K家庭同样的现象。例如，在希腊，“likos”这个词是狼的意思，“likanthropos”是狼人的意思，而“lik”，狼这个词的词根，却从来不会单独出现。但是大多数说希腊语的人都很自然地就知道，如果他们想把狼这个词的词根与一个以元音开头的词（比如说，“anthropos”）组合起来，他们应该把“likos”里的“os”去掉；而如果是要把狼这个词的词根与一个以辅音开头的词组合起来，他们需要把“likos”里的“s”去掉。听起来这好像是个复杂的规则，但正如高布尼克指出的，即使是只说英语的人也能够一下就看到这个规则的熟悉之处，我们在造新词的时候都是遵守这个规则的，比如说，“technophobia”（technophobia，对于高科技有恐惧的人。这个词是把“technology”（技术）里的“logy”去掉，与“phobia”（极度恐惧）组合成的。）。
患有语言特有损害症的希腊人，不能掌握这个规则。他们可以学习一个词，比如“likophobia”和“likanthropos”，但是他们在认识这种词的复杂结构方面很差，不能认识到这种词是由不同的词根和后缀组成的。结果，为了补救这一缺陷，他们需要有一个比正常人大得多的词汇量。高布尼克说：“你得把这些人想象成没有母语的人。”他们学习自己的母语就像我们成年人学习一门外语一样费劲，需要有意识地吸收词汇和规则。
高布尼克承认有些语言特有损害症患者在不需要语言的测试中也表现出低智商，但是另一方面，有些患者的智商高于平均水平。有一对异卵双生的孩子，有语言特有损害症的那个在不需要语言的方面智商高于没病的那个。高布尼克也承认，多数语言特有损害症患者在听和说方面都有问题，但她强调，并不是所有患者都有这些问题，听、说方面的问题与语法规则方面问题的巧合不能说明什么。比如，语言特有损害症在学“ball”和“bell”的时候没有困难，但他们想说“fall”的时候却常常说成“fell”。这是因为“fall”和“fell”之间的区别是语法上的，不是词汇上的（fell是fall（掉下、摔倒）的过去时，而ball（球）和bell（铃铛）是两个不同的词。）。还有，他们在区别押韵的词的时候没有问题。因此，当一个高布尼克的反对者说K家庭的人说话外人都听不懂的时候，高布尼克火冒三丈。她跟K家庭的人一起度过了很多个小时，一起聊天，吃比萨饼，一起参加家庭聚会，她说他们说话她完全听得懂。为了证明听、说能力的缺陷与语言特有损害症无关，她还设计了书写测试。例如，考虑以下两句话：“他上周得了第一名，他很高兴”，“他得了第一名，他上周很高兴”。大多数人立刻就知道第一句话是对的，而第二句话语法不对。语言特有损害症患者却认为两句话都对。很难想象这个问题跟听、说能力有什么关系。
尽管如此，听、说能力论的那些理论家并没有放弃。最近，他们证明语言特有损害症患者在“声音屏蔽”方面有问题，也就是说，当一个纯音之前或之后出现一些噪音的时候，他们无法听到这个纯音，除非这个纯音的音量比正常人所需音量高45分贝。换句话说，语言特有损害症患者无法像正常人那样“挑出”一串大声说话的声音中那些细微的声音。那么也许他们会漏掉一些轻声说的词，比如说：“去了”中的“了”。
但是，这个证据与其说是支持了听、说问题是语言特有损害症（包括语法问题）的根源，不如说它支持的是一个更有意思的进化方面的理论：主管听、说方面能力的大脑区域与语法区域是相邻的，在语言特有损害症中两者都有损坏。语言特有损害症是由七号染色体上的一个基因的一种特殊形式造成的，在怀孕晚期这个基因造成了大脑的损伤。核磁共振成像技术已经使我们确认了大脑损伤的存在与大致位置。一点也不奇怪，损伤发生在专管语言处理与口头表达的两个区域——布鲁卡区和威尔尼克区——之一。
猴脑中有两个区域与人脑中的这两个区域完全对应。布鲁卡对应区是用来控制猴子脸部、喉部、嘴和舌头的肌肉运动的，威尔尼克对应区是用来识别一串声音、识别其他猴子的叫声的。这些正是语言特有损害症患者常有的语言之外的问题：控制脸部肌肉、识别声音。换一句话说，当人类的祖先第一次进化出语言本能的时候，它是从发声与声音处理的区域发展出来的。发声与声音处理的区域仍然存在，与脸部肌肉和耳朵都有连接，但是语言本能的区域在这之上发展起来，形成了一种内在能力，可以把自己这个物种其他成员所用的语法规则加在由声音而产生的词汇上。这样，尽管其他灵长类动物都不能学会语言一为此，我们得感谢那么多勤奋、有时容易上当又一厢情愿的训练员，是他们试验了所有可能的办法，才终于让我们知道黑猩猩和大猩猩是学不会语言的——语言却是与发声与声音处理有密切联系的。（但是，也并不是密切得不可分。聋人脑中，语言区的输入信号与输出信号分别给了眼睛和手。）因此，大脑的那一部分因遗传而造成的损伤，就会影响语法、口语和听力三个方面。
这是对于威廉•詹姆斯在19世纪提出的假说的最好证明。他的假说认为，人类复杂行为的形成是因为在人类祖先的本能之上加了新的本能，而不是以学习代替本能。詹姆斯的理论在80年代晚期被一伙自称为进化心理学家的人给复活了。他们当中著名的有人类学家约翰•图拜（John Tooby）、心理学家里达•科斯米兹（Leda Cosmides）和心理语言学家史蒂文•频克。大致归纳一下他们的论点，应该是这样的：20世纪社会科学的主要目的是寻找社会环境影响我们行为的途径，我们可以把这个问题大头朝下反过来，寻找我们的内在社会本能是怎样影响社会环境的。这样，
所有人高兴的时候都笑，焦虑的时候都皱眉，所有文化背景下的男性都发现女性身上代表年轻的特征有吸引力，这些也许都是本能的表现，而不是文化的表现。或者，浪漫爱情与宗教信仰在所有文化里的普遍性也许暗示着它们是受本能的影响，而不是传统的影响。图拜和科斯米兹提出一个假说，认为文化是个人心理的产物，个人心理不是文化的产物。还有，把先天与后天对立起来也是一个巨大的错误，因为不管学习什么，都要求一个人有内在的学习能力，学到什么是由内在因素限定的。例如，教一只猴子（或人）害怕蛇比教它害怕花容易得多，但是你还是得教它才能学会怕蛇。怕蛇是一种需要学习的本能。
进化心理学里的“进化”二字，并不是指人们对于世代延续过程中大脑变化的兴趣，也不是指对于自然选择本身的兴趣。虽然这两者都很有意思，但它们还无法用现代手段去研究——这两者都需要非常缓慢的过程。在这里，“进化”二字指的是达尔文的理论框架的第三点：“适应”的概念。复杂的生物体内器官可以被“逆向设计”，以发现它们是被设计出来做什么用的。用同样的方法我们也可以研究复杂机器的功能。史蒂文•频克特别喜欢从兜里掏出一个用来除橄榄核的小玩意，来解释逆向设计的过程。里达•科斯米兹则倾向于用一把瑞士军刀来解释同样一个过程。在这两种情况下，除非用一件物品的功能来描述它，否则它就是没有意义的，比如说，这个刀刃是干什么的？如果在描述照相机工作原理的时候不提到它是用来拍摄图像的这样一个事实，那就是没有意义的。同样地，描述人的（或动物的）眼睛却又不提它是记录图像的，那么这就是无意义的。
频克和科斯米兹都认为同样道理也适用于对人脑的描述。它的不同区域就像是一把瑞士军刀的不同刀刃，极可能是为了特殊功能才出现的。另外一种解释则认为，人脑的复杂性是随机的现象，人脑不同区域的不同功能只是从复杂性的物理原理中掉出来的副产品，得到这些不同功能只是因为我们很幸运。这个说法到现在还被乔姆斯基欣赏，虽然它与一切证据都矛盾。很简单，没有任何证据支持这么一个假说：你把一个由许多微处理机组成的网络做得越复杂，它所能得到的功能就越多。事实上，研究神经网络时常用的“连接学派”方法，对这个假说进行了大量探讨，这是因为这个学派被“大脑是神经元和突触组成的多用途机器”这样一个说法“误导”。结果却发现这个假说站不住脚。要想解决事先存在的问题，需要事先设计好的功能。
这里有一个历史对我们的嘲弄。“自然界的设计”这样一个概念有一度是反对进化论的最有力论据。事实上，在19世纪上半叶，就是“自然界的设计”这样的观点阻挡住了进化论的被接受。它最出色的表达者威廉•佩利_{（William Paley，18世纪的神父）}有过一段著名的话:如果你在地上发现一块石头，你对于它是怎么到那里的很可能会毫无兴趣。但是如果你在地上发现一块表，你不想承认也得承认，在什么地方肯定有一个钟表匠。因此，生物体精巧、功能完美的结构就是上帝存在的证据。而达尔文却天才地把同一个论据拿来支持反面观点，反驳佩利。用理查德•道金斯的话说，一个名叫自然选择的瞎眼的钟表匠，从生物体上自然出现的差异出发，一步一步地下功夫，经过几百万、上千万年，经过几百万、上千万个生物体，可以与上帝一样做到让生物体用复杂的方法来适应生存环境。达尔文的假说被证据支持得如此之好，现在，用复杂的方法来适应生存环境已经成了自然选择的最强证据。
我们所有人都有的语言本能就是这样一个适应生存环境的复杂方法，它的优美设计使得个体之间能够清楚地交流复杂的信息。很容易就可以想见，对于在非洲平原上的我们的祖先来说，能够用其他物种都不会的复杂形式共享准确、详细的信息是多么重要。“进那个峡谷，走很短的一段，然后在水塘前那棵树那里向左拐，你会发现我们刚刚杀死的那只长颈鹿的尸体。要躲开树右边正在结果的那丛灌木，我们看见一只狮子进去了。”这样的两句话，对于听者的生存具有很大的价值。这等于是在自然选择这个“抽奖”活动里的两张奖券。但是如果不懂语法，不懂很多语法，还是听不懂。
支持“语法是内在的”这一理论的证据多种多样。也有一些证据表明，七号染色体上的一个基因在发育中的胚胎大脑构建语言本能的过程中起了作用。这些证据很可信，但是我们仍然不知道这个基因的作用有多大。不过，大多数社会科学方面的专家仍然强烈地拒绝接受这样一个想法，即有些基因的主要用途是使人在发育过程中得到语法本能。从他们关于七号染色体上这个基因的争论就可以清楚地看出，不管有多少证据，这些社会科学家们仍然争辩说，这个基因的主要作用是使得大脑有理解语言的能力，而它在语言“本能”方面的作用不过是个副作用。在一个世纪里占主导地位的学说都是本能只属于动物，人类没有本能，在这种情况下对于语言本能的拒绝就不足为奇了。其实，如果你想一想詹姆斯的观点，即有些本能是要靠学习与接受外界刺激才能建立起来，那么这个学说就要垮台了。
我们在这一章里跟随了进化心理学的观点，即试图用逆向设计的方法去了解人类行为是为了解决什么问题。进化心理学是一门很新却很成功的学科，它给许多领域里对于人类行为的研究都带来了威力巨大的新见解。在六号染色体那一章里谈到的行为遗传学，也是想要达到相同的目的。但是，进化心理学与行为遗传学的角度如此不同，它们是冲突的。问题是这样的：行为遗传学寻找的是个体之间的差异，并希望把这些差异与基因联系起来。进化心理学寻找的是共同的人类行为——人类行为的普遍性，在我们每个人身上都能发现的特征——并且试图了解这些行为是为什么和怎样成为了部分是本能的行为。因此，它假设个体之间没有区别，起码对于特别重要的行为是如此。这是因为自然选择的任务就是磨掉个体的差异。如果一个基因的一种形式比其他形式好得多，那么，好的这种形式很快就会成为普遍的形式，而差的那些形式就被淘汰了。因此，进化心理学得出这样一个结论：如果行为遗传学家发现哪个基因有几种不同的形式，那么这个基因肯定不会很重要，只能是个起次要作用的。行为遗传学家则反驳说，到现在为止所有被研究过的人类基因都有一种以上的形式，所以，进化心理学的论断肯定有什么地方是错的。
也许我们在实践中会发现这两个学科的矛盾是被放大了。一个是研究具有普遍性的、常见的、人类特有的特征的遗传学，另一个是研究个体差异的遗传学。两者都有一定的真理在里面。所有的人都有语言本能，所有的猴子都没有，虽然这种本能在不同的人体内不是发展得同样好的。患有语言特有损害症的人，他们的语言能力仍然比瓦殊、扣扣、尼姆（大猩猩或黑猩猩的名字）或任何久经训练的黑猩猩和大猩猩强得多。
行为遗传学和进化心理学得出的结论对于很多不从事科学研究的人是难以消化的。这些人用一个表面上显得很有道理的说法来表达他们感到的不可思议。一个基因，一串DNA“字母”，怎么就能导致一种行为？在一个蛋白质的配方与一种能够学习英文里过去时的能力之间，有什么我们能够想象的机制把它们联系起来？我承认，初看上去这两者之间确有一条鸿沟，说这两者是有联系的好像需要的是信心而不是理性。但是，其实并不需要如此，因为行为的遗传学从本质上来说与胚胎发育的遗传学并无区别。假设大脑里每一个区域都是通过参考发育过程中在胚胎大脑里建立的一系列化学梯度才得以发育为成年的形式，也就是说，化学梯度形成了给神经元的地图。那些化学梯度本身可以是遗传机制的产物。有些基因和蛋白质能够准确地知道它们在胚胎里的位置，这虽然难以想象却无疑是存在的。到描述第十二号染色体的时候我会讲到，这样的基因是现代遗传学研究最激动人心的发现之一。行为基因的概念并不比发育基因的概念更怪异，两者都让人费思量，但是自然从来就不会因为人类对她不理解而改变自己的方式。
X和Y染色体冲突
Xq28——多谢你的基因，妈妈。 ——90年代中期同性恋书店里T恤衫上的字样
往语言学拐一下，我们就会直面进化心理学所提出的骇人理论。也许它让你有了一种不安的感觉，感到有些其他的东西在控制我们的生命，感到我们自己的能力，语言能力和心理能力，都在某一程度上是由本能所决定，而不像你以前所骄傲地认为的那样，是由你自己的意志决定的。如果是这样，那么事情马上就要变得更糟了。这一章要讲的故事也许在整个遗传学史上是最出人意料的一个。我们已经习惯了把基因想象成是配方，它们在消极地等待着整个机体的“集体决策”，以确定要不要开始转录：基因是身体的仆人。这里我们要遇到另一种现实：身体是基因的受害人、玩具，是基因的载体和战场，为的是基因自己的雄心壮志。
比七号染色体小的那些染色体中，最大的是X染色体。X染色体是个与众不同的染色体，是不合群的家伙。跟它配对的染色体，也就是说，跟它在序列上有亲和性的染色体，不是像其他染色体那样是跟它一模一样的一条，而是Y染色体，极小，而且几乎没有活性，就像是遗传上的“马后炮”。起码在雄性哺乳动物和果蝇里，以及在雌性的蝴蝶和鸟类里是这样的。在雌性哺乳动物和雄性鸟类里，则有两条X染色体，但是它们仍然有点怪。在身体内的每一个细胞里，两条X染色体不是等量地表达自己携带的遗传信息，而是有随机选择的一条把自己卷成一个小小的卷，没有活性，被称为巴尔小体（Barrbody）。
X和Y染色体被称为性染色体，原因很明显，因为它们几乎完全准确地确定一个人的性别。每个人都从其母亲那里得到一条X染色体，但是如果你从父亲那里拿到的是一条Y染色体，那么你就是个男的；如果你从父亲那里遗传到一条X染色体，你就是女的。有个别的例外，有人虽然有一条X染色体和一条Y染色体，但是表面看上去是女的。但是这些是特殊的例子，它们的特殊正是为普遍的规则提供了证据。因为在这些人体内，Y染色体上最重要的男性化基因要么缺失要么受了损害。
大多数人都知道这个事实，在学校里学不了多少生物学就会接触到X和Y染色体。大多数人也知道色盲、血友病以及其他一些疾病在男性里更为常见，因为这些致病基因在X染色体上。因为男性没有一条“富余”的X染色体，他们比起女性来更易罹患由隐性基因导致的疾病。正如一位生物学家说的，男性体内的X染色体是在没有副驾驶的情况下独自飞行。但是，有些关于X和Y染色体的事情是大多数人不知道的，有些事情非常奇怪，让人不安，它们动摇了整个生物学的基础。
在所有科学研究方面的出版物中，《皇家学会哲学通讯》是最严肃最清醒的之一。在它里面，你很少会读到这样的文字：“这样，哺乳动物体内的Y染色体很可能参与的是一场被敌人在‘枪支’上占了上风的战斗。一种合乎逻辑的结果，是Y染色体应该逃跑、躲藏起来，把所有功能并非必需的序列都扔掉。”“战斗”、“在‘枪支’上占了上风”、“敌人”、“逃跑”？我们可不认为这些是DNA应当做的事。但是同样的语言，比这稍微多一点术语，在另一篇关于Y染色体的科研论文里也出现了。那篇文章的题目是《内在的敌人：基因组间的冲突，基因位点间竞争进化（ICE），以及物种内部的红色皇后》。文章的一部分是这样的：“Y染色体与其他染色体基因位点间进化中持续不断的竞争，使得Y染色体上基因的质量由于那些有一定负面作用的突变的‘搭便车’而不断下降。Y的衰落是由于遗传上的‘搭便车’现象，但是基因位点间在进化中的竞争才是持续地推动雌雄两性之间相互对抗共同进化的催化剂。”就算以上这段话对你来说就像“爪哇国”的文字一般，有些词还是能够吸引你的注意，比如“对抗”。最近还有一本教科书，也是关于同样的题材的。书的名字很简单，叫做：《进化，40亿年的战争》。这都是怎么回事呢？
在我们的过去，有某一时刻，我们的祖先从像两栖类动物那样让环境温度决定性别，改成了用遗传决定性别。改变的原因，也许是因为这样每一种性别的个体都可以从卵子受精就开始为自己的特殊角色而接受训练。在我们人类里，决定性别的基因使我们成为男性，如果没有这些基因就是女性，在鸟类里却正好相反。这个基因很快就在它周围吸引了一些对于男性有好处的其他基因，比如说，能够使肌肉发达的基因，或者是造成暴力倾向的基因。但是，因为这些基因是女性身体不想要的——不能浪费本来可用于抚养后代所需的能量——这些次要的基因就变得对一个性别有利而对另一性别有害。这样的基因就被称做性别对抗基因。
当另一个基因突变抑制了两条性染色体之间进行正常的遗传物质交换的时候，难题解决了。现在，性别对抗基因就可以分道扬镳了。一个基因在Y染色体上的形式可以利用钙来造出鹿角，而它在X染色体上的形式却可以用钙造出乳汁。这样，一对中等大小的染色体，本来是各种各样“正常”基因的所在地，就被性别决定这个过程给“劫持”了，最终成为了性染色体，各自“吸引”了不同的基因。在Y染色体上积累了对雄性有好处对雌性却常有坏处的基因，在X染色体上则积累了对雌性有好处而对雄性有坏处的基因。例如，有一个新近发现的基因叫做DAX，是在X染色体上的。有极少的一些人生来是有一条X染色体一条Y染色体的，但是X染色体上却有两份DAX基因。结果就是，虽然这些人从基因角度说是男性，他们却发育成为正常的女性。我们对其原因的理解，是DAX和SRY——Y染色体上让男性成为男性的基因——是互相对抗的。一份SRY会打败一份DAX，但是两份DAX就要打败一份SRY了。
这种基因之间互相对抗的升级是很危险的事。如果打个比方，我们可以觉察到，这两条染色体不再把对方的利益放在眼里了，就更不要提整个机体的利益了。或者更确切地说，一个基因在X染色体上的传播对X染色体可以是好事，但同时对Y染色体又是坏事；反过来也有可能。
举一个例子吧。假设有一个基因在X染色体上出现了，它携带的配方是一种致死的毒药，只杀死带有Y染色体的精子。一个带有这个基因的男性不会比其他男性有更少的子嗣，但是他只可能有女儿，不可能有儿子。他的所有女儿都携带有这个基因，而如果他有儿子，他们是不会携带有这个基因的。于是，在他的下一代里这个基因就多了一倍。这样一个基因会传播得很迅速。如果这样一个基因停止传播，惟一的原因就是它已“杀死”了太多的男性，使物种本身的存在都受到了威胁——男性变得很稀罕。
这是异想天开吗？根本不是。在一种学名叫做Acreaencedon的蝴蝶里，这种情况就发生了。结果就是这种蝴蝶的97％都是雌性。这只是我们所知的这种形式的进化冲突中的一例，我们称为“性染色体的推动力”。大多数已知的类似事例只限于昆虫，但是这只是因为科学家们对昆虫研究得比较详细。我在前文中引用过的那个奇怪的词，“冲突”，现在开始更加有意义了。有一个简单的统计资料：因为雌性有两条X染色体，雄性有一条X一条Y，所以在所有性染色体中有四分之三是X，只有四分之一是Y。换句话说，一条X染色体三分之二的时间是在雌性体内度过的，只有三分之一的时间是在雄性体内度过。这样，X染色体进化出攻击Y染色体能力的可能性，是Y染色体进化出攻击X染色体能力的可能性的三倍。Y染色体上的任何基因都可能受到来自一个新进化出来的X基因的攻击。结果就是Y染色体扔掉了尽可能多的基因，把剩下的“关闭”，以“跑得远远地藏起来”，剑桥大学的威廉?阿莫斯（WilliamAmos）用“科技术语”这样说。
人类的Y染色体在关掉它的大多数基因方面做得如此之有效，使得现在的Y染色体上绝大多数都是没有编码功能的DNA，什么功能也没有，但是这样它们就不给X染色体以任何可以用来瞄准的目标。有一段短短的序列看上去像是最近才从X染色体上“溜”过来的，这是所谓的“假常染色体”区域。除此之外还有一个极为重要的基因，就是我们前面提到过的SRY基因。这个基因启动一系列的事件，导致胚胎雄性化。一个单个基因能够有这样的能力是很少见的。尽管它的作用只是类似于拨一个开关，但很多事件紧随其后。生殖器官发育得像阴茎与睾丸，身体的形状与组成变得不再像女性（在我们这个物种里所有个体一开始都被当成女性对待，但在鸟类和蝴蝶就不是这样），各种激素也开始在大脑里起作用。几年以前，《科学》杂志上曾刊登过一幅搞笑Y染色体图，声称已经找到了那些典型的男性行为的基因，这些行为包括不停地拿遥控器换电视频道、记忆和复述笑话的能力、对报纸上体育版的兴趣、沉迷于包含摧毁性行为和有人死的情节的电影，以及不会通过电话表达感情。这个搞笑图之所以好笑，是因为我们认出了它提到的这些行为都是典型的男性行为。这个笑话强化了“这些行为是由基因决定的”这种说法，而远不是在嘲笑这种说法。这个图惟一错的地方在于，并不是每一种男性行为来自于一个特殊的基因，而是所有这些行为来自于因睾丸激素等引起的大脑的普遍雄性化，其结果，就是男性在现代社会里的这些表现。这样，从某种角度来说，很多男性特有的习惯都是SRY基因的产物，因为正是SRY基因启动的一系列事件导致了大脑与身体的男性化。
SRY基因比较特别。它的序列在不同男性体内惊人地相似：在人体内，它的序列中几乎没有点突变（也就是一个字母的区别）。在这种意义上说，SRY基因是一个没有变化的基因，从大约20万年前人类的最后一个共同祖先到现在，它就没有改变过。但是，我们的SRY基因与黑猩猩的很不同，与大猩猩的也很不同：这个基因在物种与物种之间的差别比一般基因要高十倍。跟其他活跃（也就是说，被表达的）的基因相比，SRY基因是进化最快的基因之一。
我们怎样解释这个矛盾的现象呢？据威廉?阿莫斯和约翰·哈伍德（John Harwood）说，答案隐藏在被他们称做“有选择地清扫”的那些逃跑与藏匿之中。时不时地会有一个有推动作用的基因出现在X染色体上，依靠着能够辨认出SRY制造出来的蛋白质的能力，来攻击Y染色体。这样，任何很少见的SRY基因的突变形式，如果能够造出一种不能被识别出来的蛋白质，它就立刻有了进化优势。这种突变形式就会取代其他形式在男性体内传播。有推动作用的X染色体使性别比例向女性倾斜，而SRY的突变形式又把这个比例扳回平衡点。结局就是一种新的SRY基因形式存在于所有男性体内，没有个体之间的差别。也许，这样一种突然爆发的进化发生得如此之快，进化的纪录里都没有能够留下它的痕迹。其结果，就是制造出了在物种之间差别很大而在物种之内又几乎没有差别的SRY基因。如果阿莫斯和哈伍德是正确的，那么这样的清扫至少有一次是发生在人类祖先与黑猩猩祖先分开之后（500万～1000万年前），但又是发生在所有现代人类的最后一个共同祖先之前（20万年以前）。
你也许会觉得有些失望。我在这一章一开始讲到的暴力与冲突变成了分子进化理论的一个细节。不要担心，我还没讲完呢，而且我很快就会把这些分子与真实的、人与人之间的冲突联系起来。
在研究性别对抗方面，领头的学者是加利福尼亚大学圣塔克鲁斯（SantaCruz）分校的威廉?赖斯（WilliamRice），他做了一系列了不起的实验来阐明自己的观点。让我们回到一个假设中的我们远古的祖先那里，他刚刚得到了一条新的Y染色体，正在关掉那上面的许多基因，以躲避有推动力的X染色体基因。用赖斯的话说，这条新的Y染色体是对男性有利的基因的温床。因为一条Y染色体永远不可能到一个女性体内，它就可以随意地获得那些对女性非常不利的基因，只要这些基因对男性有一点点好处（如果你还认为进化是为了让整个物种得益，你就别再这样想了）。在果蝇里，（在这一点上人类也一样，）雄性射出来的液体是含有精子的内容丰富的“汤”，称做精液。精液里有蛋白质，基因的产物。它们的作用还属于未知，但是赖斯有一个很厉害的想法。在果蝇交配的过程中，这些蛋白质进入雌蝇的血液里，并且转移到其他地方，包括“她”的脑。在那里，它们的功能是降低“她”对交配的兴趣，并提高“她”的排卵率。30年以前我们会把排卵率的提高说成是对物种有利的事情：母蝇到了停止寻找性伴侣的时候了，取而代之的是“她”寻找做巢的位置，公蝇的精液使得母蝇的行为发生了变化。你可以想象，国家地理节目的解说词就是这么说的。现在，这个现象却有了一层邪恶的光环。公蝇是在试图操纵母蝇不要再去与其他公蝇交配，让“她”为了自己多产些卵，“他”这样做是受了那些性别对抗基因的指使。这些基因也许是在Y染色体上，也许是被Y染色体上的基因启动的。母蝇则在自然选择的压力之下对这样的操纵越来越抵触。最后陷入僵局。
赖斯用了一个匠心独运的实验来验证他的想法。他在29代果蝇中，制止了母蝇抵抗力的发展，这样，他就保留了一支与其他分支不同的母蝇。同时他又通过让公蝇与另一些抵抗力越来越强的母蝇交配，使公蝇制造出越来越有效的精液蛋白质。29代之后，他把公蝇与没有抵抗力的母蝇交配。结果一目了然。公蝇的精液在操纵母蝇的行为方面是如此高效，它已经变成有毒的了，它可以把母蝇杀死。
现在赖斯相信性别对抗在各种环境之下都是存在的，所留下的线索就是那些飞速进化的基因。例如，在一种带壳的鱼——鲍鱼里面，精子需要用一种名为细胞溶素的蛋白质，在卵子细胞表面由糖蛋白组成的“网”上钻出一些洞来。这种细胞溶素是由一个变化非常快的基因制造的（在我们人体里可能也是如此）。这也许是因为细胞溶素与糖蛋白网之间进行着“军备竞赛”。精子如果能够飞快地进入卵子，这对精子有好处，对卵子则有坏处，因为其他寄生物或是第二个精子也有可能进来。再举一个与人类关系比较大的例子，胎盘是由来自父方的变化飞快的基因控制的。以戴维?黑格为首的现代进化理论家们现在相信，胎盘更有可能是由胚胎里来自父方的基因控制的、寄生在母体内的东西。不顾母体的反对，胎盘试图控制母亲体内的血糖水平以及血压，以利于胚胎的成长。在讲第十五号染色体的章节里我们还会再回到这一点。
但是，交配行为又是怎么回事呢？传统的观念认为，雄孔雀那繁复的尾巴是用来吸引雌孔雀的设备，而且它是依照着过去的雌孔雀的欣赏标准设计出来的。赖斯的同事布雷特?霍兰（BrettHolland）却有一种不同的解释。他认为雄孔雀的尾巴的确是进化来吸引雌性的，但这是因为雌孔雀对这种吸引方式越来越抵触。雄孔雀实际上是用交配前的展示来代替用力量强迫，而雌性用对于展示的欣赏与否来自己控制交配的频率与时间。这就能够解释出现在两种蜘蛛里的让人吃惊的现象。一种的前腿上长有一束尖刺，与交配有关。在观看雄蜘蛛展示自己前腿的录像时，雌蜘蛛会用自己的行为来表示她是否被这只雄蜘蛛撩拨得动了情。如果我们把录像加工一下，把雄蜘蛛前腿上的尖刺去掉，雌蜘蛛仍然同样有可能觉得雄蜘蛛的展示很“煽情”。但是，在另一种蜘蛛里，雄蜘蛛没有这些尖刺。如果在录像里人工加上尖刺，那么雌蜘蛛“接受”雄蜘蛛要求的机会就被增加了一倍以上。换句话说，在进化过程中雌性渐渐地“反感”了雄性的展示，而不是越来越喜欢。就这样，性别之间的选择是“勾引”基因与“抵制”基因之间的对抗的表达。
赖斯和霍兰得到了一个让人不安的结论：越是有社会性、越是个体之间交流多的物种，越会受到性别对抗基因的影响。这是因为两性之间的交流给性别对抗基因提供了一个兴盛的场所。在地球上最有社会性、最善于交流的物种，当属人类。这样，一切豁然开朗——为什么人类的两性关系像个雷区一样，为什么男性在什么是来自女性的性骚扰这个问题上有着那么多不同的标准。从进化角度来说，驱动两性关系的不是什么对男性有利或什么对女性有利，而是什么对他们或她们的染色体有利。在过去，能够吸引女性对Y染色体是有好处的，而能够拒绝一个男性的吸引则对X染色体有好处。
像这样的基因群之间的冲突（Y染色体上的所有基因就是一个基因群），并不只是在“性”方面才有。假设有一个基因的某种形式能够让人更易说谎（这不是一个在现实中很有可能性的假设，但是也许确实有一大批基因可以间接影响一个人是否诚实），这个基因也许会靠着把它的“主人”变成一个成功的诈骗犯而更好地繁殖自己。但是，再假设也许在另一条染色体上有另一个（或一群）基因有一种形式能够提高人辨别谎言的能力，这个基因要想更好地繁殖自己，就得使得它的拥有者避免上那些骗子的当。这两个基因会互相对抗着进化，每一个基因的进化都刺激着另一个的进化，即使这两个基因是被同一个人拥有。它们之间是赖斯和霍兰所说的“位点之间的竞争进化”（ICE）。在过去的300万年间推动人类智力进步的也许正是这样一个竞争过程。以前有一种说法，即人脑的增长会帮助我们的祖先在非洲平原上制造工具和点火，这种说法早就没人感兴趣了。取代它的是大多数进化生物学家都相信的马基亚维里（Machiavelli）16世纪意大利政治家、历史学家和政治理论家。他的理论认为，道德与政治无关，狡猾与欺骗在统治者夺得与保持权力的时候是正当的。——译者注）理论——在操纵别人和抵御操纵这两者的“军备竞赛”中，体积大的脑子是很必要的。赖斯和霍兰写到：“我们称做智力的现象，也许只是基因组之间冲突的副产品。这种冲突，是用语言做武器的进攻与防守基因之间的冲突”。
原谅我偏题偏到智力上去了，让我们回到性上面吧。遗传学上最引起轰动、最有争议、大家争论得最激烈的发现之一，是1993年迪安·哈默（Dean Hamer）（当代美国生物学家。——译者注）宣布他发现了X染色体上的一个基因对于人的性取向有很强的影响，或者如媒体所说，一个“同性恋”基因。哈默的研究是那个时候发表的几项研究之一，它们都指向同一个结论，即同性恋有其生物学原因——而并非来自环境压力或一个人自己有意识的选择。有些研究工作是由同性恋者自己完成的，例如萨奥克研究院（Salk Institute）（位于美国加利福尼亚州的生物学研究院。——译者注）的西蒙·勒威（Simon LeVay），他们中的一些人急于在公众心目中建立一个在他们自己心目中已经牢牢扎根的概念：同性恋者是生来如此的。他们相信，如果一种生活方式由与生俱来的倾向性决定，而非由人的意志决定，那么它所遭到的偏见就会小一些。这种想法有一些道理。而且，如果同性恋确由先天因素引起，那么家长们也就不会觉得同性恋那么有威胁性了，因为，除非孩子本身已有同性恋倾向，否则孩子崇拜的人物中那些同性恋者就不会使自己的孩子也成为同性恋。事实上，那些保守的、不宽容的人最近开始攻击同性恋的遗传因素方面的证据。1998年7月29日，保守的杨女士_{（The Conservative Lady Young）}在《每日电讯报》上写道：“我们在接受‘同性恋’是天生的这一说法时一定要谨慎，不是因为它不正确，而是因为它为那些给同性恋者争取权利的组织提供了借口。”
但是，不管有些研究人员多么希望看到某种特定的结果，他们的研究还是客观坚实的。同性恋有高度的遗传性这一点，是无可怀疑的。例如，有一项研究，研究对象中54位有异卵双生兄弟的同性恋者中，他们的兄弟有12位也是同性恋。而研究对象中56位有同卵双生兄弟的同性恋者中，他们的兄弟有29位也是同性恋。不管是同卵还是异卵双生，孪生子的生活环境是一样的，这个结果就说明，一个或一些基因是一个男性成为同性恋者的一半原因。有一打其他的研究都得到了相似的结果。
迪安·哈默被这个结果迷住了，开始寻找可能的基因。他和他的同事访问了110个家里有男性同性恋者的家庭，并且注意到了一些不寻常的事情。同性恋似乎是在女性中传递下来的。如果一个男人是同性恋，那么最有可能的是，在他的上一代里他父亲不是同性恋，他母亲的兄弟却是。
这个观察立刻让哈默想到这个基因也许是在X染色体上，因为一个男性只从他的母亲一方得到一套X染色体上的基因。他比较了他研究的那些家庭里中同性恋男性与“正常”男性基因标识的区别，很快发现了一个“可疑”区域：Xq28，位于X染色体长臂的顶端。同性恋的男性中，有75％的人都带有这个基因的一种形式，而“正常”男性中，有75％的人都带有这个基因的另一种形式。从统计学角度说，我们有99％的信心相信这个结果不是巧合。之后，其他结果也证明了这个结果的可靠性，而且还排除了这个区域与女性中的同性恋倾向的关系。
对于罗伯特?特里弗斯（RobertTrivers）这样的敏感的进化生物学家，同性恋基因有可能在X染色体上这一说法立刻让他有所联想。如果一个基因能够影响性取向，那么有一个问题就是，使人成为同性恋的那种形式很快就应该灭绝掉。但是，同性恋在当代人群里占有可观的比例。或许有4％的男性毫无疑问地是同性恋，还有更少的一些人是双性恋。因为平均来讲，同性恋的男性比“正常”男性更不可能有孩子，那么同性恋的基因就应该从很久以前就在人群中逐渐减少直到消失，除非它带有其他什么好处来弥补这一弱势。特里弗斯论述说，因为一条X染色体存在于女性体内的时间是它存在于男性体内时间的两倍，一个性别对抗的基因如果能够有助于女性的生殖能力，那么它即使对男性的生殖能力有两倍的损害，也仍然能够存留下来。比如说，假设哈默发现的基因决定女性青春期开始时的年龄，甚至是乳房的大小（记住这只是一个假设啊）。这些性质每一个都能够影响女性的生殖能力。在中世纪的时候，大乳房也许意味着更充足的奶水，或是能够嫁到一个有钱的丈夫，于是生下的孩子也就更有可能避免在婴儿期就夭折。就算同一个基因的同一种形式使得儿子觉得男性才有吸引力，因此降低男性后代的生殖能力，但是因为它给女儿带来益处，所以它仍然能够存在下来。
在过去，同性恋与两性之间冲突的联系只是一个大胆的猜想，直到哈默的基因被发现和被解码。事实上，Xq28与性别取向之间的联系仍然有可能是误导。麦克?贝利（MichaelBailey）最近对于同性恋家族遗传性的研究就没能发现同性恋由母系遗传的倾向。另外一些科学家也没能发现哈默声称的Xq28与同性恋之间的联系。现在看来这种联系也许只存在于哈默研究过的那些家族里。哈默本人也提醒大家，在同性恋基因被真正确定之前，轻易下结论是错误的。
而且现在又有了一个让事情变得更复杂的因素：另一种完全不同的解释同性恋的理论。现在变得越来越清楚的是，性取向与出生的顺序有关。一个男人，如果有一个或几个哥哥，那么他与那些没有兄弟姐妹、只有姐姐没有哥哥或者在家里是老大的男性相比，就更容易成为同性恋。出生顺序对性取向的影响如此之强，每多一个哥哥，一个人成为同性恋的可能性就增加三分之一。（这仍然是很低的可能性，3％再增加三分之一也只是4％。）这种现象现在已经在英国、荷兰、加拿大和美国都被发现和报道过了，而且在很多研究对象里都发现了。
对大多数人来讲，他们首先想到的是类似于弗洛伊德理论的想法：在一个兄长很多的家庭里长大，也许兄弟之间的关系使得一个人具有了同性恋的倾向。但是，就像我们常常发现的那样，用弗洛伊德理论作为对事物的第一反应往往是错的。（在旧的弗洛伊德理论中，同性恋被认为是由一个过于保护孩子的母亲和一个有隔膜的父亲造成的，这几乎肯定地是本末倒置了。其实是儿子正在形成的女人气让父亲对儿子有了隔膜，而母亲因为要补偿儿子，就变得保护过度。）回答也许又一次存在于两性之间的对抗中。
出生顺序对于女性同性恋倾向没有影响，她们在家庭里兄弟姐妹中的排行是随机的。这给了我们一个重要线索。除此之外，一个男人有几个姐姐也与他是否是同性恋无关。在一个已经孕育过男孩子的子宫里被孕育是一件不一般的事情，它会增加一个男人成为同性恋者的可能性。最好的解释与Y染色体上的一套三个活跃的基因有关。它们编码的蛋白质被称为H-Y次要组织相容性抗原。一个与它们相似的基因编码一种名叫抗谬氏激素的蛋白质，这种蛋白质对于人体的男性化有着至关重要的作用：它使得男性胚胎体内的谬氏小管萎缩，而谬氏小管正是子宫和输卵管的前身。这三个H-Y基因的功能是什么，还不确定。它们对于生殖器官的男性化并不是不可或缺的，有睾丸激素与抗谬氏激素就够了。H-Y基因的重要性在现在才开始显现出来。
这三个基因编码的蛋白质之所以被称为抗原，是因为它们“挑衅”母体的免疫系统产生一种反应。其结果就是母体的免疫系统在母亲孕育下一个男孩的时候更强了。（女婴不会制造H-Y抗原，也就不会引起免疫系统的反应。）雷?布兰查尔德（RayBlanchard）是研究出生顺序对同性恋的作用的人员之一，他论述说，H-Y抗原的任务是把一些器官中的一些基因激活，特别是大脑里的一些基因。事实上，在对于老鼠的研究中人们得到了一些证据说明这个说法是正确的。如果如此，那么母亲体内强壮的免疫系统就会对大脑的男性化起部分抑制作用，但却不会影响生殖器官的男性化。这样的男性就会被其他男性吸引，或者至少是对女性不太动心。有一个实验是让年幼的老鼠对H-Y抗原免疫，与对照组相比，这样的老鼠长大之后在很大程度上不能成功地交配。但急人的是研究人员们在报告里并没有说明不能正常交配的原因是什么。同样的，在果蝇发育过程中的某个关键时期，如果把一个叫做“转化器”的基因给激活，那么雄性果蝇就只会表现出雌性果蝇的性行为。这种变化一旦发生就不可逆转了。
人不是老鼠也不是果蝇，有足够多的证据表明人脑的性别分化在出生之后还会继续进行。除了个别例子之外，同性恋的男性并不是被禁锢在男性肉体里的女性。他们的大脑至少是被激素部分男性化了的。但是仍然有可能他们在早期的某个关键的敏感时期缺少了一些激素，而这永久性地影响到了一些功能，包括性取向。
比尔?汉密尔顿（BillHamilton）是最早形成性别对抗这一理论的人，他明白这会多么深远地影响我们对于什么是基因的认识。他后来写道：“现在有了这样一种认识，即基因组并不是为了一个项目——生存，生孩子——而存在的一个资料库再加一个实行计划的团队，就像我以前想象的那样。它开始越来越像一个公司的会议室，是自我中心的人和派系之间权力斗争的舞台”。汉密尔顿对于基因的新的理解开始影响到他对自己的头脑的理解：
我自己这个有意识的、看上去是不可分割的自我，结果竟与我的想象差别如此之远，我一点也不必因为怜悯自己而感到羞愧。我是被一个脆弱的联盟送到外面去的大使，带着一个四分五裂的帝国里那些心情紧张的统治者们给我的互相矛盾的命令。……当我写下这些字的时候，为了能够写下这些字，我就得装着自己是一个统一体，而在内心深处我知道这样一个统一体是不存在的。我是一个混合体，男性与女性、父辈与子辈、相互争斗的染色体片段，它们之间的冲突是在胡斯曼（Housman，诗人；塞汶河，英国最长的河流）诗中说的塞汶河（River Severn）看到凯尔特人与萨克逊人之前几百万年就形成了。
基因之间有冲突，基因组是父辈基因与孩子的基因、男性基因与女性基因之间的战场，这样一种说法，是除了少数进化生物学家之外鲜为人知的故事。但它却深深地动摇了生物学的哲学基础。
第八号染色体自身利益
我们是生存机器——糊里糊涂的、被事先编好程序的自动化机器，用来保存那些名叫基因的自私的分子。这是一个仍然让我感到目瞪口呆的事。 ——理查德•道金斯：《自私的基因》
随着新电器而来的使用手册总是很使人恼火。它们好像永远没有你最需要的那一条信息，弄得你团团转，让你气急败坏，而且它们在从中文被翻译过来_{（因为在西方国家销售的电器很多是中国制造的）}的过程中肯定有些内容被丢掉了。但是它们倒不会添什么东西进去，不会在你正读到要紧之处的时候忽然加五份席勒的《欢乐颂》或是一份半份套马指南。一般来说，它们也不会把一份怎样安装机器的说明重复五次，或把使用说明分成27段，每两段之间再插上好几页不相关的文字，让你连找自己想要的段落都很困难。但是这却描述了人类的视网膜细胞瘤基因。而且，就我们所知，这个基因是一个很典型的人类基因。它有27段有意义的段落，却被26页其他玩意给打断。
自然母亲在基因组里藏了一个卑污的小秘密。每一个基因都比它所必要的更繁复，它被打断成很多不同的“段落”也叫外显子；在它们之间是长长的随机、无意义的序列（叫做内含子），有些跟这个基因完全无关的有意义的片段在内含子里大量重复。这些重复片段有时候是另外一个完全不同的（不吉利）基因。
之所以出现这种“文字结构”上的混乱，是因为基因组是自己写自己，而且不断地加减、更改了40亿年。自己写自己的文件有着不同寻常的特性。尤其是它们很容易被别的东西当成寄生地。在这个时候打比方是有点不太沾边，但是，试想一个写使用手册的作家，他每天早晨到了自己的电脑前都会发现他文章里的各个段落都吵闹着要吸引他的注意。那些声音最大的逼着他把自己又重复了五遍，放在下一页里。结果就是，使用手册还是存在的，否则机器就永远没法组装起来了，但是其中充满了那些贪婪的像寄生虫一般的段落，它们因为作家的顺从得到了好处。
实际上，随着电子邮件的发展，这个比喻已经不再像以前那样不着边际了。假设我发给你一份电子邮件，读起来是这样的：“注意，有一个很厉害的电脑病毒出现了；如果你打开一个标题里有‘橘子酱’的邮件，它会洗掉你的硬盘！请把这个警告转发给所有你能想到的人。”这是我编的。就我所知，到目前为止还没有名叫“橘子酱”的电子邮件在游走。但是我却有效地夺走了你的早晨，让你发出我的这个警告。我的电子邮件就是病毒。
至此，这本书里的每一章都集中讲述了一个或一组基因，这背后的假设基因是基因组里最重要的东西。别忘了，基因是DNA的片段，是用来编码蛋白质的。但是我们的基因组里97%都不是真正的基因。它们是一大群各种各样的怪东西：有的叫伪基因，有的叫逆转录伪基因，有的叫卫星体、小卫星体、微卫星体、转座子、逆转录转座子。所有这些放在一起被统称“垃圾DNA”，有些时候也被叫做“自私DNA”，这种叫法比较准确。这些东西里有些是一种特殊的基因，但大多数就是一段一段的永远也不会被转录成蛋白质语言的DNA。因为它们的故事很自然地是接在上一章讲过的性别冲突的故事后面，所以，这一章我们就专门讲垃圾DNA。
碰巧这是一个适合讲述垃圾DNA的地方，因为关于八号染色体我没有什么特别可说的。这可不是在暗示这是一条枯燥乏味的染色体，也不是说它没有几个基因。这只是因为我们在这条染色体上发现的基因中没有一个引起了我这个没有耐心的人的注意。
（从它的大小来讲，八号染色体比较而言是被忽略了，它是基因图谱中被绘制得最不详细的染色体之一。）在每一条染色体上都有垃圾DNA。好笑的是垃圾DNA是人们在人类基因组里发现的第一个有真正实际的用途、在日常生活里能够用到的东西。它导致了DNA“指纹”检验。
基因是蛋白质的配方。但是并不是所有蛋白质的配方都是受身体欢迎的。在整个人类基因组里最常见的蛋白质配方是编码一个名叫逆转录酶的蛋白质的基因。逆转录酶基因对于人体来说一点用处也没有。如果在一个受精卵刚刚形成的时候把基因组里逆转录酶基因的每一个拷贝都小心地、魔术般地去掉，这个人有可能更健康、更长寿、更快乐，而不是相反。逆转录酶基因对于一种“寄生虫”来说才是至关重要的。它是艾滋病毒的基因组里非常有用——虽然不是必不可少——的组成部分，它在艾滋病毒侵入并杀死其他生命体的能力中起着重要的作用。相反，对于人体来说，这个基因是个讨厌的、有威胁的东西。但是它却是整个基因组里最常见的基因之一。在人类的染色体上散布着几百甚至上千个拷贝。这是个让人吃惊不小的事实，就像是我们突然发现了汽车的最常见用途是逃离犯罪现场。那么这个基因为什么存在呢？
从逆转录酶的功能里我们得到了一个线索。它把一个基因的125RNA拷贝翻录成DNA，又把这段DNA“缝”回基因组里去。它是一个基因的回程车票。利用这种方法，艾滋病毒可以把自己基因组的一部分整合到人体的基因组里去，以便更好地把自己隐蔽起来，更好地保存自己和更有效地复制自己。人类基因组里很多逆转录酶基因的拷贝之所以在那里，是因为我们能认得出来的一些
“逆转录病毒”把它们放在了那里，在遥远的过去或是最近的时期。人类的基因组里含有几千种病毒的几乎完整的基因组，大多数现在已经不再活跃，或者最关键的基因已经缺失了。这些“人体内在的逆转录病毒”占了人类基因组的1.3%。这听起来好像不算多。但是那些“合用的”基因也只占了3%。你要是觉得你是猿猴的后代这一事实打击了你的自信，那你就试着习惯于你也是病毒的后代这个想法吧。
但是，何不甩掉逆转录酶这个中间人呢？一个病毒的基因组完全可以去掉大部分基因，而只留下逆转录酶基因。这样，一个轻装过的病毒可以用不着那么辛苦地试图通过唾液或趁人性交的时候从一个人跑到另一个人，它就可以留在一个人的体内并且搭他的便车一代一代传下去。这是一个真正的寄生病毒。这种“逆转录转座子”比逆转录病毒还更普遍。最常见的是一串被称做LINE-1的“字母”这是一段DNA，大约1000〜6000字长，在靠近中间的地方有一份逆转录酶的完全编码。LINE-1不仅仅是多——在每一个人类基因组里面大约有10万份拷贝——而且还总集中在一起，也就是说在一条染色体上往往有好几段LINE-1紧紧挨在一起。它们占了整个基因组的14.6%，一个让人吃惊的数字。也就是说，它们比“真正”的基因多四倍。这个现象的含义很吓人。LINE-1有它们自己的回程车票。一个LINE-1可以让它自己被转录，然后造出它自己的逆转录酶，再用那个酶造出一份自己的DNA的拷贝并把它插回到基因组中随便一个位置上去。这也许就是为什么在基因组里有那么多份LINE-1。换句话说，这个重复性那么强的段126落之所以有那么多，就是因为它善于复制自己，没有其他原因。
“一个跳蚤身上还有更小的跳蚤，它又会挨比它更小的跳蚤的咬。”如果LINE-1存在于人类的基因组里，那么又会有其他的序列寄生在它中间，把自己的逆转录酶丢掉而用LINE-1的。比LINE-1还常见的，是一种很短的段落，叫做Alu。Alu有180〜280个字母，看上去好像非常擅长用别人的逆转录酶来复制自己。在人类的基因组里，Alu也许被重复了100万次——加起来大约占整个基因组的10%。
因为一些我们还不知道的原因，Alu的序列与一个真正基因的序列很相似，这个基因编码的是核糖体一制造蛋白质的机器——的一部分。这个基因与众不同的地方是它有一个内部启动子，也就是说，“读我”这个信号是写在基因中间的一段序列里的。这样，它就成了一个进行大量繁殖的最佳选择，因为它带有自己转录所需的信号，而没有必要把自己放在另外一个转录信号附近。结果就是每一个Alu基因可能都是一个“伪基因”。用一个通俗的比喻，伪基因就是那些生锈的基因残体，被一个很厉害的突变给扯到了水线以下，沉没了。它们现在歇息在基因海洋的底部，逐渐地长了越来越多的锈（也就是说，积累了越来越多的突变），直到它们与它们过去的样子再也不像了。举一个例子。在九号染色体上有一个很难描述的基因，如果你拿一份它的拷贝，在整个基因组里寻找与它相似的序列，你会发现有14个拷贝分布在11条染色体上:14条沉没的船体的鬼魂。它们是多余的拷贝，一个挨一个地，有了突变，不再被使用了。对于大多数基因来说可能都是如此。每一个正常的基因，都在基因组里的其他地方有一批坏了的拷贝。对于这14个拷贝来说，有意思的是它们不但在人类基因组里被找到，人们还试图在猴子基因组里寻找它们。人类体内的14份伪基因中，有3份是在旧世界猴子和新世界猴子（旧世界猴子指非洲和亚洲的猴子，新世界猴子指南美洲的猴子）分开之后才“沉没”的。科学家们激动地上气不接下气地说：这就意味着，它们从自己的编码功能上“下岗”，“只是”3500万年前的事。
Alu疯狂地复制了自己，但是它们也是在相对较近的时期才这样做的。Alu只在灵长类动物里才被发现过，被分成五个不同的家族，有些家族只是在猩猩和人分离之后才出现（也就是说，过去的500万年之内）。其他动物有其他的大量重复的短片段，在老鼠里有一种叫B1。
所有这些有关LINE-1和Alu的信息加在一起，是一个重要的却又在意料之外的发现。基因组里到处都是被乱丢的垃圾，甚至可以说基因组被电脑病毒那样的东西、自私的寄生序列给堵上了。它们存在的原因很简单很单纯，就是因为它们善于复制自己。我们的基因组里满是连环信件和关于橘子酱的警告。大约35%的人类基因组是各种形式的自私DNA，也就是说，要想复制我们自己的基因需要多花费35%的能量。我们的基因组太需要除虫了。
没有人猜到这一点，没有人预见到，当我们读出生命密码的时候，我们会发现它被自私的DNA这么没有限制地利用。但是我们其实应该预见到，因为生命的所有其他层次都充满了寄生现象。动物的肠道里有虫子，血液里有细菌，细胞里有病毒，为什么在基因里不能有逆转录转座子？再说，到了70年代中期的时候，很多进化生物学家，尤其是那些对行为感兴趣的，已经意识到了自然选择的进化方式主要不是关于物种之间的竞争、不是关于群落之间的竞争，甚至也不是关于个体之间的竞争，而是关于基因之间的竞争。这些基因用个体，也有个别时候用一个群体，作为它们暂时的载体。例如，如果让一个个体要么选择一个安全、舒适、长寿的生活，要么选择有风险、辛苦、危险地繁殖后代，几乎所有动物（事实上植物也如此）都选择后者。它们为拥有后代而选择增加自己死亡的几率。实际上，它们的身体被有计划地设计了废弃的过程，叫做衰老，它使动物在达到了生育年龄之后就逐渐开始机能的衰退，或者像枪乌贼或太平洋大马哈鱼那样，马上死亡。除非你把动物的身体看成是基因的载体，看成是基因在让它们自己长生不死的竞赛中的工具，否则这些便无法解释。与给下一代以生命这个目标相比，一个个体在生育之后是否继续存活是次要的。如果基因是自私的复制机器，而身体是可以丢弃的载体（用理查德•道金斯的颇有争议的术语来说），那么当我们发现有些基因可以用不着建立自己的身体就能够复制自己的时候，我们就不会那么惊讶了。当我们发现基因组也像身体一样，充满了它们独特的生存竞争与合作，我们也就不必惊讶了。在70年代，进化第一次成了遗传学概念。
为了解释基因组里充满了的大块大块的没有基因的区域，两组科学家在80年代提出，这些区域充满了自私的序列，它们的惟一功能就是在基因组里生存下来。“寻找其他解释的努力，’他们写道：“也许会证明不仅在学术上没有创意，最终也会是徒劳的。因为做了这么一个大胆的预言，他们在当时受到了不少嘲弄。遗传学家们当时仍然被这么一个思维上的框框束缚着：如果人的基因组里有一个什么东西，那么它肯定是为了人的目的而存在，而不是为了它自己的自私的目的。基因不过是蛋白质的配方。把它们想象成是有自己的目标与梦想的东西，没有任何道理。但是，那两组科学家的预言被精彩地验证了。基因的行为确实像是它们有自己的自私的目标，不是它们有意识地如此，而是我们回过头来研究它们的时候发现如此：看上去像有自己目标的基因繁衍下去了，而其他的基因则没有。
一段自私的DNA并不仅仅是个过客，它的存在不仅仅是把基因组加长了一些，使得复制基因组的时候需要更多的能量。这样一段DNA对于基因的完整性是个威胁。因为自私的DNA有从一处跳到另一处的习惯，要么就把自己的一个拷贝送到新的地点去，所以它很有可能跳到一个正常工作的基因的正中间，把这个基因搞得面目全非，然后又跳到一个新的地方去，突变也就又消失了。在40年代晚期，转座子就是这么被有远见而又被人忽视的巴巴拉•麦克林托克（Barbara Mc Clintock）（巴巴拉•麦克林托克：20世纪美国遗传学家。）发现的（她最后终于在1983年得到了诺贝尔奖）。她注意到，玉米种子颜色的突变只能够用这样一种理论解释，即有些突变是在色素基因里跳进跳出的。
在人体里，LINE-1和Alu通过跳到各种各样基因的中间而制造出了很多突变。例如，它们通过跳到凝血因子基因的中间而导致了血友病。但是，因为一些我们还没有很好理解的原因，作为一个物种，我们没有像有些其他物种那样被寄生的DNA困扰得那么厉害。大约每700个人类基因的突变里有一个是由“跳跃”的基因造成的，但是在老鼠里大约有10%的突变是由“跳跃”基因造成。跳跃基因潜在的危害有多大，被50年代一些很自然的实验在果蝇身上揭示出来了。果蝇是遗传学家心爱的实验动物。他们研究的这种果蝇学名为Drosophilamelanogaster，已经被运到全世界各地，在实验室里繁殖。它们常常会逃出来，从而遇到自然环境中其他种类的果蝇。有一种果蝇学名为Drosophilawillistoni，带有一种跳跃的基因名叫P因子。大约在1950年的时候，在南美某地，不知怎么一来（也许是通过一种吸血的尘螨），Drosophila willistoni的P因子进入了Drosophilamelanogaster。（人们对于所谓“异源器官移植”的一大担心，就是把猪或狒狒的器官移植给人的时候会不会把一种新的跳跃基因也引入到人体中去，就像果蝇中的P因子一样。）P因子从那时起就像野火一样蔓延开来，现在大多数果蝇都有P因子了，只除了1950年之前从自然界采集来又一直被与其他果蝇分开的那些。P因子是个自私的DNA，它通过破坏那些它跳上去的基因来表现出它自己的存在。逐渐地，果蝇基因组里的其他基因开始反攻了，它们发明了抑制P因子到处乱跳的手段。现在，P因子逐渐安顿下来，成了基因组里的旅客。
人体中没有像P因子这样邪恶的东西，起码现在没有。但是，一种类似的因子在大马哈鱼中被发现了，它叫做“睡美人”。当在实验室里被引入到人类细胞里之后它呈现出蓬勃生机，充分表现出剪贴DNA的能力。类似P因子的传播那样的事，也许在人类体内的九种Alu因子那里都发生过。每一个传遍整个物种，破坏其他基因，直到其他基因确定了它们的共同利益并合力抑制了这样一个跳跃因子，这样，这个跳跃因子就安顿下来，进入了它现在的这个比较沉寂的状态。我们在人类基因组里看到的不是什么飞速发展的寄生DNA感染，而是沉睡着的许多过去的寄生DNA，每一个都曾经传播得飞快，直到基因组抑制了它们。但是基因组却没有能够把它们清理出去。
从这个角度来说（从很多角度来说），我们比果蝇要幸运。如果你相信一种新的理论，那么我们好像有一种可以被普遍运用的功能，来抑制自私的DNA。这个抑制机能被称做胞嘧啶甲基化。胞嘧啶是遗传密码里面的那个C。把它甲基化（真的就是在它上面接一个由碳原子和氢原子组成的甲基）就使它不再被阅读和转录出来。基因组的大部分区域在大部分时间里都处于甲基化——被挡住——的状态，或者起码大部分的启动子（就是位于基因前面、转录开始的部分）是这样的。大家普遍假设甲基化的作用是把一种组织里面用不着的基因关闭，这样就使得大脑与肝脏不同，肝脏与皮肤不同，如此等等。但是另一个与之抗衡的理论正在越来越有影响力。甲基化也许与基因在不同组织里的不同表达形式一点没有关系，而与抑制转座子和基因组内部的奇生DNA有很大关系。大多数甲基化的部位都是在LINE-1和Alu这样的转座子中间。这个新的理论称，在胚胎发育早期，所有的基因都短暂地失去了甲基的保护，全都被“打开”了。接下来的，是由一些特殊的分子对整个基因组进行审查。这些分子的工作是发现那些高度重复的片段，并用甲基化来把它们关闭。在癌组织中所发生的第一件事就是基因的去甲基化。结果就是自私的DNA从它们的镣铸里被解脱出来，在癌组织里大量地表达。因为它们在破坏其他基因方面很在行，这些转座子就使得癌症变得更加厉害。根据这个理论，甲基化的作用就是抑制自私的DNA的影响。
LINE-1的长度一般是1400个字母。Alu则一般起码是180个字母。但是，有一些序列比Alu还要短，它们也大量地积累起来，像口吃的人说话那样地不断重复。也许把这些序列也称做是寄生DNA有些不着边际，但是它们的繁殖也是通过很类似的方法进行的——也就是说，它们之所以存在是因为它们自己带有一小段序列，能够把它们自己很好地复制出来。这些短序列中的一种，在法医学和其他学科里有很实际的用处。见一见“超可变微卫星体”吧。这个小小的序列在所有染色体上都找得到。在整个基因组里它占有1000多处位置。在每一个位置上它的序列都只含有一个“词组”大约20个字母长，重复很多次。这个词组可以因位置不同而有差别，也可以在不同的人体内有不同，但是它通常含有这些核心字母：GGGCAGGAXG（X可以是任何字母）。这个序列的重要性在于它与细菌中的一段序列非常相似，而细菌中的这段序列是用来与同一物种的其他细菌交换基因的。在人体内，它似乎也是参与了促进染色体之间基因交换的过程。就好像每一个这种序列都在它的正中间写有“把我换到别处去”的字样。
这是一个多次重复的微卫星体的例子：
hxckswapmeaboutlopl hxckswapmeaboutloplhxckswapmeaboutlopl hxckswapmeaboutlopl hxckswapmeaboutlopl hxckswapmeaboutloplhxckswapmeaboutlopl hxckswapmeaboutlopl hxckswapmeaboutlopl hxckswapmeaboutlopl。
在这个例子里一个序列有10次重复。在其他地方，那1000个位置上的每一处可能有一个词组的五次重复，也可能有50次重复。根据词组里的指令，细胞开始把这些词组与另一条相同染色体上同样位置的词组进行交换。但是在这个过程中细胞经常出错，以至于会增加或减少几次重复。这样，每一个序列的长度都在逐渐变化，变化的速率之快使得它们的长度在每个人体内都不一样，但是又慢得使得一个人体内这些重复的长度大多数都与他父母体内的一样。因为存在着上千个这种重复序列，结果就是，每个人都有一套独特的数字。
1984年，亚列克•杰弗里斯（Alec Jeffreys）（生物学家）与他的实验员维基•威尔逊（Vicky Wilson）偶然地发现了微卫星体。他们当时正在研究基因的进化，方法是比较编码人类肌肉里的蛋白质——肌球蛋白——的基因与海豹肌球蛋白的区别。他们发现在这个基因的中间有一段重复序列。因为每一个微卫星体都有相同的12个“核心”字母，但是重复的次数却变化很大，把这些序列找出来并比较它们在不同个体里长度上的区别是一件相对容易的事情。结果，它们在每一个个体里的重复次数变化如此之大，每一个人都有自己独特的“基因指纹”：一串黑色的条带，就像商品上的条带码一样。杰弗里斯立刻意识到了他这个发现的重要性。他放下了一开始研究的肌球蛋白，开始探索独特的基因指纹可以有些什么用处。因为陌生人之间的基因指纹区别非常大，移民局的官员立刻就对它有了兴趣，他们可以用这个办法来判断那些申请移民的人与他们声称的自己在美国的近亲是否真的有血缘关系。基因指纹测试显示，大多数人说的都是真话，这减轻了很多人的忧虑。但是，基因指纹的另一个更戏剧性的应用还在后头呢。
1986年8月2日，在英国莱切斯特（Leicestershire）郡一个名叫纳尔伯罗（Narborough）的小村子附近的灌木丛中发现了一个女学生的尸体。15岁的唐•阿什沃思（Dawn Ashworth）是被人强暴之后勒死的。一个星期之后，警方逮捕了一名医院的搬运工，这个年轻人名叫理查德•巴克兰（Richard Buckland），他对犯罪行为供认不讳。事情到此似乎就终止了。巴克兰理应被判有谋杀罪，然后去坐牢。但是，警方当时还急于侦破另外一粧悬案一三年之前，一个名叫琳达•曼（Lynda Mann）的女孩的命案。琳达死时也是15岁，同时她也是纳尔伯罗村人，另外，她也是遭强暴后被勒死并被弃尸荒野的。这两起谋杀是如此相似，很难想象它们不是同一个人干的。但是巴克兰却坚决不承认曼也是他杀的。
亚列克•杰弗里斯在基因指纹方面取得重大突破的消息，通过报纸传到了警察那里。而且因为杰弗里斯就在莱切斯特工作，离纳尔伯罗只有不到10英里路程，当地的警察就与他取得了联系，询问他是否能够证明巴克兰在曼的谋杀案中也是有罪的。他同意一试。警方给他提供了从两个少女身体内取到的精液和巴克兰的血样。
杰弗里斯没费任何力气就在三份样品里都找到了各种各样的微卫星体。一个多星期的工作之后，基因指纹就准备好了。两个少女体内的精液完全一样，肯定是来自同一个男人。就此结案？但是杰弗里斯在下一份样品里看到的事情让他非常震惊。巴克兰的血样与那两份精液的基因指纹完全不同：巴克兰不是杀人者。
莱切斯特郡警方对此表示了强烈的抗议，他们认为杰弗里斯肯定是什么地方搞错了，才得出这么一个荒谬的结论。杰弗里斯重新分析了样品，警局法医实验室也对样品进行了分析。他们得到了同样的结论。被搞糊涂了的警察很不情愿地撤销了对巴克兰的指控。在历史上，这是第一次以DNA序列为依据宣告一个人无罪。
但是让人揪心的疑点仍然存在。不管怎么说，巴克兰交待了犯罪的行为。如果基因指纹能够替无辜者昭雪又能抓住真凶，那才能让警察们信服呢。于是，阿什沃思死了5个月之后，警方鉴定了纳尔伯罗一带5500个男人的血液，以寻找一个与那个强奸杀人犯的精液相符的基因指纹。没有任何血样与精液的“指纹”相符。
之后不久，一个在莱切斯特的一个糕饼店里工作的伙计，伊恩•凯利（Ian Kelly），碰巧向他的同事提到这么一件事：他虽然住得离纳尔伯罗很远，但却参加了血样鉴定，他是应糕饼店的另外一个伙计的请求，才这样做的。另外一个伙计叫科林•皮切弗克（Colin Pitchfork），住在纳尔伯罗。皮切弗克告诉凯利说，警察是想陷害自己。凯利的同事把这件事对警察又复述了一遍，于是警察就逮捕了皮切弗克。皮切弗克很快就供认，自己杀了那两个少女，但是这一次，他的口供被证明是真的：他的血样的DNA“指纹”与两具尸体上找到的精液吻合。1988年1月23日，他被判终生基因指纹检测立刻就成了法医学最可靠与最有力的武器之一。皮切弗克一案是这项技术的一次精彩过人的演示，此后数年中，它给基因指纹检测定了基调：即使是面对着似乎占压倒优势的罪证，基因指纹鉴定仍然可以为清白的人洗清罪责；仅仅是用它来威胁罪犯就可以使人招供；它惊人地准确与可靠——如果使用正确；它依靠很少的身体组织，甚至鼻涕、唾液、毛发或死去很长时间的人的尸骨，就可以完成检测。
在皮切弗克案件之后的年代里，基因指纹鉴定走过了很长的路。仅仅在英国，截止到1998年年中，法医科学局就通过32万个DNA样品查出了2.8万名与犯罪现场的痕迹有关的人，还几乎有两倍多的样品被用来开释了无罪的人。这项技术被简化了，使得人们不再需要检查多个微卫星体，一个就可以了。这项技术也被发展得更灵敏了，极小的微卫星体或甚至“超微”卫星体都可以被用来提取出独特的“条形码”。不仅仅是微卫星体的长度，它们的序列也可以被测出来，使得DNA鉴定更加成熟。这样的DNA鉴定也在法庭上被滥用和不信任过，你可以想象在有律师掺和进来的时候就会如此。（大多数时候，对DNA鉴定的错误使用反映的是公众对于统计学的不了解，而与DNA没什么关系：如果你告诉一个陪审团，一个DNA样品与犯罪现场DNA吻合的随机概率是0.1%，而不是对他们说每1000个人里面有一个人的DNA会与犯罪现场的吻合，那么他们判被告有罪的可能性就高了三倍，而其实这两种说法是一回事。）
DNA指纹鉴定并不仅仅是给法医学带来了革命，对其他很多学科也是如此。在1990年，它被用来鉴定从墓中挖出来的约瑟夫•门格尔的尸体的真实性。它被用来鉴定莫尼卡•莱温斯基（Monica Lewinsky）裙子上的精液到底是否是克林顿总统的。它被用来鉴定那些自称是托马斯•杰斐逊（Thomas Jefferson）私生子（美国第三任总统）的后代的那些人到底是否说了真话。在亲子鉴定这个领域它是如此被发扬光大（不管是被政府部门公开地做还是被父母亲在私下做）。在1998年，一个名叫“基因身份”的公司在全美国的高速公路旁边都树起了广告牌，上面写着：“孩子的爸爸到底是谁？请拨打1-800-DNA-TYPE。”他们每天接到300个电话，咨询他们那600美元一次的鉴定。这些电话要么是那些正在要求孩子的父亲拿抚养费的单身妈妈打的，要么是那些心存怀疑的“父亲”打的，因为他们不知道女方生的孩子究竟是不是他们的。在三分之二以上的案例里，DNA证据显示母亲是说了真话的。DNA鉴定使有些“父亲”因为发现配偶不忠而受到伤害，它却又能够使其他父亲确知自己的怀疑完全没有根据。好处是否能抵偿坏处，还是一个可以争论的话题。可以想见，当第一个DNA鉴定方面的私人公司挂牌营业的时候，在英国出现了一场媒体上的激烈争论：在英国，这样的医学技术被认为是应该由国家而不是个人所掌握。
从一个更浪漫的角度来说，基因指纹检测在亲子测试方面的应用使我们对鸟类的歌唱有了更好的了解。你有没有注意过，鸫、知更鸟等鸣禽在春天与异性配对之后要持续地唱很长时间？对于那种鸟鸣的主要功能是吸引配偶的传统说法，这个现象简直就是当头一棒。生物学家从80年代末期开始对鸟进行DNA检测，以决定在每一个鸟巢里，哪只雄鸟是哪只幼鸟的父亲。他们很惊讶地发现，在那些“一夫一妻”制的鸟类里面，虽然一只雄鸟与一只雌鸟很忠实地互相扶助以抚养后代，雌鸟却不顾自己已有配偶这个明显的事实，还常常与邻居的雄鸟交配。不忠实、给“丈夫”戴绿帽子的现象比任何人想象得都多（这些都是非常隐秘地进行的）。DNA指纹鉴定将人们引入了一个爆炸性的研究阶段，最后产生了一个回报颇丰的理论：精子竞争。这个理论可以解释一些有趣的现象，比如说，虽然黑猩猩的身体只有大猩猩的四分之一大小，黑猩猩的睾丸却是大猩猩的四倍大。雄性大猩猩对它们的配偶是完全占有的，所以它们的精子没有竞争对手。雄性黑猩猩是与其他雄性“共有”配偶的，所以它们需要制造大量的精子、频繁交配，来增加自己做父亲的机会。这也能够解释为什么雄性的鸟在“结婚”之后叫得那么起劲，它们是在寻找“婚外恋”的机会。
第九号染色体疾病
一种令人绝望的疾病需要危险的疗法。 ——盖伊·福克斯
在第九号染色体上有一个知名度很高的基因：决定你的ABO血型的基因。在DNA指纹测试之前很久，血型测试就在法庭上出现了，因为警察有些时候会偶尔能够把犯罪嫌疑人的血液与犯罪现场的血液对上号。血液的对照是以假设犯罪嫌疑人无罪为前提的。也就是说，如果血样没对上，就证明你肯定不是杀人犯，但是如果对上了，却只能说明你有可能是杀人犯。
这个逻辑对于加利福尼亚州的最高法院倒并没有什么影响。在1946年，它判决查理·卓别林（Charlie Chaplin）毫无疑问地是某个孩子的父亲，虽然血型鉴定表明卓别林与那个孩子的血液根本不相配，不可能是孩子父亲。不过呢，法官们从来就不太懂科学。在关于谁是父亲的官司里，与在谋杀案里一样，血液对照就像DNA鉴定和手指指纹鉴定一样，是无辜者的朋友。在有了DNA鉴定之后，血样鉴定就是多余的了。血型在输血的时候是极为重要的，但也是以一种负面形式出现：被输入了错误的血的人是会死的。血型可以给我们提供一些人类迁移方面的见识，但是它们在这一方面的作用也被其他基因取代了。所以，你现在可能觉得血型这件事很没意思，那你就错了。从1990年开始，它们的一个新的用处被发现了：它们有望让我们了解我们的基因为什么有很多种形式，以及这么多的形式是如何产生的。它们掌握着人类多样性之谜的钥匙。
在血型方面第一个被发现也是我们了解最多的，是ABO系统。它们在1900年被首次发现，这个系统一开始有三套不同的名字，所以把人搅糊涂了：在莫斯（Moss）的术语里的Ⅰ型血与詹斯基（Jansky）的术语里的Ⅳ型血是一样的。理智逐渐占了上风，由血型的维也纳发现者卡尔·兰德斯坦纳（Karl Landsteiner）所发明的一套术语成了统一的术语：A，B，AB和O型。兰德斯坦纳形象地描述了输错血可以造成的灾难：“红血球都粘在一起了。”但是血型之间的关系不是那么简单的。A型血的人可以很安全地给A型或AB型的人献血；B型血的人可以给B型和AB型的人献血；AB型血的人只能给AB型的人献血；O型血的人可以给任何人献血——所以O型血的人被称为是万能献血者。在不同的血型背后也没有地域或种族的原因。欧洲人有大约40％是O型血，40％是A型血，15％的B型血和5％的AB型。在其他大陆上，比例也跟这个差不多，只除了在美洲有明显的不同。美洲的印第安人几乎全是O型血，只除了住在加拿大的一些部落和爱斯基摩人是例外，在加拿大的这些部落里有很多A型血的人。另外，爱斯基摩人也有些AB型和B型的人。
直到1920年，ABO血型的遗传性才被搞清楚，到了1990年，与这些血型有关的基因才见了天日。A和B是同一个基因“共同显性”的两种形式，O是这个基因的隐性形式。这个基因在第九号染色体上，靠近长臂的顶端。它的“正文”有1062个字母长，被分成六个短的和一个长的外显子（“段落”），分散在染色体的几页——总共有1.8万个字母——上面。它是一个中等大小的基因，被五个比较长的内含子打断。这个基因编码的蛋白质是半乳糖基转移酶，也就是说，是一个能够催化化学反应的酶。
A基因与B基因之间的区别只在1062个字母里的七个上面，这七个里面还有三个是相同意义的字母或是“不出声”的，也就是说，它们对于哪个氨基酸被选中加到蛋白质链上没有任何作用。那四个有作用的字母是第523、700、793和800个字母。在A型血的人体内这四个字母是C、G、C、G。在B型血的人体内则是G、A、A、C。另外还有其他一些极少见的区别。个别人会有几个A型的字母也有几个B型的字母，有一种极少见的A型血人是在基因末尾处丢了一个字母。但是，这四个字母的区别就足以使蛋白质上的区别大到在输错了血的时候可以引起免疫反应的程度了。
O型血的基因与A型只有一个字母的区别，但是，这并不是一个字母被另一个字母代替，而是一个字母的被删除。在O型血的人体内，第258号字母不见了，那里本来应该有个G的。它的影响却很深远，因为它所造成的是所谓的“阅读错位”或称“移码突变”，后果很严重。（还记得吗？弗兰西斯·克里克在1957年提出的那个巧妙的“没有逗号”的密码如果是正确的，那么移码突变就不会存在了）遗传密码是三个字为一个词被念出来的，中间没有标点符号。由三个字母的词组成的一句英文也许是这样的：thefatcatsattopmatandbigdogranbitcat（胖猫坐在垫子上，大狗跑过去咬了猫）。我承认，这句话不怎么优美，但是你能理解它的意思。如果换一个字母，它仍然可以理解：thefatxatsattopmatandbigdogranbitcat。但是你如果把这个字母去掉，然后把剩下的仍然三个字母一组地念出来，那就一点意义也没有了：thefatatsattopmatandbigdogranbitcat。在那些O型血的人体内，他们的ABO基因就出了这种事。因为他们的基因在比较靠近开头的地方就缺了一个字母，那之后的信息就成了完全不同的东西。结果是一个具有不同性质的蛋白质被造了出来，它无法催化正常的化学反应。
这听起来好像很严重，但实际上对人并没有什么影响。O型血的人在生活的各个方面都没有什么看得出来的缺陷。他们也并不会更容易得癌症，也不会在体育上不如人，也不会缺少音乐才能，等等。在优化人种论最盛行的时候，也不曾有政治家呼吁给O型血的人做绝育手术。实际上，关于血型的最可叹之处，也是它们之所以有用和在政治上又“中立”的原因，就是它们却是彻头彻尾的“隐者”，它们与人的任何事情都没有关系。
但是，这也是事情变得有趣了的时候。如果血型既是看不见的又是中性的，它们是怎样进化到现在这种状态的呢？美洲的印第安人都是O型血，是纯粹巧合吗？乍一看上去，血型好像是中性进化论——由木村资生（Motoo Kimura）（进化生物学家）在1968年提出的理论——的一个例子：这个理论认为大多数遗传多样性的存在不是因为它们在自然选择的过程中出于某种目的被选中，而是因为它们的存在对任何事情都没有妨碍。木村的理论说，突变就像水流一样源源不断地被注入基因组之中，然后又逐渐地被基因漂移——随机变化——而去掉。也就是说，变化是随时都存在的，也并没有什么适应环境方面的重要性。100万年之后如果回到地球上来看看，人类基因组的大部分都会与现在的不同了，而且纯粹是由于中性的原因。
“中性学派”与“选择学派”在有一段时间内都对自己的学说有忧虑。尘埃落定之后木村倒确实有了一批为数不少的跟随者。很多基因变异的后果的确是中性的。特别是当科学家们观察基因变异是如何影响蛋白质的时候，他们观察得越仔细，越发现大多数蛋白质的变化都不影响它的“活跃位点”，也就是蛋白质发挥自己功能的地方。有一种蛋白质，在两种生物体里面从寒武纪到现在积累了250个不同之处，但是只有6个对其功能有影响。
但是我们现在知道了，血型不是像它们看起来的那样中性。在它们的背后是有一个原因的。从60年代早期到现在，逐渐变得明显起来的是血型与腹泻之间有着某种联系。A型血的孩子常常会在婴儿期得某些类型的腹泻，而其他孩子却不会；B型血的孩子则会得其他一些类型的腹泻；如此这般。80年代晚期的时候，人们发现O型血的人更容易感染霍乱。在完成了十几项研究之后，细节变得更加清晰了。除了O型血的人更易感染霍乱之外，A、B和AB型血的人在霍乱易感性上面也有区别。抵抗力最强的是AB型血的人，其次是A型血的人，再次是B型血的人。但是所有这些人都比O型血的人抵抗力强得多。AB型血的人抵抗力如此之强，他们对霍乱几乎是有免疫力的。但如果因此就说AB型血的人能够喝加尔各答（Calcutta）下水道里的水也不会有病，那就是不负责任了——他们也许会得另一种什么病——但是千真万确的是，即使导致霍乱的细菌进入这些人体内并在肠道里安顿下来，这些人都不会有腹泻。
目前还没有人知道AB基因型是怎样给人体提供了保护以对抗人类疾病里最恶性最能致命的一种。但是它给自然选择提出了一个迷人而又直接的问题。别忘了，每一条染色体在我们的细胞里都有两份，所以，A型血的人实际上是AA，也就是说他们的两条九号染色体上各有一个A基因，而B型血的人实际上是BB。现在想象一个人群，只有这三种血型；AA、AB和BB。在抵抗霍乱方面A基因比B基因强。那么，AA的人就比BB的人可能有更多的孩子能够幸存下来。那是否B基因要从基因组里消失了呢？——这就是自然选择啊。但是这并没有发生，这是因为AB的人存活下来的可能性最高。所以，最健康的孩子是AA人和BB人的孩子。他们的所有孩子都是AB型，最抗霍乱的类型。但是如果一个AB型的人与另一个AB型的人生育后代，他们的后代里只有一半会是AB型；其他的孩子要么是AA要么是BB，后一种是最容易染上霍乱的。这是一个运气起伏不定的世界。在你这一代里最有利的组合，保证会给你一些容易染病的孩子。
现在想象一下，如果一个镇上所有的人都是AA，只有一个新来的女人是BB，那么事情会怎么样。如果这个女人能够抵挡住霍乱，达到生育年龄，那么她会有AB型的孩子，对霍乱有免疫力。换句话说，优势总是在较少的基因型那边，所以，A和B都不会消失，因为它们中的任何一个如果少了，它就会变成“时髦”的东西，又“流行”起来。在生物学上这叫做由频率决定的选择，而这是我们的基因为什么如此多样的最常见原因之一。
这解释了A与B之间的平衡。但是，如果O型血让你更容易感染霍乱，为什么自然选择没有让O型消失呢？答案与另一种疾病——疟疾——有关。O型血的人似乎比其他血型的人对疟疾更有抵抗力，他们好像也更不容易得一些类型的癌症。这一点生存优势也许就足以使O型基因免于灭绝了，尽管它与易得霍乱有关。一个大致的平衡就这样在血型基因的三种形式之间建立起来了。
疾病与突变之间的联系是在40年代晚期被一个肯尼亚血统的牛津研究生安东尼·阿利森（Anthony Allison）第一个注意到的。他怀疑一种在非洲流行的名叫镰刀型贫血症的疾病发病频率也许和疟疾是否普遍有关。镰刀型贫血症的突变导致血红细胞在无氧的时候缩成一个扁镰刀形，这对于那些带有两份拷贝的人是致命的，但是对于那些只有一份的人危害并不太重。但是，那些有一份突变的人对疟疾的抵抗力很强。阿利森检验了住在疟疾高发区的非洲人的血样，发现那些带有镰刀型贫血症突变的人带有疟原虫的可能性比其他人小得多。镰刀型贫血症突变在西非一些疟疾肆虐的地方尤其普遍，在非洲裔美国人里也很普遍，这些非洲裔美国人的祖先有些是坐着贩卖奴隶的船从非洲西部来到美国的。镰刀型贫血症是现在的人类为了过去的疟疾抵抗力而付出的代价。其他形式的贫血症，例如在地中海与东南亚一些地区比较普遍的地中海贫血症，看上去对疟疾也有同样的抵抗作用，这就能解释为什么它在曾经的疟疾高发区比较普遍了。
在这一点上，血红蛋白基因——镰刀型贫血症的突变就是这个基因上一个字母的改变——并不特殊。有一位科学家说它只是疟疾的基因防线的冰山一角，这样的基因可能多达12个，不同形式对疟疾有不同的抵抗力。在这一点上，疟疾也没有什么特殊的。起码有两个基因的不同形式对肺结核有不同的抵抗力，包括编码维生素D受体的基因，这个基因与人们对于骨质疏松症的不同的抵抗力也有关系。牛津大学的阿德里安·希尔（Adrian Hill）写道：“很自然地，我们忍不住要说，在很近的过去，自然对于肺结核抵抗力的选择，也许增加了对于骨质疏松症缺少抵抗能力的基因。”
在那同时，人们新发现了一个类似的关系，就是囊性纤维增生这个遗传病与传染性伤寒这个遗传病之间的关系。七号染色体上CFTR基因的一种形式会引起囊性纤维增生，这是一种很危险的肺与肠道的病变。但是同时CFTR基因的这种形式又能够保护人体免受伤寒——一种由沙门氏菌引起的肠道疾病——的危害。带有一份CFTR基因的这种形式的人会得囊性纤维增生，但是他们对伤寒带来的高烧和让人虚弱的痢疾几乎是免疫的。伤寒需要CFTR基因的正常形式，才能够侵入它瞄准了的细胞；被改变了的形式缺了三个DNA字母，伤寒就达不到目的了。因为伤寒杀掉了那些带有其他形式的CFTR基因的人，它就给这种有了改变的形式施加了压力，促使了它们的蔓延。但是，因为带有两份这种改变了的CFTR基因的人能活下来就不错了，这种形式也就从来不会太普遍。就这样，一个基因的少见又恶毒的形式，因为另外一个疾病的原因，被保留下来了。
大约每五个人里就有一个由遗传因素决定不能把ABO血型蛋白质的水溶形式释放到唾液与其他体液中去。这些“不分泌者”更容易得一些疾病，包括脑膜炎、酵母菌感染和重复发生的尿道感染。但是他们得流感或是受呼吸道合体细胞病毒影响的可能性又比一般人低。不管你往哪儿看，基因多样性背后的原因好像都与传染病有关。
对这个话题我们只是蜻蜓点水。在过去给我们的祖先带来过极大痛苦的那些大规模的传染病——瘟疫、麻疹、天花、斑疹伤寒、流感、梅毒、伤寒、水痘，等等——把它们的痕迹留在了我们的基因里。赋予了我们抗病能力的突变繁盛起来，但是抗病能力常常是要付出代价才能换来的，代价有的很高昂（镰刀型贫血症），有的只在理论上存在（不能接受错误血型的输血）。
实际上，直到最近，医生们仍然习惯于低估传染病的重要性。很多被普遍认为是由环境因素、职业因素、饮食习惯及偶然因素而造成的疾病，现在开始被认为是由一些人们不太了解的细菌和病毒的长期感染而造成的。胃溃疡是最精彩的一个例子。好几个医药公司因为发明了旨在对抗胃溃疡症状的新药而发了大财，但实际上只有抗生素才是惟一需要的药物。胃溃疡不是因为油腻食物、心理压力或是运气不好而造成的，它是由名叫Helicobactorpylori的螺旋菌引起的，这种细菌通常是在儿童时期就进入了人体。与此类似，现在有数据明显显示在心脏病与疱疹病毒之间可能有某种联系，各种形式的关节炎与各种病毒有关，甚至在精神分裂症或者抑郁症与一种少见的主要感染马和猫的脑病毒（称为伯尔诺脑病）之间也有关系。这些联系里，有一些或许会被发现是错的，有些可能是有病之后才引来的病毒与细菌，而不是病毒与细菌引来了病。但是，一个已经被证明了的事实是，人们在对诸如心脏病的各种疾病的遗传而来的抵抗能力上差异很大。也许基因上的不同也与对于细菌、病毒感染的抵抗力有关。
从某种意义上说，基因组就是一份我们过去的病史的书面记录，是每一个民族每一个种族的医学圣经。美洲印第安人中O型血那么多，也许反映的是这样一个事实：霍乱与其他形式的腹泻通常是在人口密集和卫生状况差的地方出现的，而在西半球新近才有人居住的新大陆上这些疾病还没有蔓延起来。不过，霍乱本来就是一种少见的疾病，在1830年以前也许只限于恒河三角洲地带，在1830年左右才突然扩展到欧洲、美洲和非洲。我们需要一种更好的解释来说明美洲印第安人中O型血非常普遍这一让人迷惑的现象，特别是从印第安人的干尸中找到的证据表明，在哥伦布到达美洲之前，印第安人里有不少是A型或B型血的。这看上去几乎像是有一种西半球特有的生存压力使得A型和B型从人群里很快消失了。有些迹象表明原因也许是梅毒，这似乎是一种在美洲一直存在的病（在医学史的圈子里这仍然是被激烈争论的一个观点，但事实是，在1492年以前的北美人骨骼里就可以发现梅毒的损害，而在1492年以前的欧洲人骨骼里则没有）。O型血的人与其他血型的人相比，似乎对梅毒的抵抗力更强。
现在我们来考虑一下一个很奇怪的发现，在血型与对霍乱的抵抗力之间的关系被揭示之前，这个发现是很令人不解的。假设你是一个教授，如果你让四个男人和两个女人都穿棉质的T恤衫，不许用香水和除味剂，在两个晚上之后还必须把T恤衫脱下来交给你，可能有人要嘲笑你有那么一点点性变态。如果你还请121位男人和女人来闻这些脏T恤衫的腋窝处并把它们按照对自己多么有吸引力来排个顺序，那么说婉转一点也是你这人太古怪。但是真正的科学家是不会感到尴尬的。克劳斯·维得坎德（Claus Wedekind）和桑德拉·菲里（Sandra Füri）（生物学家）就做了这么一个实验，结果他们发现，男人和女人都最喜欢（或最不讨厌）另外一个性别里与自己在基因组成上区别最大的那个成员的体味。维得坎德和菲里研究了六号染色体上的MHC基因群，它们是免疫系统用来定义什么是自我和用来识别寄生物和入侵者的。它们是可变性非常大的基因。如果其他条件都一样，那么，一只母老鼠会喜欢MHC基因与她自己区别最大的公老鼠，这是她通过闻他的尿来确定的。就是在老鼠身上的这个发现点醒了维得坎德和菲里，让他们想到，我们自己可能也仍然保有这样的能力，根据对方的基因来选择配偶。只有正在服用避孕药片的妇女才没有能够在闻T恤衫的实验中表现出对与己不同的MHC基因型的兴趣。但是我们知道避孕药片能够影响人的嗅觉。就像维得坎德和菲里说的：“没有一个人让所有的人都觉得好闻，关键在于是谁在闻谁。”
在老鼠身上的实验被一直用远系繁殖来解释：母老鼠是在试图找到一个基因很不同的公老鼠，这样她才能生下基因变化较多的孩子，因而不会有近亲繁殖所造成的疾病。但是，也许她——还有那些闻T恤衫的人——是在做一件知道了血型的故事之后才能理解的事。记住，在霍乱期间寻找性伴侣的时候，一个AA型的人找到一个BB型的人才是最理想的，这样他们的所有孩子都会是对霍乱有抵抗力的AB型。如果同样的机制在其他基因与其他疾病那里也有作用——而且，MHC基因群又似乎是抵抗疾病的最主要地点——那么，被基因组成与自己正相反的人所吸引，就是有显见优势的事情了。
人类基因组计划是建立在一个谬见上的。根本就没有一个“人类基因组”，在时间和空间上，都无法定义这么一个东西。在遍布于23条染色体上的几百个位点上，有着一些在每个人之间都不一样的基因。没有人可以说A型血是“正常”的而O型、B型和AB型是“不正常”。所以，当人类基因组计划发表了一个“典型”的人类基因组的时候，在九号染色体上的这个血型基因的序列应该是什么样子呢？这个计划公布的目标是发表平均的或具有“共性”的299个人的基因组。但是这在ABO基因这里就失去意义了，因为它的功能很重要的一条就是它不能够在每个人体内都一样。变化是人类基因组内在与不可分割的一部分，其实，对于任何其他基因组也是一样。
在1999年这一个特定的时刻，给基因组拍一张快照，并且相信这代表了一幅稳定和永久的图像，这也是不对的。基因组是在变化的。基因的不同形式被疾病的起伏驱动着，它们在人群里的普遍性也在起伏。人类有一个很值得遗憾的倾向，就是夸大稳定性的作用，太过相信平衡。实际上，基因组是不断变化的动态图景。过去，生态学家相信过所谓的“高峰”植被——英国的橡树、挪威的枞树。他们现在已经学乖了。生态学与遗传学一样不是关于平衡态的学科，它们是关于变化的学科。变化，再变化，没有任何事是永远不变的。
第一个瞥见这个道理的人可能是J.B.S. 霍尔丹_{（Haldane）（遗传学家）}，他曾试图找出人类基因如此多样的原因。早在1949年，他就推测到基因的多样性也许与寄生因素对其施加的压力有很大关系。但是，霍尔丹的印度同事苏莱士·贾亚卡尔（Suresh Jayakar），在1970年才真的把船摇动了。他认为稳定性根本就没有必要，那些寄生的因素会导致基因频率永远周而复始地变化。到了80年代，火炬传到了澳大利亚罗伯特?梅（Robert May）那里。他证实即使在一个最简单的寄生物与宿主系统里，也可能没有一个平衡状态：在一个因决定果的系统里也会永远有混沌的潮流在涌动。梅就这样成了混沌学说的奠基人之一。接力棒又传到英国人威廉·汉密尔顿（William Hamilton）那里，他发展了一些数学模型来解释有性生殖的进化，这些模型依靠的是寄生因素与宿主之间的“军备竞赛”，这种竞赛最终就会导致汉密尔顿所说的“很多基因永不安宁”。
在70年代的某个时候，就像在那之前半个世纪发生在物理学方面的事情一样，生物学的确定性、稳定性、决定论这个旧世界坍塌了。取而代之的是，我们需要建立一个起伏不定的、变化的、不可预测的世界。我们这一代人破解的基因组密码只不过是一份不断变化的文件的一张照片。这个文件没有一个权威性的版本。
第十号染色体压力
这真是一个这世界上绝对愚蠢的做法：当我们遇到败运的时候——常常由我们自己行为的过度造成——我们把我们的灾祸归罪到太阳、月亮和星星上，就仿佛我们必须是坏蛋，是天国的力量才让我们成为蠢货。……这是嫖客逃避责任的一个壮举：把自己那好色的性子说成是星星的命令。 ——威廉·莎士比亚，《李尔王》
基因组是记载着过去的瘟疫史的圣经。我们的祖先对疟疾和痢疾的长期抗争被记录在人类基因的多样性中。你有多大机会能够避免死于疟疾，是在你的基因里与疟疾病原体的基因里事先编排好了的。你把你的队伍送出去参加竞赛，疟原虫也把它的队伍送出来。如果它们的进攻队员比你的防守队员棒，它们就赢了。抱怨你的差运气吧，你没有替补队员可换。
但是，应该不是这样的吧？基因对疾病的抵抗能力应该是我们的最后一道防线，有各种各样比这简单的办法来打败疾病的。睡在蚊帐里面，把臭水沟抽干，吃药，在村子里撒DDT。吃好，睡好，避免精神压力，让你的免疫系统保持健康状态和在多数时候保持愉快的情绪。所有这些都与你是否会染上疾病有关。基因组可不是惟一战场。在前面几章里我进入了简化论的习惯。我把生物体拆开，把基因分离开，去辨别它们每一个有什么兴趣。但是没有一个基因是孤岛。每一个都存在于一个巨大的联盟之内，也就是身体。现在是把生物体的各部分放回到一起的时候了，现在是去探访一个“社交很广”的基因的时候了。这个基因的惟一功能就是把身体里一些不同的功能组织到一起。这个基因的存在昭示出我们有关肉体—精神的二重性是个谎言，它侵蚀着我们对人的认识。大脑、身体和基因组是被捆在一起的三个舞伴。基因组与另两者相互控制。这多少说明了为什么基因决定论是一个神秘的东西。人类基因的激活与关闭可以被有意识的与下意识的外界活动所影响。
胆固醇是一个充满危险的词。它是心脏病的病因，是个坏东西，是红肉，你吃了就要死的。其实，把胆固醇与毒药等同起来的做法是错得不能再错了。胆固醇是身体的一个基本成分，它在一个微妙的将身体各部分组织到一起的生物化学与遗传系统里占有中心位置。胆固醇是一类很小的有机物，能溶解在脂肪里，不能溶解在水里。身体利用来自饮食的糖类合成它所需要的大部分胆固醇，没有它，人就活不下去。起码有五种至关重要的激素是由胆固醇出发制成的，每一个都有独特的功能：孕酮、醛固酮、皮质醇、睾酮和雌二醇。它们总称类固醇。这些激素与身体中的基因的关系既亲密又迷人，却也让人不安。
类固醇激素被生命体使用了很长时间，也许比植物、动物和真菌的分道扬镳还要早。促使昆虫蜕皮的激素就是一种类固醇。在人类医学里那个被人们称为维生素D的谜一般的物质也是类固醇。有些人工合成的（或说是合成代谢）类固醇可以骗身体去抑制炎症，另外一些则可以用来强化运动员的肌肉。但是还有一些类固醇，虽然是从植物中提取出来的，却与人类的激素足够相似，可以用做口服避孕药。还有另外一些是化学公司的产品，也许它们要为被污染的水流中雄鱼雌化以及现代男人精子数目的减少负责。
在第十号染色体上有一个基因名叫CYP17。它制造一种酶，使得身体能够把胆固醇转化成皮质醇、睾酮和雌二酮。如果没有这个酶，这个转化途径就被堵上了，那个时候，从胆固醇就只能造出孕酮和皮质酮。没有这个基因的正常形式的人无法制造出其他的性激素，所以他们就无法进入青春期之后的阶段。如果他在基因上是男性，他也会长得像个少女。
但是先把性激素往旁边放一放，来考虑一下用CYP17造出的另一种激素：皮质醇。人体内的几乎每一个系统都用得上皮质醇，它名副其实地是一个把身体和精神结合起来的激素，因为它可以改变大脑的结构。皮质醇干预免疫系统，改变耳朵、鼻子和眼睛的灵敏度，改变各种身体机能。当你的血管里流动着很多皮质醇的时候，你就处于压力之下，这是压力的定义。皮质醇与压力几乎就是同义词。
压力是由外部世界造成的，一个将要来临的考试、最近一个亲人的死亡、报纸上的什么吓人的消息或者因为照顾一个早老性痴呆症病人而感觉到的无休止的劳累。造成短暂压力的因素会导致肾上腺素与去甲肾上腺素的迅速上升，这两种激素使心跳加快，双脚冰凉。这两种激素在紧急情况下让身体做好“打还是跑”的准备。造成长期压力的因素激活一条不同的路径，结果是皮质醇缓慢而持续地增加。皮质醇最惊人的效应之一是它能够抑制免疫系统的工作。那些准备一个重要考试并出现了受到心理压力之后特有的生理特点的人更容易得感冒或受到其他感染，这是一个很重要的事实，因为皮质醇的效应之一就是减少淋巴细胞—白细胞的活性、数量和寿命。
皮质醇靠激活基因来做到这一点。它只激活内含皮质醇受体的细胞里的基因，皮质醇受体则是由其他某些开关来控制的。它激活的那些基因的主要功能，是激活其他一些基因，有些时候，再激活的基因又去激活其他的基因，如此下去。皮质醇的间接影响可以多至几十甚至几百个基因。但是这个过程的开端——皮质醇的产生则是因为肾上腺皮质里有一系列的基因被激活了，它们制造出了生产皮质醇所需的酶，CYP17蛋白质就是其中之一。这是一个让人头昏眼花的复杂系统：如果我只是试着列出最基本的化学反应链，就能让你闷得要哭。所以，也许这样说就足够了：你需要几百个基因来生产和调节皮质醇并对皮质醇做出适当反应，而几乎所有这些基因的作用都是把其他基因激活或关上。这是很适时的一课，因为人类基因组里大部分基因的功能就是调节其他基因的表达。
我说过我不会让你觉得闷，但还是让我们瞟一眼皮质醇的一个效应吧。在白细胞里，皮质醇几乎肯定参与了激活一个名叫TCF的基因，也在十号染色体上，这样，TCF就可以制造自己的蛋白质，然后用它去抑制一个名叫白介素二号的蛋白质的表达。白介素二号是一种使白细胞高度警惕、提防微生物的袭击的化学物质。所以，皮质醇会抑制你的免疫白细胞的警惕性，从而使你更容易得病。
我想放在你面前的问题是：到底谁是管事儿的呢？是谁在一开始就把这些开关都放在了合适的位置上？又是谁决定什么时候把皮质醇释放出来？你可以说基因是管事儿的，因为身体的分化——身体内形成不同的细胞类型，在每一类型内活跃着的基因都不同——归根结底是个遗传的过程。但是这是不确切的，因为基因并不会引起生理和心理压力。一个所爱的人的死亡或是一个即将来临的考试并不与基因直接对话。它们只是经过大脑处理的信息。
那么，大脑是管事儿的了？脑子里的下丘脑会发出一个信号，让脑垂体释放一种激素，它会告诉肾上腺皮质去制造和分泌皮质醇。下丘脑则是从大脑里有意识的那些区域接受指令，而这些区域又是从外部世界中得到信息。
但是这也不能算是个答案，因为大脑也是身体的一部分。下丘脑之所以刺激脑垂体，脑垂体之所以刺激肾上腺皮质，并不是因为大脑认识到了这是一个很好的办法。大脑并没有设立这样一套系统，让你在要考试的时候就容易得感冒。是自然选择设立的这样一个系统（原因我稍后会解释）。而且，无论如何，这样一个系统都是非自主、无意识的举动，也就是说，是考试，而不是大脑，在主导这一切事件。如果考试才是罪魁祸首，那么我们就应该怪社会了，但是社会又是什么？也不过是很多个体的集合，于是我们就又回到身体上来了。另外，对抗压力的能力也因人而异。有些人觉得即将来临的考试非常恐怖，其他人却一路顺利。区别在什么地方？在制造、控制皮质醇与对皮质醇做出反应这一系列事件的链条上，易受压力的人与那些对压力没有什么反应的人相比，肯定有一个地方在基因上有细微的差别。但是，又是谁、是什么控制着这个基因上的差别呢？
真正的情形是，谁也不是管事儿的。让人们习惯于这样一个事实是太难了，但是，世界充满了错综复杂的系统，它们设计巧妙，部件之间相互紧密地联系着，但是却没有一个控制中心。经济就是这样一个系统。有一个幻觉是如果有人去控制经济——决定什么产品应该由什么人在什么地方生产——它就会运转得更好。这个想法给全世界人民的健康和富裕都带来了巨大灾难，不仅是前苏联，在西方世界也是如此。从罗马帝国到欧洲国家联盟的高清晰度电视计划，由一个中心做出的应该在哪个方面投资的决定比无中心的市场调节而成的“混乱”差远了。所以，经济系统不应有控制中心，它们是由分散的因素来控制的市场。
人体也是这样。你不是一个通过释放激素来控制身体的大脑，你也不是一个通过激活激素受体来控制基因组的身体，你也不是一个通过激活基因来激活激素来控制大脑的基因组。你同时又是以上所有这些。
心理学里很多最古老的说法可以概括成此类错误概念。支持与反对“遗传决定论”的理论都事先假设基因组的位置是在身体之上的。但是就像我们看到的那样，是身体在需要基因的时候把它们激活，身体之所以这样做，常常是因为它是在对大脑（有时还是有意识的）对外部事件的反应做出回应。你可以只靠想象那些给人压力的场景——甚至是虚构的——就可以提高你体内的皮质醇水平。与此相似，争论一个人所受到的某种痛苦纯粹是精神上的原因还是也有部分是生理上的原因——例如ME，或叫慢性疲劳综合症，是完全不对的事情。大脑与身体是同一个系统的两个部分。如果大脑在回应心理上的压力时刺激了皮质醇的释放，而皮质醇抑制了免疫系统的活性，从而一个潜伏在体内的病毒感染得以发作起来，或是一个新的病毒得以进入身体，那么症状虽然是生理上的，原因却是心理上的。如果一种疾病影响到大脑，从而改变人的心情，那么原因虽是生理上的，症状却是心理上的。
这个题目被称做心理神经免疫学，它正在慢慢地成为时尚。抵制它的多是医生，而把它吹得很神的是各种给人实施信心疗法的人。但是，证据却是足够真实的。长期心情不好的护士更容易得冻疮，虽然其他护士可能也带有同样的病毒。焦虑的人比起心情好的乐天派，更容易得阵发性的生殖系统疱疹。在西点军校，最容易得单核细胞增多症和得了这种病之后最容易出现严重症状的，是那些被功课压力搞得焦虑不安的学生。那些照顾早老性痴呆症患者的人（这是个压力很大的工作）的抗病T淋巴细胞要比估计的少。在三厘岛（Three Mile Island）核设施事故_{（1979年在美国东部宾夕法尼亚州附近三厘岛核电站发生的核泄漏事故）}发生时居住在那附近的人，事故发生三年之后得癌症的比估计的多，并不是因为他们受到了放射线的伤害（他们并没有），而是因为他们的皮质醇大量增加，降低了免疫系统对癌细胞的反应。那些受到配偶死亡之痛的人，之后几个星期之内免疫力都比较低。父母如果在前一个星期里吵过架，那么他们的孩子就更容易得病毒感染。在过去的生活中有过心理压力的人，比起那些一直生活愉快的人来更容易患感冒。如果你发现这些研究有点让人难以置信，那么我告诉你，这些研究中的大多数在老鼠身上也能够得到相似结果。
可怜的老勒内·笛卡儿（René Descartes）_{（17世纪数学家、科学家、哲学家）}，人们通常说是他发明了主宰了西方世界的身心二元论，使得我们拒绝接受精神可以影响肉体、肉体也可以影响精神这样一个观点。把这个归罪于他可不公平，这是我们大家都犯的错误。而且，不管怎样，并不都是二元论的错——这个理论本来是说有一个存在于组成大脑的物质之外的精神。我们都犯过一个比这更大的错误，犯这个错误如此容易，我们自己都没有察觉。我们直觉地假设身体里的生物化学反应是因，行为是果，我们还在思考基因对我们生活的影响的时候把这个假设推到可笑的极致。如果基因对行为有影响，那么基因就是因，就是不可变的。这个错误不仅遗传决定论者会犯，他们那些吵闹的反对者也犯，这些反对者认为行为“不是在基因里”，他们说行为遗传学所暗示的宿命论和先决论让人反感。他们给了遗传决定论者太多余地，没有对“基因是因”这个假设提出疑问，因为他们自己也做了同样的假设：如果基因是与行为有关的，那么基因肯定是在金字塔的顶端。他们忘记了，基因是需要被激活的，而外界事件——或者说，由自由意志控制的行为——可以激活基因。我们可远不是缩在我们那无所不能的基因脚下，受它们的恩赐，我们的基因经常是受我们的恩赐。如果你去玩“蹦极”，或者找一份压力很大的工作，或者持续地想象一个可怕的事情，你会提升你体内的皮质醇水平，而皮质醇就会在你的身体内跑来跑去地激活各种基因。（还有一个无可置疑的事实，就是你可以用故意而为的微笑来刺激你大脑里的“高兴中心”，就像你可以用一个愉快的想法来使你微笑一样。微笑真的会让你觉得愉快一些。生理变化可以被行为调动。）
关于行为怎样改变基因表达，有些最好的研究是用猴子做的。很幸运的，对于那些相信进化论的人来说，自然选择是个俭省得可笑的设计师，她一旦想出了一个基因与激素的系统用来显示和对付身体所受的压力，她就很不情愿修改了。（我们的98％是黑猩猩，94％是狒狒，还记得吧？）所以，在我们体内与在猴子体内，有同样的激素用同样的方法激活同样的基因。在非洲东部有一群狒狒，它们血液中的皮质醇水平被人们仔细地研究过。雄狒狒到了一个特定年龄都惯于加入一个狒狒群。当一只年轻的雄狒狒刚刚加入一个狒狒群的时候，他变得极赋进攻性，因为他要通过打架来建立他在自己选择的这个“集体”里的地位。他的这一行为使得他这位“客人”的血液里的皮质醇浓度大幅上升，他的那些不情愿要他的“主人”们血液皮质醇浓度也上升了。随着他的皮质醇（以及睾丸酮）浓度上升，他的淋巴细胞的数量减少了，他的免疫系统直接受到了他的行为所造成的冲击。与此同时，在他的血液里，与高浓度脂蛋白（HDL）结合在一起的胆固醇越来越少。这样的现象是冠状动脉堵塞的一个经典的前兆。这个雄狒狒通过自己的自由意志在改变自己的激素水平，于是也就改变了自己体内的基因表达，这样，他便增加了自己受微生物的感染与得冠状动脉疾病的机会。
在动物园里生活的那些得冠状动脉疾病的猴子都是在尊卑顺序里最下层的。它们被那些地位更高的同伴欺负，持续地感受到压力，血液里皮质醇浓度高，大脑里缺乏5-羟色胺，免疫系统永久性地被抑制着，它们的冠状动脉壁上积满了伤疤。到底这是为什么，仍然是一个谜。很多科学家现在相信冠状动脉疾病至少部分是由于微生物感染而引起的，例如一种球状的革兰氏阴性细菌和疱疹病毒。压力带来的是降低免疫系统对这些潜伏的感染的监视，使得它们得以繁荣起来。在这个意义上，也许在猴子那里心脏病是一种传染病，虽然压力也会有一定作用。
人和猴子很像。在尊卑次序里靠底层的猴子容易得心脏病这一发现，是紧跟着另一个更让人吃惊的发现之后做出的。另外一个发现是：英国的公务员得心脏病的可能性是与他们在这个官僚机构里的地位有多低成正比的。一个大型、长期的研究调查了1.7万名在伦敦警察局工作的公务员，一个几乎令人无法置信的结果出现了：一个人在工作中的地位比他是否肥胖、是否吸烟和是否血压高更能准确地预示这个人是否有心脏病。一个做低级工作的人，比如清洁工，比起一个在人堆儿上面地位稳固的秘书，得心脏病的可能高几乎三倍。实际上，即使这个秘书很胖、有高血压，或者吸烟，在每一年龄段他得心脏病的可能性仍然小于一个很瘦、血压正常且不吸烟的清洁工。在60年代对100万名贝尔电话公司雇员的一个类似调查中也得到了同样的结果。
把这个结论考虑一分钟。它把别人告诉过你的所有关于心脏病的知识都给削弱了，它把胆固醇推到了故事的角落（胆固醇高是一个危险因素，但是只在那些因为遗传原因而容易高胆固醇的人那里才是如此，而且即使对于这些人，少吃含脂肪食物的收益也很小）。它把饮食习惯、吸烟和血压——医学界喜欢把这三者说成是心脏病的生理原因——变成了间接的致病因素。它把一个陈旧和已经不太为人所信的说法变成了一个脚注，这个说法认为压力和心脏病来自于繁忙的职务高的工作，来自于喜欢快节奏生活的个性。这个说法有一丝真理在里面，但不多。科学研究把这些因素的作用都降低了，取而代之的是与生理状况无关的纯粹环境的因素：你在工作中的地位。你的心脏是否健康要看你拿的薪水怎么样。这到底是怎么回事呢？
猴子那里有些线索。它们在尊卑次序里越低，它们就越无法控制自己的生活。公务员也如此。皮质醇浓度的提高不是看你面对的工作数量多还是少，而是看你被多少人呼来喝去。实际上你可以通过实验来演示这个效果：给两组人同样多的工作，但是命令一组人用一种规定的方法去做这个工作，必须遵守某个事先规定的进度。这一组被外界控制的人比起另外一个组来，体内因压力而释放的激素浓度更高，血压升高，心率加快。
在对伦敦警察局雇员进行的研究开始20年之后，同一项研究在一个已经开始私有化的公众服务部门里被重复了一次。在研究一开始，公务员们都不知道失业意味着什么。事实上，当研究者们为这项研究设计问卷的时候，被调查对象对问卷中的一道题提出了异议，这道题是问他们是否害怕失去自己的工作。他们解释说，在公众服务这个行业，这个问题根本没有意义，他们最多会被转到另外一个部门去。到了1995年，他们就清楚地知道失去工作意味着什么了，三分之一以上的人已经尝过失业的滋味了。私有化的结果，是给了每个人这样一种感觉：他们的生活是受外部因素控制的。一点也不令人吃惊地，心理压力增加了，健康情况随之下降了，健康情况恶化的人数之多，无法用饮食、吸烟、喝酒方面习惯的改变来解释。
心脏病是自己无法控制自己的生活时出现的症状，这样一个事实解释了它的出现为什么是分散的。它也能够解释为什么那么多有高级职务的人退休“享受悠闲生活”之后不久就会得心脏病。他们常常是从指挥一个办公室“沦落”到在由老伴做主的家庭环境里干一些“低级”的需要动手的活儿（洗碗、遛狗之类）。它能够解释为什么人们可以把某一种疾病甚至是心脏病的发生推迟到一个家庭成员的婚礼或是一个重大庆典之后——直到他们操持、忙碌、做出决定之后。（学生也是更容易在紧张的考试之后生病，而不是在考试期间。）它能够解释为什么失业和靠救济金生活是如此有效的让人生病的办法。在猴群里面，没有一只雄性首领是像政府的社会福利署控制那些领救济金的人那样来铁面无私地控制它属下的猴子的。它甚至有可能解释为什么那些窗户不能被打开的现代化大楼会让人容易生病，因为在老式楼房里面人们能够对自己的环境有更多的控制。
我要再强调一遍我已经说过的话：行为远不是受我们的生物特性所控制，我们的生物特性常常是受我们的行为控制的。
我们发现的皮质醇的特点对于其他类固醇激素也适用。睾丸酮在体内的水平与进攻性成比例。但这是因为这种激素导致进攻性，还是因为进攻性导致这种激素的释放？我们的唯物主义思维使得我们发现第一种说法比较可信。但是事实上，对于狒狒的研究表明，第二种说法却更接近于真理。心理变化先于生理变化而出现。精神驱动身体，身体驱动基因组。
睾丸酮和皮质醇一样可以抑制免疫系统。这就解释了为什么在很多物种里雄性比雌性容易染病，染病之后的死亡率也比雌性高。免疫机制的抑制不仅仅只反映在身体对于微生物的抵抗力方面，也反映在对于大的寄生虫的抵抗力方面。牛蝇在鹿和牛的皮肤上产卵，孵出来的蛆虫先要爬进这些动物的肉里去，然后才返回到皮肤上去做一个小“窝”在里面变成蝇。挪威北部的驯鹿就特别为这种寄生虫所困扰，但在雄鹿身上又明显地比雌鹿身上更严重。平均来说，到了两岁的时候，一只雄鹿身上牛蝇的“窝”比雌鹿身上要多两倍。但是，被阉割了的雄鹿身上牛蝇的“窝”又与雌鹿差不多了。类似的模式在观察很多寄生虫的时候都会发现。例如，包括引起南美锥虫病的原生动物，人们普遍认为这种病就是达尔文长期不适的原因。在智利旅行的时候，达尔文曾被传播南美锥虫病的虫子叮咬过，他后来的一些症状也与这种病相吻合。如果达尔文是个女人，他也许就用不着花那么多时间替自己委屈了。
但是在这里，我们从达尔文那里得到启发。睾丸酮抑制免疫系统的功能这一事实被自然选择的表弟——性别选择——给抓住并且很充分地利用了。在达尔文论进化的第二部著作《人类的由来》里，他提出了这样一个想法：就像育鸽子的人能够培养良种鸽子一样，女人也可以培养“良种”男人。如果雌性动物在连续多代里用固定的标准来选择与谁交配，她们就可以改变她们这个物种里雄性的身体形状、大小、颜色或歌声。事实上，就像我在关于X和Y染色体的那一章里讲过的，达尔文提出过，这样的事在孔雀里就发生过了。在他之后一个世纪，一系列的实验与理论研究在70年代和80年代证明了达尔文是正确的。雄性动物的尾巴、羽毛、角、歌声和身体大小都是由于一代一代的雌性动物在择偶时条件一致而逐渐形成的。
但是为什么呢？一个雌性动物选了一个长尾巴或是大声唱歌的雄性动物，她能得到什么可以想见的好处呢？在人们的争论中，有两个受人欢迎的理论占了主要位置。一个是说，雌性动物需要迎合时尚，否则她们生的儿子可能就不会被那些迎合时尚的雌性动物选中。另一种理论是我想在这里让读者考虑的，那就是雄性体表那些“装饰物”的质量以某种方式反映了他的基因的质量，尤其是反映了他对流行疾病的抵抗力。他是在对所有愿意倾听的人说：看我是多么强壮啊，我能够长一条长长的尾巴，能够唱这么动听的歌，是因为我没有得疟疾，也没有生寄生虫。睾丸酮能够抑制免疫系统这一事实其实是帮助了雄性，使他的“话”更加真实可信。这是因为他那些“装饰物”的质量取决于他血液里睾丸酮的浓度：他体内的睾丸酮越多，他的外表就越五颜六色，身体就越大，越会唱歌，也越有进攻性。如果他能够在免疫机能被睾丸酮降低了的情况下不仅不生病，还能长一条大尾巴，那么他的基因肯定很了不起。这几乎像是免疫系统把他的基因“真相”掩盖住了，睾丸酮则把帷幕掀开，让雌性直接看看他的基因到底怎么样。
这个理论被称做免疫竞争力缺陷，它是否正确，取决于睾丸酮对免疫系统的抑制作用是否真的不可避免。一个雄性动物无法既提高睾丸酮的浓度又使免疫系统不受影响。如果这样一个雄性动物存在，他无疑是一个巨大的成功，会留下许多后代。因为他既能长一条长尾巴又能有免疫力。因此，这个理论暗示着类固醇与免疫能力之间的联系是固定不变、不可避免的，也是非常重要的。
但是这就更让人迷惑了。没有人能够解释为什么这个联系一开始会存在，更别说它为什么是不可避免的了。为什么身体被设计成这样，它的免疫系统要被类固醇激素抑制？这个设计意味着每当生活中的事件使你感到压力的时候，你就更容易受微生物感染，更容易得癌症和心脏病。这简直是在你倒地的时候上去踢你一脚。它意味着每当一个动物提升自己的睾丸酮浓度以与对手争夺配偶或是向异性展示自己的时候，他就更容易受微生物感染，更容易得癌症和心脏病。为什么？
不少科学家都为这个谜绞过脑汁，但是收获甚微。保罗·马丁（Paul Martin）在他关于心理神经免疫学的书《患病的意识》中，讨论并否定了两种解释。第一种解释是说，这一切只是一个错误，免疫系统与对压力的反应之间的联系只是另外某些系统的副产品。就像马丁指出的，对于人体免疫系统这样一个有着复杂的神经与化学联系的系统来说，这是一个相当不令人满意的解释。身体里很少有哪个部分是偶然形成的、多余的或是没有用处的，复杂的系统更是如此。自然选择会无情地把那些抑制免疫系统的东西砍掉，如果它们确实没有用处。
第二种解释是说，现代生活方式制造出的压力很多是不自然的、过久的，在以前的环境里这样的压力通常都是短暂的。这个解释同样令人失望。狒狒和孔雀是生活在很自然的环境里，可是它们——以及地球上几乎所有的鸟类和哺乳动物——也因类固醇而遭到免疫抑制。
马丁承认这是令人不解的事。他不能解释压力不可避免地抑制免疫系统这一事实。我也不能。也许，就像迈克尔·戴维斯（Michael Davies）提出的那样，免疫系统功能的降低是在半饥饿的时候——在现代社会之前这是一种很常见的生存状态——保存能量的办法。也或许，对皮质醇的反应是对睾丸酮反应的副产物（这两种物质在化学成分上非常相似），而免疫系统对睾丸酮的反应则可能是雌性动物的基因故意安排在雄性动物体内的一个机制，用来把那些对疾病的抵抗力更强的雄性与其他的区别开来。换句话说，类固醇与免疫系统的联系也许是某种性别对抗的产物，就像在X和Y染色体那一章里讨论过的一样。我觉得这种解释也不太可信，不过，你要是有本事你想一个出来。
第十五号染色体性别
所有的女人都会变得和她们的母亲一样，这是她们的悲剧；没有一个男人会变得和他们的母亲一样，这是他们的悲剧。 ——奥斯卡·王尔德：《不可儿戏》
在马德里的普拉多博物馆，挂着两副17世纪宫廷画家胡安·卡瑞尼奥·德·米兰达（Juan Carreo de Miranda）的作品，叫做“穿衣服的恶魔”与“不穿衣服的恶魔”。它们描绘的是一个过于肥胖却一点没有魔相的五岁女孩，她的名字是尤金尼亚·马蒂拉兹·维耶候（Eugenia Martinez Vallejo）。很明显地她有些什么地方不对劲：她很肥胖，对于她的年龄来说是个巨大的人，有着非常小的手和脚和形状怪异的眼睛和嘴。她活着的时候或许是在马戏团被当成畸形人展出的。现在看起来，她很明显地有着一种罕见疾病——普拉德?威利（Prader-Willi）综合症——的所有典型症状。有这种症状的儿童，出生时身体软绵绵的、皮肤苍白，不肯吸母亲的乳头，在后来的生活中却吃饭吃得要把自己撑爆，从来就不觉得饱，所以就变得肥胖。在一个例子中，一个普拉德?威利综合症患儿的父母发现自己的孩子在从商店到家的途中，坐在汽车后座上吃完了一整磅生的熏猪肉。有这种病的人长着小手小脚和欠发育的性器官，智力也有轻微迟钝。他们时不时地会大发脾气，尤其是当他们想要食物而被拒绝的时候，但是他们也有一种能力，被一位医生称为“超群的拼图（jigsaw puzzle，一种游戏，从几百、上千块碎片拼出一副完整的图）能力”。
普拉德?威利综合症是在1956年由瑞士医生首先确诊的。有一种可能是，它只是另外一种罕见的遗传病，是我在这本书里一再保证不写的那种病，因为基因的存在不是为了致病的。但是，关于这个基因有一些十分奇怪的东西。在80年代，医生注意到，普拉德?威利综合症有时会在一个家庭里以另外一种完全不同的疾病形式出现，不同之处如此之大，可以被称为是普拉德?威利综合症的反面。这种病叫做安吉尔曼综合症。
当哈里·安吉尔曼（Harry Angelman）在兰开郡沃灵顿（Warrington，Lanca shire）做医生的时候，他第一次意识到，他所说的那些受着罕见疾病折磨的“玩偶孩子”是患有一种遗传疾病。与普拉德?威利综合症的患儿相反，他们身体并不软，反而绷得很紧。他们很瘦、异常地活跃、失眠、头很小、下巴很长，常把他们的大舌头伸出来。他们的动作一顿一顿的，像木偶一样，但是他们有着愉快的性格，总是微笑着，并时不时爆发出一阵大笑。但是他们永远学不会说话，智力严重迟钝。安吉尔曼症患儿要比普拉德?威利症患儿少得多，但是有些时候他们在同一个家族里出现。
很快弄清楚了，十五号染色体上的同一个区域在普拉德?威利综合症和安吉尔曼综合症患者体内都丢失了。区别则在于，在普拉德·威利综合症患者里，丢失的部分来自父亲的染色体，而在安吉尔曼综合症患者里，丢失的部分来自母亲的染色体。同一种疾病，如果是通过一个男性传到下一代，就是普拉德?威利综合症；如果通过女性传到下一代，就是安吉尔曼综合症。
这些事实对于我们从格雷戈尔·孟德尔以来了解到的有关基因的一切真是迎头一击。它们与基因组的数码特点似乎不太相符，这就意味着一个基因不仅仅是一个基因，它还带有一些它的出身的隐秘历史。一个基因“记着”它是从父母哪一方来的，因为在卵子受精的时候它得到了一个父方或母方的印记——就像是来自某一方的基因是用斜体字写的。在这个基因呈活跃状态的每一个细胞内，带有印记的那个基因拷贝是活跃的，另一个拷贝则不表达。这样，身体就只表达来自父方的那个基因（在普拉德·威利综合症的情况下），或只表达来自母方的那个基因（在安吉尔曼综合症的情况下）。这具体是怎么发生的我们全然不知，但是我们已经开始在了解它了。它的成因，将是一个不同寻常而又大胆的进化理论所要解释的。
80年代晚期，在费城和剑桥的两组科学家有了一个出人意料的发现。他们试图制造出只有父亲或只有母亲的老鼠。由于那时从体细胞中直接克隆老鼠还不可能（在多莉（世界上第一只克隆成功的哺乳动物）之后，情况急转直下），费城的那组便把两个受精卵的“前核”做了交换。当一个卵细胞受精的时候，带有染色体的精子细胞核进入卵细胞，却并不马上就与卵细胞核融合在一起：这两个细胞核被称为“前核”。一个灵巧的科学家可以用他的移液管“潜入”受精卵，把精子的细胞核吸出来，把另外一个卵细胞的细胞核放进去；他也可以把卵细胞核取出来，放进另外一个精子细胞核。结果是他得到了两个受精卵，从遗传角度说，一个受精卵有两个父亲，没有母亲，另一个则有两个母亲，没有父亲。剑桥的那一组用了略为不同的技术，但得到的是同样的结果。但是，这两组得到的胚胎都没有能够正常发育，很快就死在子宫里了。
在有两个母亲的那种情况里，胚胎本身有正常的结构，但它却无法制造一个胎盘来给自己获取营养。在有两个父亲的那种情况里，胚胎制造出了一个又大又正常的胎盘，也基本上有围绕着胎儿的膜。但是，在膜里面胚胎应该在的位置上，只有一小团没有结构的细胞，看不出头在哪里。
这些实验结果引向了一个不寻常的结论。遗传自父方的基因负责制造胎盘；遗传自母方的基因负责胚胎大部分的发育，特别是头部和大脑。为什么会是这样的？五年之后，当时在牛津的大卫?黑格认为他得到了答案。他开始重新诠释哺乳动物的胎盘，不把它当成是一个用来维持胎儿生命的母体器官，却更把它看做是胎儿的一个器官，目的是寄生于母体的血液循环，在这个过程中又不服从于任何阻挡。他注意到，胎盘实实在在地是钻进母体的血管里去，迫使血管扩张，进而又产生一些激素提高母体的血压和血糖浓度。母体的反应是通过提高胰岛素的浓度来抵御这种“入侵”。但是，如果因为什么原因，胎儿的激素没有分泌出来，母体就不需要提高胰岛素的浓度，仍然有一个正常的怀孕期。换句话说，尽管母体和胎儿有共同的目标，两者却在细节上激烈地争吵，关于胎儿可以使用母体资源的多大一部分——同以后在婴儿断奶时的冲突一模一样。
但是，胎儿的一部分是由来自母体的基因造成的，所以它们如果发现自己有些相互冲突的利益，也不足为奇。胎儿体内来自父体的基因就没有这样的问题。它们心里没有母亲的利益，她只是为它们提供了一个家。暂时用个拟人的说法，父亲的基因不太信任母亲的基因能够造就一个侵入性足够强的胎盘，所以它们要自己来完成这项工作。因此，我们才得以在有两个父亲的胚胎里发现胎盘基因上有父方的印记。
黑格的理论做出了一些预测，很多在短时间内就被证实了。具体地说，它预测了给基因加印记这个过程在下蛋的动物里不存在，因为一个在蛋里的基因无法影响母亲在蛋白有多大这个问题上所做的投资：在它可以影响母亲之前，它就已经离开母体了。与此相似的是，袋鼠之类的有袋动物以口袋代替胎盘，从黑格的理论出发，也不会有带有印记的基因。到现在为止，看起来黑格是对的。基因标记是有胎盘的哺乳动物与种子依靠母体才能存活的植物所特有的。
还有，黑格很快就带着胜利感注意到，一对新近发现的带印记的老鼠基因与他预测的功能一致：控制胚胎的发育。IGF2是由一个基因造出的非常小的蛋白质，与胰岛素类似。它在发育中的胎儿体内很充足，在成人体内却不被表达。IGF2R是另外一个蛋白质，IGF2与之连接起来，但是目的是什么，还不清楚。IGF2R的存在可能只是为了除掉IGF2。现在听好，IGF2和IGF2R基因都带有标记：前者只从来自父方的染色体表达，而后者只从来自母方的染色体表达。它看起来非常像是一场小小的竞赛：来自父方的基因鼓励胚胎的发育，来自母方的基因使其发育不要过度。
黑格的理论预测，带有标记的基因通常会在这样相互作对的基因对里被发现。在有些情况下，甚至在人体内，他的预测是正确的。人体的IGF2基因位于第十一号染色体上，带有父方的印记。如果有人偶然遗传到了两个父方的版本，他就会受拜克维斯·魏德曼（Beckwith Wiedermann）综合症的折磨，心脏和肝脏会发育得过大，肿瘤在胚胎组织里也会比较常见。尽管人体内的IGF2R基因没有印记，倒确有一个带有母方印记的基因，H19，是与IGF2作对的。
如果带有印记的基因之所以存在，只是为了跟对方作对，那么你就应该能够把两者的表达都停掉，对胚胎的发育应该没有任何影响。你能够这样做。把所有的印记都去掉，仍然能够得到正常的老鼠。我们又回到我们所熟悉的第八号染色体了，在那里基因是自私的，它们做对自己有利，而不是对整个身体有利的事情。基因标记几乎没有任何内在的目的性（尽管很多科学家曾做过相反的猜测）；它只是基因自私的理论和两性冲突的一个具体事例。
当你开始用基因都是自私的这个方式来思考的时候，一些真正奇怪的想法就进入了你的头脑。试一试这个吧。受父方基因影响的胚胎如果与其他拥有同一个父亲的胚胎一起分享子宫环境，它们的行为会和与其他拥有另一个父亲的胚胎一起分享子宫环境时不太一样。在后一种情况下它们或许有更多的自私的父方基因。这个想法一旦被想到，做一个自然的实验来验证这个预测就是相对容易的事了。不是所有的老鼠都是一样的。在一些种类的老鼠里面，例如Peromyscusmaniculatus，母鼠与多个公鼠交配，每一窝老鼠通常都有几个不同父亲的后代。在其他种类的老鼠里，例如Peromyscuspolionatus，母鼠只与一只公鼠交配，每一窝老鼠都有同一个父亲和同一个母亲。
那么，当你让P.maniculatus与P.polionatus交配时，会发生什么呢？这取决于哪一种是父亲哪一种是母亲。如果多配偶的P.maniculatus是父亲，幼鼠生下来的时候就有巨大的个头。如果单配偶的P.polionatus是父亲，幼鼠生下来时个头就会很小。你看出来是怎么回事了吗？maniculatus的父方基因，因为估计着自己会与跟自己不同父的其他胚胎合住在子宫里，已经被自然选择培养出了与其他胚胎争夺母体资源的能力。maniculatus的母方基因，估计着子宫里的胚胎们会为了她的资源争斗不停，被自然选择培养出了反击的能力。在事态比较温和的polionatus的子宫里，气势汹汹的maniculatus的父方基因只遇到了一点象征性的抵抗，所以，它们赢了这场竞争：如果幼鼠有多配偶的父亲，它的个头就大；如果有多配偶的母亲，个头就小。这是基因标记理论的一个很清楚的演示。
这个故事虽然很流畅，但并不是一丝漏洞都没有。就像很多吸引人的理论一样，它好得都不像真的。具体来说，它的一个预测没有实现：带有印记的基因应该是进化得比较快的。这是因为两性之间的冲突会成为分子之间“军备”竞赛的动力，每一种分子通过暂时获得先手而获益。通过一个物种一个物种地比较带有标记的基因，没发现有这种现象。相反地，带有标记的基因似乎进化得很慢。事情看上去越来越像是这样一种情况，即黑格的理论可以解释基因标记的一部分现象，却并非全部。
基因标记有一个很有意思的后果。在一个男人体内，来自母体的第十五号染色体带有一个记号，说明自己来自母方。但是，当他把这条染色体传给自己的儿子或女儿的时候，它必须用某种方法得到一个记号表明自己是从他体内来的，亦即父方。它必须从一个母方染色体变成父方染色体。在母亲体内有相反的工作需要进行。我们知道，这样的一个转换肯定是发生了的，因为在一小部分安吉尔曼综合症患者体内，两条染色体都没有什么不正常的地方，只除了两者的行为好像它们都来自父方似的。这些是转换没有成功的例子。它们的原因可以被追回到上一代体内的某些突变，这些突变影响一个名叫基因标记中心的东西，它是一小段离有关基因很近的DNA，通过某种方法把父方的标记放到基因上去。这个标记就是一个基因的甲基化，就是我们在第八章里谈过的那种。
你还记得吧？字母C的甲基化是使基因变得“沉默”的方法，它被用来把那些自私的基因“软禁”起来。但是，在胚胎发育的早期，所谓的胚囊形成的时候，甲基化被去掉了，然后在发育的下一个阶段，原肠胚形成的时候，又被重新加回来。不知为什么，带有标记的基因逃过了这一过程。它们顶住了去甲基的过程。关于它们是怎样做到这一点的，有一些很有意思的线索，但是还没有任何确定的答案。
我们现在知道，带有标记的基因躲得过去甲基这个过程，是多年以来科学家试图克隆哺乳动物时的惟一障碍。蟾蜍可以很容易地被克隆，只需要把体细胞里的基因放进一个受精卵里即可。但是在哺乳动物那里这一招就是行不通，因为一个女性体细胞内的基因组带有一些被甲基化因而不再活跃的重要基因，男性体细胞里又有另外一些不活跃的基因，这些就是带有标记的基因。所以，在基因标记被发现之后，科学家们曾很自信地宣布，克隆哺乳动物是不可能的。一只克隆出来的哺乳动物，它的有标记的基因在它出生时要么在两条染色体上都表达，要么都不表达，如此就破坏了动物细胞所需要的合适的量，也就导致了发育的失败。发现了基因标记的科学家写到：“用体细胞的核来成功克隆哺乳动物之不可能性，是顺理成章的。”
之后，突然之间，在1997年上半年，出现了多莉，克隆的苏格兰母羊。她与后来的那些克隆是怎样避开了基因标记这个问题，还是个谜，甚至对她的创造者来说也是如此。但是看上去，在克隆过程中给她的细胞施加的处理方法中，肯定有某一部分把基因的所有标记都抹掉了。
第十五号染色体带有标记的那一段区域带有大概八个基因。其中的一个一旦被破坏，就会造成安吉尔曼综合症，这个基因叫做UBE3A。在这个基因的旁边是两个一旦被破坏就可能造成普拉德?威利综合症的基因，一个叫SNRPN，另一个叫IPW。可能还有其他的，不过现在让我们先假设SNRPN就是罪魁。
这两种病并不总是因为这些基因的突变而发生，它们也可以产生于另外一种“事故”。当一个卵细胞在一个妇女的卵巢里形成的时候，它通常是每一条染色体都得到一份。但是在很少见的情况里，一对来自母方的染色体没有能够分离开来，那么，卵细胞就把同一条染色体得到了两份。在精子与卵子结合之后，胚胎就有了三条同样的染色体，两条来自母亲，一条来自父亲。这种情形在高龄孕妇那里更有可能，这对受精卵来说常常是致死的。只有在这三条染色体都是第二十一号染色体——最小的染色体——的时候，胚胎才能够发育成一个可以存活的胎儿，出生之后能够存活几天以上，结果就是唐氏综合症。在其他情况下，多余出来的染色体把细胞内的生物化学反应搅得乱七八糟，使胚胎发育无法成功。
但是，在大多数情况下，在还没有到这一步的时候，身体就已经有办法来对付这个“三倍体”问题了。它干脆扔掉一条染色体，只留下两条，就像本来应该的那样。困难在于，它这样做的时候很盲目。它无法确定自己扔掉的是两条来自母方的染色体之一，还是惟一那条来自父方的。这样盲目地扔，有66％的机会把来自母方的多余染色体扔掉，不过事故也经常发生。如果它错误地扔掉了惟一那条来自父方的染色体，那么胚胎就高高兴兴地带着两条母方染色体继续发育。在大多数情况下这没有任何关系，但是，如果那“三倍体”是第十五号染色体，你就立刻会看出来将要发生什么。两份带有母方标记的UBE3A要被表达，带有父方标记的SNRPN却一份都没有。结果，就是普拉德?威利综合症。
表面上看来，UBE3A不是什么有趣的基因。它制造的蛋白质是一种“E3泛蛋白连接酶”，这是一类存在于某些皮肤和淋巴细胞里的、不起眼的从事“中层管理”工作的蛋白质。然后，在1997年年中，三组科学家忽然同时发现，在老鼠和人类里，UBE3A在大脑里也表达。这无异于是炸药。普拉德?威利综合症与安吉尔曼综合症的症状都表明病人的大脑有些不同寻常。更让人惊讶的是，有很强的证据表明，其他一些带有标记的基因在大脑里也很活跃。具体地说，在老鼠里，大部分的前脑看起来都是由带有母方标记的基因造出来的，而大部分的下丘脑（位于脑子的基座处），则是由带有父方标记的基因造出来的。
这种不平衡是通过一件构想巧妙的科学工作而发现的：老鼠“镶嵌体”的创造。镶嵌体是两个具有不同基因的个体身体的融合。它们在自然条件下就会出现——你可能见过这样的人，你可能自己就是这样的人，但是除非对染色体做细致的检查，你不会意识到。两个带有不同基因的胚胎可以融合起来，然后就像它们原本是一个那样地发育。可以把它们想成是同卵双生子的反面：一个身体里有两个不同的基因组，而不是两个不同的身体带有同样的基因组。
比较而言，在实验室里制造老鼠的镶嵌体还是很容易的，小心地把两个早期胚胎的细胞融合起来即可。但是在这里，剑桥科研小组的独创性在于，他们把一个正常的老鼠胚胎与另外一个特殊胚胎融合起来了。这个特殊的胚胎，是由一个卵细胞核给另一个卵细胞“受精”而造出来的。这样，它就只带有母亲的基因，没有一点来自父亲的贡献。结果，生出了一只脑子奇大的老鼠。当这些科学家把一个正常胚胎与一个只来自父方的胚胎（从一个卵细胞产生，但是卵细胞的细胞核被两个精子细胞的细胞核所取代了）融合起来之后，结果刚好相反：一只身子大脑袋小的老鼠。通过给母方细胞加上一个类似于无线电信号的生化“信号”，用来报告它们所在的位置，科学家们得以做出了这样一个重大发现：老鼠大脑里大部分的纹状体、脑皮质、海马区都是由这些母方细胞组成的，但是这些细胞被排斥在下丘脑之外。脑皮质是加工来自感官的信息、产生行为的地方。相比之下，父方的细胞在脑子里比较少，在肌肉里则比较多。当它们出现在脑子里的时候，它们为下丘脑、杏仁体、视前区的发育出了些力。这些区域组成了“皮质下感情系统”的一部分，负责控制感情。一位科学家罗伯特·特利沃斯（Robert Trivers）的意见是：这样的区别所反映的是脑皮质需要完成与母方的亲友好好相处这样一项任务，而下丘脑是个自大的器官。
换句话说，如果我们相信父方的基因不放心让母方基因去造一个胎盘，那么，大脑就是母方基因不放心让父方基因去造的。如果我们像老鼠一样，我们可能就会带着母亲的思想和父亲的感情在这世界上生活（如果思想和感情可以遗传）。在1998年，另外一个带有标记的基因在老鼠体内被发现了，它有个了不起的特点，就是它能够决定一只雌老鼠的母性行为。带有这个Mest基因的正常形式的老鼠是认真照料幼鼠的好妈妈。没有这个基因的正常形式的雌老鼠仍然是正常的老鼠，只是，她们是很差劲的妈妈。她们造不出像样的窝，幼鼠出去闲逛的时候这些妈妈也不把它们招回来，幼鼠身上脏了她们也不管，总的说来，她们好像无所谓。她们的幼鼠通常会死去。无法解释的是，这个基因是从父系遗传来的。只有来自父方的拷贝才有功能，来自母方的拷贝是不活跃的。
黑格关于胚胎发育冲突的理论无法很轻易地解释这些现象，但是，日本生物学家岩佐庸（YohIwasa）有一个理论却可以。他提出，因为父亲的性染色体决定了后代的性别——如果他传下去一条X染色体，而不是Y染色体，后代就是女性——父方的X染色体就只有在女性体内才有。因此，女性特有的行为就应该只从来自父方的X染色体上表达。如果它们也在来自母方的X染色体上表达，它们就可能也会出现在男性身上，或者它们在女性体内会被表达得太多了。这样，母性行为带有父方遗传的标记就是很合理的事了。
对这个想法的最好证明来自于伦敦儿童健康研究院的戴维·斯库斯（David Skuse）与同事们做的一项不寻常的观察。斯库斯找到了80位患有特纳综合症的妇女与小女孩，年龄在6～25岁之间。特纳综合症是由于X染色体的部分或全部缺失而引起的。男性只有一条X染色体，女性把她们所有细胞里的一条X染色体都保持在关闭的状态。从原则上说，特纳综合症就应该在发育上没有什么作用。实际也是如此，患有特纳综合症的女孩具有正常的智力和外表。但是，她们常常在“社交适应”方面有问题。斯库斯和他的同事们决定比较两种不同的患特纳综合症的女孩：一种是丢失了来自父方的X染色体，一种则丢失了来自母方的X染色体。25名丢失了母方X染色体的女孩，明显地比55名丢失了父方X染色体的女孩要适应得更好，有着“卓越的语言和高级控制能力，这些能力调节人际间的交往”。斯库斯与同事们是通过让孩子们做标准化的认知测试和给父母调查问卷的方式来估量社交适应能力的。在问卷中，他们询问父母孩子是否有如下的表现：意识不到别人的感受，意识不到别人的烦躁和怒气，对自己行为对家人的影响毫无察觉，总是要求别人陪伴，烦躁的时候很难与之讲道理，自己的行为伤害了别人自己却意识不到，不听命令，如此等等。父母必须回答0（一点都没有）、1（有些时候有）或2（经常如此）。然后，所有12个问题的回答被加起来。所有患特纳综合症的女孩都比正常的男孩女孩的总分高，但是，丢失了父方X染色体的女孩，比起丢失了母方X染色体的女孩，分数要高出一倍多。
从这里引出的结论是，在X染色体上某个地方有一个带有标记的基因，它在正常情况下只从父方的X染色体上表达，而这个基因通过某种方式促进社交的适应能力——例如，理解别人感受的能力。斯库斯与同事通过观察只丢失了部分X染色体的孩子，又为这种理论提供了进一步的证据。
这项研究有两个深远的影响。第一，儿童自闭症、阅读困难症、语言障碍以及其他与人相处方面的问题都是在男孩中比在女孩中更普遍，这项研究为这样的现象提出了解释。一个男孩只从他母亲那里收到一条X染色体，也就是说，他收到了带有母方标记的一条，那么促进社交能力的这个基因就是不被表达的。在我写下这句话的时候，这个基因还没有被定位，不过，我们知道有些X染色体上的基因确是带有标记的。
但是，第二个影响更有普遍意义。在20世纪后半期一直持续的一个有点可笑的争论是两性之间的差异，它把先天因素与环境因素对立起来了。而我们现在开始看到了这个争论结束的可能性。那些喜欢环境因素的人曾经试图否认先天因素的任何作用，而那些喜欢先天因素的人却很少否认环境因素也有作用。问题不在于环境因素是否有作用，因为没有任何一个头脑清醒的人会否认它的作用。问题在于，先天因素是否有作用。当我写这本书的时候，我的一岁女儿有一天在一个玩具童车里发现了一个塑料娃娃，她发出的那种兴奋的尖叫，是我儿子在同样年龄的时候看到过路的拖拉机时会发出的。像很多家长一样，我很难相信这只是因为我们下意识地加给了他们一些“社会规范”。在最早开始的自主活动里，男孩和女孩就有系统的差异。男孩有更强的竞争性，对机器、武器和动手做事更感兴趣，而女孩则对人、衣服和语言更感兴趣。说得更大胆一些，男人喜欢地图、女人喜欢小说可不仅仅是后天培养的结果。
不管怎么说，一个完美的（虽然人们没有意识到它的残酷）的实验已经被那些只相信环境因素的人做了。60年代，在温尼佩格_{（Winnipeg，加拿大的一个城市）}，一个失败的包皮切除手术给一个小男孩留下了一个严重损坏了的阴茎，后来医生决定把它切掉。他们决定，通过阉割、手术和激素治疗等方法把这个男孩变成女孩。约翰变成了琼，她穿了裙子，玩了布娃娃。她长大成了一个年轻女子。1973年，一个弗洛伊德派的心理学家，约翰·莫尼（John Money），突然对公众宣布，琼是一个适应得很好的少年人，她的例子也就结束了一切猜测：性别角色是通过社会环境建立的。
一直到了1997年，才有人去核对事实。当米尔顿·戴蒙德_{（Milton Diamond，性别研究专家）}和济茨·西格孟德森_{（Keith Sigmundson，心理学家）}找到了琼的下落的时候，他们找到的是一个娶了一位女子、生活幸福的男人。他的故事与莫尼的说法非常不同。在他还是孩子的时候，他就总是深深地为什么事情感到非常不快乐，他总是想穿裤子，想跟男孩子混在一起，想站着撒尿。在14岁的时候，他的父母告诉了他发生过的事情，这让他松了一口气。他停止了激素治疗，把名字又改成了约翰，恢复了一个男性的生活，通过手术切除了乳房，在25岁的时候，他与一个女子结婚，并成了她的孩子的继父。他曾经被当成是性别由社会环境决定的证明，他却证明了这个理论的反面：先天因素在性别的决定上是有作用的。动物学的证据一直是指向这个方向的：在大多数物种里，雄性行为与雌性行为有着很系统的差异，这些差异有着先天成分。大脑是有先天性别的器官。从基因组、有标记的基因、与性别相关的行为诸方面得来的证据，现在都指向同一个结论。
第十六号染色体记忆
遗传为修改它自己提供了方法。 ——詹姆斯•马克•鲍德温，1896
人类基因组是一部书。一个有经验的技术员通过通读并认真对待不寻常的地方，比如基因标记，就可以造出一个完整的人体。如果有正确的阅读与诠释的方法，一个有能力的现代弗兰肯斯坦_{（小说《弗兰肯斯坦》中的主人公，是一个医学院的学生；玛丽•雪莱是该书的作者）}也可以做到这一点。但是，之后又怎样呢？它可以造出一个人体，并注之以生命之泉，但是，如果“他”要真正地生活，“他”就不仅仅需要存在，还需要做到其他一些什么。“他”需要对外界因素适应、变化、做出反应。“他”需要获得自己的独立性。“他”需要摆脱弗兰肯斯坦的控制。有一种感觉就是，基因们必然失去对它们所创造出来的生命的控制，就像玛丽•雪莱（Mary Shelley）的小说里那个不幸的医学院学生那样。它们必须给“他”自由，让“他”找到“他”自己的生活之路。基因组并不告诉心脏应该什么时候跳动，也不告诉眼睛什么时候应该眨，也不告诉思维什么时候应该思想。即使基因确实为人的性格、智力和人性规定一些变量，并且是以惊人的准确性规定了这些变量，它们知道什么时候应该把权力下放。这里，在第十六号染色体上，存在着一些重要的放220权者：允许学习和记忆的基因。
也许在很惊人的程度上，我们人类是由我们的基因的“意志”决定的，但是，在更大的程度上我们是由我们一生中所学到的东西决定的。基因组是处理信息的计算机，它通过自然选择从周围世界吸收有用的信息，然后把这些信息汇入它自己的设计图中。进化在信息处理方面慢得要命，常常需要好几代才能够产生一点变化。因此，基因组发现，发明一种快得多的机器对它很有帮助，这就一点也不奇怪了。这个机器的工作是在几分钟甚至几秒钟之间从周围世界里提取信息，并把它整合到自己的行为里去。这个机器就是大脑。你的基因组给你提供了神经，告诉你什么时候你的手被烫到了。你的大脑则给你提供把手从炉台上拿开的动作。
“学习”是神经科学和心理学的范畴，它是本能的反面。本能是遗传决定的行为，学习则是由经验来调节的行为。心理学中的行为学派在20世纪的大部分时间里都希望我们相信这两者没有什么共同之处。但是，为什么有些事情是通过学习得到的，有些却来自于本能？詹姆斯•马克•鲍德温（James Mark Boldwin）——这一章里的英雄人物——是19世纪一个很不起眼的美国进化理论家。他在1896年写了一篇文章，总结了一场哲学争论。他的文章在当时没有什么影响，事实上，在那之后的91年里也没有什么影响。但是，幸运的是，在80年代晚期，一组计算机科学家把它从默默无闻之中翻了出来，他们认为，他的理论对他们面临的如何教计算机学习的问题有很大的相关性。
鲍德温试图解释的问题是：为什么有些事情是一个人在他的一生里学习到的，而不是事先设计好的本能。有一个被广泛认同的信念，那就是：学习是好的，本能是坏的，或者说，学习是更先进的，而本能是更原始的。因此，人类的一个标志就是：所有那些对于动物来说是很自然的事情，我们人类都需要学习。人工智能的研究者们遵循着这个传统，很快就把学习放到了最重要的位置上：他们的目的是要造出有多种用途、能够学习的机器。但是这不过是一个事实上的错误。人类通过本能得到的，与动物通过本能得到的一样多。我们爬行、站立、行走、哭泣、眨眼时那种下意识的方式，与一只鸡的方式也差不多。只是在我们移植到动物本能之上的那些事情上，我们才使用学习这一方法：诸如阅读、开车、去银行存款、购物等事情。“意识的主要作用”鲍德温写道：“是使儿童学习遗传没有给他们的东西。
而且，通过迫使我们学习什么事情，我们把自己放在了一个有选择性的环境里，这个环境很看重一个人把学到的东西变成直觉从而在将来能够用本能来解决问题。这样，学习就慢慢为本能让路。我在讲述第十三号染色体时谈到过，产奶动物的养殖给身体出了一个难题：消化不了的乳糖。第一个解决办法是文化上的:制造奶酪，但是后来身体进化出了一个内在的解决方法，即把乳糖酶的制造持续到成年。如果不识字的人在足够长的时期内在繁殖后代方面处于劣势，也许最终识字都会变成一种内在的特性。实际上，因为自然选择的过程就是从环境中提取有用的信息并把它在基因里储存起来，你也许可以把人类基因组看成是40亿年以来积累起来的学习成果。
但是，把学到的事情变成本能的优势是有限度的。在口头语言这个例子里，我们有很强的语言本能，但这是一个可塑性很强的本能。如果自然选择一路干到底，甚至把词汇也搞成是本能的东西，那就明显地是发疯了。如果那样，语言就会成为一个太没有可变性的工具：没有一个词用来指代计算机，我们就必须把它描绘成“当你与它交流时它能够思考的东西”同样地，自然选择想到了（原谅我这种目的论的说法）要给迁徙的鸟类一个用星座导航的系统，但是这个系统不是完全装配好的。因为岁差的缘故，正北的方向是在逐渐变化的。鸟类的每一代都能够通过学习来校正自己的星座罗盘，这是生死攸关的事。
鲍德温效应是文化进化与遗传进化之间微妙的平衡。它们不是事物的相反两面，而是伙伴的关系，它们互相影响，以求达到最好的效果。一只鹰可以从父母那里学到生存本领，从而更好地适应自己的生存环境。一只布谷鸟则必须把所有本事都建立在本能之中，因为它永远见不到自己的父母。（布谷鸟自己不孵卵，而是把卵产在别的鸟的巢里）它必须在从蛋里出生之后的几小时之内就把所寄居的鸟窝里养父母的孩子赶走；必须在幼年时期就迁徙到非洲适合它生活的地方，并且要在没有父母带领的情况下完成；它必须发现怎样找到毛毛虫并以它们为食；必须在第二年春天返回自己的出生地；必须给自己找到一个配偶；必须为自己的孩子找到一个合适的有主儿的巢。这些都靠的是一系列本能的行为，再加上一次次谨慎的从经历里的学习。
就像我们小看了人类大脑对本能的依靠程度，我们也小看了其他动物学习的能力。例如，人们已经揭示出野蜂从自己的经历中学到很多如何从不同种类的花里采集花蜜的本事；如果只练习过在一种花上采蜜，它们见到另一种花时就会不知所措，直到练习过一阵。但是，只要它们知道怎样对付一种花，它们对付起形状相似的花来就更容易。这就证明了它们不仅仅只是记住了每一种花的特性，而是总结出了一些抽象的原理。
另外一个从与野蜂一样的低等动物那里得到的动物学习的著名例子是海参。很难想象有比它更卑微更简单的动物了。它既不怎么动又小、又简单、又不出声。它有个极小的脑，它的一生中就是进食与交配，从来不精神紧张。让人羡慕。它既不会迁移也不会交流，不会飞也不会思考。它只是存在着。与布谷鸟甚至是野蜂比起来，它的生活太容易了。如果简单动物运用本能、复杂动物学习这一理论是正确的，那么，海参什么也用不着学。
但是，它能够学习。如果一股水流射到它的鳃上，它会把鳃收回去。但是如果一股股水持续地喷到它的鳃上，这个收回的举动就逐渐停止了。海参对它认定的“假情报”不再做出反应。它“习惯”了。这当然不是学什么微积分，但是它同样也是学习。反过来，如果在水喷到鳃上之前先给它一次电击，海参会学着把自己的鳃收回得更多——一个叫做“敏化”的现象。它还可以像巴甫洛夫那些著名的狗一样形成条件反射：它可以在感到一股非常轻微的水流时就收回自己的鳃，如果这轻微的水流与一次电击总是同时出现。之后，这轻轻的水流本身就导致使海参飞快地把自己的鳃收回去，虽然在通常情况下这样轻微的水流不足以使海参收鳃。换句话说，海参有能力像狗或人那样学习：习惯、敏化、“联想”学习。但是它们甚至不用它们的脑。这些反射与能够修改它们的学习过程发生在腹部神经节，在这些黏糊糊的动物肚子上的一个小小的神经系统“中转站”在这些实验背后的人，埃里克•坎德尔_{（Eric Kandel，生物学家，因为在学习的细胞机制方面的工作，与另外两位科学家分享了2000年诺贝尔生理学和医学奖）}，动机并不是要跟海参过不去。他想要理解学习的最基本机制。学习是什么？当大脑（或腹部神经节）形成了一种新的习惯或改变了它的一种行为的时候，神经细胞里发生了什么？中枢神经系统里有很多神经细胞，电信号在每一个细胞里游走，另外，还有很多突触，它们是神经细胞之间的“桥梁”当神经系统里的一个电信号到达一个突触的时候，它必须要先变成一个化学信号，然后才能以电信号的形式继续旅行，就像火车上的旅客需要搭渡轮过海峡一样。坎德尔的注意力很快就集中在神经细胞之间的这些突触上了。学习似乎是在改变它们的特性。这样，当海参习惯于一个假情报的时候，接受感官信息的神经细胞与移动鳃的神经细胞之间的突触被以某种方式弱化了。反过来，当海参对某种刺激敏化了的时候，这个突触就被加强了。慢慢地，坎德尔与同事们巧妙地逼近了海参脑子里的一个分子，它位于突触弱化或强化的中心。这个分子叫做环化腺苷酸（cyclic AMP）。
坎德尔与他的同事们发现了一串围绕着环化腺苷酸的化学反应。我们先忽略它们的正式名字，先想象有一串化学物质名字就叫A、B、C……：
A造出B，
B造出C，
C打开一个叫做D的通道，
这样就使得更多的E进入了细胞内部，
E延长了释放F的时间，
F就是把信号送过突触以到达下一个神经细胞的神经递质。
现在，凑巧的是C也激活一个名叫CREB的蛋白质，激活的方式是改变它的形状。动物如果缺少这种被激活的CREB，仍然可以学习，但是学到的东西大约一小时之后就不再记得了。这是因为CREB—旦被激活就使其他基因开始表达，从而改变突触的形状和功能。以这种方式被改变的基因叫做CRE基因，意思是环化腺苷酸反应因子。如果我讲得比这还细，我会把你闷得扔下这本书直奔离你最近的惊险小说，不过再忍受一下，事情马上又会变得简单起来了。
事情会变得如此简单，现在是跟“笨伯”见面的时候了。笨伯是一种带有突变的果蝇，它学不会这么一件事：某种气味出现之后总会出现电击。它是在70年代被发现的，是一连串“学习突变”中的第一个，这些“学习突变”的发现，是通过用射线照射果蝇然后让它们完成一些简单的学习任务，然后繁殖那些完成不了这些任务的果蝇而得到的。其他的突变种果蝇随着“笨伯”之后陆续被发现了，它们叫做“白菜”“健忘”“萝卜”“小萝卜”“大萝卜”等等。（这又一次说明，果蝇遗传学家在给基因起名字方面所享有的自由，比人类遗传学家的要大得多。）现在总共有17个“学习突变”在果蝇中被发现了。受到坎德尔研究海参成果的提醒，冷泉港实验室_{（美国生物学实验室，由发现了DNA结构的詹姆斯•沃森指导）}的梯姆•塔利（Tim Tully）开始研究这些突变的果蝇到底是什么地方不对劲。让塔利和坎德尔高兴的是，在这些突变种果蝇体内被损坏了的基因都与制造或响应环化腺苷酸有关。
塔利接着提出，如果他能够彻底毁掉果蝇的学习能力，那么他也应该可以改变或加强它的学习能力。通过去掉制造CREB蛋白质的基因，他造出了一种可以学习却记不住自己学了什么的果蝇——学到的东西很快就从记忆里消失了。很快地，他又得到了另外一种果蝇，它们学习得如此之快，某种气味之后会有电击这样一个信息，它们只要学一遍就会了，而其他果蝇通常要学十遍才会。塔利描述这些果蝇说它们有照相机一般的记忆。但是，这些果蝇远远算不上聪明，它们在总结规律方面很差劲。它们就像这样一个人：因为他骑自行车在晴天摔了一跤，以后他就拒绝在有太阳的时候骑自行车。
记忆出众的人，比如著名的俄国人谢拉什维斯基_{（Sherashevsky）}，就经历过这样的问题。他们的脑子里充满了那么多的小知识，使得他们常常只见树木不见森林。智慧要求的是把什么该记住什么该忘掉恰当地结合起来。我常常遇到这样一个现象：我能容易地记起——也就是说，能够认出——我读过某一段文章或听过某一段广播节目，可是我背不出它们的内容。它们的记忆是用某种方式藏在我的意识够不着的地方。也许，对于那些记忆超群的人来说它们没有藏得这么好。
塔利相信CREB处于学习与记忆机制的中心地位，是一种有控制权的基因，它使其他基因开始表达。这样，为理解学习而进行的探索最终变成了对基因的探索。动物有学习的能力而并不是只依靠本能，这一发现并没有让我们逃脱基因的“暴政”，我们只不过发现了，最好的理解学习的方法是了解基因和它们的产物是怎样使得学习能够进行的。
到现在，如果你得知CREB不仅是在海参和果蝇里才有，就应该不是什么让你吃惊的事了。在老鼠体内有一个几乎是一模一样的基因，失掉CREB基因的突变种老鼠也已经被造出来了。就像预测的那样，这些老鼠学不会简单的东西，比如说，记住眼睛看不见的水下平台在游泳池里的什么地方（这是老鼠学习实验中很标准的“折磨”它们的方法），或者记住什么食物是安全的。通过把反义的CREB基因——它可以在短期内抑制CREB基因——注射到老鼠的大脑里去，老鼠可以变得有短暂的失忆。相反的是，如果它们的CREB基因异常活跃，它们就会是超级的学习能手。
老鼠与人的距离，从进化角度说也仅仅是毫发之间。我们人类也有CREB基因。人类的CREB基因本身是在第二号染色体上，但是帮助它正常工作的一个重要同盟——CREBBP——却就在这里——第十六号染色体上。与第十六号染色体上另外一个名叫a-整合蛋白的学习基因一起，CREBBP给了我一个（不怎么充分）的理由，把学习这个题目单列成一章。
在果蝇里，环化腺苷酸系统似乎在一个叫做蘑菇体的大脑区域里异常活跃，它是果蝇大脑里突出来的一堆神经细胞，它们组成了一个伞菌形状的结构。如果一只果蝇的脑子里没有蘑菇体，那么这样的果蝇通常学不会气味与电击之间的联系。CREB和环化腺苷酸似乎就是在蘑菇体里工作。它们具体是怎样工作的直到现在才开始变得清楚起来。通过系统地寻找其他没有学习能力和没有记忆的突变种果蝇，休斯顿的罗纳德*戴维斯（Ronald Davis）、麦克尔•格洛特维尔（Michael Grotewiel）与他们的同事找到了另外一种突变种果蝇，他们给它取名叫“沃拉多”（对于“沃拉多”，他们给了一个很有用的解释。在智利语里它是一种俗语，意思跟“心不在焉”和“健忘”相近，一般用来形容教授）。就像“笨伯”、“白菜”和“萝卜”一样，沃拉多果蝇学习起来很困难。但是，与其他基因不同的是，沃拉多好像跟CREB和环化腺苷酸都没有关系。它是a-整合蛋白中一个部分的配方，这个蛋白质存在于蘑菇体里，似乎在把细胞聚集在一起这个方面有一些作用。
为了检验这是不是一个“筷子基因”（请看第十一号染色体那一章），除了改变记忆之外是否还有很多其他功能，休斯顿的科学家们做了一件很巧妙的事。他们拿一些自身的“沃拉多”基因被除掉的果蝇，插进去一个与“热激”基因——这个基因在突然受热的时候就开始表达——连在一起的“沃拉多”基因。他们小心地把这两个基因进行了排列，使得“沃拉多”基因只在热激基因表达之后才能够有功能。在温度低的情况下，这样的果蝇没有学习能力。但是，在给了它们一个热刺激三小时之后，它们忽然变成了学习能手。再过几个小时之后，在热刺激已经成为过去的时候，它们又失去了学习能力。这意味着在学习发生的那一瞬间需要“沃拉多”基因，它不是一个仅仅是在建造学习所需的构制时才需要的基因。
沃拉多基因的任务是把细胞聚集在一起，这个事实提出了一个吸引人的假设，那就是记忆也许真的就是把细胞之间的连接变得更加紧密。当你学什么东西的时候，你改变了你的大脑里的网路，在以前没有连接或只有很弱连接的地方产生新的或更强的连接。我当然可以接受这种有关学习和记忆的说法，但是我很难想象我的关于“沃拉多”一词词义的记忆就是几个神经细胞之间突触连接更加紧密。这真让人百思不得其解。我感觉到，科学家们把学习与记忆的问题“简化”到了分子层次上之后，不仅远远没有把这个问题的神秘性消除，而且在我面前打开了一种新的吸引人的神秘：这个神秘就是，试图想象神经细胞之间的连接不仅给记忆提供了一种机制，而且它们本身就是记忆。它与量子物理是同样给人刺激的神秘，比欧异家板（从神灵世界里获取信息的装置）和飞碟刺激得多了。
让我们往这个神秘性里再走得更深一些。沃拉多的发现暗示了这样一个假设：整合蛋白对于学习和记忆是至关重要的，但是，这样的暗示以前就有过了。到了1990年的时候，我们已经知道有一种抑制整合蛋白的药会影响记忆力。具体地说，这个药对一种名叫长效强化的过程起干扰作用，而长效强化似乎在记忆的产生中有着重要作用。在大脑基部的深处有一个结构叫做海马区（hippocampus，在希腊语里是海马的意思），海马区的一部分叫做阿蒙角（这个名字来源于埃及与羊相关的神。亚历山大大帝在神秘地造访了利比亚的斯瓦赫（Siwah）绿洲之后，称阿蒙是自己的父亲）。在阿蒙角里有数量众多的“金字塔”细胞（注意这持续不断的埃及风格），它们把其他感觉神经细胞的信息收集到一起。一个“金字塔”细胞很难“开火”（“开火”在这里指神经细胞送出一个电信号），但是如果有两个独立的信息同时输入，它们共同的努力就会使“金字塔”细胞产生电信号。一旦产生过一次电信号，它就容易再次产生了，但是这只是当它接到最初使它开火的那两个信息的时候，其他的输入信号没有用。这样，眼睛里看到金字塔和耳朵里听到“埃及”这个词能够结合起来使一个“金字塔”细胞产生电信号，在这两者之间产生一种联系记忆。但是，关于海马的念头虽然可能也与同一个“金字塔”细胞是连接在一起的，却没有用同一种方式被“加强”，因为它与另外两种信息没有同时到达。这是一个长效强化的例子。如果你用过于简单化的方式把这个“金字塔”细胞想象成是埃及的记忆，那么它现在就可以被金字塔的画面或“埃及”这个词，但不是海马这个词，诱发而产生电信号。
长效强化，例如海参的学习，绝对需要突触性质的改变，在“埃及”这个例子里，就需要输入信号的细胞和金字塔细胞之间突触的改变。这个改变几乎肯定要跟整合蛋白有关。奇怪的是，抑制整合蛋白并不干扰长效强化的形成，但是的确影响它形成之后的保存。整合蛋白可能真的是把突触“绑”在一起。
不久之前我曾经很随意地暗示过，“金字塔”细胞可能就是记忆。这是瞎说。你童年时期的记忆甚至都不在海马区里，而是在新皮质里。存在于海马区内部和附近的是形成新的长期记忆所需的机制。“金字塔”细胞大概是以某种方式把新形成的记忆送到它最终存在的那个区域里去。我们之所以如此认为，是因为两个出色却偏偏倒霉的年轻人，他们在50年代遇到了古怪的事故。第一个年轻人在科学文献里以他名字的简称H.M.而被人所知，为了避免因为一次自行车事故而引起的癫痫发作，他的大脑的一部分被切除了。第二个人被称做N.A.，是空军里的雷达技师。有一天他在做一个模型的时候，忽然转过身来，而他的一个同事正在玩一把假剑，碰巧就在那个时刻把剑往前一伸，剑从N.A.的鼻孔穿进去，进了他的脑子。
这两个人直到今天仍然受健忘症的折磨。他们可以很清楚地记起从他们小时候到出事之前几年的事情。他们可以很短期地记住眼前发生的事——如果在他们记住这些事和回忆这些事之间不再用其他事来干扰他们。但是，他们无法形成新的长期记忆。他们认不出一个每天都见的人的面孔，也学不会记住回家的路。在N.A.的情况里（他是症状较轻的一个），他没法看电视，因为一播广告，他就忘了广告之前演的是什么了。
H.M.可以很好地学习一项新的技能并不把它忘掉，但是他却想不起来自己曾经学过这项技能。这意味着“程序”记忆是在一个与关于事实或事件的“陈述”记忆不同的地点形成的。这个区别通过研究另外三个年轻人得到了证实。这三个年轻人对事实与事件有严重的健忘症，但是他们上学期间学习阅读、写作和其他技能却没有遇到什么困难。在做脑部扫描的时候，发现这三个人的海马区都非常之小。
但是，除了记忆是在海马区形成的之外，我们还可以说得更具体一些。H.M.和N.A.受到的损伤暗示了另外两个大脑区域与记忆形成之间的关系：H.M.还缺少中心颞叶，而N.A.缺少一部分间脑。从这里得到启示，神经科学家们在寻找最重要的记忆区域时逐渐把范围缩小到了一个主要区域：鼻周皮质。在这里，来自视觉、嗅觉、听觉及其他感觉器官的信息经过处理成为记忆，也许通过CREB的帮助而完成。之后，信息被送到海马区，然后又送到间脑，暂时储存。如果某个信息被认为是值得永久储存的，它就以长期记忆的形式被送回新皮质储存起来：这就是那个奇怪的瞬间，你忽然用不着查某个人的电话号码，而是自己就能想起来了。记忆从中心颞叶传到新皮质的过程似乎有可能是在夜间睡觉的时候发生的：在老鼠脑子里，中心颞叶的细胞在睡觉时特别活跃。
人类大脑是一个比基因组还更令人惊叹的机器。如果你喜欢数量化的东西，那么，大脑里有上万亿的突触，而基因组只有上十亿的碱基对，大脑的质量以千克计，而基因组则以微克计。如果你喜欢几何学，那么，大脑是一个三维的逻辑机器，而不是一230个数码式的一维机器。如果你喜欢热力学，那么，大脑在工作的时候产生大量的热量，就像一个蒸汽机一般。对于生物化学家来说，大脑需要成千上万种不同的蛋白质、神经递质以及其他化学物质，并不仅仅是DNA的四种核苷酸。对于没有耐心的人来说，
我们真的是眼睁睁地看着大脑在不断改变，突触不断地变化以产生新学来的记忆，而基因组的变化比冰山移动还慢。对于热爱自由意志的人来说，一个名叫经验的无情的园丁对我们大脑里神经网络所进行的修整对于它的正常功能有着至关重要的作用，而基因组只是把事先定好的信息放送出来，比起大脑来，没有什么变化余地。从各种角度来看，有意识、由自由意志控制的生活似乎都比自动化的、基因决定的生活更有优势。但是，正如詹姆斯·马克·鲍德温意识到而又被今天研究人工智能的书呆子们所欣赏的，这样的两分法是错误的。大脑由基因制造出来，它有多好取决于它内在的设计。它被设计成一个能够被经验修改的机器，这是写在基因里的。基因是怎样做到这一点的，这个秘密是当代生物学面临的最大挑战之一。但是毫无疑问，人类大脑是基因的神通的最好纪念碑。它标志着一个出色的领导者知道应该在何时把权力下放。基因组就知道应该何时把权力下放。
第十七号染色体死亡
为自己的祖国而死既甜蜜又光荣。——荷雷斯（公元前65〜8年，罗马诗人、讽刺文学作家）
古老的谎言。——威尔弗雷德•欧文（20世纪英国诗人）
如果学习是在大脑细胞之间建立新的联系，它也同时是失去旧的联系。在出生的时候，大脑细胞之间的连接太多了，随着大脑的发育，很多连接被丢掉了。比如说，在最初的时候，每一侧的视觉皮质都与到达每一只眼睛的一半信息有连接。通过很剧烈的调整，才使得它们成为这样一种情况：一侧接受来自右眼的信息，另一侧接受来自左眼的信息。经验导致了那些不必要的连接衰弱、消失，也因此把大脑从一个多用途的机器变成了很专门的机器。就像一个雕塑家把一块大理石削来凿去以形成人形那样，环境也把多余的突触去掉以使大脑功能更强。在一个瞎眼的幼年哺乳动物或者眼睛一辈子被遮住的动物那里，这样的调节从来不会发生。
但是，这个过程除了突触连接消失之外，还有其他意义。它还意味着整个细胞的死亡。有着不正常形式的ced-9基因的老鼠不能正常发育，因为它大脑里多余的细胞不能履行他们的职责而死去。这样的老鼠最终会有一个结构不正常、负担过重、不能正常运转的大脑。民间流传的说法总喜欢强调一个恐怖的（却没有意义的）统计数字，即我们每天要失去100万个大脑细胞。在我们幼年的时候，甚至当我们在子宫里的时候，我们确实以很快的速度失去脑细胞。如果我们没有失去这些细胞，我们就永远也不可能思考。
受到ced-9之类的基因刺激之后，不必要的细胞就大规模地自杀（其他ced基因在身体的其他器官里引发细胞自杀）。这些要死的细胞顺从地遵守一个精确的程序。在肉眼难见的线虫里，发育中的胚胎有1090个细胞，但是，它们中的131个会在发育过程中自杀，在成年线虫体内剩下959个细胞。它们好像是牺牲自己来换取身体的更大利益。“为自己的祖国而死既甜蜜又光荣”，它们高喊着口号英雄般地逝去了，就像战士们冲上凡尔登（第一次世界大战时德军与法军激烈交战之地，双方死亡将士各达40万人）的峰顶，或是工蜂自杀性地蜇入侵者。这样的比喻远不是只有表面的相似。身体内细胞之间的关系在事实上非常像是蜂巢里蜜蜂之间的关系。你体内细胞的祖先曾经一度是独立的个体，在大约6亿年前，它们在进化过程中决定要合作。这与5000万年以前社会性的昆虫决定要合作几乎是一样的：遗传上关系很近的个体意识到，如果它们把繁殖后代变成一项专门的工作，效率就会高得多，在细胞那里，它们把这项工作交给了生殖细胞，在蜜蜂那里，这项工作交给了蜂王。
这个比喻如此之好，进化生物学家们开始意识到合作精神是有限度的。就像凡尔登的战士们偶尔被逼得不得已，会不顾集体利益地叛变。如果工蜂们得到机会，它们也会繁殖自己的后代。只有其他工蜂的警惕性可以阻止它们。蜂王通过与多只雄蜂交配来保证大多数的工蜂都只是半个姐妹（一个蜂群里的工蜂都是同一只蜂王与不同雄蜂的后代，同母不同父），因此，它们在繁殖后代方面也就没有多少共同的兴趣。这样，蜂王就能保证工蜂对她忠心，而不是工蜂之间彼此忠心。身体里的细胞也是如此。叛变是个永恒的问题。细胞们经常忘记它们对“国家”的职责，即为生殖细胞服务。它们经常要复制自己。不管怎么说，每一个细胞都来自一代一代传下来的生殖细胞，在整整一代里都不分裂是很违反本性的。也就因此，在每一个器官里、每一天里，都有细胞打破秩序重新开始分裂，就好像它抵御不了基因要复制自己的古老召唤。如果这个细胞的分裂不能被制止，结果就是我们所说的癌症。
但是，通常它是会被制止的。以癌症为后果的叛变是如此古老的问题，所有身体比较大的动物都在细胞里带有一套精巧的开关，在细胞发现自己变得具有癌症性质的时候，这套装置可用来引起细胞自杀。最著名和最重要的开关，事实上自从它在1979年被发现以来也可能是被人们谈论得最多的人类基因，是TP53，就在第十七号染色体的短臂上。这一章就是要通过一个主要功能是防止癌症产生的基因，来讲述癌症的非凡故事。
在理查德•尼克松（Richard Nixon）（当时的美国总统）1971年宣布对癌症宣战的时候，科学家们甚至还不知道敌人是谁，只除了一个明显的事实：癌症是细胞组织过多的生长。大多数癌症明显地既不是来自传染也不是来自遗传。传统说法是癌症不是一种疾病，而是一群多种多样的病变，由多种原因引起，这些原因多数来自外部。扫烟囱会因炭灰而染上阴囊癌；X光技术员和广岛原子弹的幸存者因为辐射而得白血病；吸烟的人因吸烟而得肺癌；造船厂工人则因接触石棉纤维而得肺癌。在各种癌症之间可能没有共同的联系，如果有，也许就是免疫系统没有能够抑制肿瘤。传统的说法就是这样。
但是，两项齐头并进的研究开始得出了一些新的认识，它们最终把我们领到了在认识癌症方面的革命。第一个是在60年代加利福尼亚州布鲁斯•爱姆斯（Bruce Ames）的发现。他发现，很多导致癌症的化学物质和辐射，例如煤渣和X射线，都有一个重要的共同点：它们都很有效地损坏DNA。爱姆斯瞥见了这样一个可能性：癌症是基因的病变。
第二个突破很早就开始了。在1909年，佩顿•劳斯（Peyton Rous）（1966年获诺贝尔生理学和医学奖）发现有肉瘤的鸡可以把病传给一只健康的鸡。他的工作在很大程度上被忽略了，因为几乎没有什么证据表明癌症是有传染性的。但是，在60年代，一连串的动物癌症病毒被发现了，第一个就是劳斯肉瘤病毒。劳斯最终在86岁高龄的时候获颁诺贝尔奖，以表彰他的先见之明。人类癌症病毒不久也被发现了，变得明显了的是好多类的癌症，例如宫颈癌，实际上是部分地由于病毒感染而引起的。
把劳斯肉瘤病毒送到基因测序机里，我们发现它带有一个特殊的导致癌症的基因，现在被称为src。其他类似的癌基因很快就从其他癌病毒里被发现了。与爱姆斯一样，病毒学家们开始意识到了癌症是基因的病变。在1975年，癌症研究领域被折腾了个底儿朝天，因为人们发现src根本就不是一个病毒基因。它是一个我们都有的基因，鸡、老鼠、人类体内都有。劳斯肉瘤病毒是从它的宿主那里偷走了这个基因。
比较传统的科学家很不愿意接受癌症是基因病变的事实：不管怎么说，除了极个别的例子之外，癌症并不遗传。他们忘记了基因并不只存在于生殖细胞里，在一个生命的一生里，基因在所有其他器官里都有用处。在身体的一个器官里的基因病变，即使不是在生殖细胞里，仍然是基因病变。到了1979年，已经有从三种不同肿瘤里得到的DNA在老鼠体内诱发了类似癌症的细胞生长，这样就证明了基因本身可以导致癌症。
从一开始就很清楚什么样的基因会是癌基因——鼓励细胞生长的基因。我们的细胞拥有这样的基因，所以我们才能够在子宫里生长，能够在儿童时代生长，能够在之后的生活中愈合伤口。但是，至关重要的一点是这些基因大多数时候都应该是关闭着的。如果它们很容易就可以被开启，结果就是灾难性的。我们的身体里有100万亿个细胞，而且更新很快，因此，在一生的时间里癌基因有很多机会可以被开启，即使没有导致突变的吸烟、日光照射等因素从旁鼓励。但是幸运的是，身体拥有一些基因，它们的任务就是识别细胞的过度生长，并使其停止。这些基因最早是由牛津的亨利•哈里斯（Henry Harris）在80年代中期发现的，被人们称为肿瘤抑制基因。它们是癌基因的对立面。癌基因在开启的时候导致癌症，肿瘤抑制基因则在被关闭的时候导致癌症。
它们用各种方式履行自己的职责，最突出的是在细胞生长、分裂周期的某一时刻把它“关押”起来，并且，可以这么说吧，只有当这个细胞的一切许可证都备齐了之后，才把它放出来。所以，要想进到下一步，一个肿瘤必须要有一个细胞是具有同时开启了的癌基因与关闭了的肿瘤抑制基因的。这就已经不太可能了，但这还没完。要摆脱控制、自由生长，肿瘤现在还需要通过一个决心更大的检查站，那里的哨兵是一个基因，它能够察觉细胞内的异常活动并给其他基因签发命令，把这个细胞从内部解体：细胞的自杀。这个基因就是TP53。
最初，当TP53在1979年被邓迪（Dundee）的戴维•莱恩（David Lane）发现的时候，人们以为它是一个癌基因，后来它被认出是一个肿瘤抑制基因。1992年的一天，莱恩和他的同事彼得•霍尔（Peter Hall）在酒馆里聊TP53的时候，霍尔伸出自己的手臂，愿意用自己做实验来验证TP53是不是肿瘤抑制基因。拿到动物实验的许可证需要几个月的时间，但是在一个志愿者身上做实验却立刻可以进行。霍尔通过辐射把自己的胳膊一次一次地弄出了小小的伤口，莱恩则在之后的两星期内取了霍尔伤口处的活体样品。
他们发现，在受到辐射之后，p53——由TP53制造出来的蛋白质——水平显著上升，清楚地证明这个基因对能够导致癌症的伤害有反应。之后，莱恩开始研究以p53作为临床克癌药物的可能性，在本书出版的时候，第一批志愿者要开始服用p53。事实上，邓迪的癌症研究进展如此之快，p53眼看就要成为这个苏格兰台河（Tay）河口边小城的第三大著名产品了，前两个是黄麻和橘子酱。
TP53基因上的突变几乎是致命的癌症最典型的特征。在所有人类癌症的55%中，TP53都被损坏了。在肺癌里，这个比例上升到90%以上。那些出生时的两份TP53基因中有一份就已经不正常的人，有95%的机会要得癌症，而且通常在年龄很小的时候就得癌。举一个例子，就说结肠和直肠癌吧。这个癌症的开始，是因为一个突变破坏了一个名叫APC的肿瘤抑制基因。如果生长中的息肉又出现了第二个突变，使得一个癌基因RAS被开启，它就变成一个所谓的“腺瘤”。如果这时它再出现第三个突变，破坏一个现在还没有被确认的肿瘤抑制基因，腺瘤就成为一个问题更严重的肿瘤。现在，它就有了得到第四个突变的危险。第四个突变如果发生在TP53基因上，它就把肿瘤变成恶性的癌。相似的“多次打击”模型在其他种类的癌症里也适用，TP53突变常常发生在最后一步。
你现在就可以看出来，为什么在肿瘤生长的早期就下诊断是那么重要。一个肿瘤越大，它就越有可能已经得到了下一个突变，不仅因为概率的原因，也是因为肿瘤内细胞的快速繁殖很容易引起基因传递过程中的错误，导致突变。特别容易得某些癌症的人经常在“促突变”基因上有突变，它们通常鼓励突变的出现（在关于第十三号染色体的那一章里讨论过的乳腺癌基因BRCA1和BRCA2，也许就是乳房特有的促突变基因）。这些人也有可能已经带有了一份有问题的肿瘤抑制基因。肿瘤就像兔子似的，很容易受到既快又强的进化压力。就像繁殖得最快的兔子很快就会在一个养兔场里占上风一样，在每个肿瘤里繁殖最快的细胞会迅速占上风，排挤掉那些更加稳定的细胞。就像带有突变的兔子能够钻进地洞躲避恶棍，也就因此很快能够排挤掉那些只会坐在开阔地里的兔子，肿瘤抑制基因里的突变如果能够使细胞分裂不被抑制，它就很快能够挤掉其他突变而占上风。肿瘤所处的环境在选择肿瘤抑制基因里的突变时，真的就像是外界环境选择兔子。突变最终在一些情况下出现并不是什么神秘的事。突变是随机的，选择却不是。
与此类似的是，现在我们也清楚了为什么癌症这种病主要是老年病，年龄每增加十年，癌症出现的几率就翻一番。在10%〜50%的人体内（具体数字与所居住的国家有关），癌症最终会绕过各种肿瘤抑制基因，也包括TP53，会让我们得上这种可怕的而且可能会致死的疾病。这其实是预防医学成功的一个标志，起码在工业化的国家里，它除掉了其他那么多致死的因素使人能够长寿，不过这个说法不会给我们什么安慰。我们活得越长，我们的基因里就积攒了越多的错误，在同一个细胞里一个癌基因被开启、三个肿瘤抑制基因被关上的可能性就越大。这种情况出现的几率是不可想象的小，但是我们一生中造出来的细胞的数目又是不可想象的大。就像罗伯特•温伯格（Robert Weinberg）（当代美国生物学家，癌症研究方面的先驱之一）说过的：C5]“每10亿亿次细胞分裂中出现一次致命的恶性事故，看起来不太坏嘛。”
让我们近距离看看TP53吧。它有1179个字母长，编码的是一个简单蛋白质的配方。p53在正常情况下很快就会被其他酶降解掉，它的半衰期只有20分钟。在这种状况下p53是不活跃的。但是，当接到一个信号之后，p53的制造就迅速加快，而它的降解也几乎停止了。这个信号还很神秘，对于它到底是什么，还有争议，但是DNA的损坏是它的一部分。被损坏了的小段DNA好像用某种方式提醒了p53。像一个刑事案件的破案小组或突击队一样，p53匆忙地进入战斗位置。下一步发生的，是p53掌握整个细胞的控制权，就像汤米•李•琼斯（Tommy Lee Jones）或哈维•凯特尔（Harvey Keitel）（两人都是好莱坞电影明星）演的那些角色一样，来到事故现场说：“我们是联邦调查局，从现在开始由我们接管了。”p53主要靠着激活其他的基因来告诉细胞做两件事之一：要么停止繁殖，停止复制它的DNA直到损伤被修复，要么自杀。
另外一个有了麻烦的标志也会提醒p53，那就是如果细胞开始缺氧，这是判断一个细胞是否成为了癌细胞的依据。在一个正在生长的癌细胞团内部，血液供应可能会跟不上，细胞就开始窒息。恶性癌症可以克服这个困难，它给身体送出一个信号，使其把更多的血管伸到肿瘤里去——最初，癌症的希腊名字就来自于它的特征鲜明、像螃蟹腿一样的血管结构。（癌症的英文名字cancer来自于希腊文里的“螃蟹”一词）新的抗癌药物里最有前景的一些就是要阻止血管的形成。但是，有些时候p53会在血液供应到来之前就意识到发生了什么，就会杀死癌细胞。在血液供应不良的器官里的癌，比如说皮肤癌，就必须在其生长早期把p53干掉，它才能够生长。这就是为什么色素瘤如此危险。
一点也不奇怪，p53得到了“基因组卫士”的昵称，甚至被叫做“基因组的守护天使”。TP53好像是在编码集体利益，它就像一个士兵嘴里含的自杀药片，当它发现这个士兵要叛变了，它就开始融化。以这样方式进行的细胞自杀叫做“程序性死亡”，这个词来源于希腊语“秋天树叶的掉落”（英文为apoptosis，三名科学家因为在发现其机制方面的贡献而获得2002年诺贝尔奖）。它是身体对付癌症最重要的武器，是最后一道防线。事实上，程序性死亡如此之重要，现在已经逐渐清楚，所有抗癌疗法之所以有效，都只是因为它们改变了p53及其同伴，因而引发程序性死亡。以前人们认为放射疗法和化学疗法之所以有用是因为它们可以有选择地杀死正在分裂的细胞——它们可以在细胞复制自己DNA的时候将其破坏。但是，如果真是如此，为什么这些疗法对有些肿瘤不起作用？在癌症发展的过程中有一个时刻，过了之后这些疗法就不再有效了——肿瘤不再因为放射疗法或化学疗法而缩小。为什么会是这样？如果这些疗法杀死正在分裂的细胞，它们应该在任何时刻都有效呀。
在冷泉港工作的斯科特•洛（Scott Lowe），对此有一个巧妙的答案。他说，这些疗法确实给DNA带来一些小小的损伤，但不足以杀死细胞。事实上，这些损伤刚好能够提醒p53，然后p53会告诉细胞采取自杀行动。所以，化学疗法和放射疗法就像疫苗一样，它们是促使身体帮助自己的疗法。有些很不错的证据支持他这个理论。放射疗法和三种化学疗法——5-氟尿嘧啶、依多波塞（etoposide）、阿霉素都能够促使在实验室里被病毒癌基因感染的细胞进行程序性死亡。而且，当对这些疗法有反应的癌症复发并对这些疗法不再起反应的时候，同时发生的是一个突变将TP53给破坏了。与此类似的是，那些对疗法反应最小的癌症色素癌、肺癌、结肠癌、直肠癌、膀胱癌、前列腺癌——通常它们那里的TP53早就被突变了。某些种类的乳腺癌也对疗法不起反应：TP53被破坏了的那些。
这些见识对于癌症的治疗相当重要。医学的一个重要分支一直以来是在一个错误的理解之下开展的。医生们不应该寻找能够杀死正在分裂的细胞的物质，而应该寻找能够使细胞自杀的物质。这不是说化学疗法整个就没有效果，但它只是由于偶然原因才有效。现在，既然医学研究知道了自己在干些什么，结果就会更给人以希望。从短期来说，它给人的希望是很多癌症病人可能不会死得那么痛苦。通过检查来判断TP53是否已经被破坏，医生们很快就可以事先知道化学疗法是否会起作用。如果不会，那么病人和他们的家庭就不必再因错误的希望而受折磨了，这种错误的希望在今天是这些病人临终前几个月非常典型的特点。
癌基因在没有被突变的情况下是动物一生中细胞生长与繁殖所必需的：皮肤需要被不断更新，新的血液细胞需要产生，伤口要被修复，如此等等。抑制潜在癌症的机制必须允许例外的情况，使得正常的生长和繁殖得以进行。细胞必须经常得到许可而进行分裂，而且，只要它们在合适的时候停止，它们就必须具备鼓励分裂的基因。这是如何完成的，现在刚刚变得清楚起来。如果我们是在观察一个人工制造的东西，我们会得到结论说：它的背后有一个聪明得近乎可怕的设计者。
这里的关键又是程序性死亡。癌基因是导致分裂与生长的基因，但是很让人吃惊的是，它们中有几个也激发细胞死亡。在这几个基因中有一种叫做MYC，它既可以激发细胞分裂也可以激发细胞死亡，但是，它发出的死亡信号暂时被外界的因素——存活信号——抑制住了。当这些存活信号被用完了之后，死亡就占了上风。这好像是设计者意识到了MYC能够误入歧途，所以一开始就给它设了一个陷阱，使得任何发了疯的细胞都会在存活信号被用光的时候自杀。这个聪明的设计师还往前多走了一步，把三个不同的癌基因——MYC、BCL-2和RAS——拴在了一起，使得它们互相控制。只有在三者都正常工作的时候，正常的细胞生长才可以进行。用发现了这些相互关系的科学家们的话说：“离开了这些支持，陷阱就露出来了，受了影响的细胞要么被杀死，要么就奄奄一息，两者都不再有（癌症的）威胁。”
p53和癌基因的故事就像我这本书的大部分内容一样，对“遗传研究有危险”以及“遗传研究应该停止”的说法是个挑战。这个故事也对这样一个观点——简化论科学，也就是把系统拆成部分以理解它们的做法，是有问题和徒劳的——提出了很强的挑战。癌症医学是把癌症作为一个整体的医学研究，虽然从事这方面研究的人们既勤奋又聪明，也有大量的经费，它所取得的成果，相比于以简化论为基础的遗传研究几年来所取得的成果，真是少得可怜。事实上在最初，测定整个人类基因组序列的号召之一来自于意大利诺贝尔奖得主若罗纳托•都贝科（Renato Dulbecco），在1986年，他提出这是打赢对癌症的战争的惟一途径。现在我们对于癌症这个西方世界里最残酷、最常见的杀手，终于有了得到真正的治愈方法的切实希望，这是人类历史上的第一次，而这来自于简化论、遗传研究以及它们带给我们的认识。那些认为这些研究有危险的人应该记住这一条。
自然选择在选定了一个解决问题的方法之后，常常也用它去解决其他问题。程序性死亡除了清理掉癌细胞之外，也还有其他用途。它在对抗普通的传染病方面也有用处。如果一个细胞发现它被某种病毒感染了，它就可以为了整个身体的利益而杀死自己（蚂蚁和蜜蜂也会因为整个蚁群或蜂群的利益而这样做）。有很好的证据表明，有些细胞确实这么做。不可避免的是，有些证据也表明一些病毒进化出了一种方法使得这样的细胞自杀不会出现。爱泼斯坦一巴尔（Epstein-Barr）病毒可以导致腮腺炎或单核细胞增多症，它带有一个暂时休眠的细胞膜蛋白质，其任务似乎就是制止被感染的细胞所表现出来的任何自杀倾向。人类乳头状瘤病毒是宫颈癌的起因，它带有两个基因，它们的任务就是关闭TP53和另外一个肿瘤抑制基因。
我在四号染色体那一章里谈到过，亨廷顿氏病就是无计划的、过多的脑细胞的程序性死亡，而这些细胞一旦死亡就无法被补充——这就是为什么有些大脑损伤是不可逆的。这在进化角度来说很合情合理，因为与皮肤细胞不同，每一个脑细胞都是被很仔细地塑造、训练的富于经验的“接线员”。用一个没有经验、没有受过训练、形状不定的细胞来代替它比无用还要糟糕。当病毒进入神经细胞的时候，神经细胞不会接到自杀的指令。但是因为某种还不完全清楚的原因，病毒本身有时候引发神经细胞的自杀。例如，在致命的脑炎a病毒那里，就是这么个情况。
程序性死亡还可以被用来制止除癌症之外的其他细胞叛变，比如由转座子引起的基因的改变。有些很好的证据表明，卵巢和精囊里的生殖细胞分别处于卵泡细胞和塞尔托里细胞的监视之下，它们的任务就是察觉细胞的自私性并在其出现的时候引发程序性死亡。例如，在一个5个月大的人类胚胎的卵巢里，有着大约700万个生殖细胞。到了她出生的时候，就只有200万个了。在这200万个里，只有400个左右会在她的一生中进入排卵过程。剩下的大多数都通过程序性死亡被除掉了。这个过程铁面无私地执行优化人种的政策，给任何不够完美的细胞都下达自杀的命令（身体是个独裁统治的地方）。
同样的原则可能也适用于大脑，在那里，ced-9和其他基因在发育过程中除掉了大量细胞。任何工作得不太好的细胞又是为了集体利益而被牺牲掉了。所以，通过程序性死亡除去神经细胞不仅仅使学习成为可能，它也保持了细胞的平均质量。在免疫细胞那里可能也发生了类似的事情，即用程序性死亡无情地除去细胞。
程序性死亡是个没有中央控制的行为。没有一个计划中心，没有一个“中央政治局”来决定哪个细胞该死哪个可以留着。这是它美妙的地方。就像胚胎发育一样，它从每一个细胞对自己的了解得到收获。只有一个概念上的困难：程序性死亡是如何进化来的？如果在受到感染、具有癌的性质或有了捣蛋基因的时候，一个细胞就会杀死自己，那么它就没有办法把自己的优点传给子孙。这个问题被称为“神风之谜”（神风是第二次世界大战时期日本自杀式敢死队的名称，在无法用常规手段打击敌舰时队员们驾驶飞机撞向敌舰），它可以用一种群体选择的形式解决：如果程序性死亡进行得比较好，那么整个身体就比那些程序性死亡进行得不好的身体要有更大的优势，前者因此就把好的特点传给它们后代的细胞。但是这就意味着程序性死亡系统在一个人的一生中无法进步，因为在一个身体之内它无法通过自然选择而进化。我们只能守着我们从遗传得到的细胞自杀机制。
第十八号染色体疗法
我们的疑惑是叛徒它让我们惧怕尝试而失去我们本可以得到的果实 ——威廉•莎士比亚《一报还一报》
当公元第三个千年来临之际，我们第一次处在了可以修改我们的遗传密码的位置。它不再是珍贵的手稿，它现在被存在软盘上。我们可以切下一些部分，加进一些部分，重新组合段落，或者重写某些词。这一章是关于我们是怎样做这些事情的、我们是否应该做，以及为什么在我们就要这样做的时候我们似乎失去了勇气，而强烈地想要把整个文字处理器扔掉，坚持说遗传密码应该保持它的神圣不可侵犯性。这一章是关于基因的操纵的。
对大多数外行来说，遗传研究的明显目的——如果你愿意也可以说是最终的奖赏——就是通过基因工程造出的人。有一天，也许是几个世纪以后，这意味着会有一些人身上带有新发明出来的基因。现在，它意味着一个借了别人基因的人，或者从动物或植物那里借了基因的人。这样的事情可能吗？而且，如果可能，在伦理上行得通吗？
想一想在第十八号染色体上的一个基因，它能够抑制结肠癌。我们在上一章里已经与它有过一面之交了：它是一个位置还没有被完全确定的肿瘤抑制基因。人们曾经认为它是一个名叫DCC的基因，但是我们现在知道DCC的任务是在脊柱里引导神经生长，与抑制肿瘤一点关系也没有。这个肿瘤抑制基因与DCC挨得非常近，但它仍然难以捉摸。如果你生下来时就已经有了这个基因的不正常形式，你得癌症的几率就会大大增加。一个未来的基因工程师能不能像取出汽车上一个坏了的火花塞那样把它给拿出来，用好的零件来代替它呢？很快，答案就会变得肯定。
我的年龄使我在开始新闻业生涯的时候还在用真正的剪刀剪纸张，用真正的糨糊贴它们。现在，要把段落挪来挪去的时候，我会用微软的好人们做得很合适的小小的软件里的符号来指示它们做同样的剪贴。（我刚刚把这一段从下一页里挪过来。）但是，原理是一样的：为了挪文字，我把它们剪下来，再把它们贴到另外一个地方。
对基因内容做同样的事，也需要剪刀和糨糊。幸运的是，自然界为了她自己的目的已经把两者都发明了。糨糊是名叫连接酶的东西，每当它遇到松散的DNA句子的时候，它就把它们缝到一起。剪刀叫做限制性内切酶，是1968年在细菌里发现的。它们在细菌细胞里的作用是以切碎病毒的基因来打败它们。但是，很快显现出来的是，跟真正的剪刀不同，限制性内切酶事儿很多：它只是在遇到一串特定的字母序列的时候才能够把DNA切开。我们现在知道400种不同的限制性内切酶，每一种识别不同的DNA字母序列，然后把那一处切开。它们就像是一把剪刀只在找到“限制”这个词的时候才把纸剪开。
1972年，斯坦福大学的保罗•伯格（Paul Berg）用限制性内切酶在试管里把两段病毒DNA对半切开，然后用连接酶把它们以新的排列组合方式又连接起来。他就这样造出了第一个人工“重组”DNA。现在，人类可以做反转录病毒做了很久的事情了：把一个基因插到染色体上去。在那之后的一年之内，第一个基因工程细菌产生了：这是带有从蟾蜍里拿出来的一个基因的一种肠道细菌。
当时立刻有了一阵公众的忧虑，而且并不仅限于外行。科学家们自己也认为在急着去利用这项新的技术之前暂停一下是对的。在1974年，他们呼吁暂时停止所有的基因工程研究，这仅仅是给公众的忧虑之火又煽了些风：如果科学家都担心得要让研究停下来，那肯定有什么事是值得担心的。自然把细菌基因放在细菌里，把蟾蜍基因放在蟾蜍里，我们是谁，要把它们换过来？后果是否很可怕呢？1975年在阿西洛玛（Asilomar）（美国加利福尼亚州海滨度假村，很多科学会议在此举行）举行的一次会议经过讨论搞出了一份安全方面的意见，使得美国的基因工程在一个联邦委员会的指导下小心翼翼地重新开始。科学在当自己的警察。公众的紧张情绪似乎逐渐消失了，不过，在90年代中期它又相当突然地复活了，这一次的聚焦点不是安全，而是伦理。
生物技术诞生了。一开始是基因能泰克，然后有西特斯和百奥真_{（都是生物技术公司的名字）}，然后其他公司纷纷崛起，来利用这些新技术。在这些新兴企业面前的是一个充满可能性的世界。细菌可以被引诱来制造人体蛋白，用于医药、食品或工业。不过，当人们发现大部分人类蛋白质都不能由细菌很好地造出来，以及我们对人类蛋白质知之甚少，在医药上对它们还没有大量需求的时候，失望就逐渐地浮现了。尽管有大量的风险投资，为它们的持股者赢了利的只是诸如“应用生物系统”等给其他人制造仪器的公司。产品还是有的。到了80年代末期的时候，细菌制造的人体生长激素就代替了从死尸大脑里取出来的既昂贵又不安全的同类产品。在伦理和安全方面的担心到目前为止被证明是没有根据的：在30年来的基因工程中，没有任何或大或小的环境或公共健康事故是由于基因工程实验引起的。到目前为止，一切良好。
同时，基因工程对科学的影响比对商业的影响要大。现在克隆基因是可能的（在这里，这个词的意思与尽人皆知的那个意思不一样）：在人类基因组这个“稻草堆”里分离出一个基因这样的一根“针”把它放入细菌里去，长出几百万份，这样使得它们能够被纯化，它们的序列能够被读出来。通过这个方法，存“书”很多的人类DNA图书馆被建起来了，它们存着成千上万相互之间有重叠的人类基因组片段，每一种的数量都够用来进行研究。
就是从这些图书馆里，人类基因组计划中的人们拼凑出了基因组的全部文字。这个计划开始于80年代末期，有着一个野心大得近于荒唐的目标：在20年内读出整个人类基因组。在之后的14年里，没有什么进展。然后在一年之内，新的基因测序仪器就完成了任务。2000年6月26日，人类基因组计划宣布它得到了人体的完整草稿。
实际上，人类基因组计划是被“撞”进了这个声明。一个中学肄业生、前职业冲浪运动员、越南战争老兵克雷格•文特尔（Craig Venter）分享了功劳。文特尔曾经三次把遗传学翻了个底儿朝天。第一次，他发明了一种快速寻找基因的方法，专家说这不会成功。它却成功了。去了私人公司之后，他又发明了一种快速测序的技术，叫做“霰弹法”它把基因组打成随机的碎片，然后通过各片之间的重合部分把它们按正确的顺序重新组装起来。专家们又说这不会成功，而他事实上已经在用它给一个细菌基因组测序了。
这样，当文特尔在1998年5月宣布他要第一个为人类基因组测序并把结果申请专利时，人类基因组计划内部出现了很严重的惊恐情绪。英国的威尔康姆信托基金会通过资助剑桥附近的桑格中心而资助了该计划的三分之一，它对文特尔的回应是提高“赌注”它给由公众资金扶持的这个项目注入了更多资金，并要求把它的完成日期提前。桑格的头儿，约翰•萨尔斯顿（John Sulston），领头开展了一场影响很大的宣传，反对在他看来文特尔在研究最后关头为寻求商业利益而进行的“海盗”行为。最后，冷静的头脑占了上风，2000年6月，宣布了一个“平局”。
但是，还是回到操作上去吧。把一个基因放到一个细菌里去是一回事，把它插到人体里去又是另一回事。细菌很高兴吸收那些叫做质粒的环状DNA，把它们当做自己的DNA—样接受。还有，每一个细菌都只有一个细胞。人有100万亿个细胞。如果你的目标是从遗传上摆布一个人，那你需要在每一个相关的细胞里都插进一个基因，或者从单细胞的受精卵开始。
即使如此，在1970年发现的逆转录病毒能够从RNA制造DNA拷贝，突然使得“基因疗法”似乎是个可行的目标了。一个逆转录病毒带有由RNA写成的信息，基本上是这样的意思：“做一份我的拷贝，把它缝到你的染色体里去。一个实施基因疗法的人只需要拿来一个逆转录病毒，切掉几个它的基因（特别是那些使它在第一次插进染色体后变得有传染性的），放进一个人类基因，然后用它感染病人。病毒开始工作，把基因插到体细胞里，嘿，你就有了一个转基因人。
在整个80年代早期，科学家们都在担心这样一个程序的安全性。逆转录病毒也许会工作得太好了，不仅感染普通细胞，也感染生殖细胞。逆转录病毒也许会用某种方法重新获得它那些丢失了的基因，变成恶性；也或者它会使得身体本身的基因变得不稳定而引发癌症。任何事都可能发生。在1980年，当一位研究血液病的科学家马丁•克莱因（Martin Cline）违背了他的承诺，把一段无甚害处的重组DNA放入了一个受遗传血液病地中海贫血症折磨的以色列人体内（尽管不是通过逆转录病毒）的时候，对于基因疗法的恐惧被煽得更厉害了。克莱因丢了工作与名誉；他的实验结果从未被发表。每一个人都同意，就算不说别的，人体实验的时机也还不成熟。
但是，老鼠实验被证明是既让人宽心又让人失望。基因疗法远远没有不安全，却更有可能不会成功。每一种逆转录病毒只能感染一种细胞组织；需要细心的包装才能把基因放进它的套子里去；它着陆在随便一条染色体上的随便一个什么地方，而且常常不被激活；而且，身体的免疫系统被传染病的“突击队”事先提示了一下，不会漏过一个笨手笨脚、科学家自制的逆转录病毒。
还有，到80年代早期为止，被克隆出来的人类基因如此之少，即使能够使逆转录病毒成功地工作，也没有什么明显的候选基因要放进逆转录病毒里去。
不过，到了1989年，几个里程碑被越过了。逆转录病毒把兔子基因带入了猴子细胞；它们把克隆出来的人类基因送入了人体细胞；它们还把克隆的人类基因带入了老鼠细胞。三个大胆又有雄心的人-弗伦奇•安德森（French Anderson）、麦克尔•布雷斯（Michael Blaese）和史蒂文•罗森伯格（Steven Rosenberg）（这三个人均为当代美国生物学家，基因疗法的创始人）——认为人体实验的时机成熟了。在一场既漫长且有时很痛苦的与美国联邦政府重组DNA指导委员会进行的斗争中，他们试图得到在癌症晚期病人身上做实验的许可。他们的理由带出了科学家和医生对于什么有优先权的不同考虑。在纯科学家看来，人体实验显得仓促和不成熟；对于惯于见到病人因癌症而死的医生来说，仓促一些是很自然的。“我们为什么这么匆忙？”在一次会议上安德森问到：“在这个国家里每一分钟有一个病人死于癌症。自从146分钟之前我们开始这场讨论，已经有146个病人死于癌症。最后，在1989年5月20日，委员会给予了许可，两天以后，一个马上要死于黑色素瘤的卡车司机——莫里斯•孔茨（Maurice Kuntz）——接受了第一个特意引入（并被批准）的新基因。它并不是被设计来使他痊愈的，甚至都不会在他的身体里永久停留；它仅仅是一种新的癌症疗法的助手。一种特殊的白细胞在他的体外被繁殖了，它们在渗透入并吃掉肿瘤方面很不错。在把它们注射回体内之前，医生们用带有一个小小的细菌基因的逆转录病毒感染了这些细胞。这样做的目的只是为了使它们能够在病人体内跟踪这些细胞，指出它们去了哪里。孔茨去世了，在这个实验里什么让人吃惊的事也没有发生。但是，基因疗法开始了。
到了1990年，安德森和布雷斯又回到了委员会面前，带着一份更有雄心的计划。这一次，要注射的基因真的会是能够治病的，并不仅仅是一个身份标签。目标是一种极其少见的遗传病，叫做严重综合免疫缺失（SCID），它使得儿童面对感染无法展开免疫防御，致病原因是所有白细胞的迅速死亡。这样的孩子面对的是不断地受感染不断生病的短暂生命，除非他们是被放置在无菌的罩子里，或是因为幸运寻得了一个骨髓型相配的亲戚而得到完全的骨髓移植。这个病是由第二十号染色体上一个名叫ADA基因的一个“拼写”错误造成的。
安德森与布雷斯的建议是从一个SCID孩子体内取出一些白细胞，拿一个用新的ADA基因武装起来的逆转录病毒感染它们，然后再把它们输入孩子体内。他们的建议又一次遇到了麻烦，但是这一次的反对来自另外一个方向。到了1990年，有一种治疗SCID的方法，叫做PEG-ADA，它的组成部分是巧妙地向血液里输送——不是ADA基因——ADA蛋白质，这是用等价的基因在牛体内合成的。就像治疗糖尿病的方法（注射胰岛素）或治疗血友病的方法（注射血凝因子）一样，SCID被蛋白质疗法（注射PEG-ADA）攻克了。基因疗法还有什么必要呢？
在新技术刚刚诞生的时候，它们常常显得无可救药地缺乏竞争力。最早的铁路比当时存在的运河昂贵得多，不可靠得多。只是随着时间，新的发明才会逐渐降低它自己的花费或是提高它的效应，达到能够比得上旧技术的地步。基因疗法也是如此。蛋白质疗法在治疗SCID上赢得了竞赛，但是它要求每月一次在臀部注射，很不方便，也很贵，并且一生都要坚持治疗。如果基因疗法能够成功，它会把所有这些都用一次治疗代替——给身体重新安装上它本来就应该有的基因。
在1990年9月，安德森与布雷斯得到了“前进”的许可，他们用基因工程改造过的ADA基因治疗了阿山蒂•德西尔瓦（Ashanthi DeSilva），一个三岁的小女孩。那是一个立竿见影的成功。她的白细胞数目增加了两倍，她的免疫球蛋白数目大大提高，她的身体开始制造正常人四分之一的ADA蛋白。不能说基因疗法使她痊250愈了，因为她已经接受了PEG-ADA，并且还在继续接受。但是，基因疗法成功了。今天，全世界四分之一以上的SCID儿童已经接受过基因疗法。没有一个人是确确实实被治愈到能够停止使用PEG-ADA的程度，但是还没有什么副作用。
其他病会很快加入SCID，列入已经被逆转录病毒基因疗法攻打过的疾病名单，包括家族性高胆固醇血症、血友病和囊性纤维化。但是，癌症毫无疑问是主要目标。1992年，肯尼斯•卡尔沃（Kenneth Culver）（当代美国生物学家）尝试了一个有勇气的实验，第一次把带有想要的基因的逆转录病毒直接注射入人体（与此相对应的是用病毒感染在体外培养的细胞，再把这些细胞重新输入人体）。他把逆转录病毒直接注射进了20个人的脑瘤里。把任何东西注射进大脑里听起来都够吓人的，更别说是逆转录病毒了。但是，等你听到逆转录病毒里有什么再说吧。每一个逆转录病毒里都有一个从疱疼病毒里提取出来的基因。肿瘤细胞把逆转录病毒吸收进去，然后表达疱疹病毒的基因。届时，卡尔沃医生再让病人服用治疗疱疹的药物；而这药物就攻击了癌症细胞。在第一个病人身上它似乎成功了，但是在那之后的五个病人里有四个没有成功。
这些是基因疗法最初的日子。有些人认为有一天它们会像今天的心脏移植那样常规。但是，要想说基因疗法是否是战胜癌症的战术，或者，那些以抑制血管生成、抑制端粒酶或p53为基础的疗法，哪一种能够赢得这场比赛，现在还为时过早。不管结论如何，在历史上癌症疗法从来没有像现在这样看上去充满希望，这几乎都是因为新的遗传学的缘故。
这样的体细胞基因疗法已经不再那么有争议了。当然，关于安全的担心还是有的，但是几乎没有人能够想出一个从伦理出发的反对意见。它只是另一种形式的治疗方法，没有一个人，在目睹朋友或亲戚因为癌症而接受了化学治疗或放射治疗之后，会从那些没有什么根据的安全考虑出发，对相对来讲可能没有什么痛苦的基因疗法有什么不情愿。加进去的基因会离那些形成下一代的生殖细胞远远的；这个担心已经被牢固地消除了。但是，生殖细胞基因疗法一在那些能够被传到后代去的地方改变基因，对人类来说还是彻头彻尾的禁忌一在某种意义上来说要容易实施得多。在90年代里导致了新一轮抗议的，就是以转基因大豆和转基因老鼠形式出现的生殖细胞基因疗法。借用贬损它的人所用的一个词来说，它是弗兰肯斯坦技术_{（弗兰肯斯坦从不同尸体上肢解不同的部分合成为一个有生命的“人”，结果这个丑陋的“人”成为一个为害人类的强大的怪物。弗兰肯斯坦为了消除自己行为的恶果而追杀怪物，最后与自己的作品同归于尽）}。
植物基因工程迅速发展有几个原因。第一个是商业的：多年以来，农夫们都为新品种的种子提供了一个需求迫切的市场。在史前时期，传统的培养方法把麦子、稻子和玉米从野生的草变成了产量高的庄稼，这完全是通过操纵它们的基因完成的，虽然那些早期的农民肯定不知道他们做的是这么一件事。在现代，虽然从1960年到1990年，世界人口翻了一番，但同样的技术使粮食产量提高了两倍，人均粮食产量提高了百分之二十多。热带农业的“绿色革命”在很大程度上是一个遗传学现象。但是，所有这些都是盲目完成的，有目标的、精心的基因操纵能够取得的成就会比这大多少？植物基因工程的第二个原因是植物可以被相当容易地克隆和繁殖。你不可能拿从老鼠身上切下来的一块去长出一只新老鼠，你在很多植物那里却可以。但是，第三个原因是个幸运的意外。一种名叫土壤杆菌的细菌已经被发现了，它有一种不寻常的特点，就是能够用名叫Ti质粒的小型环状DNA感染植物，这些Ti质粒把自己融合到植物染色体里去。土壤杆菌是现成的载体:只需往质粒里加一些基因，把它涂到叶子上，等到感染确实发生之后，用叶子的细胞再长出新的植物。现在，这个植物会用自己的种子把新基因一代代传下去。这样，在1983年，最初是一株烟草，然后是一株牵牛花，再然后是一株棉花，都以这种形式成为转基因植物。
谷类植物对土壤杆菌的感染有抵抗力，它们需要等到一个更粗糙的方法的发明：基因们名副其实地是被装在微小的金粒上用火药或是粒子加速器射进细胞里的。这个技术现在已经成了所有252植物基因工程的标准技术。它引起的发明有放在架子上不容易烂的西红柿，不受棉铃虫蛀蚀的棉花，能够抵抗科罗拉多甲虫的土豆，能够抵抗玉米螟虫的玉米，以及其他很多转基因植物。
这些植物从实验室挪到大田实验，又成为商品出售，过程中没打几个嗑巴。有时候，实验没有成功——1996年，棉铃虫严重毁坏了应该是有抵抗力的棉花；有时候，它们招来了环境保护人士的抗议。但是，从来没有出过“事故”当转基因庄稼被运过大西洋时，它们遇到了更强烈的环保人士的抵制。特别是在英国，那里的食品安全检验者们自从“疯牛病”之后就失去了公众的信任。转基因食品在美国已经成为常规食品的三年之后，在1999年，它在英国突然成了了不得的事。更有甚者，蒙森托_{（Europe Monsanto，农业技术公司，研制出很多转基因食品）}在欧洲犯了一个错误，它首先推行的作物对它自己公司生产的没有选择性的杀植物剂——“围捕”——有抵抗力。这使得农夫可以用“围捕”来除草。这样一种操纵自然、鼓励使用除草剂和赚取商业利益的组合，激怒了很多环保主义者。环保恐怖分子开始捣毁油料作物的试验田，并穿着弗兰肯斯坦的服装到处游行。这个问题成了绿色和平组织的三大担忧之一，这无疑是信奉公众的权利与智慧的标记。
像通常情况一样，媒体迅速地把争论两极化了，极端分子们在午夜电视节目上冲对手大喊大叫，一些采访逼着人们做出简单回答：你支持还是反对基因工程？这场争论的最低点，是一位科学家被迫早早退休，因为在一个歇斯底里的电视节目中有人声称他证明了加有凝集素的土豆对老鼠有害。后来，由“地球之友”组织起来的一些同事证明了他的“清白”。他的结果与其是说明了基因工程是否安全，不如说是说明了凝集素——这是一种已知的动物毒素——是否安全。是媒体混淆了它所传达的信息。把砒霜放到烧锅里会使里面煮的东西变得有毒，但是这并不意味着所有烹调都是危险的。
同样道理，基因工程与工程里涉及到的基因一样安全或危险。有些安全，有些危险。有些对环境无害，有些对环境有害。对“围捕”有抵抗力的油菜也许对环境不友好，因为它鼓励除草剂的使用，或者把抵抗力传给杂草。能够抵抗昆虫的土豆对环境友好，因为它们需要更少的杀虫剂，使撒杀虫剂的拖拉机需要更少的柴油、运送杀虫剂的卡车损耗更少的路面，等等。对于转基因作物的反对，是出于对新技术的仇恨而不是对环境的热爱，它们在很大程度上故意忽略这样一些事实：千千万万的安全性实验已经做过了，没有得到过意外的坏结果；现在已经知道，在不同物种之间——尤其是在微生物之间一进行的基因交换，比我们所料想的要普遍得多，所以，这个原理没有一点不“自然”的地方；在基因改造之前，植物的育苗就包含有有意或偶然地用伽马射线对种子的照射，以引起突变；基因改造的主要后果是提高对疾病与害虫的抵抗力以减小对于化学喷雾的依赖；粮食产量的迅速增长对环境是有好处的，因为减轻了开荒种地的压力。
这个问题的政治化造成了荒唐的结果。在1992年，世界上最大的种子公司“先锋”把巴西果的一个基因引入了大豆。本意是想弥补大豆里一种名叫甲硫氨酸的化学物质的“先天不足”，使得大豆对于那些以它为主食的人来说成为更为健康的食品。但是，很快就发现，世界上有很少的一些人对巴西果过敏，于是，“先锋”试验了它的转基因大豆，证明它们也能够引起这些人的过敏反应。在这个时候，“先锋”通知了负责机构，发表了他们的发现，并放弃了最初的计划。尽管计算表明，这个新的大豆过敏可能每年最多杀死两个美国人，却有可能把世界上数以万计的人从营养不良中解脱出来，他们还是这样做了。但是，这个事情并没有成为商业集团小心谨慎的一个例子，相反，这个故事被环保人士重新包装之后，被当成一个揭示基因工程的危险性和商业集团不顾一切的贪婪心的故事来讲。
尽管如此，甚至在有那么多项目出于小心而被取消的情况下，一个比较可靠的估计是，到了2000年，在美国出售的作物种子里有50%〜60%是经过基因改造的。不管是好是坏，转基因作物是在这儿呆下去了。
转基因动物也是如此。把一个基因放入一只动物里使它及它的后代被永久地改变，现在已经与改变植物一样容易了。你只需要把基因给插进去。用一个非常细的玻璃移液管把基因吸进去，在老鼠交配的12小时以后，把移液管的尖端捅进一个还处在单细胞阶段的老鼠胚胎里去，确定移液管的尖端进入了两个细胞核之一，然后轻轻一按。这个技术还远远不够完美：这样出来的老鼠只有大约5%能够表达外来的基因，在其他动物比如牛中，成功的就更少了。但是在那5%里得到的结果是外来基因整合到了某一条染色体的一个随机位置上的“转基因老鼠”。
“转基因老鼠”在科研上是含金的沙子。它们使得科学家能够发现基因的作用是什么以及为什么。加进去的新基因不需要是来自老鼠的，它可以来自于人体：跟电脑不同，几乎所有生命体都能够运用任何类型的“软件”。例如，一只特别容易得癌症的老鼠可以通过引进人类的第十八号染色体而重新变得正常，这也是最早证明第十八号染色体上有一个肿瘤抑制基因的证据之一。但是，与加进去一整条染色体相比，更常见的是只加一个基因。
微观注射正在为另一个更精巧的技术让路，它有一个明显的优势：可以把基因安插到一个精确的位置上。一个三天大的老鼠胚胎含有一些叫做胚胎干细胞的细胞，又称为ES细胞。如果这些细胞之一被取出来，注射进一个基因，那么，就像马里奥•卡佩255基（Mario Capecchi）（当代美国生物学家）在1988年首先发现的那样，细胞会在这个基因应在的位置上把染色体切开，把新基因放进去，把这个位置上原来的基因取下来。通过在电场里让细胞上的孔洞短期张开的方法，卡佩基把从老鼠里克隆出来的一个癌基因int-2放进了一个老鼠细胞，并且观察了新基因找到有故障的基因并将其换下来的过程。这个方法被称为“同源基因重组”它利用了这样一个事实，即修复破损的DNA的机制常常是用另一配对染色体上富余的那个基因作为模板。细胞错误地把新的基因当成了模板，照着它去修复了自己的基因。这样改变之后，就可以把这个ES细胞放回胚胎里，长成一个“镶嵌体老鼠”——它体内的一部分细胞带有新的基因。
同源基因重组不仅允许基因工程师修补基因，也允许他们做相反的事情：用安插有问题的基因去故意破坏正常工作的基因。
这样做的结果是所谓的“剔除”老鼠，它们是在有一个基因不能“出声”的情况下长大的，这可以更好地让那个基因的真正功能显露出来。记忆机制的发现（参见第十六号染色体那一章），就要在很大程度上归功于“剔除”老鼠，其他生物学分支也是如此。
转基因动物并不是只对科学家才有用。转基因羊、牛、猪和鸡都有商业方面的应用。有一个人类的凝血因子已经引进到羊的体内，这样做是希望它可以从羊奶里被大量提取出来，用于治疗血友病。（顺便说一句，进行了这项工作的科学家克隆了多莉羊并在1997年早些时候把它展示给一个大惊失色的世界。）魁北克的一个公司拿了使蜘蛛能够结网的基因，把它放进山羊体内，希望能够从山羊奶里提取成丝蛋白质并把它们纺成丝。另外一个公司把它的希望寄托在鸡蛋上，指望着把它变成生产各种有价值的人类需要的产品的工厂，从药品到食品添加剂。但是，即使这些半工业化的应用失败，转基因技术也会改造动物的繁殖，就像它改造了植物的繁殖一样，它可以生产出有更多肌肉的肉牛，有更多奶的奶牛，或者是下的蛋味道更好的鸡。
这些听起来都很容易。制造转基因人或“剔除”人的技术上的障碍，对于一个设备精良的实验室里的一组优秀科学家来说，变得越来越微不足道了。从原理上说，从现在开始的几年之后，你也许可以从你自己的身体里取出一个完整的细胞，在一个特定染色体的一个特定位置上插进一个基因，把细胞核转到一个自身细胞核被去掉了的卵细胞里，然后从这样造成的胚胎里长出一个人来。这个人会是一个你本人的转基因克隆，在其他任何方面都与你一模一样，惟一例外的是——举个例子说——在让你秃头的那个基因处有另外一种形式的基因。你还可以用这个克隆人体内的ES细胞长出一个多余的肝脏来替换你体内被酒精损坏了的那个。或者你可以在实验室里长出一些人类的神经细胞用来试验新的药物，这样就可以饶过实验动物的性命了。或者，如果你发疯得够厉害，你可以把财产留给你的克隆，然后放心地自杀，知道你的一部分仍然存在，但是经过了些许改进。没有人需要知道这个“人”是你的克隆。如果他年龄大了之后你们之间的相似处越来越多，他不秃顶这一点就可以消除别人的怀疑。
所有这些都还不可能——人类ES细胞刚刚被发现——但是它不会在将来很长时间里都不可能。当克隆人体成为可能的时候，它是否符合伦理？作为一个自由的个体，你拥有你自己的基因组，没有任何政府可以使它成为国家财产，没有公司可以把它买下来，但是这是否就给了你权力把它加之于另一个个体身上？（一个克隆人是另一个个体。）又能否去改变它？到目前为止社会好像倾向于把自己绑住以抵御这些诱惑，暂时停止克隆人和生殖细胞基因疗法，给胚胎研究设立严格的界限，放弃医学上的可能成就以避免未知事物可能会带来的恐怖。我们已经把科幻电影里福斯特式（德国民间传说中的人物，因只顾眼前快乐不计后果而把自己的灵魂卖给魔鬼）的布道，即干扰自然进程就会招致凶暴的报复，牢牢地刻进了脑子里。我们变得谨慎了，或者说起码作为有投票权利的人我们更谨慎了。作为消费者，我们很可能有不同的做法。克隆很可能不257是由于多数人赞成而发生，而是由于少数人的行为。毕竟试管婴儿就大致是这样发生的。社会从来就没有决定可以允许试管婴儿；它只是慢慢习惯了这样的想法，即那些绝望地想要试管婴儿的人有办法搞到他们。
与此同时，现代生物学大量提供给我们的嘲弄之一，就是如果你在第十八号染色体上的肿瘤抑制基因有问题，那你就忘掉基因疗法吧。一个更简单的预防措施也许就在我们手边。新的研究表明，有些人的基因会增加他们得直肠癌的可能性，但含有大量阿司匹林和不成熟的香蕉的饮食，可能会为他们提供保护。诊断是基因上的，疗法却不是。在基因诊断之后实施传统疗法，也许是基因组给医学带来的最大好处。
第十九号染色体预防
_{99%的人一点儿都不理解这场革命来得有多快。 ——史蒂夫•福多尔（Steve Fodor），爱菲梅特利克斯（生物技术公司）的总裁}
任何医疗技术的进步都带来一个道德难题，冲击着我们这个物种。如果这个技术可以挽救生命，那么，即使有风险相伴，不发展和应用它也是道德上的错误。在石器时代，我们除了眼睁睁看着亲人死于天花之外，别无他法；在琴纳（Jenner）完善了疫苗接种技术之后，如果我们还是眼睁睁看着亲戚死于天花，那我们就是不负责任。在19世纪，我们除了眼看父母向肺结核屈服之外，别无选择；在弗莱明（Fleming）发现了青霉素之后，如果我们没有把将要死亡的肺结核病人送去看医生，那是我们的疏忽。（作者此处所举例子不恰当，因为青霉素治不了肺结核）对于个体适用的，对于国家和群体就更适用。富国不能够再忽视夺去了穷国里无数儿童生命的流行性腹泻，因为我们再也不能说医学对此没有办法。口服补水疗法（腹泻之所以危险是因为身体如果因此脱水过多就会造成机能不正常，严重时可以死亡）给了我们良知。因为有些事情是我们可以做的，我们就必须做。
这一章是关于最常见的两种疾病的基因诊断，这两种病，一种是快速无情的杀手，另一种是缓慢又没完没了的盗取记忆者：冠心病和早老性痴呆症。我相信，我们在运用影响这两种疾病的基因的知识方面有一种危险，就是我们过于吹毛求疵、过于谨慎了，因此，我们就面临着另一种危险：拒绝人们接触到能够挽救生命的研究，从而犯下道德上的错误。
有一个家族的基因，叫做载脂蛋白基因，或APO基因。他们基本上有四种，叫做A、B、C和——很奇怪的——E，尽管每一种在不同染色体上会有不同的形式。我们最感兴趣的是APOE，它凑巧位于第十九号染色体上。要理解APOE的工作，需要离题一点，谈谈胆固醇和甘油三酯的习惯。当你吃一盘熏肉和鸡蛋的时候，你吸收进很多脂肪，跟它们一起进来的是胆固醇——能够溶于脂肪的物质，很多激素都是从它开始造出来的（见第十号染色体那一章）。肝脏把这些东西消化掉，送它们进入血液循环，以让它们被送到其他器官里去。因为它们不溶于水，胆固醇和甘油三酯必须被名叫脂蛋白的蛋白质“背着”通过血液。在旅途开始的时候，送货的卡车叫做VLDL，是非常低浓度脂蛋白的意思，它装着胆固醇和脂肪。当它卸下它的一些甘油三酯的时候，它就变成了低浓度脂蛋白，或叫LDL（这是“坏的胆固醇”）。最后，在把胆固醇送到地方之后，它又变成高浓度脂蛋白，HDL（这是“好的胆固醇”），又回到肝脏去接受下一批货。
APOE蛋白（叫做apo-s）的任务是把VLDL与一个需要甘油三酯的细胞上的受体介绍给对方；APOB蛋白（或说是apo-P）的任务，是卸胆固醇时做同样的工作。这样，很容易就可以看出，APOE和APOB是与心脏病有关基因的主要候选者。如果它们不正常工作，胆固醇与脂肪就留在血液里，慢慢会在动脉壁上累积起来，成了动脉粥样硬化。APOE基因被“剔除”了的老鼠即使吃正常的老鼠食物也会得动脉粥样硬化。制造脂蛋白与细胞上受体的基因也能够影响胆固醇和脂肪在血液里的行为，影响心脏病的发生。一种遗传的易得心脏病的特性叫做家族性高胆固醇血症，是胆固醇受体基因上一个罕见的“拼写错误”的结果。
APOE之所以特殊，在于其非常“多态”我们并不是所有人都有同一形式的APOE基因，只有很少见的例外。相反，APOE就像眼睛的颜色一样：它有三个常见的类型，叫做E2、E3和E4。因为这三类在从血液里取出甘油三酯的效率有所不同，它们在是否易得心脏病方面也不同。在欧洲，E3是“最好”与最常见的一种:80%以上的人起码有一份E3，39%的人有两份。但是，有两份E4的那7%的人很早就有心脏病的危险比别人高得多，有两份E2的那4%的人也是如此，虽然得病的方式略有不同。
但这是一个全欧洲的平均数。跟其他许多多态性相似，APOE的多态性也有着地理上的趋势。在欧洲，往北走得越远，E4就变得越常见，而E3变得越少（E2是大致不变的）。在瑞典和芬兰，E4的出现频率几乎是在意大利的三倍。因此，冠心病的频率也大致是意大利的三倍。再往远走，差异还更大。大约有30%的欧洲人至少有一份E4;东方人拥有E4的比例最低，在15%左右；美国的黑人、非洲人和波利尼西亚人中，这个比例是40%以上；新几内亚人是50%以上。这也许部分地反映了过去几千年中饮食里脂肪和肥肉的数量。在一段时间里我们已经知道，新几内亚人在吃自己的传统饮食，即甘蔗、芋头和偶尔从负鼠和树袋鼠那里得到的瘦肉时，几乎不得心脏病。但是，只要他们在露天矿上找到工作并开始吃西方的汉堡包与炸薯片时，他们很早就得心脏病的危险便飞快上升了——比大多数欧洲人快得多。
心脏病是可以预防也可以治疗的疾病。特别那些有E2基因的人对高脂肪、高胆固醇的饮食非常敏感，换句话说，只要他们接受警告，远离这样的食品，他们就可以很容易地被治好。这是极有价值的基因信息。通过简单的基因诊断以挑出那些有得病危险的人并着重于他们的治疗，有多少生命可以挽救，有多少早期的心脏病可以避免啊。
基因筛选并不会自动导致人工流产或基因疗法这些极端的解决办法，一个不祥基因的诊断会越来越多地导致不那么极端的治疗方法：去吃人造黄油以代替真黄油，去上健美操课。医学界应该尽快就学会不要警告所有人都避免高脂肪饮食，而是要挑出那些能够从这样的警告里获益的人，让剩下的人放松下来大吃冰激凌吧。这也许与医学界谨慎的直觉相反，却与希波克拉底誓言（希波克拉底是古希腊医师，被誉为西方医学之父，认为医师所医治的不仅是疾病，而且是病人。希波克拉底每次行医，必先吟诵自己的把为病家谋幸福作为第一目的的誓言。希波克拉底誓言被视为医德的基础）不矛盾。
但是，我把你带到APOE这里，主要不是为了写心脏病的，尽管我感到我仍然在违反自己的规定，因为我要写另一种病了。APOE是被研究得最多的基因之一，原因不在于它在心脏病里的作用，而在于它在另一种更邪恶、更无法治疗的疾病中的重要作用：早老性痴呆症。伴随着年龄在很多人那里出现的是摧毁性的记忆与性格的丧失一这在很少的一些年轻人那里也同样会出现，它被归结为各种因素，环境的、病理的，或是偶然原因。诊断早老性痴呆症的症状是大脑细胞里无法溶解的蛋白质“硬块”的出现，它的生长会损坏细胞。病毒感染曾经一度被怀疑是病因，头部经常受打击也同样被怀疑为病因，铝在硬块中的存在使得铝锅有一段时间成了怀疑对象。传统的经验是说，遗传与这种病没有什么关系或只有很少的关系。有一本教科书很坚定地说：“它不是遗传病。”
但是，就像基因工程的发明者之一保罗•伯格所说：“所有疾病都是遗传病”，即使当它也受其他因素影响的时候。终于，在伏尔加德国人（18世纪离开德国到俄国伏尔加地区定居的人。19世纪末，由于资源不足，很多人被送到西伯利亚。伏尔加德国人一直以来很贫困并受到严格的控制，20世纪以来更受到了深重的迫害）现在美国的后裔中，找到了早老性痴呆症以高频率出现的家谱，而且，到了90年代早期，有三个基因被与早发性早老性痴呆症联系起来了。这三个基因，一个在第二十一号染色体上，两个在第十四号染色体上。但是，在1993年，一个比这重要得多的发现是第十九号染色体上的一个基因似乎与老年人的早老性痴呆症有联系，也就是说，老年人中的早老性痴呆症也有部分遗传基础。很快，犯有“罪行”的基因就被找到了，不是别的，正是APOE。
一个血脂基因与一种大脑疾病之间的联系不应该是这样让人惊讶的。说到底，早老性痴呆症患者常常胆固醇也高，这已经被发现了有一阵儿了。不过，它们之间联系的密切性让人吃了一惊。
“坏”的基因形式在这里又是E4。在特别容易得早老性痴呆症的家族里，没有E4基因的那些人得这种病的几率是20%，平均发病年龄是84岁。那些有一份E4基因的人，发病几率上升到47%，平均发病年龄降低到75岁；那些有两份E4基因的人，发病几率是99%平均发病年龄是68岁。换句话说，如果你带有两份E4基因（7%的欧洲人就是如此），你最终得早老性痴呆症的几率大大高于一般人。有些人仍然能够逃过这样的命运——事实上，有一项研究就发现了一个有两份E4的86岁老人，他还保留着他所有的智慧。在很多没有显现出记忆衰退的人当中，早老性痴呆症那经典的硬块仍然存在，它们在带有E4基因的人体内也比带有E3基因的人体内更严重。那些起码带有一份E2基因的人比带有E3基因的人更不容易得早老性痴呆症，尽管他们之间的区别很小。这不是偶然的副产物，也不是统计的巧合：这看上去像是这个病的机理的关键所在。
回想一下，E4在东方人里很少，在白人里常见一些，而在非洲人里更常见，在新几内亚的美拉尼西亚人（Melanisian）中最为常见。随之而来的应该是早老性痴呆症也遵从这样一个梯度，但是，事情并不这么简单。相比于E3/E3的人，得早老性痴呆症的相对危险在E4/E4的白人里比E4/E4的黑人和拉丁美洲人里都高得多。也许，是否容易得早老性痴呆症还受其他基因影响，而这些基因在不同的种族之间有所不同。而且，E4的效果在女性中似乎比在男性中更强。不仅仅有更多女性得早老性痴呆症，而且E4/E3的女性与E4/E4的人有同样的得病危险。在男性当中，有一份E3就可以降低危险。
你也许在想，为什么E4还能够存在，更别说还以这么高的频率而存在。如果它既加剧心脏病又加剧早老性痴呆症，它当然应该已经在很早以前就被更无害的E3和E2灭绝掉了。我则倾向于这样来回答这个问题：高脂肪的饮食直到最近以前还是非常少见的，它对冠状动脉的副作用几乎不重要，而早老性痴呆症对于自然选择来说根本是不相关的，因为它不仅仅是发生在那些得病之前很早就已经把孩子抚养成人的人身上，而且在人们受到它袭击的那个年龄，大多数石器时代的人早就死了。但是，我不太肯定这是不是一个好的回答，因为多肉多奶酪的饮食在世界上的某些部分已经存在很久了——长得足够让自然选择去做它的工作了。我怀疑E4在身体里还有另外一个我们不知道的功能，在这个功能上它比E3强。记住：基因的存在不是为了导致疾病。
E4与更常见的E3之间的区别在于：基因的第334个字母是G而不是A，E3与E2之间的区别是第472个字母是G而不是A。这样的结果是：E2蛋白质比E4多了两个半胱氨酸，而E4比E2多了两个精氨酸，E3介于两者之间。这些细微的变化，在一个有897个字母长的基因上，足够改变APOE蛋白质工作的方式。那个工作到底是什么，还很模糊，但是有一个理论是说，它的作用是稳定另外一个名字叫tau的蛋白质，而tau的作用又可能是保持一个神经细胞的管状“骨架”的形状。tau对于磷酸盐很有亲和性，而磷酸盐却阻止它做自己的工作；APOE的工作就是让tau别碰磷酸盐。另外一个理论是说，APOE在大脑里的工作与它在血液里的工作有相似之处。它带着胆固醇走在脑细胞之间和脑细胞内部，使得脑细胞可以建造和修理它们那些脂肪不能穿过的细胞膜。第三个较为直接的理论是说，不管APOE的工作是什么，E4都对一种淀粉状p多肽有很强的亲和力，而这正是积累在早老性痴呆症患者神经细胞里的东西，APOE则以某种方法帮助这些具有毁灭性的硬块的形成。
这些细节有一天会变得重要，但是现在，重要的事实是我们突然掌握了一种作预测的方法。我们可以检测个体的基因，做出相当好的预言来预测他们是否会得早老性痴呆症。遗传学家埃里克•兰德（EricLander）最近提出了一个让人震惊的可能性。我们现在知道罗纳德•里根（Ronald Reagan）就有早老性痴呆症，现在回想起来，似乎有可能他还在白宫里的时候就有了此病的早期症状。假设在1979年一个又肯干又有倾向性的记者急于发现某种方法来丢里根这个总统候选人的脸，假设他抄走了一张里根用来擦过嘴的纸巾并检测了上面的DNA（先忽略这样的检测当时还没有出现这一事实吧）。假设他发现了这个历史上年龄第二大的总统候选人很有可能在任职期间患上早老性痴呆症，并把他的发现在他的报纸上刊登出来。
这个故事刻画了基因测试所带来的对于公民自由的威胁。当问到我们是否应该提供APOE测试给那些好奇地想知道自己是否会得早老性痴呆症的人，大多数医学界人士都回答：否。最近，在深思熟虑之后，英国在这方面最好的思想库——纳菲尔德生物伦理委员会（Nuffield Council on Bioethics）——也做出了同样的结论。检查某人是否患有一种无药可治的病，说得再好听，也是值得怀疑的。它可以为那些没有E4基因的人买来安心，但却付出了高昂的代价：那些有两份E4基因的人几乎无疑会得到无药可治的痴呆症的“判决”。如果这样的诊断是绝对可靠的，那么_{（就像南希•韦克斯勒对于亨廷顿氏病所说的——见第四号染色体那一章）}，这样的检测可能对人的打击更大。另一方面，亨廷顿氏病这265样的测试，起码不会误导人。但是在不是那么肯定的情况下，比如说APOE的例子，这种测试的价值就更低了。你仍然可以——如果你非常幸运一有两份E4基因却活到很大年纪都没有症状，正如你仍然可以——如果你运气非常差——没有E4基因而在65岁的时候患上早老性痴呆症。因为有两份E4这样一个诊断既不是患早老性痴呆症的充分条件也不是必要条件，又因为这病无法治疗，别人不应该向你提供基因测试，除非你已经有了这个病的症状。
一开始，我认为所有这些理由都很让人信服，但是现在我不这么肯定了。说到底，给人提供HIV病毒检测（只要他们自己想要）被认为是符合伦理的，虽然艾滋病（直到最近以前）是无药可治的。艾滋病并不是HIV感染之后的必然结果：有些人虽然有HIV感染却能够无限期地存活。不错，在HIV的例子里，社会还有另外一个愿望，就是阻止HIV感染的传播，而这在早老性痴呆症里就没有。但是，我们在这里考虑的是有患病危险的那些个体而不是整个社会。纳菲尔德委员会是通过不言明地把基因测试和其他测试区分开的方法来对待这个问题的。一份报告的作者菲奥娜•考尔迪科特（Fiona Caldicott）夫人说，把一个人容易得某种疾病的特点归结于他的基因组成，可以扭曲人们的态度。它使人们错误地相信遗传的影响是至关重要的，这使得他们忽略社会以及其他因素，而这又使得与精神疾病联系在一起的耻辱更多了。
这是一个被不恰当地运用了的恰当的观点。纳菲尔德委员会是在使用双重标准。心理分析学家和精神病学家对于精神疾病提供“社会”解释，他们只需要最薄弱的证据就可以得到执照去行医，而这些解释与遗传解释一样可能让一些人显得更耻辱。这些“社会”解释持续繁荣，而“伟大正义”的生物伦理学却把另外一些有根据的诊断定为“非法”，只因为它们是基因方面的解释。在努力寻找理由去禁止用基因作解释却又允许用社会作解释大行其道的时候，纳菲尔德委员会甚至采用了这样的方法：称APOE4检测的预测能力“非常低”。这是一个奇怪的用词方法，因为在E4/E4与E3/E3之间，得病的危险有11倍的区别。就像约翰•麦道克斯引用APOE这个例子来阐明他的观点时评论的一样：有些根据，使人怀疑医生们在向他们的病人提供不受欢迎的基因信息时很鋳躇，也因此而没有抓住有价值的机会，……这种鋳躇有时有些过度。”
另外，尽管早老性痴呆症没有治愈方法，现在已经有药物来减轻一些症状，也可能有一些可以让人们使用的预防措施去防止得病，虽然这些措施有多大价值还不确定。一个人使用所有的预防措施难道不是更好吗？如果我有两份E4，我可能很愿意知道，这样我可以做志愿者去试验新的药物。对于那些在行为上放纵自己从而会增加得病机会的人来说，这样的检测无疑是有意义的。例如，现在已经很明显，带有两份E4基因的职业拳击手得早发性早老性痴呆症的机会如此之大，拳击手们的确是被告知他们最好是去作检测，如果发现自己有两份E4基因就不要再搞拳击了。每六个拳击手中就有一个在40岁之前会得震颤麻痹或是早老性痴呆症——在微观上它们的症状是相似的，但是致病基因却不同——很多人，包括穆罕默德•阿里（Mohammed Ali）（穆罕默德•阿里：美国20世纪著名黑人拳击手，奥运会冠军），得病的年龄还要更早。在那些得早老性痴呆症的拳击手中，E4基因不同寻常地常见，在那些受到过头部伤害，之后又发现神经细胞里有硬块的人当中，也是如此。
在拳击手那里出现的事情，在其他头部会受冲击的运动里可能也会出现。有一些道听途说的证据表明很多优秀的足球运动员在上了年纪之后过早地衰老——最近的一些伤心的例子是英国倶乐部队的丹尼•布兰茨弗劳尔（Danny Blanch flower）、乔•默瑟（JoeMercer）和比尔•佩斯利（BillPaisley），被这些证据提醒，神经学家们已经开始研究在这些运动员中早老性痴呆症的普遍性。有人计算出，一个足球运动员在一个赛季里平均要顶头球800次，对头部的损害和磨损可以是很可观的。荷兰的一项研究确实发现足球运动员比起其他项目的运动员来有更严重的记忆衰退，挪威的一项研究则发现了足球运动员脑部损伤的证据。在这里又有这样的可能，即如果E4/E4纯合子起码在选择职业时能够知道自己面临很高的危险，还是有可能受益的。我是经常把头撞在门框上的一个人，因为建筑师没有把它们设计得高到让个子高的人也能走过，我自己也在想，我的APOE基因是什么样子的。也许我也应该去测试一次。
测试还可以有其他价值。起码有三种新的早老性痴呆症药物在发展和试验阶段。已经使用的药物，泰克林（tacrine），现在我们知道它对于带有E3或E2基因的人要比对带有E4基因的人效果好。基因组一次又一次地把“个体差异”这一课给我们上到家了。人类的多样性是它最重要的信息。但是在医学界，人们仍然明显地不情愿把人当做个体来治疗，而愿意把人当成群体来治疗。对一个人合适的治疗方法也许对另外一个人就不合适。饮食上的建议可以挽救一个人的生命，对另外一个人却可能一点用处都没有。将来会有这么一天，医生在给你开一大堆药之前先要检查一下你带有的是哪一种基因。这样的技术已经在被开发了，一个加利福尼亚的小公司爱菲梅特利克斯与其他公司一道试图把一整个基因组的基因序列都放到一个硅片上去。有一天，我们也许每人都会随身带着这样一个芯片，医生的电脑通过它就可以读出任何基因，这样，医生就可以更好地使他的处方适应我们的情况。
也许你已经感觉到了这样做的问题是什么——以及专家们对于APOE检测过于谨慎的真正原因。假设我真的有E4/E4，而且我是一个职业拳击手。我因此有比一般人高得多的可能会发作心绞痛和早发性早老性痴呆症。假设我今天不是去看医生，而是去见一个医疗保险代理商，想搞一份新的人寿保险以配合我的房屋抵押，或者是搞一份新的医疗保险以应对将来的疾病。我拿到一份表格，被要求填写对这样一些问题的回答：我是否吸烟，喝多少酒，是否有艾滋病，体重多少，是否有心脏病的家族史——这是个遗传问题。每一个问题都设计得用来把我归类到一个特殊的风险级别，这样，我才可以得到一个既可以让保险公司赢利又仍然有竞争力的报价。很合乎逻辑的事是，保险公司很快也会要求看看我的基因，问问我是E4/E4还是有一对E3。它担心我也许是因为从最近的一次基因检测中知道我自己肯定要完蛋了，所以大买特买人寿保险，就像一个计划放火烧楼的人给楼买保险一样，坑保险公司一笔。不仅如此，它还看到，它可以通过给基因检测结果令人放心的那些人提供折扣价来吸引到让它赢利的生意。这被人们叫做“摘樱桃”这也正是为什么一个年轻、瘦削、非同性恋、不吸烟的人已经发现：比起那些年老、胖墩墩的同性恋吸烟者，他可以买到很便宜的人寿保险。有两份E4基因跟这样也差不多。
在美国，健康保险公司已经对早老性痴呆症的基因检测感兴趣了，这没有什么奇怪的，早老性痴呆症可以是需要保险公司拿出高额开销的疾病（在英国，医疗保险基本上是免费的，主要的担心是人寿保险）。但是，保险公司在开始对同性恋者比对异性恋者收取更高保费以反映出同性恋得艾滋病的更大可能性时，引起了人们极大的愤怒。因为还记着这件事，所以保险公司现在是在小心翼翼地探路。如果基因检测对很多基因都成为常规的事情，那么，整个群体风险的概念，保险业的基础，就会受到影响。一旦我的命运被精确地了解，我就会得到这样一个保险费的报价：它会正好够我一生看病所用。对于那些在基因上很不幸的人来说，这样的保费也许是他们负担不起的：他们就会成为医疗保险里的下层阶级。因为对这些问题很敏感，英国的保险业联合会在1997年同意两年之内它们不得把做基因检测作为买保险的条件，而且不得（对10万英镑以下的房屋抵押）要求知道你已经做过的基因检测的结果。有些公司走得更远，声明基因检测不在它们的计划之内。但是这样的羞羞答答可能长不了。
为什么人们对这个问题有如此强烈的感受，当它在实际中意味着很多人的保险费会降低？事实上，与生命中很多其他事情不同，基因上的好运气是在“受了眷顾”与没有“受眷顾”的人当中平均分配的——富人无法买到好基因，虽然富人原本就在保险上花更多的钱。我想，答案是在决定论的核心里。一个人在吸烟喝酒方面的决定，甚至是让他患上艾滋病的决定，在某种意义上来说是他自愿做出的。他“决定”在APOE基因上有两份E4，这却根本不是一个决定；这是大自然替他做出的决定。在APOE基因的基础上对人歧视就像是以皮肤颜色或性别为基础对人歧视。一个不吸烟的人也许可以很正当地拒绝与吸烟者被放在同一个风险级别里，拒绝给吸烟者的保险费提供“补贴”但是，如果一个E3/E3的人拒绝“补贴”E4/E4者的保险费，他却是在对一个什么错都没有只是运气不好的人表达偏执与偏见。
对于用人单位拿基因检测来挑选可能雇用谁，这样的担心倒不多。即使有更多的检测成为可能，也没有什么东西可以引诱用人单位去使用它们。事实上，当我们对“基因决定我们对环境中的风险有多敏感”这个说法更为习惯之后，有些检测也许会对用人单位和雇员都成为好的做法。在一个要与已知的致癌物质（比如说日光）有一定接触的工作上（比如说，救生员），用人单位如果雇用有着不正常的p53基因的人，在将来也许会算是忽视自己关心员工的责任。在另一方面，用人单位也许出于更加自私的动机会要求申请工作的人去进行基因检测：以选择先天更健康或有更外向的性格的人（这些正是找工作时的面试所要达到的目的）。但是，已经有法律规定不得歧视了。
同时，有一种危险，就是为保险而作基因测试或为选择雇员而作基因测试这样的“怪物”会把我们吓得不敢为了发展更好的医药的目的而进行基因测试。但是，有另外一个怪物让我更害怕：那就是担心政府要告诉我，我能如何使用自己的基因。我很不希望与保险公司分享我的遗传密码，我很希望我的医生能够知道并利用它，但是我坚持这应该是我自己的决定，而且我的这种坚持到了狂热的程度。我的基因组是我的财产，不是国家的。我和谁应该分享我的基因的内容是不应该由政府决定的，我是否应该作基因检测是不应该由政府决定的。这些应该由我决定。有一种很可怕的“父性”倾向，认为“我们”在这些问题上应该有一个统一的政策，认为政府应该制定规则来决定你可以看到多少你自己的遗传密码，你可以把它给什么人看。但是它是你的，不是政府的，你应该永远记住这一点。
第二十号染色体政治
噢，英国的烧牛肉，古老英国的烧牛肉。 ——亨利·费尔丁《格拉博街歌剧》
科学的燃料是无知。科学就像一个饥饿的火炉，必须要从包围着我们的无知森林中取来木柴喂给它。在这个过程中，我们称做“知识”的开阔地扩展开来，但是，它扩展得越大，它的边界就越长，越多的无知就出现在我们面前。在基因组被发现以前，我们不知道在每一个细胞的“心脏”里都有一个30亿个字母长的文件，我们对它的内容一无所知。现在，当我们读了这本书的一部分之后，我们就意识到了很多新的神秘现象。
这一章的主题就是神秘。一个真正的科学家认为知识很沉闷；向无知——以前的发现揭示出来的新的神秘现象——开战才会让他来劲。森林比开阔地更有意思。在第20号染色体上有一个小“灌木丛”，它既迷人又恼人，比起哪个神秘现象来也不逊色。它已经造就了两个诺贝尔奖，只不过是因为发现了它的存在，但它固执地抵抗着，不肯被砍伐下来成为空地。而且，就像是要提醒我们，具有神秘性的知识有一种习惯是要改变世界，在1996年的某一天，它成为了最具煽动性的政治问题之一。它与一个名叫PRP的小小基因有关。
故事从羊开始。在18世纪的英国，一组企业家先驱给农业带来了革命。在这些企业家中有莱切斯特郡的罗伯特·贝克维尔（Rober tBakewell）。他的发现是：通过让羊和牛有选择地与自己的后代里最出色的那些来交配的方法，可以使人们喜欢的特点以更高的频率出现，迅速改良品种。这种近亲繁殖用到了羊身上，产生了生长快、肉肥、毛长的羊。但是，它有一个没有预料到的副产品。萨佛克种的羊尤其明显地在年老之后出现了精神错乱的症状。它们挠自己、走路蹒跚、用一种奇怪的步子小跑，变得焦虑，似乎对抗群体生活。它们很快就死了。这种无法治愈的疾病叫做瘙痒症，它成了一个大问题，常常是每十只母羊里就有一只死于这个病。瘙痒症随着萨佛克种的羊，在较小程度上也随着其他品种的羊，来到了世界其他地方。它的病因仍然是个谜。它似乎不是遗传的疾病，但是它也无法被追踪到另外一个起因。在30年代，一位兽医学研究者在试验另外一种疾病的疫苗时，导致了瘙痒症在英国的一场大传播。这个疫苗的一部分来自其他羊的脑子，尽管这些脑子已经用福尔马林彻底消毒过了，它们仍然保留了部分传播感染的能力。从那时开始，兽医学家们就形成了一个“正统”的观念，且不说这个观点还是受了“蒙蔽”的：既然瘙痒症可以传播，它肯定是由什么微生物引起的。
但是，什么微生物呢？福尔马林没有杀死它。清洁剂、煮沸和用紫外光照射也杀不死它。这个微生物能够通过连最小的病毒都能够挡住的过滤器。它在受感染的动物体内不引起任何免疫反应，有些时候，从注入致病物到发病之间有很长的延迟——但是如果把带病体直接注射入大脑，延迟就会短得多。瘙痒症筑起了一道让人摸不着头脑的无知的墙，打败了一代意志坚强的科学家。在相似症状出现在美国貂养殖场和落基山脉一些国家公园里居住的野生麋和黑尾鹿时，它的神秘性反而更深了。如果在实验室里把带病体直接注射入体内，貂对于羊的瘙痒症是有抵抗力的。到了1962年，一位科学家又回到了遗传的假说。他提出，也许瘙痒症既是遗传病又是可以传染的，这在那时还是一种没有听说过的组合。遗传病多得是，由遗传因素决定是否易受感染的传染病也很多——霍乱现在是一个经典的例子了——但是一个有传染性的“颗粒”能够通过某种方式在生殖细胞里旅行，这种说法似乎违反所有的生物学定律。这位科学家——詹姆斯·帕里（James Parry）——坚定不移。
大约就在这个时候，一位美国科学家——比尔·哈德洛（Bill Hadlow）——在伦敦维尔康姆医学博物馆看到了被瘙痒症困扰的病羊那些受了损害的大脑的图片。他被这些图片与他以前在另外一个非常不同的地方所见的图片之间的相似而震动了。瘙痒症马上就要变得跟人类更加有关了。另外那个地方是巴布亚新几内亚，在那里有一种可怕的、让人丧失能力的大脑疾病，名字叫做酷鲁（Kuru），它在一个名叫佛尔的部落里已经打倒了大批的人，尤其是妇女。一开始，她们的腿开始晃晃悠悠，然后，她们的整个身体开始摇晃，她们说话开始吐字不清，她们突然会出人预料地大笑起来。在一年之内，因为大脑逐渐从内向外瓦解，病人也就死了。到了50年代末期，酷鲁已经是佛尔妇女死亡的主要原因了。它杀死了如此之多的妇女，使得在部落里男性和女性的比例成了三比一。儿童也得上了这种病，但是相比之下成年男性得病的很少。
后来证明这是一个关键的线索。在1957年，两个在那个地区工作的西方医生，文森特·齐嘎斯_{（Vincent Zigas，生物学家）}和卡尔顿·盖达塞克_{（Carlton Gajdusek，生物学家，1976年获诺贝尔生理学和医学奖）}很快意识到了在发生什么。当有人死了的时候，尸体被部落里的妇女以固定仪式肢解，作为葬礼仪式的一部分，而且据传还会被吃掉。葬礼上的吃人习俗已经快要被政府铲除掉了，它已经有了足够的恶名，很少有人愿意公开谈论。这使得有些人怀疑它是否真的在过去发生过。佛尔人用断续、嗑巴的英语描述1960年前的葬礼是“切开、煮、吃”，但是，盖达塞克和其他人搜集了足够多的证人的叙述，使得人们不再认为这样的说法是在撒谎。一般情况下妇女和儿童吃内脏和脑子，男人吃肌肉。这立刻就为酷鲁病的发生提示了一个解释。它在妇女和儿童中最常见，它出现在死者的亲属里——但是在姻亲和血亲里都出现。在吃人的习俗被定为不合法之后，发病年龄稳定地提高了。说得具体一些，盖达塞克的学生罗伯特·克里茨曼（Robert Klitzman）查出了三群死亡者，每一群死者都在40年代和50年代参加过因酷鲁病而死的人的葬礼。例如，在1954年有一个为一位名叫尼诺的妇女举行的葬礼，参加葬礼的15名亲戚中有12名后来死于酷鲁。那三个没有死于酷鲁的人一个是在很年轻时就死于其他原因了，一个是因为她与死者嫁给了同一个男子，所以传统上禁止她参与吃尸体的行为，一个是事后声称她只吃了一只手。
当比尔?哈德洛看到被酷鲁病折磨的人脑与被瘙痒症折磨的羊脑之间的相似性时，他立刻给在新几内亚的盖达塞克写了信。盖达塞克跟踪了这个线索。如果酷鲁病是瘙痒症的一种，那么就应该可以通过直接往脑子里注射的办法把它由人传给动物。在1962年，他的同事乔·吉布斯（Joe Gibbs）开始了一长串的实验，试图用佛尔部落死人的脑子使猩猩和猴子感染上酷鲁病（这样的实验在今天是否会被认为是符合伦理的，不在本书讨论范围之内）。头两只猩猩在接受了注射之后的两年之内得了病，死了。它们的症状很像那些酷鲁病人的症状。
证明酷鲁病是瘙痒症在人体里的自然表现形式并没有什么帮助，因为瘙痒症研究在到底什么是病因的问题上把人搞糊涂了。自从1900年以来，一种罕见又致命的大脑疾病就一直困扰着神经学家。这种病后来被叫做克鲁茨菲尔特—雅各布病（Creutzfeldt-Jacob），或简称CJD。它的第一个病例是1900年由布列斯劳_{（Breslau，当时德国的一个城市，现属波兰）}的汉斯·克鲁茨菲尔特（Hans Creutzfeldt）诊断出来的，病人是一个11岁的女孩，她在那之后的十年里死去了。因为CJD几乎从来不袭击特别年轻的人，而且得病之后死得也快，这个病例乍看起来几乎肯定是一个奇怪的误诊，它给我们留下的迷惑对于这个神秘的病来说是太典型了：第一个被查出的CJD病人原来没有这个病。但是在20年代，阿尔方斯·雅各布（Alfons Jakob）确实发现了一些可能是CJD的病例，于是病的名字就定下来了。
吉布斯的猩猩和猴子很快就被证明对CJD与对酷鲁一样敏感。在1977年，事情的发展向更可怕的方向转了个弯。两个癫痫病人在同一家医院里接受了运用微电极进行的试验性脑手术之后都染上了CJD。这些电极以前在一个CJD患者身上被使用过，但是使用之后它们被用适当方式消毒过了。那致病的神秘东西不仅能够抵挡住福尔马林、清洁剂、煮沸和照射，它还能抵挡住手术器械的消毒。这些电极被空运到贝塞斯达（Bethesda）（美国国家卫生研究院所在地。——译者注），去在猩猩身上使用，它们也很快染上了CJD。这被证明是一个新的而又更加古怪的流行病：“由医生引起的”CJD。从那时到现在它杀死了近100人，都是身材矮小的人使用了从尸体的脑垂体中分离出来的人体生长激素。因为每一个病人接受的人体生长激素都来自好几千个脑垂体，提取的过程就把很少几个自然出现的CJD病给放大成了一个真正的流行病。但是，如果你谴责科学是在以福斯特式的行为与自然捣乱而引火烧身，那么你也得给它些荣誉，因为它解决了这个问题。生长激素引起的CJD规模有多大是在1984年被了解到的，但早在这之前，合成生长激素，最早的来自经过基因工程改造的细菌的产品之一，就已经在代替从尸体里提取的激素了。
让我们来盘点一下这个奇怪的故事在1980年左右时的样子吧。羊、貂、猴子、老鼠和人都可以因为注射受了感染的脑组织而染上同一种病的不同形式。这个感染经受住了几乎所有通常的杀灭微生物的程序，而且，在最有威力的电子显微镜下它仍然是隐形的。但是在日常生活里它又不传染，似乎没有通过母亲的乳汁传染，不引起任何免疫反应，有些时候可以在休眠状态里呆上二三十年，只需要些许剂量就可以染病——虽然染病的可能性与剂量非常有关。它到底是什么呢？
在所有这些兴奋当中几乎被忘记了的是萨佛克羊的病例，以及近亲繁殖看上去似乎加剧了瘙痒症这个线索。逐渐变得清楚的还有，在几个病人那里——尽管只占总数的不到6％——似乎有一些家族的联系，暗示着这可能是遗传病。了解瘙痒症的关键不是在病理学家所掌握的那套“武器库”里，而是在遗传学家的“武器库”里。瘙痒症存在于基因里。这个事实在以色列表现得最充分。当以色列科学家在70年代中期在自己的国家里寻找CJD病例的时候，他们注意到了一个不寻常的事情。整整14个病例，或者说，是偶然发生率的30倍，出现在从利比亚移民到以色列的那为数很少的犹太人当中。立刻，怀疑到了他们的饮食上面，而那包括了对羊脑的特别爱好。但是，这不是问题所在。真正的解释是遗传方面的：所有得病的人都属于一个分散开了的家族。现在知道，他们都带有同一个突变，这个突变在斯洛伐克、智利和德国裔美国人的几个家庭里也找到了。
瘙痒症的世界很怪异、很异乎寻常，却也模模糊糊地有点熟悉。就在一组科学家抵挡不住诱惑要把瘙痒症总结为遗传病的同时，另外一组却在琢磨一个革命性的、事实上可以说是异端邪说的想法，在一开始它似乎是向与遗传病相反的方向走的。早在1967年，有人就提出，传播瘙痒症的东西可能既不含有DNA也不含有RNA。它也许是地球上惟一不用核酸也没有自己的基因的生命。因为弗兰西斯?克里克刚刚在那之前不久发明了被他半严肃地称为“遗传的中心教义”这个词——DNA制造RNA制造蛋白质——有一种生命没有DNA，这个主张在生物学里所受的欢迎，与路德_{（Luther，16世纪宗教改革家）}的主张在罗马教廷所受的欢迎一般。
1982年，一位遗传学家，斯坦利?普鲁西纳（StanleyPrusiner）提出一个方案，来解决一个没有DNA的生命与一种在人类DNA里游走的疾病之间明显的矛盾。普鲁西纳发现一团能够不被普通的蛋白酶切碎的蛋白质，它在有瘙痒症类疾病的动物体内存在，在同样一种动物健康的个体里却不存在。他比较容易地就得到了这一团蛋白质里氨基酸的序列，并推测出与其等价的DNA序列，然后他在老鼠的基因里寻找这个序列，后来在人类基因里也找了。普鲁西纳就这样发现了一个基因，名叫PRP（抵抗蛋白酶的蛋白质），并且把他的“异端邪说”钉到了科学这个教堂的大门上。他的理论在之后的几年里逐渐发展起来，是这样的：PRP是老鼠和人类体内的正常基因，它制造一个正常的蛋白质。它不是一个病毒的基因。但是，它的产品，名字叫做蛋白侵染子的，是一个有着不寻常性质的蛋白质，它可以突然改变自己的形状，变成一个又硬又黏的东西，抵御住所有想要摧毁它的企图，并结成一团，破坏细胞的结构。所有这些已经够史无前例的了，但是普鲁西纳还提出了更异乎寻常的东西——这种新型的蛋白侵染子有能力改变正常的蛋白侵染子，使其成为像自己一样的形状。它不改变蛋白质的序列——蛋白质与基因一样也是由长长的数码序列组成——但是它改变蛋白质的折叠方式。
普鲁西纳的理论摔在了石头地上。它未能解释瘙痒症与类似疾病的一些最基本的特点，具体地说，它未能解释这个病有多种形式这样一个事实。正如普鲁西纳今天沮丧地说的：“这样的假说得不到什么热情。”我还清楚地记得，那时我在写一篇文章时询问专家对于普鲁西纳理论的意见，而那些专家谈到普鲁西纳的理论时带有一种轻蔑。但是，慢慢地，随着证据的积累，看起来他似乎是猜对了。最终变得清楚起来的是，没有蛋白侵染子基因的老鼠不会染上这一类病里的任何一种，而一剂形状不对的蛋白侵染子就够让一只老鼠得病了：这些病是由蛋白侵染子造成的，也是通过它们传播的。但是，尽管普鲁西纳的理论从那时起砍倒了一大片无知的林子——普鲁西纳也恰当地尾随着盖达塞克去斯德哥尔摩拿回了诺贝尔奖（普鲁西纳于1997年获诺贝尔生理学和医学奖。——译者注）——大片林子仍然存在。蛋白侵染子保持着深深的神秘性，最突出的一个是它们到底是为了什么而存在。PRP基因不仅在所有检查过的哺乳动物里都存在，它的序列也很少有变化，这暗示着它是在做什么很重要的工作。这个工作几乎肯定是与大脑有关，因为大脑是这个基因被激活的地方。这个工作也许需要铜，因为铜是蛋白侵染子很喜欢的东西。但是——这是它的神秘所在——一只老鼠的两份PRP基因如果在出生之前就被有意拿掉，它仍然是一只完全正常的老鼠。看起来，不管蛋白侵染子的工作是什么，老鼠可以不需要它就长大。为什么我们要有这么一个有潜在致命性的基因？我们仍然不得而知。
同时，我们只差一两个突变就会从我们自己的瘙痒症基因那里得上这个病。在人体内，这个基因是有253个三个字母长的词。尽管最前面的22个和最后面的23个在蛋白质一制造出来的时候就被砍下去了。只在四个位置上，一个改变会引发疾病——四种不同形式的疾病。把第102个词从脯氨酸变成亮氨酸会引起戈斯特曼—斯特劳斯勒—杉克病（Gerstmann-Straussler-Scheinker），这是一种遗传病，病人可以存活很长时间。把第200个词从谷氨酰胺改成赖氨酸会引起在来自利比亚的犹太人当中典型的CJD病。把第178个词从天冬氨酸改成天冬酰胺引起典型的CJD，除非第129个词也同时被从缬氨酸改成甲硫氨酸。在这种情况下，结果是由蛋白侵染子引起的疾病里最可怕的一种。这是一种罕见的疾病，被称为致命家族失眠症，在几个月彻底的失眠之后死亡。在这个病里，丘脑（也就是大脑里的睡眠中心之一）被疾病吞噬掉了。看来，蛋白侵染子引起的不同疾病的不同症状，是不同的大脑区域被侵蚀的结果。
在这些事实最初变得清楚之后的十年，科学在进一步探索这个基因的神秘性方面成果辉煌。从普鲁西纳和其他人的实验室里，巧妙得几乎让人发懵的实验不断涌现出来，揭示了一个不同寻常的关于决定性和专一性的故事。“坏”的疾病通过重新折叠它的中心部分（第108到第121个词）来改变自己的形状。在这个区域里的一个突变会使形状的改变更容易发生，它在一只老鼠生命的如此早期就会致死，蛋白侵染子在出生之后的几个星期之内就会发作。我们在不同种类的蛋白侵染子疾病中所看到的突变，都是“边缘”性质的，它们只稍微改变一下蛋白质形状改变的机会。这样，科学告诉了我们越来越多有关蛋白侵染子疾病的事情，但是，每一条新知识只暴露出了更深的神秘。
这个形状的改变到底是怎么发生的？是否像普鲁西纳所设想的那样，还需要有未被发现的第二个蛋白质，被称为X蛋白质的那个？如果真是如此，为什么我们无法发现它？我们不知道。
同样的一个基因，在大脑的所有区域都表达，它怎么可能根据自己带有什么样的突变而在不同的区域里有不同的表现呢？在山羊里，疾病的症状可以是嗜睡也可以是过度兴奋，看它们得的是两种疾病形式里的哪一种。我们不知道这是为什么。
为什么物种之间有一道屏障，使得这些疾病在物种之间很难传递，在一个物种之内却很容易？为什么通过口腔传染不容易得病，而直接注射到脑子里却相对比较容易？我们不知道。
为什么症状的出现由剂量大小决定？一只老鼠摄入的蛋白侵染子越多，发病就越快。一只老鼠拥有的蛋白侵染子基因份数越多，注射“无赖”蛋白质之后发病就越快。为什么？我们不知道。
为什么杂合体要比纯合体更安全？换句话说，如果在你的一份基因上第129个词是缬氨酸，在另一份上是甲硫氨酸，你为什么就会比那些有两份缬氨酸或是两份甲硫氨酸的人对蛋白侵染子疾病有更强的抵抗力（致死家族失眠症除外）？我们不知道。
这些疾病为什么这么挑剔？老鼠很难患上仓鼠瘙痒症，反过来也一样。但是，一只被人工加了仓鼠蛋白侵染子基因的老鼠，却在接受仓鼠脑子的注射之后能够患上仓鼠瘙痒症。一只带有两份不同的人类蛋白侵染子基因的老鼠，能够患上两种人类的疾病，一种像是致死家族失眠症，一种像是CJD。一只既有人类蛋白侵染子基因又有老鼠蛋白侵染子基因的老鼠，比起只有人类蛋白侵染子基因的老鼠，患病会更慢。这是否说明不同的蛋白侵染子相互有竞争？我们不知道。
这个基因在穿过一个新的物种时是怎样改变它的品系的？老鼠很难患上仓鼠瘙痒症，但是一旦患上了，它们就把它越来越容易地传给其他老鼠。为什么？我们不知道。
为什么这个疾病从接受注射的位置缓慢而逐渐地传播开去，仿佛坏的蛋白侵染子只能够改变那些就在它们旁边的好的蛋白侵染子？我们知道这个疾病要通过免疫系统里的B细胞，它们不知怎么一来就把这病传到脑子里去了。但是为什么是B细胞？是怎样传递的？我们不知道。
这个不断扩展的对于我们的无知的了解，它真正让人迷惑的一个方面是它冲击了比弗兰西斯?克里克的那个教义还更中心的遗传学教义。它削弱了我从这本书的第一章就开始宣讲的内容之一，那就是：生物学的核心是数码式的。在这里，在蛋白侵染子基因上，我们确有像样的数码突变，用一个词代替了另一个词，但它导致的后果离开其他知识就是无法预测的。蛋白侵染子系统是个逻辑系统，不是数码系统。它的改变不是序列上的而是形状的改变，它还与剂量、位置以及是否在刮西风有关。这并不是说它没有决定作用。要说起开始发病的年龄来，CJD比起亨廷顿氏病还准确呢。过去的记录里曾有不居住在一起的兄弟姐妹在完全相同的年龄发病的。
蛋白侵染子疾病是一种链式反应引起的，一个蛋白侵染子把它的邻居变成跟它自己一样的形状，它的邻居们再去改变其他的，就这样呈指数式地继续下去。它就像是1933年有一天列奥?希拉德（LeoSzilard）（匈牙利物理学家，核物理中链式反应的发明人。——译者注）在伦敦等着过马路的时候在他脑子里想出来的一个决定人类命运的图景：一个原子裂开放出两个中子，每个中子导致另外一个原子裂开又放出两个中子，这样继续下去——这个图景里的链式反应后来在广岛爆炸了。蛋白侵染子的链式反应当然比中子链式反应慢得多，但是它也同样有能力形成一个指数式的“爆炸”，还在普鲁西纳在80年代早期刚刚开始破解其中细节的时候，新几内亚的酷鲁流行病就是这种可能性的一个证据。但是，在离家更近的地方，一个更大的蛋白侵染子流行病已经开始了它的链式反应。这一次，牺牲品是牛。
没有人确切地知道是在什么时候、什么地点、怎么样——又是那该死的神秘性——但是在70年代晚期或80年代早期的某个时候，英国牛肉食品的制造商开始把形状不对的蛋白侵染子加进了他们的产品。它也许是因为在牛脂降价之后工厂里的生产过程有所变化，也许是因为有更多的年老的羊找到了进入工厂的路，多谢慷慨的羊肉补贴。不管原因是什么，形状错误的蛋白侵染子进入了生产系统：它所需要的只是一只被高度感染的、被瘙痒症困扰的动物进入给牛做的牛食。老牛和羊的骨头和下水先要被煮沸消毒之后才能够被做成富含蛋白质的添加剂，给奶牛食用，但这没有用处。瘙痒症里的蛋白侵染子在煮沸之后仍然“存活”。
把蛋白侵染子疾病传给一头牛的机会仍然非常小，但是如果有千万头牛，那就够了。一旦最初的几例“疯牛病”又重新进入食物链，被做成食物给其他的牛吃，链式反应就开始了。越来越多的蛋白侵染子进入了牛食饼，给新的小牛越来越高的剂量。较长的潜伏期意味着那些完蛋了的牛平均在5年之后才出现症状。在1986年底，当人们认识到最初的6个病例不同寻常的时候，在英国已经大约有3万头牛被感染上了，尽管此前没人知道这件事情。最终，在90年代晚期此种病几乎被全歼之前，有18万头牛死于牛海绵状脑病。
在第一个病例被报告之后的一年之内，政府兽医那精湛的侦探工作就把受污染的饲料确认为问题根源。它是惟一符合所有细节的理论，还能解释奇怪的异常现象，比如说，在古恩希岛（Guernsey）发生的流行病比泽西岛（Jersey）早很多：这两个岛的饲料来自两个不同的供给商，一个用了很多肉和骨头，另一个用得比较少。到了1988年7月，反刍动物饲料禁令就已成了法律。很难想象专家和政府部门动作还能比这更快，除了事后诸葛亮的时候。到了1988年8月，索思伍德（Southwood）委员会的建议也被执行了，所有患有海绵状脑病的牛都被杀掉且不得再进入食物链。这时，发生了第一个大错：政府决定只给农民牛价值的50％作为补偿，这就给了农民一个动力去漠视疾病的征兆。但是，即使这个错误的后果也不像人们所想的那么严重：当补偿金额提高之后，汇报上来的病牛数字也没有大幅增加。
特别规定的牛内脏禁令在一年之后也生效了，它禁止成年牛的脑子进入人类的食物，只在1990年才把被禁的牛脑扩展到小牛。这也许会发生得更早。但是，因为知道除非是直接往脑子里注射，其他物种很难染上羊瘙痒症，这样的措施在当时显得过于谨慎了。已经证明了通过食物是不可能让猴子染上人类蛋白侵染子疾病的，除非剂量特别大，而从牛到人的跳跃比从人到猴子的跳跃大得多。（人们的估计是，与通过食物吸收相比，向大脑里注射会把得病的危险提高1亿倍。）在那个时期，如果谁说食用牛肉不安全，那就会成为最大的不负责任。
就科学家们所关心的来说，不同物种之间口腔传播的危险确实小得几乎不存在：如此之小，以至于在实验里如果不用几万只、几十万只动物就一个病例都得不到。但是这就是问题所在：这样的一个实验正在5000万只名字叫做英国人的“实验动物”上进行。在这样大的一个样本里，不可避免地会出现几个病例。对于政治家来说，安全是个绝对的概念，不是相对的。他们想看到的不是个别人患病，而是没有一个人患病。另外，牛海绵状脑病像它以前的所有蛋白侵染子疾病一样，被证明在让人吃惊这一点非同一般。猫因为吃了牛所吃的同样的有肉有骨头的饲料，也染上了病——从那时到现在，70只以上的家猫、三只猎豹、一只美洲豹、一只美洲斑豹，甚至一只老虎都因牛海绵状脑病死了。但是还没有出现过得了牛海绵状脑病的狗。人类会像狗那样有抵抗力还是会像猫科动物那样脆弱？
到了1992年，牛的问题被有效地解决了，尽管流行病的高峰在那之后才出现，因为在受感染和出现症状之间有五年的潜伏期。1992年之后出生的牛很少有患牛海绵状脑病或有可能患上的。但是，人类的歇斯底里才刚刚开始。至此，政治家们所做的决定开始稳步地变得越来越愚蠢。感谢那个内脏的禁令，它使得食用牛肉比最近十年来的任何时候都更安全，但是也就是在那个时候人们开始拒食牛肉。
在1996年3月，政府宣布，确有十个人死于蛋白侵染子疾病的一种，看起来很像是在那段危险的时期通过牛肉传染上的：它的一些症状与牛海绵状脑病相似，以前没有见过。公开的警告，加上媒体心甘情愿地煽风点火，就成了——很短暂的——极端。认为只在英国就会有几百万人死亡的狂想式预言也被大家认真对待。把牛变成了吃人兽这样的蠢事被广泛地描述成支持用有机肥料种田的证据。出现了很多阴谋理论：这个病由杀虫剂引起；科学家的嘴都被政客们封住了；真相被隐瞒了；对饲料业的管理规则被取消才是问题的原因；法国、爱尔兰、德国和其他国家也在封锁同样严重的流行病的消息。
政府感到它必须做出反应，要出台一个更没用的禁令，不许食用两岁半以上任何年龄的牛：这个禁令更煽起了公众的警惕，摧毁了整个一个行业——把整个系统用那些命运已被注定的牛给堵死了。那一年的晚些时候，在欧洲政客们的坚持下，政府下了命令，“有选择地杀死”另外10万头牛，尽管明知这是一个会进一步疏远农民和消费者的没有意义的姿态。在马跑了之后它都不再把圈门关上了，它要在圈外面杀一只羊来做祭祀。不出所料，这个新的杀牛举动甚至没能取得让欧盟解除它禁止进口所有英国牛肉禁令效果，这个禁令其实主要是出于欧洲自身的经济利益。但是比这更糟的是接下来在1997年对带骨头的牛肉的禁止。人人皆知带骨牛肉的危险是微乎其微的——最多导致每四年有一例CJD。政府对于危险所采取的措施如此之集权化，尽管危险性比遭雷击还小，农业大臣也不准备让大家自己去做决定。事实上，可以预料到，政府对危险采取了这样一种荒谬的态度，它逼得治下的人们采取了更有危险的行动。在有些圈子里几乎出现了一种逆反心理。我就发现，在禁令即将生效的时候，我受邀请去吃红烧牛尾的次数比以前任何时候都多。
在1996年一整年里，英国做好了迎接一场人类海绵状脑病流行的准备，但是从3月到年底只有6个人死于这种病。患病数字远远没有增加，相反似乎保持稳定甚至减少了。当我写这本书的时候，有多少人会死于新类型的CJD仍然不清楚。这个数字慢慢升到了50以上。每一个病例都是无法想象的家庭悲剧，但还算不上流行病。一开始，调查显示这个新类型CJD的受害人都是在危险的年头里特别热衷于吃肉的人，尽管受害者之一在几年以前当了素食者。但是这是一个幻象：当科学家们向那些被认为是死于CJD的病人（但是死后检查却表明他们是死于其他原因）的亲属询问死者生前的习惯时，他们发现了同样的食肉倾向：死者家属所讲述的记忆，心理上的多于实际的。
受害者们一个共同的特点是他们几乎都属于同一种基因型——在第129个词上是双份的甲硫氨酸。也许，人数更多的杂合子与缬氨酸纯合子会被证明只不过是有更长的潜伏期：通过大脑内注射而传给猴子的牛海绵状脑病就比其他蛋白侵染子疾病有长得多的潜伏期。另一方面，因为绝大多数人类通过牛肉得到的传染都应该发生在1988年底以前，十年的时间已经是牛的平均潜伏期的两倍了，也许，物种之间的界限与在动物实验里看到的一样高，而流行病最坏的时候已经过去了。也可能新类型的CJD跟吃牛肉没有关系。很多人现在相信，有一种可能是从牛肉制品中得到的人体疫苗或其他医药制品给我们的危险更大，而这种可能性在80年代晚期被权威机构有点太过轻率地否定了。
CJD曾经杀死过一辈子都吃素、从来没有动过手术、从来没有离开过英国、从来没有在农场或屠宰场干过活的人。蛋白侵染子最后的也是最大的一个神秘之处就是甚至在今天——当CJD的各种形式通过各种已知途径传播，包括吃人的习俗、手术、激素注射，吃牛肉也有可能——85％的CJD病例是“零星”的，意思是说，在目前它们无法用任何理由解释，只能说是偶然。这冒犯了我们的决定论，在这个理论里所有疾病都要有个病因，但是，我们并不生活在一个完全由决定论控制的世界。也许CJD就是以每100万人中有一例的概率自发地出现。
蛋白侵染子让我们因自己的无知而感到卑微。我们没有想到存在一种不使用DNA的自我复制——根本就没有用数码信息。我们没有想象到有一种疾病有着如此深奥的秘密，从如此不可能的地方出现，被证明是如此致命。我们仍然不能完全理解一个多肽的折叠怎么就能导致这么大的混乱，或者蛋白链组成上的一个微不足道的改变怎么就能够有这么复杂的后果。正如两位蛋白侵染子专家所写：“个人的与家庭的悲剧、民族的灾难与经济的灾难，都可以追溯到一个小小的分子淘气的错误折叠。”
第二十一号染色体优化人种论
（优化人种论，eugemcs，指把“有害”基因从人类基因组中淘汰掉，往往伴随着对某一部分人的歧视）
我不知道有比人民本身更为安全的社会力量的保管处，如果我们认为他们所受的启蒙不足以使他们以健康的判断力来行使他们的控制权，补偿的办法不是把权力从他们那里拿走，而是为他们作判断提供信息。 ——托马斯•杰斐逊
第二十一号染色体是人体里最小的染色体。因此，它应该被叫做第二十二号染色体，但是，叫了那个名字的染色体直到最近还被认为是更小的，这些名字现在已经固定了。也许因为第二十一号染色体是最小的染色体，可能有着最少的基因，它是惟——条能够在一个健康人体内有三份而不是两份的染色体。在所有其他情况下，有一整条多余的染色体会把人类基因组的平衡打乱得使身体根本无法正常发育。偶尔有儿童在出生时有一条多余的第十三号或第十八号染色体，但是他们最多活几天。出生时有一条多余的第二十一号染色体的儿童很健康、明显地很快乐，也注定能够活很多年。但是他们不能够被认为是——用那个带点轻蔑的词说——“正常”他们有唐氏综合症。他们的外表特征——矮小的身材、胖胖的身体、窄眼睛、愉快的脸-看就很明显。同样明显的是，他们头脑迟钝、性情温和、衰老得快，常常患上某种形式的早老性痴呆症，在40岁之前死去。
唐氏综合症的婴儿通常有大龄母亲。随着母亲年龄的增加，生出一个唐氏综合症婴儿的机会迅速呈指数增长，从20岁时每2300个婴儿里有一个到40岁时每100个里就有一个。完全是出于这个原因，唐氏综合症的胚胎是基因筛选的主要受害者，或者说他们的母亲是基因筛选的主要使用者。在大多数国家里，现在为高龄母亲提供羊膜穿刺——或者是强制实行——来检查胚胎是否带有一条多余的染色体。如果是，母亲就会被建议流产，甚至被骗做了流产。给出的理由是尽管这些孩子有着愉快的举止，但大多数家长不希望成为唐氏综合症孩子的父母。如果你持有某种观点，
你会把这看成是科学的良性用途的一个体现，它奇迹般地制止了那些身有残酷疾患之人的出生，又没有给谁带来痛苦。如果你持有另一种观点，你会把这看成是出于可疑的追求人类完美与对残疾人的不尊重、由政府公开鼓励的对神圣的人类生命进行的谋杀。你看，尽管50多年前纳粹的暴行使人们看到优化人种的做法的荒唐而对其失去了信任，但是在实际生活中它仍然在进行。
这一章是关于遗传学历史上的阴暗面的，关于遗传学家庭里的“黑羊”——以基因纯洁性的名义而进行的谋杀、绝育和流产。
优化人种论之父——弗兰西斯•高尔顿——在很多方面都与他的第一代表兄查尔斯•达尔文正相反。达尔文有条理、有耐心，288害羞，很传统，高尔顿却是知识的浅薄涉猎者，在性心理上一团糟，还爱炫耀。他也很聪明，在南部非洲探险过，研究过孪生子，搜集过统计学资料，幻想过乌托邦。今天，他的名声几乎与他的表兄一样大，只不过他的名声更像是臭名昭著而不是声名显赫。达尔文主义总有被变成政治信条的危险，高尔顿就这么做了。哲学家赫伯特•斯宾塞（Herbert Spencer）热情地拥抱了“适者生存”这个观念，并论述说它支持了经济学中的自由资本主义和维多利亚时代社会中的个人主义：他称之为社会达尔文主义。高尔顿的见解更缺乏诗意一些。如果像达尔文阐述的那样，物种被系统化的有选择的繁殖而改变，就像牛和信鸽那样，那么人类也可以通过这样的繁殖来改进自己。在某种意义上说，高尔顿是在求助于一个比达尔文主义要早的传统：18世纪繁殖牛的传统和比这更早的养殖各种苹果和玉米的传统。他叫嚷的是：让我们像改进了其他物种那样地改进我们自己这个物种吧。让我们只用人类最好的样本而不是最差的来传宗接代。在1885年，他发明了“优化人种”这个词来指称这样的生育方式。
但是，“我们”是谁？在斯宾塞的个人主义世界里，它确确实实是我们每一个人：在这里，优化人种的含义是每一个人都努力挑选一个优秀的配偶——脑子好用身体健康的人。这与选择结婚对象时比较挑剔也没有什么不同一我们已经这样做了。但是，在高尔顿的世界里，“我们”有了一个更加“集体化”的含义。高尔顿的第一个也是最有影响的一个跟随者卡尔•皮尔逊_{（Karl Pearson，统计学家）}，是个激进的社会主义乌托邦派，也是一个优秀的统计学家。被德国不断发展的经济实力所吸引又对其感到畏惧，他把优化人种论变成了一种军国主义。必须优化人种的不是个体，而是国家。只有在公民中实行有选择的生育，英国才能够领先于它在欧洲大陆上的竞争对手。在谁能够生育谁不能够生育上国家必须有发言权。刚刚诞生的时候，优化人种论不是一门政治科学，它是以科学为借口的政治信条。
到了1900年，优化人种论抓住了普通民众的想象。“优基因”这个名字突然成了时尚，平空冒出了公众对于有计划地生育的兴趣，同时，优化人种学会在英国各处都冒了出来。皮尔逊写信给高尔顿说：“如果孩子不健康，我听到大多数的中产阶级太太会说：‘噢，但是那不是一个优化人种的婚姻！波尔战争（1899到1902年英国军队与波尔人（从17世纪起居住在非洲南部并融入当地的荷兰农民的后代）在非洲南部进行的战争，英国的目的是掌握对该地区的控制。此次战争中，英国军队遇到了出乎预料的打击，这也是英国殖民思维变弱的开始）中军队征招来的战士素质非常差，以至于它在刺激了关于福利的争论的同时，也刺激了关于更好地生育的争论。
相似的事情在德国也发生了，一种混合了弗雷德里克•尼采（Friedrich Nietzsche）的英雄哲学与恩斯特•海克尔的强调人的生物命运的学派，产生了一种激情，希望进化上的进步与经济和社会的进步同时发生。独裁哲学能如此容易地吸引人，意味着在德国，比在英国更甚，生物学与民族主义交织在一起了。但是在那时候它还仅仅是意识形态，还没有被付诸实施。
到此，还没有什么危害。但是，重点迅速从鼓励最优秀的人以优化人种的名义生育转移到了阻止最“差”的人生育，以免把基因带坏。“最差的”很快就成了“心智虚弱”的意思，它包括了酗酒者、患有癫痫病的人、罪犯，以及智力低下者。在美国尤其如此。在1904年，高尔顿和皮尔逊的一个崇拜者查尔斯•达文波特（Charles Davenport）劝动了安德鲁•卡内基（Andrew Carnegie）（卡内基是以铁路和钢铁起家的美国实业家、慈善家，出资建立过很多研究机构），为自己建立了冷泉港实验室，专门研究优化人种论。达文波特是个顽固保守、精力无穷的人，他更关心的是怎样制止劣化人种的生育，而不是怎样鼓励优化人种的生育。他的“科学”，至少是过于简单化的。例如他曾说，既然孟德尔学说已经证明了遗传的颗粒结构，美国人的“大熔炉”思维就应该退休了；他还提出过海军的家庭可能有热爱海洋的基因。但是在政治上，达文波特既有技巧又有影响力。亨利•戈达德有一本书，是关于一个神秘的、智力有缺陷的、名字叫做卡里卡克（Kalli kak）的家庭的。在这本书里他强烈地论证了心智虚弱是有遗传的，而达文波特就从这本书里得到了帮助。达文波特和他的同盟者们逐渐说服了美国政界，让他们认为美国人的“质量”正处于极度危险之中。西奥多•罗斯福_{（Theodore Roosevelt，美国第26任总统，第32任总统富兰克林•罗斯福是他的本族侄子）}说:“总有一天我们会意识到，正确的类型中的优秀公民最主要的责任，不能逃避的责任，这个法案是在他或她的身后给这世界留下他们的骨血。”错误类型的人就不必申请了。
美国对于优化人种论的热情多是来自反对移民的感情。在那个时候，东欧与南欧迅速地向美国移民，很容易就会掀起疑神疑鬼的情绪，认为美国国内“更好”的盎格鲁一萨克逊人种正在被290稀释。支持优化人种的观点为那些出于传统的种族主义而希望控制移民的人提供了方便的掩饰。1924年的移民限制法案就是优化人种运动的直接结果。在以后的20年间，它把很多绝望的欧洲移民困在故国，推入了一个更加恶劣的命运，因为它拒绝给这些人提供一个在美国的新家。它在法律文书里呆了40年，没有得到修正。
对于优化人种论的支持者们，限制移民可不是他们在法律上的惟一胜利。到了1911年，有六个州已经有了记录在案的法律，允许对心智不健康的人实行强制绝育。6年之后，又有9个州加入了他们的行列。理由是这样的：如果一个州可以处决罪犯，它当然可以剥夺人的生育权（好像头脑天真跟犯罪行为是同样的东西）。“在这些个人自由，或者是个体权利的例子中，……我们要谈的是登峰造极的愚蠢。这样的个体……没有权利生育像他们那样的人。”一个名叫W•J•罗宾逊（Robinson）的美国医生写道。
最初，最高法院否决了很多绝育方面的法律，但是在1927年，它的立场改变了。在巴克控告贝尔（Buckvs Bell）一案中，最高法院判决，弗吉尼亚州政府可以给凯瑞·巴克（Carrie Buck）做绝育手术。巴克是一个17岁的女孩，居住在林池堡一个癫痫病人和弱智者的群落里，和她的妈妈爱玛以及女儿维维安住在一起。在进行了一次仓促草率的检查之后，只有7个月大[!]的维维安被宣布是个白痴，于是凯瑞被命令去做绝育手术。法官奥利弗•温代尔•霍姆斯（Oliver Wendell Holmes）在判决里有一句出了名的话：“三代白痴已经够了。”维维安幼年就死去了，（维维安在7岁时因病死去。她读了一年多小学，成绩中等）但是凯瑞活到了较大的年龄，是一个值得尊敬的女人，智力中等，空闲时间喜欢玩填字游戏。她的妹妹多瑞丝也被做了绝育手术，她试了很多年想要怀个孩子，最后才意识到，在没有征得她同意的情况下别人对她做了什么。直到70年代，弗吉尼亚州还在继续给那些有智力障碍的人做绝育手术。美国——个人自由的堡垒——按照1910年到1935年间通过的30多个州和联邦的法律，给十多万人做了绝育手术，理由是这些人“弱智”。
但是，尽管美国是个先锋，其他国家却跟得很紧。瑞典给6万人做了绝育，加拿大、挪威、芬兰、爱沙尼亚和冰岛都把强制绝育放入了自己的法典，并付诸实施。最臭名昭著的是德国，先是给40万人做了绝育，后来又杀死了其中的很多人。在第二次世界大战期间的18个月内，有7万已经被做过绝育手术的德国精神病人被用毒气杀死，为的是腾出病床来给受伤的战士用。
但是，英国从来没有通过一个优化人种的法律，在新教工业化国家里这几乎是惟一的。它从来没有通过一个法律允许政府干涉个人的生育权利。（注意此处的说法与后面的说法的区别）具体地说，英国从来没有过一个法律制止弱智人结婚，也从来没有一个英国法律允许政府以某人弱智为理由对其实行强制绝育。（这并不是要否认，医生和医院都有过连蒙带骗给病人做了绝育的行为，但是这些属于个人行为。）
英国并没有什么特殊之处；在罗马天主教堂影响比较大的国家都没有优化人种的法律。荷兰人就避免了通过类似法律。苏联更关心迫害和杀掉聪明人而不是无趣的人，从来没有这样的法律条文。但是，英国之所以突出，是因为20世纪前40年优化人种学与优化人种的宣传很多——事实上，大部分一都来自英国。与其去问为什么那么多国家都跟从了这样残忍的行为，回过头来问一问这样一个问题会给人以启发：为什么英国抵挡住了这样做的诱惑？功劳应该给谁？
功劳不是科学家的。科学家们在今天喜欢告诉自己，优化人种学一直是被看成伪科学并被真正的科学家所不屑的，特别是在孟德尔的主张被重新发现之后（它揭示了比明显的突变多得多的隐性突变的存在），但是，在有文字的记录里，这样的说法却没有什么证据。大多数科学家都很乐意接受在一个新的技术官僚体系中被尊为专家的奉承。他们一直在催促政府采取行动。（在德国，学术界一半以上的生物学家加入了纳粹党——比任何其他专业人员比例都高——而且没有一个人批评优化人种论。)
一个说明问题的例子是罗纳德•费希尔爵士，又是一个现代统计学的奠基人（尽管高尔顿、皮尔逊和费希尔是伟大的统计学家，没有人就此认为统计学与遗传学一样危险）。费希尔是个真正的孟德尔主义者，不过他也是优化人种学会的副主席。他沉迷于被他自己称做是从高等阶级向穷人的“生育事件的重新分配”：穷人比富人生孩子更多这样一个事实。即使后来优化人种论的批判者，例如朱利安•赫胥黎（Julian Huxley）和J•B•S•霍尔丹，在1920年以前也是优化人种的支持者。他们抱怨的不是优化人种的原则，而是优化人种政策在美国实行过程中的粗鲁和有偏向性。
社会主义者在制止优化人种论方面也没有功劳。尽管工党在30年代是反对优化人种的，在那之前社会主义运动总的来说给优化人种论提供了思想武器。你得使劲挖掘才能在英国有名的社会主义者中找到一个在20世纪的前30年对优化人种论表示过哪怕是相当模糊的一点反对。要在那个时候的费边社人物中找到支持优化人种的言论却超乎寻常的容易。H*G•韦尔斯（H*G.Wells）、J.M. 凯恩斯（J*M•Keynes）、乔治•伯纳德•萧（George Bernard Shaw）、海弗洛克•埃利斯（Havelock Ellis）、哈罗德•拉斯基（Harold Laski）、西德尼和贝亚翠丝•韦伯（Sidneyand Beatrice Webb ）_{（H•G•韦尔斯是小说家，J•M•凯恩斯是经济学家，乔治•伯纳德•萧是作家萧伯纳，海弗洛克•埃利斯是性学家，哈罗德•拉斯基是政治学家，西德尼和贝亚翠丝•韦伯都是社会改革者）}一都在关于迫切需要让蠢人和残疾人停止生育的方面说过很可怕的话。萧伯纳的剧本《人与超人》里的一个角色说：“作为懦弱者，我们用慈善的名义打败自然选择：作为懒汉，我们用体贴和道德的名义忽视人工选择。”
H•G•韦尔斯的作品尤其充满了有滋有味的话：“就像人们带有的致病微生物，或者一个人在墙壁很薄的房间里发出的噪声一样，人们带到这世界上来的孩子们也不仅仅属于父母自己”，或者是：“密密麻麻的黑人、棕色人、肮脏的白人以及黄种人……都必须走开。”或者：“已经变得明显，人类群体从总体上看，要比他们所拥有的未来低劣……给他们平等就是把自己降到他们的水平，保护和珍视他们则会被他们的多产所淹没。”他又安慰人地加上一句：“所有这样的杀戮都要先施麻醉剂。”（事实不是这样。）
社会主义者们有着对计划的信心，准备好了把国家权力置于个人之上，他们是优化人种理论的天然接受者。生育也到了国有293化的时候了。优化人种论在费边社皮尔森的朋友们中间首先扎下根来成了一种受欢迎的论调，优化人种论是他们的社会主义磨坊里的麦子。优化人种论是进步的哲学，又呼吁了国家的作用。
很快，保守派和自由派都同样地有了激情。前总理阿瑟•鲍尔弗（Arthur Balfour）主持了1912年在伦敦召开的第一届世界优化人种大会，赞助会议的副主席们包括最高法院的大法官和温斯顿•丘吉尔（Winston Churchill）。牛津联合会（世界上最著名的辩论社，创建于1823年，活跃至今。常邀请著名人物对重要事件发表演说）在1911年以二比一的比例通过支持优化人种论的原则。像丘吉尔所说：“心智虚弱之人的成倍增加”是“对于一个种族非常危险的事情”。
确切地说，还是有几个孤独的反对声音的。一两个知识分子保持了怀疑态度，在他们当中有希莱尔•贝洛克（ffilaire Belloc）和G•K•切斯特顿（Chesterton）（希莱尔•贝洛克：出生于法国、在英国生活的作家、政治家；G•K •切斯特顿，英国作家、诗人），他们写道：“优化人种论者发现了把硬心肠和软脑子结合起来的方法。”但是，大多数英国人是支持优化人种的法律的，这一点无可置疑。
有两个时刻英国几乎要通过优化人种的法律了：1913年和1934年。在第一次，这样的企图被孤胆反对者逆着传统认识的潮流给挫败了。1904年，政府设立了一个“照顾与控制弱智人”的皇家委员会，由拉德纳（Radnor）伯爵指导。在1908年，当汇报工作的时候，它顽固地坚持“智力低下是遗传”的立场，这一点都不奇怪，因为委员会的很多成员都是收了钱的优化人种论者。最近格里*安德森（Gerry Anderson）在剑桥大学所作的论文里阐述，在那之后有一个时期各个游说组织开展了长期的游说，敦促政府采取行动。内政部接到了来自各郡、各市议会和各教育委员会的几百份决议，敦促通过一个法案限制“不适者”的生育。新的优化人种教育学会对总理进行了“狂轰滥炸”，并与内政大臣开会以推进自己的主张。
在一段时间内，什么也没发生。内政大臣赫伯特•戈莱德斯通（Herbert Gladstone）不为所动。但是，当他在1910年被温斯顿•丘吉尔接替之后，优化人种论终于在内阁的会议桌上有了一个积极的代表。丘吉尔在1909年已经把阿尔弗雷德•特雷德戈尔德（Alfred Tredgold）的一个支持优化人种的演讲以内阁文件的形式散发了。在1910年12月，在内政部就职之后，丘吉尔写信给总理赫伯特•阿斯齐斯（Herbert Asquith），敦促尽快制定优化人种的法律，结束时写道：“我感到，在另一年过去之前，疯狂之流的源泉应该被切断与封住。”他希望那些精神病人的“诅咒随着他们死去”。为了防止对他的意思还有怀疑，威尔弗里德•斯克恩•布伦特（Wilfrid Scawen Blunt）（威尔弗里德•斯克恩•布伦特：19世纪末20世纪初英国作家）写道，丘吉尔那时已经在私下里宣传用X射线和手术的方法给那些精神“不合适”的人做绝育。
1910年与1911年的宪法危机使得丘吉尔没有能够提出自己的提案，然后他就调到了海军部。但是到了1912年，立法的声浪又复活了，保守党的一名高层人物，格寿姆•斯图尔特（Gershom Stewart），在这个问题上提出了自己以个人成员身份的提案，最终强扭了政府的手。1912年，新的内政大臣里吉诺德•麦克纳（Reginold Mc Kenna）有些不情愿地提出了一个政府法律草案：精神缺陷法案。这个法案将会限制弱智者的生育，并惩罚那些与有精神残疾者结婚的人。一个公开的秘密是，一旦具备可行性，这个法案就可以被修改为允许强制绝育。
有一个人应该特别提及，因为他发动了对这个法案的反对：一个激进的自由派议会成员，他的名字如雷贯耳——事实上这也与故事有关——乔赛亚•韦奇伍德（Josiah Wedg wood）。他是多次与达尔文家族联姻的著名的工业家族的后代。查尔斯•达尔文的外祖父、岳父以及一个姐夫（同时也是他妻子的哥哥）都叫乔赛亚•韦奇伍德。议员乔赛亚的职业是海军工程师。在1906年自由派大获全胜的时候他被选入议会，但是后来加入了工党，于1942年进入上议院。[达尔文的儿子伦纳德（Leonard），在那时是优化人种学会的主席。
韦奇伍德非常不喜欢优化人种论。他指责优化人种学会是在试图“把劳动阶层像牛一样繁殖”，他还断言，遗传定律“太没有确定性，无法让人把信心建立在某一个特定学说上，更不要说根据它来立法了”。但是，他的主要反对意见是以个人自由为基础。他对一个给予了政府用强制手段把孩子从自己家中领走的权力的法案很反感，因为其条文规定，警察在接到公众举报某人“心智虚弱”时有责任做出反应。他的动机不是社会公正，而是个人自由：其他保守党的自由派，例如罗伯特•塞西尔（Robert Cecil））爵士，加入了他的行列。他们的共同目标是个人利益与政府的对抗。
真正让韦奇伍德如鲠在喉的条文是，“鉴于整个社会的利益，（心智虚弱的）人被剥夺生育后代的机会是合乎意愿的”。用韦奇伍德的话来说，这是“在所有被提倡过的事情中最令人厌憎的”，而且不是“我们有权期望一个自由派政府所能做到的对于治下人民自由的关切和在个人面对政府时给予个人的保护”。
由于韦奇伍德的攻击的效力，政府收回了这个法案，第二年又以温和得多的形式重新提出。关键的是，这一次它略去了“任何可能被诠释为优化人种论的提法”（用麦克纳的话说），那些限制生育与婚姻的得罪人的条文被去掉了。韦奇伍德仍然反对这一法案，他用了整整两个晚上，靠巧克力支撑着，把200多条补充条款放到桌面上，以继续了自己对草案的攻击。但是，当他的支持者减少到只有四个人的时候，他放弃了，草案被通过，成为了法律。（作者此处的叙述与第299页有矛盾）
韦奇伍德也许认为自己失败了。可以强制执行的对于精神病人的关押成了英国生活的一个特征，并在实际上使他们更不容易生育后代。但是真实情况是他不仅阻止了优化人种的措施被采用，而且他还发出了警告信号给将来任何认为优化人种立法值得考虑的政府。并且，他指出了整个优化人种工程中处于中心位置的漏洞。这个漏洞不是基于错误的科学理论，也不是因为优化人种在实际中不可行，而是它归根结底是对人的压制而且很残酷，因为它要求政府的权力得到保证，凌驾于个人权利之上。
在30年代早期，随着萧条时期失业人数的增加，优化人种论死灰复燃。在英国，人们开始荒唐地把高失业率与贫困怪罪到最初的优化人种论者预言过的种族的退化，优化人种学会的会员数达到了创纪录的水平。就是在那个时候，多数国家通过了优化人种的法律。例如，瑞典在1934年开始具体实施它那强制绝育的法律，德国亦然。
希望英国通过绝育法律的压力已经在一些年里增加了，政府的一个被称为伍德报告的关于精神缺陷的文件帮了忙，这个文件的结论是精神疾病在增加，而原因部分是因为精神缺陷者的高生育率（提交这个报告的委员会小心地定义了三类精神缺陷：白痴、弱智和“心智虚弱”）。但是，当一个工党议员以私人名义递交给下议院的优化人种提案被拒之后，向政府施压的优化人种组织改变了策略，把它们的注意力转向社会服务部门。卫生部被说服了，聘请了一个委员会，在劳伦斯•布罗克（Lawrence Brock）爵士领导下分析为精神缺陷者绝育的提议。
布罗克委员会虽然出自于行政系统，但从一开始就有派性。据一位现代历史学家说，它的大多数成员“一点都不愿意去不带感情地检验那些相互矛盾和下不了结论的证据”。这个委员会接受了精神缺陷来自遗传的观点，忽略了与此观点不符的证据，“跟从了”（用它自己的原话）那些支持此观点的证据。它接受了精神缺陷者生育多这样一个观点，全然不顾只凭已有证据还不足以下结论，它只是为了便于满足反对者才“拒绝”了强制绝育一它轻描淡写地放过了一个问题，即怎样从精神有缺陷的人那里得到绝育许可。在1931年出版的一本生物学普及读物里，有一句引用的话道出了游戏内幕：“可以通过贿赂或其他说服的方法使很多这样的低等人接受自愿绝育。”
布罗克报告是彻头彻尾的宣传，粉饰得却像是一个不带个人偏见的专家评估。就像在最近被指出的，在制造一个由“专家”们一致同意并需要采取紧急措施的人工合成的危机时，它所使用的方法为20世纪后期国际上社会服务人士们在全球变暖问题（全球变暖是20世纪环保人士最关注的现象之一。但是，不少科学家、经济学家与其他专家认为全球变暖现象没有那么严重，有一些人士指责一些环保人士为了捍卫自己的主张而夸大事实，对与自己观点不符的证据视而不见）上的行为开了一个先例。
这个报告的目的是要引出一个绝育法案，但是这样的法案却一直没有见天日。这一次，主要原因倒不是有一个像韦奇伍德那样的坚定的反对者，而是因为全社会的意见已经有所不同。很多科学家改变了自己的想法，引人注目的是J•B•S•霍尔丹。原因部分是因为通过玛格丽特•米德等人与心理学中的行为学派，用环境解释人类本性的说法开始为公众所知，影响也与日倶增。工党在那时是坚定地反对优化人种的，它把这看成是劳动者的一场阶级斗争。在一些圈子里，天主教会的反对也很有影响。
让人吃惊的是，直到1938年，才有报告从德国渗透过来，说明强制绝育在现实里意味着什么。布罗克委员会曾经不够明智地赞赏过纳粹的绝育法律，这样的法律是在1934年一月开始实行的。在1938年事情变得清楚了，这样一个法律是无法容忍的对个人自由的侵犯，也是迫害别人的借口。在英国，良好的判断占了上风。
这一段优化人种论的简短历史让我得到了一个不可动摇的结论。优化人种论的错误不在于它背后的科学，而在于强制的方法。优化人种与任何其他把社会利益置于个人权利之上的计划并无不同。它是人道上的罪行，不是科学上的罪行。毫无疑问，优化人种的生育方法会在人类中“成功”，就像它在狗和奶牛那里都成功了一样。通过有选择的生育是有可能来减少精神疾病的发生率、提高人类的健康的。但是，也没有什么疑问，这只能通过漫长的过程来完成，它的代价——残酷、不公正与对人的压制——无比巨大。卡尔•皮尔森有一次在回答韦奇伍德时说：“社会的就是正确的，除此之外没有其他定义。”这个骇人的说法应该成为优化人种论的墓志铭。
是的，当我们在报纸上读到智慧基因、生殖细胞基因疗法、产前检查和筛选的时候，我们无法不从骨子里感觉到优化人种论还没有死。正如我在第六号染色体那一章里讲述的，高尔顿的信念——人的本性大多都有遗传因素一又重新成为了时尚，这一次，它有了更好的——尽管仍然无法下定论的——事实依据。在今天，基因筛选越来越使得父母能够选择他们孩子的基因了。例如，哲学家菲利普•基切尔（Philip Kitcher）（菲利普•基切尔：当代美国哲学家，研究领域主要是科学和数学哲学）就把基因筛选叫做“自由优化人种”：“每一个人都要成为他或她自己的优化人种师，利用现有的基因检测手段去做出他或她认为正确的生育方面的决定。
用这个标准来看，优化人种每天都在全世界的医院里发生，它最最常见的受害者是那些带有一条多余的二十一号染色体的胚胎，这些胚胎原本是会出生为有唐氏综合症的婴儿。如果他们出生，在大多数情况下他们会有一个短暂却很快乐的一生——这是他们先天条件的属性。但是，对于一个依靠母体为生又没有情感的胚胎，不见得要在没有被生出来时就被杀死。现在，我们就像接到紧急通知一样飞快地进入了关于流产的争论：母亲是否有权流产掉一个孩子，或者政府是否有权制止她这样做。这是一个旧的争论了。基因的知识使她有了更多理由去做流产。在胚胎中选择一个具有某种特殊能力的而不是去掉一个缺乏能力的，也可能离我们不远了。选择男孩而把女孩流产掉，已经是羊膜穿刺的不正当使用了，这在印度次大陆上尤其猖獗。
我们拒绝了政府的优化人种政策只是为了落入私人优化人种的陷阱吗？父母们也许会受到各种压力而接受自愿的人种优化，这些压力可能来自医生、来自医疗保险公司、来自社会文化。有很多故事讲述的是直到70年代还有妇女被他们的医生诱骗去做绝育手术，因为她们带有一个遗传病的基因。但是，如果政府要以基因筛选可能被不正当使用为理由把它禁止，它会冒增加世界上的痛苦的危险：把基因筛选列为非法与把它强制实行是同样残忍的。它是一个个人的决定，不是应该由技术官僚来决定的。基切299尔肯定是这样想的：“至于人们想要得到哪些特性、避免哪些特性，这当然是他们自己的事情。”詹姆斯•沃森也这样想：“这些事情应该放得离那些认为自己才最有见识的人远远的……我想看到把关于基因的决定放到用户手里，政府可不是用户。”
尽管还有少数边缘上的科学家担心种族和人类遗传上的退化，大多数科学家现在都认识到了个体的幸福应该比群体的幸福更有优先权。在基因筛选与优化人种论者在他们的巅峰期想要的东西之间，有着巨大的区别，这就在于：基因筛选是要让人以个人的身份用个人的要求来做出个人的选择。优化人种论则是要把这样的决定国有化，让人民不是为了自己而是为了国家来生育。在忙着规定“我们”在基因的新世界里应该允许什么不允许什么的时候，这是一个常常被忽略了的区别。“我们”是谁？是个体，还是有着集体利益的国家和种族？
比较一下现代仍然实行的“优化人种”的例子。在美国，就像我在第十三号染色体那一章里讲过的，犹太人遗传疾病防治委员会为学龄儿童验血，在将来，如果想结婚的双方都带有某一个特定的致病基因的一种形式，委员会就要劝阻。这是一个完全自愿的政策。尽管它被批判成是“优化人种”，但是它没有任何强制的措施。
优化人种历史的很多现代版本都把它表达成是一个科学、尤其是遗传学、不受控制会有多么危险的例子，其实它更多地是一个政府不受控制会有多么危险的例子。
第二十二号染色体自由意志
_{休谟之叉：我们的行为要么是事先已经被决定了的，这样我们就不必为它们负责；要么是偶然事件的结果，这样我们也不必为它们负责。——《牛津哲学词典》}
当这本书的第一稿快要完成的时候，也就是新千年到来之前的几个月，传来了一个重要的消息。在剑桥附近的桑格中心，第二十二号染色体的全部序列已被测完，这是第一条被从头读到尾的人类染色体。在人类自传的第二十二章里的所有1100万个词已经被读出来，并写成了英文：3340万个A、C、G和T。
在靠近第二十二号染色体长臂顶端的地方，有一个大而复杂的基因，充满了重要性，它叫做HFW。它有14个外显子，合在一起拼出了一篇6000多字母长的文字。在转录之后，这篇文字被奇怪的RNA剪接过程剪辑一番，造出一个非常复杂的蛋白质，却只在大脑前额叶的一小部分区域里表达。相当过分地概括一下，这个蛋白质的功能是把自由意志赐予人类。没有HFW，我们就不会有自由意志。
前一段是瞎编的。在第二十二条染色体上没有HFW基因，在其他染色体上也没有。在花了二十二章的篇幅没完没了地讲事实之后，我就是想要骗骗你。我在身为非小说作者而感到的压力下撑不住了，没法再抵御想编些东西出来的诱惑。
但是，“我”是谁？是被一种傻傻的冲动战胜、决定写一段瞎编的情节的那个我吗？我是一个被我的基因组合在一起的生物体。它们事先确定了我的体型，给了我每只手上的五个手指和嘴里的32颗牙，设置了我的语言能力，规定了我的智力能力中的大约一半。当我记忆什么事情的时候，是基因在为我做这件事，把CREB系统打开，把记忆储存起来。它们给我造了一个大脑，把日常工作的职责分派给它。它们还给了我一个明显的印象，就是我能够自由地决定我想怎样行动。简单的自省告诉我，没有什么事是我“帮不了我自己”的。同样，也没有什么告诉我，我必须要做什么事不许做什么事。我能够现在就跳进我的汽车开到爱丁堡去，原因没有别的，就是我想去。我也能够编出一段小说般的文字。我是一个自由的力量，有自由的意志。
自由意志从何而来呢？很清楚地，它不是来自我的基因，否则就不是自由意志了。根据许多人的说法，答案是它来自社会、文化和后天培养。根据这个说法，自由就等于我们的天性中没有被基因决定的那部分，是一种在我们的基因干完了它们那暴君的恶行之后才开的花。我们可以到达我们那基因决定论之上去摘取那神秘的花：自由。
有一类科学书籍作者有着一个悠久的传统，他们说生物学的世界被分成了两派：相信基因决定论的人和相信自由的人。但是，同样是这些作者，他们否定基因决定论，只是因为他们建立了其他形式的生物决定论以代替它——父母影响决定论或社会环境决定论。很奇怪的是有这么多作者捍卫人类的尊严不受基因的统治，却似乎很高兴接受我们的环境的统治。有一次在某出版物上我受到了批评，因为它声称我说过（其实我没有说过）所有行为都是由基因决定的。这个作者进一步给了一个例子以说明行为不是由基因决定的：广为人知的一件事，是虐待儿童的人往往自己在小时候也受过虐待，这就是他们日后行为的原因。他似乎没有意识到，这个说法同样是决定论，而且对于那些已经受了很多苦头的人，这比我说过的任何话都是更缺乏同情、更带偏见的谴责。他是在主张：虐待儿童的人的孩子很可能也会变成虐待儿童的人，他们自己无法改变这一结局。他没有意识到他是在使用双重标准：在用基因解释行为时要求有严格的证明，却轻易就接受了用社会因素来解释行为的说法。
有一种粗糙的划分法：基因是不可更改的编程员，是加尔文主义的命运前定，而环境则是自由意志的家。这是错误的。在塑造性格与能力方面最有力量的环境因素之一是子宫里的总体状况，你无法改变它。正如我在六号染色体那一章中提出的，有些智力能力方面的基因也许是欲望方面的基因，而不是能力方面的基因：它们把它们的拥有者带上一条自愿学习的路。同样的效果也可以由一个会激励人的老师达到。换句话说，天性比起后天培养更有可塑性。
阿道斯·赫胥黎（Aldous Huxley）的《美丽的新世界》写于优化人种的热情达到顶峰的20年代，它呈现给我们的是一个恐怖的世界：整齐划一，强制的控制，没有个人的差异。每个人都温顺、自愿地接受他或她在等级制度里的位置——从?到埃普西隆（从最高到最低）——顺从地工作，并享受社会希望他或她享受的娱乐活动。“美丽的新世界”这个词现在已经有了这样的意义：集权统治与先进的科学手挽手实现的恶劣的社会。
所以，让人吃惊的就是当你读了赫胥黎的书之后你会发现，里面几乎没有任何优化人种的东西。?和埃普西隆不是天生的，而是产生于在人工子宫里的化学调节以及其后的巴甫洛夫式的条件反射训练和洗脑，并在成人之后靠类似于鸦片的药物维持。换句话说，这个糟糕的社会与天性没有一点关系，却全部来自于后天的培养。它是一个环境的地狱，不是基因的地狱。每个人的命运都是注定的，被他们的严格受控的环境，而不是被他们的基因。这确实是生物决定论，但却不是基因决定论。赫胥黎的天才在于他认识到了一个后天培养占主导地位的世界事实上会多么可怕。确实，30年代统治了德国的极端的基因决定论者与同一时期统治了苏联的极端的环境决定论者，谁给人们带来了更大的痛苦，还很难说。我们所知道的只是，两个极端都很恐怖。
幸运的是，我们抵抗洗脑的能力相当辉煌。不管父母和政客们怎么告诉年轻人吸烟对他们有害，他们还是要吸烟。事实上，正是因为成年人给他们宣讲吸烟的危害，才使得吸烟有这么大的吸引力。我们从遗传得到一种仇视权威的倾向，特别是在我们的青少年时期，我们用它来保护我们的本性，提防独裁者、老师、虐待人的后爹后妈以及政府的宣传攻势，
另外，我们现在知道，几乎所有用来显示父母影响塑造我们性格的证据都有缺陷。在虐待儿童与在童年曾经受过虐待中间，确实有一定联系，但是它可以完全用遗传的性格特点来解释。虐待儿童的人，他们的孩子从遗传得到了虐待他们之人的性格特点。研究发现，在把这个因素考虑到之后，后天因素就没有什么决定作用了。例如，虐待孩子者收养的孩子不会成为虐待孩子的人。
惊人的是，同样的现象在你听到过的几乎所有的标准的“社会的阴谋”里都是如此。罪犯生罪犯，离婚的人养出离婚的孩子，问题父母养出问题儿童，肥胖的父母养出肥胖的孩子。朱迪斯·里奇·哈里斯_{（Judith Rich Harris，心理学家）}在她写作心理学课本的漫长的职业生涯中曾经相信了所有这些说法，但是在几年前她突然对此产生了怀疑。她的发现让她感到震惊与不解。因为几乎没有任何实验考虑了遗传的因素，在所有这些研究里没有任何因果关系的证据。对于这样的忽略甚至都没有人提一句：在这些研究里两件事情之间有联系被经常地说成是有因果关系。但是在每一个现象里，从行为遗传学研究里都得到了新的、有力的证据，反对里奇?哈里斯所称的“后天培养假说”。例如，关于孪生子离婚率的研究显示，遗传能够解释离婚率的一半区别，每一个孪生子遇到的独特的社会因素解释了另一半，而他们共同的家庭环境一点作用都没有。换句话说，如果你是成长在一个破裂的家庭，你离婚的可能性并不高于平均水平——除非你的亲生父母离了婚。在丹麦，对于被领养孩子的犯罪纪录的研究显示，他们是否犯罪与亲生父母的犯罪纪录有很大关系，与养父母则只有很小的关系——这很小的关系，在考虑了同伴效应之后也消失了，这个效应就是，这些被领养的孩子是否犯罪与他们的养父母是居住在犯罪率高的街区还是犯罪率低的街区有关。
事实上，现在已经清楚了，孩子对于父母的非遗传影响比父母给孩子的非遗传影响还要大。正如我在X和Y染色体那一章里提出的，传统说法一般认为与孩子疏远的父亲和过分保护的母亲把孩子变成了同性恋。现在认为更可能的是反过来：觉察到儿子对于男性关心的东西不太感兴趣之后，父亲就疏远了儿子，母亲则用过分保护儿子来弥补。同样地，自闭症儿童确实通常有冷淡的母亲；但这是果而不是因：母亲长年以来努力想要与一个自闭症孩子沟通，却没有任何回报，她被搞得精疲力竭，最后终于放弃了。
里奇?哈里斯有系统地摧毁了作为20世纪社会科学基础的教条之一：父母塑造孩子的性格与文化的假说。在西格蒙德·弗洛伊德的心理学、约翰·沃森（John Watson）的行为学派和玛格丽特?米德的人类学中，父母养育的决定作用从来没有被检验过，只是一种假设。但是来自孪生子的研究、来自移民家庭孩子以及被领养孩子的研究现在就在我们面前：人们从他们的基因和他们的同伴那里得到他们的性格，而不是从他们的父母那里。
在70年代，E.O. 威尔逊_{（Wilson，生物学家）}的著作《社会生物学》出版之后，出现了对于遗传影响行为说法的一个猛烈的反击，领头的是威尔逊的哈佛同事，理查德?路文廷和斯蒂芬?杰?古尔德。他们中意的口号被路文廷用做自己一本书的书名，教条得不留任何回旋余地：“不在我们的基因里！”在那个时候，“基因对行为只有很少影响或没有影响”这样的论断仍然只是一个合乎情理的假设。在25年的行为遗传学研究之后，这个观点已经不再成立了。基因确实影响行为。
但是，即使有了这些发现，环境仍然相当重要——在所有行为中也许环境的总和都比基因重要。但是在环境的影响中只有小得惊人的一部分是父母影响的作用。这不是要否认父母有作用或者是孩子没有父母也行。事实上，就像里奇?哈里斯所说，如果这样否认就太荒谬了。父母塑造家庭环境，而一个愉快的家庭环境本身就是好事。你不需要相信快乐决定性格，也会同意拥有快乐是好事。但是儿童似乎不让家庭环境影响他们离开家之后的性格，也不让它影响自己在成年之后生活里的性格。里奇?哈里斯在观察之后做出了一个关键的结论：我们都把自己生活中的公共生活带和私人生活带分开，而且我们并不见得会把在一个带里学到的教训或表现的性格拿到另外一个带里。我们很容易地在两个带之间切换。这样，我们学到了我们同伴的语言（对于移民来说）或口音并在今后的生活中使用，而不是我们父母的。文化自动地从一个儿童的小群体传到另一个，而不是从父母传到子女——举一个例子说，这就是为什么在成年人中推动性别更加平等的运动对于儿童活动场上自愿的按性别分组没有任何影响。每个家长都知道，小孩喜欢模仿同伴而不是家长。心理学与社会学和人类学一样，曾经被那些对遗传因素有着强烈反感的人所主导；但是它再也不能继续这样无知下去了。
我的目的并不是要重复一遍天性与后天培养的辩论，这个题目我在第六号染色体那一章里谈过了。我是想引起人们对这个事实的注意：即使后天培养的假说被证明是正确的，它也不会减少外界因素对行为的决定性。通过强调跟从于同伴对人的性格会有多么大的影响，里奇?哈里斯彻底揭示了环境决定性比遗传决定性更应该引起警觉。它就是洗脑。它远没有给自由意志留下空间，而是减少了空间。一个孩子在不顾父母和兄弟姐妹的压力而表达自己的（部分是遗传的）性格时，他至少是在遵从内在的力量，而不是其他什么人的。
所以，靠着用社会因素来寻找同情并没有躲开决定论。事情的结果要么有原因，要么没有原因。如果我因为童年时期发生的什么事而变得胆小，这并不比一个胆小的基因具有更少的决定性。更大的错误不是把决定性与基因等同起来，而是把决定性当成是不可避免的。《不在我们的基因里》一书的三位作者，史蒂文·罗斯（Steven Rose）、利昂·卡民（Leon Kamin）和理查德?路文廷说：“对于生物决定论者来说，那古老的信条‘你无法改变人的本性’是人类状况的开始也是结束。”但是这个等式——决定论等于宿命论——是没有根据的，这是人们都理解得很清楚的，很难发现这三位批评家到底是在指控哪个假想敌。
决定论等于宿命论之所以是没有根据的，原因如下。假设你生病了，但是你通过推理认为没有必要打电话找医生，因为你要么会痊愈，要么不会，医生是多余的。但是，这就忽略了一个可能性，那就是：你痊愈也许是因为你看了医生，不痊愈也许是因为你没有看医生。随之而来的是，决定论并不决定你可以做什么不可以做什么。决定论是向后去看你现在状况的原因，并不是向前去看它的后果。
但是，这样的神话继续流传：遗传决定性是比环境决定性更不容易改变的命运。就像詹姆斯?沃森所说的：“我们谈论基因疗法，似乎它能够改变一个人的命运，但是你也可以用帮一个人还清债务的方法改变他的命运。”了解遗传知识的惟一目的就是为了（主要利用非遗传的方法）干涉、弥补遗传的缺陷。我已经列举了众多例子，说明基因突变的发现远远没有导致宿命论，而是导致了减轻它们影响的双倍的努力。就像我在六号染色体那一章里提出的那样，当阅读困难症终于被认做是一个真实的也许是遗传的问题之后，家长、老师和政府的反应不是宿命式的。没有人说，因为阅读困难症是遗传病，所以它是不可治愈的，从现在起被诊断为有阅读困难症的孩子都应该被允许当文盲。发生的事情与此正相反：为阅读困难症孩子发展出了有弥补措施的教育方法，效果相当令人叹服。与此类似，我在第十一号染色体那一章里说过，连心理疗法医生都发现，害羞的遗传解释能够帮助它的治疗。通过让害羞的人相信他们的害羞是内在的、“真实”的，能够帮助他们克服这个问题。
生物决定论威胁政治自由的说法也是说不通的。正如山姆·布瑞坦_{（Sam Brittan，经济学家）}曾经说过的：“自由的反面是强制，不是因果决定。”我们珍惜政治自由是因为它允许我们拥有个人作决定的权利，而不是反过来。尽管我们嘴上说我们热爱自由意志，当需要“赌注”的时候我们却抓住决定论想用它来救我们。1994年2月，一个美国人斯蒂芬·莫布利（Stephen Mobley）被判决谋杀了一个比萨饼店的经理约翰·科林斯（John Collins），并被判死刑。他的律师在上诉要求把死刑改判为无期徒刑时，提出遗传作为辩护。他们说，莫布利来自一个几代都出骗子和罪犯的家庭。也许他杀了科林斯是因为他的基因让他这么干的。“他”对此没有责任，他只是由遗传决定的一个自动化机器。
莫布利愉快地放弃了他拥有自由意志的想法，他希望别人相信他没有自由意志。每个用“精神疯狂”或“应负责任应该减少”来为自己辩护的罪犯都是这样希望的。每一个因嫉妒而杀死了自己不忠的配偶的人也是这样希望的，他们为自己辩护的理由是“短暂疯狂”或“正当的愤怒”。每一个大亨在被指控造假欺骗持股者的时候也是这样希望的，他们的借口是“早老性痴呆症”。事实上，每一个孩子，当他在游戏场上说，是他的朋友让他干的，他也是这样希望的。我们中的每一个人，如果在心理医生一点隐晦的暗示下就心甘情愿同意我们现在的不快乐都应该怪我们的父母，也是这样希望的。一个把高犯罪率归罪到社区环境上的政客也是这样希望的。当一个经济学家肯定地说消费者追求的是商品功能的极值时，他也是这样希望的。当一个传记作家试图解释他书中人物的性格是怎样被具有改变人的力量的体验而塑造的时候，他也是这样希望的。每一个去算命的人都是这样希望的。在每一个例子里都有一种自愿、快乐和感激的对于决定论的拥抱。对于自由意志我们远远不是热爱，我们似乎是一个只要有可能就会跳起来把它放弃的物种。
一个人对自己的行为负全责是一个有必要的虚构故事，没有它，法律就站立不稳，但是它照样是一个虚构故事。在某种程度上说，你的行为是出于你的性格，你是要为自己行为负责的；但是，出于性格的行为只不过是在表达那许多决定了性格的因素。大卫·休谟_{（David Hume，18世纪哲学家）}发现自己被这个后来被称为休谟之叉（Hume’s fork）的两难问题难住了。我们的行为要么是被事先决定的，在这种情况下我们不必为它负责；要么我们的行为是随机的，在这种情况下我们也不必为它负责。在每种情况下，常识被否定了，社会秩序无法形成。
基督教已经与这些问题纠缠了两千年，其他宗教的神学家们还要更长。上帝似乎是否认自由意志的，这几乎是定义，否则他就不是万能的了。但是，基督教尤其努力地试图保存自由意志的概念，因为没有它，就不能让人类对自己的行为负责。如果没有责任，罪恶的概念就是一个笑话，而地狱就成了来自于公正的上帝的一个该诅咒的不公正。现代基督教的共识是上帝把自由意志加诸我们，使得我们能够选择让自己的生活充满美德还是罪恶。
几位著名的进化生物学家最近提出，宗教信仰是人类普遍拥有的本能的体现——在某种意义上说，有一组基因是关于信仰上帝或神祇的。（一位神经生物学家甚至声称他在大脑颞叶发现了一个专门的区域，在信仰宗教的人里比在其他人里体积更大更活跃；过分的宗教情结是有些种类的颞叶癫痫的一个特征。）宗教的本能也许仅仅是本能的迷信的一个副产品，这样的迷信假定所有事件，甚至是雷雨，都有一个带有某人意志的原因。这样的迷信在石器时代可以是很有用的。当一块大石头滚下坡几乎把你压扁的时候，如果你听信阴谋理论而认为这是有人把它推下来的，就比认为它只是偶然事件要更安全。我们自己的语言里布满了带有意志的词。我早些时候写道，我的基因建造了我，并把日常责任分配给了我的大脑。我的基因没有做这一类的事，这些事仅仅是发生了。
E.O. 威尔逊在他的《综合知识》一书里甚至提出，道德是我们的本能的成体系的表达，什么是正确的确实是由什么是自然的而衍生出来的，尽管自然主义也有站不住脚的地方。这引出了一个矛盾的结论：信仰上帝或神是自然的，因此是正确的。但是威尔逊本人在成长过程中是一个虔诚的浸礼教徒，现在却是不信其有也不信其无，这样，他就反抗了一个有决定作用的本能。同样的，史蒂文?频克接受了自私基因的理论，却没有要孩子，他告诉他的自私基因“去跳河吧”。
所以，即使是决定论者也可以躲开决定的因素。我们有了一个矛盾。除非我们的行为是随机的，否则它就是事先决定的。如果它是事先决定的，它就不是自由的。但是，我们感到——而且可以被证明——我们是自由的。查尔斯?达尔文把自由意志描述成是一个幻觉，是因为我们没有能力分析我们自己的动机。现代达尔文学派人士——例如罗伯特?特斯里弗——甚至提出，在这样的事情上我们欺骗自己也是一个进化来的对环境的适应。频克曾经把自由意志说成是“使得伦理游戏成为可能的人类的理想化”。作家丽塔?卡特（RitaCarter）说它是事先装在思维里的幻觉。哲学家托尼?英格拉姆（TonyIngram）把自由意志说成是我们假设别人拥有的东西——我们似乎有内在的倾向认为我们周围所有人和所有事物都有自由意志，从不听使唤的外板发动机到带着我们基因的不听话的孩子。
我愿意相信，在解决这个矛盾时我们能够做得更好。还记得吗？在谈论第十号染色体的时候我描述过，组成对于压力的反应的，是对环境变化迅速做出回应的基因，而不是相反。如果基因能够影响行为，行为又能影响基因，那么就有了一个循环的因果关系。在一个循环反馈的系统里，简单的因果过程可以产生非常难以预料的结果。
这种说法出自于混沌理论。我讨厌承认这一点，不过，是物理学家先发明的这个理论。18世纪法国的伟大数学家皮埃尔-西蒙·德·拉普拉斯（Pierre-Simonde La Place）曾经设想过，作为一个优秀的牛顿学派人士，如果他能够知道宇宙中每一个原子的位置和运动，他就能够预言未来。或者说，他猜到了自己不能预知未来，在琢磨为什么不能。时髦的说法是，答案在亚原子水平上，我们现在知道，那里的量子力学事件只是在统计的意义上是可以预测的，世界不是牛顿的桌球组成的。但是这并没有什么帮助，因为牛顿物理学其实在我们日常生活的尺度上是对事件的很精确的描述，没有人认真相信我们的自由意志依赖于海森伯_{（Heisenberg，量子物理学家，所提出的“不确定性原理”认为人们无法同时准确地测量一个粒子的位置与动量，对其一的测量越准确，对另一个的测量就越不准确。后来有人认为，这个原理给人的行为从本质上加上了随机性：因为我们无法在任何一个时刻测量出所有的变量，我们也无法预测人的行为）}不确定性原理的那个概率框架。把原因说得直接一些：今天下午我在决定写这一段的时候，我的大脑没有掷骰子。采取随机的行动与自由地行动根本不是一回事——事实上，正相反。
混沌理论给拉普拉斯提供了一个更好的回答。与量子物理不同，它不依赖几率。数学家所定义的混沌系统是事先决定的而不是随机的。但是这个理论说，即使你了解所有决定这个系统的因素，你可能还是无法预测这个系统的发展轨迹，这是因为不同的因素之间相互作用的结果。即使是因果关系简单的系统也可能有混沌状态的行为。它们这样是因为“自激”性，在这里，一个行动影响下一个的初始状况，所以，很小的结果会成为很大的原因。股票市场指数的走向、未来的天气和海岸线的“分形几何”，都是混沌系统：在每一种情况下，大概的轮廓和事件发展的大体方向是可以预测的，但是精确的细节却不能。我们知道冬天会比夏天冷，但是我们不知道下一个圣诞日是否会下雪。
人类行为也具有这些特点。压力可以改变基因的表达，基因表达又可以影响对压力的反应，如此这般。因此，人类的短期行为是无法预测的，但是长期行为却大致可以。这样，在一天中的任何一个时刻我可以选择不吃饭，我有不吃饭的自由，但是几乎可以肯定，在那一天之内我是要吃饭的。我吃饭的时间可能会由很多因素决定——我的饥饿程度（部分由我的基因决定），天气（由众多的外界因素以混沌的方式决定），或者是另外某人决定问我要不要出去吃午饭（他是一个做事有因果的个体，我无法控制他）。这些基因与外界影响的相互作用使我的行为无法预测，但是它们并非没有决定我的行为。在字词的空隙里，有着自由。
我们永远不可能逃避决定性，但是我们可以在好的决定性与坏的决定性之间做出区别——自由的和不自由的。假设我坐在加州理工学院下条信辅（Shin Shimojo）的实验室里，他此刻正用一根电极戳我的大脑里离前环沟（anteriorcingulate sulcus）很近的地方。因为对于“自愿行为”的控制就是在这个地方，也许他使我做了一个动作，在我看来具有所有的自愿行动的特征。如果问我为什么要动胳膊，我几乎肯定会很确信地回答，那是我自己的决定。下条教授要知道得更清楚（让我赶快加上一句，这是下条向我建议的一个设想的实验，不是真的）。与我的关于自由的幻觉相矛盾的，不是我的动作是被其他因素所决定这一事实，而是因为它是另外某人从外部决定的。
哲学家A.J.艾尔（Ayer）是这样说的：
如果我患上了强迫型精神病，以至于我会站起身来走到房间另外一头去，不管我想不想这样做，或者如果有人强迫我这样做，那么，我就不是在自由地行动。但是如果我现在这样做，我就是在自由行动，仅仅是因为上面说的两种情况不存在。从这个角度来看，我的行动仍然有一个原因这个事实是无关紧要的。
一位研究孪生子的心理学家林登·伊弗斯（Lyndon Eaves）曾经说过类似的观点：
自由是站起来超越环境限制的能力。这个能力是自然选择赋予我们的，因为它具有适应性……如果你要被推着走，你是宁愿被你的环境推着走，还是被你的基因推着走？环境不是你，而基因在某种意义上说就是你。
自由在于表达决定你自己的那些因素，而不是决定别人的那些。“决定”不是区别所在，谁是决定因素的主人才是区别所在。如果自由是我们想要的，那么最好是让来自于我们内部的力量来决定我们，而不是让其他人内部的力量来决定。我们对于克隆人的厌恶有一部分是来自于这样的一个恐惧：我们的独特性要被另外一个人分享了。让基因在它们自己的体内为自己作决定，这样一个执著的信念是我们反对把自由丢给外界因素的最强堡垒。你是否已经开始看出来我为什么要半开玩笑地随便想想一个自由意志基因的想法？一个自由意志基因不是什么自相矛盾的事，因为它会把我们行为的来源放到我们身体之内，其他人拿不着。当然，自由意志不会由一个基因决定，而是由与基因相比无限宏伟、给人激励的力量决定：整个的人类本性，事先建立在我们的基因组里，具有灵活性，又是每个人所特有的。每一个人都有一个独特的与众不同的内在本性。这就是“自我”。

2024-11-02
三星堆

三星堆遗址位于四川省广汉市三星堆镇鸭子河南岸。
根据文物分析得出三星堆文化遗存的年代是距今3600年到距今3000年之间。
目前发现的三星堆遗址近12平方公里，明代挖掘的运河马牧河穿过三星堆的西城墙。遗址三面环墙，北临鸭子河。沿着马牧河，依次分布有祭祀区、宫殿区、居民区、手工业作坊区。河北岸有一片台地，像一轮弯月，是三星堆遗址的核心区域，约3.6平方公里。
1929年当地农民燕道诚家院子旁发现过一个埋葬有400余件玉石器的长方形土坑，为三星堆考古之开始。
青关山1号大房子，。
青关山1号建筑平面结构图：面积超过1000平方米，8个埋葬坑出土的器物与建筑中各功能区分布契合，建筑内通道两侧126个柱洞可以用于安装青铜人像和青铜面具
八号坑神坛：方形基座上为献祭平台，共有13个青铜人像。第一组人像4人，各自面向斜外侧，跪于献祭平台四角。第二组人像4人，可见獠牙，坐在献祭平台各侧边中部的镂空小凳上。第三组人像4人，体型大于其他小人，各自跪在一个小型柱状台基上，4人共同扛起一个由铜杆构成的抬架。第四组人像1人，跪在献祭平台中心的一座山形台基上，并背负一件有盖圆罍

铜兽驮跪坐人顶尊铜像数字复原图
金面罩铜人头像
戴金面罩青铜人头像
戴尖脊帽铜小立人像
铜扭身跪坐人像
青铜眼形器
金面罩
铜太阳形器
铜鸟
铜神树

2024-10-10
玛丽安娜·沃尔夫《普鲁斯特与乌贼》

前言大脑天生不会读
第一部分我们是如何学会阅读和思考的:阅读脑的进化
第1章普鲁斯特与乌贼给我们上的阅读思维课
阅读——智力的“圣殿
阅读的认知过程
阅读脑的设计原则品
人类的大脑如何学会阅读
个体的大脑如何学会阅读
大脑无法阅读的情况
第2章阅读脑与思考的自然史
人类最早的语言
文字的第一次突破:象征符号
文字的第二次突破:楔形文字和象形文字苏美尔人如何教儿童阅读
从苏美尔语到阿卡德语
象形文字的发明率
龙骨、龟甲与绳结:其他早期的奇妙文字
第3章苏格拉底反对的“阅读”是否会妨害人的思考
什么是字母文字
字母文字是否造就了不一样的大脑
苏格拉底的抗议
第二部分阅读如何改变了我们的思维:阅读脑的发展
第4章阅读决定孩子拥有怎样的思维与人生
从听故事到读儿歌
我们还可以为孩子做什么
第5章阅读者的五大进阶(1)
开始阅读之旅
萌芽级阅读者
初级阅读者
解码级阅读者
第6章阅读者的五大进阶(2)
流畅级阅读者
专家级阅读者
第三部分不会读的大脑也有高品质的思维:阅读脑的变奏
第7章阅读脑的补偿机制
盲人摸象般的历史
说读障碍的诸多面貌
世纪之谜
第8章不要错失阅读以外的才能
阅读障碍者的石脑
每个孩子都有自己的潜能
第四部分让大脑有时间来思考:超越阅读脑
第9章网络时代的阅读与思维方式
对阅读进化的反思
对阅读自然史的反思
对阅读障碍的反思:跳出定式思维
致读者:最后的思考
前言：大脑天生不会读
我以研究文字为生:寻找它们隐藏在脑海深处的秘密，探究它们意义与形式的各个层面，然后把这些奥秘教授给年轻人。在本书中，我邀请读者一起思考文字阅读中最深奥的创造特质。我们正加速进入数字时代，在这样一个历史转型期，任何关于智力发展的事都值得我们仔细思量。的确如此，过去从未有哪个时代的研究者能像现在这般深谙阅读过程的繁复之美。通过科学研究，我们越来越了解阅读的益处，然而这些益处似乎又有被新型传播方式取代的危险。审视现况并反思我们需要保留哪些阅读习惯，这将是本书从始至终的讨论主题。
很久以前，埃德蒙·休伊(Edmund Huey)爵士写过一段让人印象深刻的话，他认为真正了解阅读时大脑的运作过程，会是“心理学家最大的成就，因为这将得以描述人类心灵中诸多错综复杂的运作，解开彼此纠结的现象，揭露出整个文明在历史中最了不起的成就”。
在当代进化史与认知神经科学等诸多学科的帮助之下，我们累积的关于“阅读脑”( the reading brain)的知识想必会令休伊震惊。我们知道每-种新型的书写系统都从人类千年的历史中发展而来，需要人类大脑的不同适应方式;我们研究阅读发展的诸多层面，从婴儿时期逐渐深入到专家级阅读;我们发现难以学会阅读的大脑，混杂着阅读障碍的挑战与其他方面的天赋，这转变了我们对阅读的理解。综合起来，这些领域的知识彰显出大脑近乎神奇的能力，它可重组自身结构来学习阅读，并且在这一过程中形成新的想法。
在本书中，我希望引导读者重新思考长久以来被视为理所当然的事情比如儿童自然而然地学会阅读。在我们大脑学习能力的进化中，阅读的行为并不是自然发生的，而且在许多人身上，尤其是儿童，可能会产生奇迹或悲剧性的后果。
构思这本书需要一整套系统的观点，这花了我好几年的时间来准备。我是一个儿童发展与认知神经学的教师，是一位关注语言、阅读与阅读障碍问题的研究者，也是一个热爱文字的人。我是波士顿塔夫茨大学(Tufts University)阅读与语言研究中心的主管，在那里，同事和我一起研究各年龄层的阅读者，特别是阅读障碍者。
我们研究全世界各语系中的阅读障碍，从与英语同源的德语、西班牙语、希腊语与荷兰语等，一直到与英语关系较远的希伯来语、日语与汉语我们知道学不会阅读的儿童要付出多大的代价，不论他们的母语是哪一种，不论他们来自贫困的菲律宾社区、美国原住民保留区，还是富裕的波士顿郊区。我们投入许多精力设计新的治疗方案，并且探讨这些方案在课堂教学和个体大脑中的效果。幸亏有脑成像技术的协助，我们可以真正“看到大脑在阅读时的情况，从而比较治疗前后的差别。
我过去累积的经验、对众多研究项目的理解以及对社会传播模式转型的认识，促使我提笔写下第一本通俗读物。有一点必须在此说明，这本书的许多部分都来自众多学者的研究，但为了顺应通俗读物的写作形式，我不再像学术文献那样–列出注释与参考文献，在这里我真诚地向这些参考文献的写作者表示感谢。
本书首先介绍文字系统的起源与演变，接着讲述个体阅读脑发展的不同阶段，最后揭示未来我们将要面临的机遇与危险。
奇怪的是，作者通常会在前言中将自己成书时的最终想法传达给读者。这本书也不例外。不过与其用我自己的语言，倒不如引用玛里琳·鲁宾逊(Marilynne Robinson)在将她最好的作品《基列家书》(Gilead)送给她小儿子时所说的话:“我以最深沉的希望与信念来写出我所想写的一切。我的想法游移，措词也随之变幻，尝试说出真相。而我可以坦诚地告诉你，这真的很棒。
第一部分我们是如何学会阅读和思考的：阅读脑的进化
文字与音乐乃是人类进化过程的轨迹。——约翰·邓恩
欲了解事物如何运作，最佳途径莫过于了解它的起源。——特伦斯·迪肯
第一章普鲁斯特与乌賊给我们上的阅读思维课
我相信就其本质而言，阅读是一个在全然的孤独之中，仍令人心满意足的沟通奇迹。——马塞尔，普鲁斯特
学习本身包含了对天性的培育。——约瑟夫，勒杜
没有人生来就会阅读，人类发明阅读这项活动也只是几千年前的事情，这项发明使大脑精密的结构重新排列组合，思维得以延伸，进而改变整个人类物种的智力进化过程。阅读是历史上最卓越的发明之一，其结果之一便是让我们有了记录历史的能力。我们的祖先之所以能够发展出这项技能，是因为人类大脑拥有在已知的结构上建立新联结的超凡能力，经验对大脑的塑造使得这一过程成为可能。大脑机能的核心是其可塑性，我们因此才会思考自己是谁，未来又会成为什么样的人。
本书主要讲述大脑如何进行阅读的故事，同时揭开智力进化的奥秘。这个故事不断地在我们眼前更迭，在我们指间流转。由于大脑会持续建立新的联结，这种联结将驱使人类的智力发展朝着崭新且多元的方向前进，于是在接下来的几十年内，我们将见证人类沟通能力的转变。了解阅读对大脑的要求，以及阅读怎样促进我们的思考、感觉、推理及理解他人的能力，在今天看来尤为重要，因为我们的大脑正从“阅读脑”向“数字脑”转变。通过理解阅读的历史演变、儿童获得阅读能力的过程，以及阅读对大脑生物基本架构的重整方式，我们可以发现人类作为智慧物种所具有的神奇性和复杂性。这将会明确地告诉我们，人类的智力进化接下来可能发生什么，以及在创造未来时我们将会面对怎样的选择。
本书包含3个部分的知识：
@人类在早期，即从苏美尔时代到苏格拉底时代，是如何学习阅读的；
@人类生命发展周期中日益复杂的阅读学习方式；
@大脑学不会阅读的原因，包含科学解说及案例介绍。
总的来看，本书积累的有关阅读的知识，既展现了人类作为能阅读和记录的物种所取得的巨大成就，又引导我们注意哪些习惯值得保持。
从历史和进化的视角研究阅读脑，其中的价值还不能一眼看透。但关于怎样去教授阅读过程的本质，它提供了一个既传统又新颖的方法：研究那些能学会阅读的人，也研究那些在阅读方面存在障碍的人。阅读障碍者的大脑系统组织方式有所不同。理解这些通过基因指令代代相传的独特大脑系统，将以意料之外的方式扩充我们的知识，同时也暗示我们，新的探索才刚刚开始。
在本书的3个部分里，都交织着另一个话题：大脑是如何学习新事物的。除了阅读，大脑鲜有重塑自身以学会新智力功能的惊人能力。在人类进化史中一段很长的时间里，大脑中更多的结构和神经回路原本是专门负责视觉和口头语言等更基础的能力的，阅读使大脑在这些结构上建立起新的联结。现在我们知道这样一个事实：每当我们学会一项新的技能，神经元之间便会建立新的联结和通道。计算机科学家们用“开放架构”这一术语来描述该系统：功能非常丰富，可以通过重新排列来适应变化的需求。在人类基因遗传的约束下，大脑为我们展示了一个“开放架构”的完美例子。在此设计模式下，我们生来就有能力适应外部世界的变化，能够超越自然。因为从一开始，我们就注定要有所突破。
因此阅读脑是“双向互动”理论的典型。我们之所以能够学会阅读，仰赖的全是脑部可塑性的设计。当阅读发生时，个体的大脑无论是在生理层面还是智力层面都发生了永久性的变化。例如，在神经元水平上，一个人学习汉语阅读时使用的特殊神经联结模式，和学习英语阅读的神经联结模式是完全不同的。当以汉语为母语的读者首次阅读英文时，他们的大脑会尝试使用基于汉语模式的神经通路。学习阅读汉字的行为塑造了阅读汉语的大脑。
同样，我们如何思考以及思考什么在很大程度上是基于阅读所产生的见解和联想。正如作家约瑟夫·爱泼斯坦所言：“每一个文学家的传记都要详细记录他在何时阅读了什么书籍，因为在某种意义上, ‘我即我所读’。”
阅读——智力的”圣殿
阅读脑的两个维度——个体智力的发展和生物学上的进化，很少被结合起来描述。然而把两者并列来看，我们会发现很多关键和精彩的启示。在这本书里，我将以备受世人推崇的法国著名小说家马塞尔，普鲁斯特为例，与相对而言无比单纯的乌贼作对照，从两种截然不同的角度探索阅读。
阅读脑：不是”专门负责阅读的大脑”，大脑中并没有生来就负责阅读的区域。阅读脑指的是”阅读中的大脑”，它会在学习阅读的过程中不断发展。
普鲁斯特将阅读看做智力的“圣殿”，在那里，人们可以接触到众多永远不能亲临或者不能理解的“另一种现实”，这些“另一种现实”的好处是不需要读者离开舒服的躺椅，就可以感受到每一个新体验，以及由新体验带来的心智的提升。
早在20世纪50年代，科学家们就开始利用中央神经轴突较长、害羞又狡猾的乌贼，来探究神经元之间是如何激活和传递信号的，以及在某些情况下，当神经元出错时，大脑如何进行修复和补偿。当代的认知神经科学家则致力于另一个层面的研究，即大脑中各种各样的认知（或称心智）过程的运作方式。在此研究范畴中，阅读极具典型性，这种文化产物需要从大脑已存在的结构中发展出新元素。阅读时大脑如何工作，出现问题时大脑如何聪明地调整，这些都类似于早期神经科学对乌贼的研究。
在阅读过程的不同维度上，普鲁斯特的“阅读圣殿”与科学家的乌贼研究恰好提供了一种互补的模式。为了更具体地介绍本书的思路，我摘抄了普鲁斯特《论阅读》一书中一段美得令人无法呼吸的文字，请读者以最快的速度阅读：
恐怕不会再有如童年一般充实的岁月……一本喜爱的书陪伴我们度过许多时光。仿佛其他一切皆为了阅读而存在，因此我们将所有打扰阅读的种种，鄙视为对此神圣享受的粗俗妨碍，其中包括：在读到最有趣的片段时，有朋友找我们出去玩游戏、害我们不得不抬起头或更换姿势的恼人蜜蜂及阳光、即便到了黄昏天空由蓝转暗时搁在长椅上碰都没碰的下午茶、到了得回家吃晚餐的时间；遇到这些事时，满脑子只想着待会儿一定要立刻继续未读完的章节。尽管以上说的例子在那时只让我们觉得烦人，但是它们却也深深烙印在甜美的记忆之中（现在想来，其实远较当时深爱的书籍本身更为珍贵）。而若是改天我又重新拾起那时读过的书本浏览，唯一的原因正是对于那些已经逝去的日子，深深缅怀所致；在书本的字里行间，多希望能够再次看见孩童时代陪伴我读书、如今却不复存在的池塘与家园。
首先思考一下，你在阅读上述段落时想到了什么？再试着分析一下在阅读过程中，你是如何以普鲁斯特为起点进行各式联想的，并且另外还做了哪些事？如果你和我一样，普鲁斯特会使你想起长久以来贮藏在脑海中的关于书的记忆：
为躲避兄弟姐妹和朋友的打扰而藏起来读书的秘密地点；简·奥斯汀、夏洛蒂·勃朗特和马克·吐温笔下惊心动魄的时刻；害怕被父母发现而躲在被子里看书时手电筒微弱的光线。
这些构成了普鲁斯特的“阅读圣殿”，也构成了我们的阅读王国。在这里，我们第一次遨游中土世界、小人国和纳尼亚王国；我们第一次感受那些永远不会身临其境的经历：王子和乞丐、恶龙和少女、功夫武士，还有为逃避纳粹士兵躲在阁楼里的犹太小女孩。
传说马基雅维利在阅读某本著作之前，会打扮成作者那个年代的样子，并为自己和作者准备一张双人桌子。由此可见他对作家的才华有多重视，也可能是他与普鲁斯特对于“阅读境界”一事，有着十足的默契。阅读时，我们可以暂时拋下本身拥有的观感，进入另一个个体、另一个年代或另一个文化。
神学家约翰·邓恩用“逾越”这个说法来概括阅读的过程。在这个过程中，阅读使我们试着去扮演、赞同并暂时进人另一个与我们自身截然不同的个体的感观世界。当我们体验到一个骑士是如何思考、一个奴隶是如何感受、一个英雄是如何作为、一个恶棍是怎样忏悔或否认罪行时，我们很难没有任何感想。有时候我们深受鼓舞，有时候倍感悲伤，但无论如何，我们的世界的确变得更加丰富多彩。通过这些感同身受，我们同时理解到思想的普遍性和独特性，我们是个体，但并不孤独。
逾越：约翰·邓恩认为，所谓的“逾越”现象，乃是当代的新宗教。邓恩对这个过程的描述是，”先是过渡到另一种文化的标准，另―种生活方式，另一种宗教……接下来就是所谓¹归返’的过程，带着崭新的洞见归返自己原来的文化、生活方式和宗教”。
当这一时刻发生时，我们便不再受限于自身的思想范畴。因此无论何时，一旦“逾越”发生，个体既有的思想界限即受到质疑或嘲弄，进而一步步地改变。如此一来，延伸的感知会改变对自我的认知，这对孩童来说尤其重要，因为它改变的是对未来自我的想象。
阅读的认知过程
让我们回到之前。当我让你把注意力从本书转到普鲁斯特所写的段落，尽快地阅读并理解这一段落时，为了执行我的要求，你的心智认知系统从事了一系列包含注意力、记忆力、视觉能力、听觉能力和语言能力的活动。
很快，你大脑的专注功能和执行系统开始计划：如何快速阅读并理解这段文字。接着，你的视觉系统加快行动，快速浏览页面，将搜索到的字母外形、单词形式和习惯用语等文字信息传递到等候信息的语言系统。这些系统将包含细微差别的视觉符号和文字蕴含的意义迅速联系起来。在意识几乎无法察觉的那一刻，你高度自动化地调用英语书写体系中的字母读音规则，而这需要动用大量的语言处理能力。（作者主要基于英语文字的角度来分析，但原理是相通的，故保留原表述。书中多处有类似情况。）这就是所谓的“字母原则”，它依赖于大脑的奇特能力：迅速联系和整合所见、所闻、所知。
当你将所有这些规则运用于眼前的文字时，你就迅速激活了相关的语言和理解过程，这一过程运行速度之快至今令研究者惊讶不已。举一个语言领域的例子，当你阅读普鲁斯特所写的这233个单词时，你的语义系统就调出脑海中你所读到的每个单词可能的意思，找出符合上下文的含义整合到这个文本语境中。这个过程的复杂和神奇远超想象。
许多年前，认知科学家戴维，斯威尼发现了这样一个事实：当读到一个简单的单词，如“虫子”_（Bug）的时候，大脑不仅仅是激活了它较常见的意思_{〔一种爬行的六腿生物〕}也激活了使用得较少的其他意思，如间谍、大众汽车和软件漏洞_{（英文中—词有这些延伸意思）}。斯威尼发现大脑不会只为某个单词找到一种简单的意思，而是会激活关于这个单词的大量知识以及与之联系的众多其他单词。这种阅读语义层面的丰富程度依赖于我们之前储存的词汇量，这对儿童的成长发育意义非凡，有时甚至有毁灭性的影响。与那些词汇量和概念比较贫乏的儿童相比，有着丰富词汇的儿童会以一种完全不同的方式阅读文字和进行对话。
试着思考一下斯威尼的发现对于阅读不同的文本意味着什么，从如苏斯博士的幼儿读物《哦，你要去的地方》那样简单的文本，到像詹姆斯·乔伊斯的《尤利西斯》那样充满语义复杂性的文本。那些尚未走出自己狭隘成长环境框架的孩童，无论是在理解譬喻还是文字上的表现，与其他儿童都是截然不同的。我们会将所有储存的知识运用到所读文本之上。
如果将这一发现运用于刚刚所读的普鲁斯特的那段文字，那就意味着，你的执行计划系统指导了一系列活动以确保你领会所读到的内容，并检索出和文本相联系的所有个人信息。你的语法系统需要持续工作以避免你卡在普鲁斯特文本中那些不熟悉的句型上，比如他在谓语前用了很多长分句，并用逗号和分号将它们连在一起。（这里针对的是那段文字的英文原文）为了不致“过目即忘”，你的语义和语法系统需要与你的工作记忆（这种记忆就像一块“认知黑板”，能暂时存储稍后要用的信息）紧密合作。如此一来，当我们在读普鲁斯特特殊语法结构的文字，并将每个单字串连成语义的同时，就能顺利了解全文的整体意义。
当你将全部的语言形式和概念信息串联起来的时候，你就在自己背景知识的基础上产生了自己的推断和假说。如果这时你还读不懂，就需要重读某些部分，并试着找出符合上下文的意义。接着，当你把所有这些视觉的、概念的以及语义的信息和自己的背景知识、推理综合在一起后，便能体会普鲁斯特在书中所描述的境界：“神圣的”阅读乐趣，让多姿多彩的童年岁月永恒不朽。
许多读者在读完普鲁斯特的文字之后，可能会稍加停顿，超越文本，进入另一个境界，任思想驰骋。然而，在解读这个比较具有哲学性的问题之前，让我们再回到生物学层面上，看看阅读行为的表象之下隐藏着什么。所有的人类行为都建立在层层叠加的各种基本活动之上，阅读也不例外。我请牛津大学的神经学科学家兼艺术家凯瑟琳·斯图德利画了一个金字塔图来阐述当我们读到一个单词时，这些不同层级的生物学活动是怎么协同运行的（见图1-1）
图1-1 阅读金字塔
在金字塔的顶端，读到单词“bear”是表面行为，其下是认知层面，包含着那些阅读所需的专注、知觉、概念、语言能力及神经系统的作用。这些让很多心理学家终生研究的认知过程，依赖于有形的神经结构，这些结构由神经元联系而成，并受基因和环境之间互动关系的引导。换言之，所有的人类行为都基于各种认知过程，这些认知过程则基于特定神经结构中快速进行的信息整合。这些神经结构依赖于数十亿的神经元和上千亿的神经联结，神经元的活动则在很大程度上受到基因的控制。为了维持人类各项基本功能的正常运作，神经元需要从基因那里获得指令，在神经结构中形成有效的神经回路或通道。
这座金字塔像一幅三维地图，帮助我们理解视觉等受基因控制的行为是如何产生的。但是它无法解释阅读的神经回路层面，因为在底层没有特殊的阅读基因。阅读与其组成部分（如视觉和语言）相比，没有直接的基因编码可以遗传给下一代。因此，个体大脑在开始学习阅读时，必须经由后天努力重新形成金字塔上面4层所需的神经回路。这使得阅读等文化行为，不能像视觉和口语一样通过基因编码遗传给下一代。
阅读脑的设计原则
那么，首次阅读又是如何发生的呢？法国神经学家斯坦尼斯拉斯·戴哈尼告诉我们：首批发明书写和算术的人类可以通过“神经元再利用”实现这一过程。例如，如果在猴子面前摆放两盘香蕉一个盘子里面放2根香蕉，另外一个盘子里放4根，猴子会直接去抓香蕉多的盘子。通过灵长类动物实验，戴哈尼发现，在猴子行动之前的瞬间，其大脑后皮质的某个区域就被激活了。人类大脑中的相应区域现在负责数学计算过程。
以此类推，戴哈尼与其同事们认为：人类阅读时的认字能力运用到了我们祖先古老的专门用于物体识别的神经回路。更进一步看，我们祖先迅速区别天敌和猎物的能力来源于先天特殊的视觉功能，因此我们认识字母和单词的能力可能源自更深层次的先天能力，是“特殊化后的特殊化”。
如果稍微扩展一下戴哈尼的观点，我们不难发现，阅读脑不仅利用了古老的视觉神经通路，同样也利用了将视觉与概念和语言功能相联系的神经通路。例如：通过脚印的形状能迅速判断出是否有危险；将常见的工具、捕食者或者天敌同脑海中的词汇联结起来。因此当人类需要发展出阅读或计算之类的新能力时，大脑便会自动遵循三项巧妙的设计原则：
@在旧的神经结构中建立新的联结；
@形成功能高度专门化的各个区域，别信息中的不同模式；
@学会从这些区域中自动搜集信息。
这三条建立脑部新功能的原则，正是所有阅读进化、阅读发展与阅读障碍的基础，尽管在不同情况下有不同的表现。
精密的视觉系统为我们提供了最好的例子，证明了大脑是怎样再利用原有的视觉神经回路，并进一步发展出阅读能力的。视觉系统的神经元可以变得高度专门化并且能在已有结构中发展出新的神经回路。这一切使新生儿在呱呱落地时，即拥有了一双可以随时工作的眼睛，毫无疑问，眼睛也是精密设计的完美例子。
（视网膜拓扑围构建：人类出生后，视网膜上的神经元与脑部枕叶中的特定细胞群产生一一对应的关系，即视网膜所看见的每一条直线、斜线等都会激活枕叶高度专门化的区域。）
人类出生后不久，视网膜上的神经兀就开始和脑部枕叶中的特定细胞群产生对应的关系。视觉系统的这一特性被称为“视网膜拓扑图构建”，简单来说就是视网膜所见到的每一条直线、斜线，以及每一个圆形或弧形都会在瞬间激活枕叶高度专门化的区域（见图1-2）。
图2-2 视觉系统
视觉系统的这一特性并不等同于下述情况:我们的祖先克罗马侬人(Cro-Magnon)能够分辨出视野尽头的动物;现代人能够认出400米外汽车的型号;鸟类观察者能够发现燕鸥的身影，而此时其他人什么都没看见。戴哈尼认为，我们祖先大脑内部主要负责物体识别的视觉区域，通过调整内置的识别系统，来破译书面语言中最初的符号和字母。关键在于，为了达到功能调整、专门化或建立新联结等目的，大脑会整合多种遗传功能在视觉区域和负责认知、语言过程的区域之间建立起新的神经回路，这些回路是阅读文字所必需的。
阅读拓展出的第三条原则——神经回路自动化的能力，包含着前两条原则。这使我们在快速浏览过普鲁斯特的文字后，就能马上理解其中蕴涵的意义。然而冰冻三尺，非一日之寒，大脑不可能在一夜之间就发展出自动化的能力。这种能力不存在于一个初级观鸟者的脑中，也不存在于阅读初学者的脑中。儿童要接触上百次字母与单词,而对于阅读困难的儿童来说可能需要接触上千次，才能建立起新的神经回路。
辨识字母、字母样式与单词等神经回路得以自动化，归功于视网膜拓扑图构建、物体识别能力，以及脑组织的另一项重要能力：高度“再现”举例来说，当负责辨别字母和字母样式专门化区域中已学过的信息模式。的神经元网络群“同时激活”时，大脑会构建出信息的视觉表征，以便快速提取信息。
不可思议的是，长期“同时激活”的神经元网络群在眼前没有同样的信息时，仍可重现视觉信息的表征。任教于哈佛大学的认知神经科学家斯蒂芬·科斯林_{(Stephen Kosslyn)}曾经做过一项颇具启发性的实验，实验内容是:在脑部扫描仪的监控下，成年阅读者闭上双眼在脑海中想象不同的字母。
斯蒂芬·科斯林发现，当想象大写字母时，脑部视觉皮质层视觉区域的某些部分被激活了;当想象小写字母时，被激活的则是该区域的另一些位置。仅仅是在脑海中想象不同的字母，就会激活我们视觉皮质层不同的神经元。在专家级阅读者的脑中，当信息通过视网膜进入大脑时，会有一组专门的神经元来处理字母的各种物理特征，并将这些信息自动提供给其他更深层的视觉处理区域。大脑中的视觉自动化功能是分段、分批式的这使得所有的表征以及处理功能_{(不只是视觉)}都变得极为迅速和轻松。
从我们第一次接触字母到成为阅读专家，这之间发生了什么，对科学家来说相当重要，因为它提供了一个观察认知过程有序发展的难得机会视觉系统的各种特征包括:旧有的受基因控制的神经结构、模式识别、针对特定表征形成的专门化的神经元工作组、建立多功能的联结回路，以及达到熟能生巧的程度等。阅读发展中需要涉及的所有其他认知、语言系统的原理都和视觉系统大同小异。在进一步详述之前，首先我想强调一点:在每个读者的内心思想与大脑中所发生的事件之间,存在着惊人(却非巧合的一致性。
就许多方面来说，阅读不仅反映了大脑超越原有设计结构的潜能，同时也反映了读者超越文本或作者所赋予内容的潜能。当你读到普鲁斯特(作者)描述与最爱的书一起度过童年的那段文字时，大脑系统会整合所有的视觉、听觉、语义、句型等信息，而你(读者)则自动将普鲁斯特的文字与你个人的思想及生活体验联系起来。
我当然无法揣摩出你对于普鲁斯特文本的各种联想，但是我可以描述出我的体会。可能是由于刚刚才参观了波士顿美术馆的“莫奈与印象主义”展览，我发现自己很容易将普鲁斯特描写的回忆里童年美好的一天，与莫奈如何画出自己的代表作品《日出·印象》联系。如果他们准备完成一件生活的完美复制品，他们都会运用生活中点滴的信息来综合演绎出更加生动鲜明的印象。如此，画家与小说家都成了埃米莉·狄更生_{(Emily Dickinson)}谜一般的诗中写的:
以迂回的方式道出全部真理。
当埃米莉·狄更生写下这些诗句时，可能从未想过神经回路的问题但是这些句子不仅具有诗的韵味而且又恰巧符合了生理学知识。正如普鲁斯特与莫奈利用间接表达的方法，迫使观众或读者在欣赏作品的过程中投入自身经验，反而能更加直接地体会作品。阅读正是一种神经上与智能上的迂回行为，文字所提供的直接信息与读者产生的间接且不可预期的思绪，都大大地丰富了阅读活动。
阅读活动:阅读正是一种神经上与智能上的迁回行为，文字所提供的直接信息与读者产生的间接且不可预期的思绪，都大大地丰富了阅读活动。
当想到我的孩子已经沉浸在谷歌的世界里时，我开始为阅读的独特魅力担忧。当我们的阅读媒介变成电脑文本，瞬间就能接收到大量信息时建构阅读核心的基本元素会不会发生改变，甚至崩解?换育之，当许多数字化媒介能够快速地提供几乎全部的信息时，我们是否仍能具备充分的时间与动机，以更具有推理性、分析性或批判性的态度，来处理这些信息?
在这种背景下，阅读活动会不会产生戏剧性的变化?虽然基本的视觉语言过程是完全一致的，但是在理解过程中需要更多的时间、检验、分析以及创造的部分，会不会受到忽视?打开超链接所得的额外知识，是否有助于儿童思维的发展?当儿童逐渐掌握执行多重任务的能力以及整合大量信息的能力时，他们是否仍能保存人类的建构式阅读习惯?对于各式各样的阅读模式，我们是否应该开始提供明确的指导方法，以确保孩子能以多元的途径处理信息?
在这些问题中我渐渐地迷失自己，但是阅读也往往会让我们迷失。这么说完全没有贬意，只是想表现阅读的另一项衍生出来的核心特质。达尔文在150年前发现了造物的奥秘，即“无限”形式从“有限”原则演变而来:“肇始于微，进化于斯，无限形体，美好至极。”文字也是如此。无论是生物学上还是心智上，阅读都会促使我们“超越信息的束缚”，创造出无限美好的思想。人类学习、处理以及理解信息的方式正处于历史的转折点但是我们绝不能丢掉阅读的本质特征。
诚然，阅读者与文本之间的关系，在不同文化和历史时期中也不相同。古往今来无数人读过《圣经》这样的神圣书籍，他们可能按具体的、字面的方式解读，也可能是从衍生的、说明性的角度来理解，数以万计的生命可能因此改变。马丁·路德_{(Martin Luther)}将拉丁文版的《圣经》翻译成德文，让普通大众都可以读到，并从自己的角度来理解，这对宗教的历史产生了深远的影响。正如某些历史学家所观察的那样，随着时间的推移文本与读者之间的关系，可视为人类思想史上的一个重要索引。
然而本书的重点还是以生物学和认知神经科学为主，而非人类的文化就此看来，阅读时生成新思想的能力与大脑神经回路的可塑性相辅相史。成，两者共同辅助我们超越文本内容的限制。由此能力生成的丰富的联想力、推理力、领悟力启发人类超越所读，形成新的思维。从这个意义上去理解，阅读不仅反映而且重演了脑部认知能力发展历程中的重大突破。
普鲁斯特对此已经讲过很多，对于能够启发我们思维的阅读能力，他自有一番或许拐弯抹角但却独到的见解：
我们应能由衷体会，读者的智慧始于作者写作之终了。当我们渴望作者能够给予我们答案时，他能给的却只是更多的渴望。而他只有竭尽所能发挥他的艺术，让我们的思绪陷入作品里崇高的美好，他才能在我们身上挑起这些渴望。不过……规则可能意味着我们无法由任何人那里获得真相，我们只能创造真相;这是作者智慧展现的终点，也是读者智慧展现的起点。
普鲁斯特对于阅读衍生性的思考，其实是矛盾的:阅读的目的在于超越作者的想法，产生自主的升华的思想，最终完全脱离文本。从儿童费劲地破解第一个字母开始，阅读经验不再只限于阅读行为本身，而是成了我们转变思想的最好工具，并且，在生理和智力上都将切实“改造”我们的大脑。
总之，阅读带来了生理和智力的改变，这仿佛是一个具有非凡意义的培养皿，让我们能检验自己的思维方式。这项检验需要多种视角——古代及现代语言学家、人类学家、历史学家、文学家、教育家、心理学家及认知神经科学家等，以不同的角度进行研究。本书主旨在于融会贯通这些学科的论点，并提出三项新的观点:
@ 阅读脑的进化(人类的大脑如何学会阅读);
@ 阅读脑的发展(个体的大脑如何学会阅读);
@ 阅读脑的变奏(大脑无法阅读的情况)。
人类的大脑如何学会阅读
让我们从苏美尔、埃及以及克里特岛这些书面语言的神秘起源地说起。在这些起源地中，我们发现了苏美尔人的楔形文字、埃及人的象形文字以及克里特岛人的原始字母文字。我们的祖先每发明一种重大的书写系统，大脑都需要进行些许调整，这也解释了为何上述的早期文字与古希腊人发明的意义重大且近乎完美的字母文字之间，时间相差了2000年之多。
普鲁斯特与乌贼：普鲁斯特把阅读看做个体智力的圣殿在书本面前，我们“身未动，心已远”;认知神经科学家却把阅读比做乌贼学游泳，畅快淋漓的行动之下是复杂而精密的神经活动。
字母规则在根本上呈现出了人类祖先深邃的洞察力，口语中的每一个单词都由一些有限的独立音位组成，而这些音位又可由一组有限的独立字母来表示。随着时间的流逝，我们发现这套看似单纯的发音原则是非常具有革命性的，因为它提供了这样一种可能性:每种语言的每一个口语词都可以被转换为文字。
阅读史上有个很少提及的故事，苏格拉底竭尽所能地发挥他传奇性的口才，来反对发展希腊字母文字及其读写能力。在今天看来，苏格拉底很有先见之明，人类从口语时代转变到文本文化后确实遗失了一些东西。柏拉图对此沉默地抗议，他以文字记录下苏格拉底的每一句话。苏格拉底的反对格外契合当今我们的环境和心理，因为我们和孩子们正在从文本文化过渡到充斥着视觉影像与数字信息的时代，正在经历着同样的反对与妥协的过程。
个体的大脑如何学会阅读
有几种令人深思的关系，联结了人类书写的历史和儿童阅读的发展。首先，人类经历2000年之久，才实现了认知能力的突破，学会阅读字母表而现在的儿童只需大约2000天就学会了同样的知识。其次是一个为学习阅读而不断进行“重组”的大脑，有什么进化及教育学上的意义。没有特定的基因组直接负责阅读功能，我们的大脑还需要在负责视觉和语言的原有结构间建立联结去学习阅读这项新的技能，因此每一代的每一个儿童都需要重复大量的工作。
认知科学家史蒂芬·平克_{(Steven Pinker)}信誓且日地表示：“儿童天生就会辨认声音，然而文字是额外的需求，他们需要努力地学习才能把它们都读懂。”为了获得这项非天赋的技能，儿童需要一个全面的阅读教育环境，这样他们大脑中负责阅读的神经回路才能得到充分开发。但是目前的教学方式与该设想背道而驰，顶多只关注阅读的一两个层面。
要理解自婴儿时期直至青少年时期的阅读发展，必须先理解阅读脑中所有的神经回路及其发展情况。假设有两个差不多大的孩子，他们都必须掌握成千上万个词汇和概念、数以千计的听觉及视觉认知，这些都是建构阅读的基本元素。但是由于他们生长环境的差异，其中一个儿童能掌握这些基本元素，另一个却不能。孩子本身并没有错误，但每一天都有许多儿童的学习需求无法被满足。
最开始的阅读学习发生于幼儿期，那是我们躺在父母的怀抱里听故事的时候。事实证明，5岁以下的儿童听故事的频率会影响他们将来的阅读能力。给孩子提供丰富的语言环境或文本环境的家庭，与没有或是无法提供这种环境的家庭，形成了社会的两种阶层，然而很少有人关注这种隐性的阶层差异。一项著名的研究发现，在学龄前的小朋友们中，来自语言贫乏家庭的儿童与来自语言丰富家庭的儿童相比，他们接触到的词汇量差距大约是3200万。换言之，某些环境中，5岁以下的中产阶级家庭的儿童比来自贫困家庭的儿童，平均多接触3200万的词汇。
在人学前已经听过、用过数以千计的词语，并能在大脑中理解、分类、记忆这些词语意义的儿童，人学后一定感到游刃有余。反之，另一群没有听过父母讲故事、没有听过儿歌、没有想象过与龙搏斗或与王子结婚的孩子，人学后经常会有挫败感。
认识阅读活动发生的前兆，可以改变这种状况。在新科技的帮助下现在我们可以直接观察儿童学习阅读的过程，从解读一个词语，如“猫”开始，到流畅轻松地理解如“麦非斯特猫一般狡猾”一般复杂的句子，这中间发生了什么事情。我们发现人在生命周期中要经历一系列可预知的阅读阶段，这些阶段显示出初级阅读者与专家级阅读者的大脑有着不同的神经回路及其他必备条件，这些条件帮助专家级阅读者畅游《白鲸记》及《战争与和平》的世界，或是理解逻辑缜密的经济学书籍。
随着时间的推移，我们关于大脑如何学习阅读的知识逐渐积累，这有助于我们预测、改善甚至预防一些原本未必会发生的阅读障碍的情况。如今，我们在阅读方面具备了充分的知识，不仅可以诊断出绝大多数幼儿园儿童是否有阅读障碍的风险，更可以教导已经出现困难症状的幼儿学会阅读。然而，这些积累的知识同时也突显出新的问题：数字化时代对大脑提出新的不同的要求，同时我们也不希望失去阅读脑的已有成就。
大脑无法阅读的情况
研究阅读障碍，有助于我们从另一个角度理解阅读行为。从自然科学的角度来看，阅读障碍之于人类，有点类似于游泳障碍之于小乌贼。这类天生有游泳障碍的乌贼，不仅让我们明白了学会游泳必须具备哪些条件，也使我们了解如果没有游泳这项独特的天赋，这些乌贼如何和其他乌贼样繁衍生存。
我与我的同事采用了从字母测验到脑成像等多种研究方法，希望理解为什么许多儿童患有阅读障碍。我的大儿子也是这样，他除了阅读障碍的症状之外，在一些简单的语言行为上也有困难，例如他无法区别单词里的音素，也无法在看到某种颜色时立刻说出其名称。现在我们可以追踪正常儿童与阅读障碍儿童在进行各种行为时的大脑活动情况，分别建立动态的脑部影像。
这些脑部影像每天都为科学家们带来新的惊喜。随着脑成像技术的进步，对阅读障碍者大脑的研究有了新的前景，特别是在干预治疗_{(intervention)}方面的应用。这些成果有可能帮助许多原本无法对社会做出贡献的患者。将正常儿童的发展与阅读障碍者的发展相比较，能够帮助无数阅读障碍的儿童恢复潜在的能力，重拾生活的希望。
（脑成像技术:在计算机等现代设备的辅助下，”看见”大脑在人做出反应、进行思考或想象时的情况的技术。常见的脑成像技术有计算机断层扫描、磁共振成像、正电子成像术。）
关于阅读障碍者的大脑可能具备哪些特殊优势的问题，目前仍处于令人兴奋的早期研究阶段。但是许多发明家、艺术家、建筑学家、电脑设计师、放射学家或金融学家在童年时期都有过阅读障碍，这是毋庸置疑的事实。发明家托马斯·爱迪生、亚历山大·贝尔，企业家查尔斯·施瓦布大卫·尼尔曼，艺术家列奥纳多·达芬奇、奥古斯特·罗丹，以及诺贝尔奖得主医学家巴茹·贝奈赛拉夫等杰出人士，在儿童时期都出现过阅读障碍或者阅读困难的症状。
某些阅读障碍者在设计、空间技能、模式识别等领域具备无可比拟的创造力，这些创造力与他们的阅读障碍之间有何关联呢?阅读障碍者的大脑结构，在比较注重建筑和探险能力的史前时代，是否更适合生存?阅读障碍者是否更容易适应视觉与科技主导的未来世界?现今最先进的脑成像与基因研究，是否能清晰描绘出阅读障碍者大脑的特殊构造，并最终解释这些已知的缺陷，以及各种正渐渐被发现的特殊天赋?
上述关于阅读障碍者大脑的疑问，不但有助于回顾我们进化的过程也有助于展望符号发展的未来。许多年轻人选择以需要“持续部分注意力_{（continuous partial attention)}的网络多元化文化取代书本原有的地位，他们将会获得什么，又会失去什么?无限信息时代对于阅读脑的进化有何影响，对人类的进化又有何影响?信息爆炸对原本需要时间才能形成的全面深刻的知识而言，是不是一种威胁?
近年来，撰写科技文章的作家爱德华·特纳(EdwardTenner)，曾经质疑过谷歌这样的搜索引擎是否促进了“信息文盲”(information illiteracy的出现，这种学习方式是否会有意想不到的负面结果?他说:“若科技的光辉最终威胁到了创造它的智慧，这是多么使人羞耻的事情!”
反思上述问题，我们会更加珍视人类通过文字发展出的各种智慧的价值。我们不愿丧失这些技能，即使它们可能被其他技能取代。本书分为三部分:两部分是科学研究，一部分是个人的观察，我尽可能以各种事实来证明，为了我们自己,也为了后代,我们迫切需要保存阅读发展的独特功能。我们已经不需要像柏拉图那样，在口语与文字两种交流方式之间左右摇摆;只是当新的维度加人智力发展的行列中时，我们必须警惕，不能失去阅读脑这项意义深远的传承。
然而，和普鲁斯特一样，在已有知识的王国里，我只能带领读者走这么远。本书的最后一章将会超越现在我所知的信息，进入一个充满直觉与猜想的世界。在这场探索阅读脑的旅程的终点，希望每位读者都能体验并超越这个奇迹，这个每当人们阅读时都必然发生的认知奇迹。
第2章阅读脑与思考的自然史
因此我雄心勒勃地从自己作为读者的个人历史开始，逐渐过渡到阅读行为的历史，更确切地说是阅读的历史。许多事物的历史都是由特定的社会习俗及不同的个人情况组成的，阅读史也不例外。 ——阿尔维托，曼古埃尔
书写的发明堪称人类智力的最高成就之一。它多次独立发生于不同地点、不同时代，甚至偶然还会发生于现在。没有书写，今日我们所熟悉的文明，将成为难以理解的天方夜谭。 ——曾志朗、王士元
一万多年前，书写以各种各样的形态出现在地球的各个角落：表面覆以坚硬黏土的小小代币、印加文明中错综复杂的染色绳结（见图2-1）、龟甲表面的精致图案等。最近在南非布罗姆斯洞穴中的岩石上发现了约77 000年前留下的交叉符号，这有可能是人类从事“阅读” 的最早遗迹。
图2-1 印加文明中的结绳文字
无论阅读从哪里开始，在何时发生，阅读绝不是“突然发生的”。阅读的故事伴随着人类重大的文化变革，反映了一系列认知和语言上的突破性事件。它多姿多彩且间歇性的发展历史，揭示出大脑在进行每一次突破时必须要做的努力。此外，这不仅仅是我们学会阅读的历史，也是大脑原有结构以不同方式适应不同形式书写系统需要的历史，因此也是我们思维方式改变的历史。从现代我们逐渐演变的交流方式来看，为什么每种新的书写系统都对人类智力的发展产生特定的影响，阅读的故事为此提供了独一无二的记录。
纵观古今中外的书写系统，文字之成形通常有一些先兆：
第一类是“符号表征”，其抽象程度远远高于人类早期的绘画一令人惊讶的是，这些刻在黏土、石头或龟甲上的简单线条，不仅能代表绵羊等大自然中的具体事物，而且能代表数量或神谕等抽象意涵。
第二类预兆是明白符号系统可以跨越时空，保存个体或整个文明的思想。
第三类预兆是发音与符号的对应关系，这个将语言抽象化的发明并非普遍存在于所有文字系统，然而此发明却使得所有单字都可以由更小的发音单位组成。同样，每个符号也都对应着一个单字的部分发音。
我们的祖先在书写系统上的突破性发展，为我们提供了一面特殊的镜子，使我们更清楚地审视自己。正如神经科学家特里，迪肯所言，了解每个事件的起源能够帮助我们了解它如何运作，进而认清我们拥有什么，又需要去保存什么。
人类最早的语言
历史上有不少君主曾经试图找出地球上最早的语言是什么，以下两个故事就是其典型例子。
古希腊作家希罗多德曾告诉我们：埃及法老普萨美提克一世（公元前664年至公元前610年），曾下令将两位婴儿隔离在牧羊人的小屋里，除了每天负责送食物及牛奶的牧羊人之外，不准他们接触其他人类，也不许他们接触任何人类语言。普萨美提克一世认为从这些婴儿口中说出的第一个字，就是人类最早的语言——一个聪明的假想，可惜是错的。终于，其中一个婴儿哭喊着说出”bekos”，在弗里吉亚语中的意思是“面包”。此故事使许多人长久以来都坚信，在安纳托利亚西北地区使用的弗里吉亚语是人类最早的语言，即”原始语”。
几个世纪以后，苏格兰国王詹姆斯四世进行了类似的实验，结论不同却十分有趣：苏格兰的婴儿”说了一些希伯来文”。而在欧洲大陆，霍亨斯陶芬王朝的弗雷德里克二世以更多新生儿为样本又做了一遍同样的实验，不幸的是，由于实验过度严苛，婴儿们还没开口就死了。
关于哪一种语言才是最早的语言，我们可能永远无法做出权威的论断，更不用说争议性更大的“最早的文字”。然而要回答文字的发明只有一次还是有许多次，就容易许多。本章将通过追踪几套特定的书写系统，来探讨在公元前8000年到公元前1000年的漫长时间内，人类如何学会从小小的代币或“龙骨”上阅读信息。在这段耐人寻味的历史背后隐藏了一个事实，那就是大脑不断的调整与改变。每一种新文字的发明，都将使书写系统变得更为错综精细，脑部神经回路因此重新排列组合，从而引导人类智力的发展和思考能力的伟大突破。
文字的第一次突破：象征符号
仅仅是看着这些小碎片，就能够将我们的记忆延伸至太古之初，即使思想的创造者早已终止思想，思想本身仍继续着。我们参与了创造，并且只要刻下的图案有人看见、解释或阅读，这创造便永远不朽。 ——阿尔维托，曼古埃尔
在偶然发现了一些比铜板还小的黏土碎片后，现代人迈出了探索文字历史的步伐。现在这些黏土碎片被称为“代币”，其中一部分以黏土为外壳，刻上记号来代表内容（见图2—2〉。现在我们确认了这些碎片的使用可以追溯到公元前8000年到公元前4000年间，它们是古代世界里许多地方都会使用的一种记数系统。这些代币最初用来记录货物买卖的数量，比如买卖了几只羊、几瓶酒等。这项略带讽刺意味的事实说明，人类认知能力的增长可能开始于黏土壳上的数字世界，随后才发展至文字世界。
代币：代币上的象征符号是文字的前身。最初的象征符号是用来记录货物买卖数量的，人类认知能力的增长可能开始于代币上的数字世界，逐渐发展至文字世界。
图2-2 代币
与此同时，数字及字母的发展也带动了古代经济与我们祖先的智力技能的发展。有史以来第一次，人类终于可以在现场没有羊或酒的情况下, 计算货物交易。新的认知能力使得信息储存及永久记录这一文字出现的预兆变成现实。举例来说，与近来在法国和西班牙发现的洞穴壁画一样，代币系统反映了人类出现了新的能力一象征符号的运用，主要体现在视觉系统能够辨认出代表具体实物的符号。
除了认知语言系统，大脑必须建立新的联结，人类才能开始阅读符号。于是大脑在原本已经建立的视觉、语言、概念等脑神经回路上，发展出新的神经联结及视网膜拓扑通路，把眼睛和特殊的视觉区域联结起来，然后指派此区域负责“阅读代币”。
虽然我们无法对阅读代币的祖先进行脑部扫描，然而以现在对脑部功能的了解，我们足以对他们的大脑做出精确的推测。神经科学家马库斯·莱切尔、迈克尔·波斯纳和莱切尔在华盛顿大学的研究团队曾经做过一系列具有开创性意义的实验：运用脑成像技术，观察被试在看到一连串有意义或无意义的符号时，大脑是如何运作的。试验中被试分别被安排看了无意义符号、有意义符号组成的字母、无意义单词及有意义单词这4种不同的符号材料。
虽然这项研究是为了其他目的而设计的，但是其结果让我们得以一窥人类面对抽象难解的书写系统时大脑内部发生了什么。无论是数千年前，抑或是现代的大脑，其中的道理都是一样的。
莱切尔的团队研究发现，人在看到没有意义的符号时，只有大脑后方枕叶有限的视觉区域会被激活，此发现在某种程度上为前面提及的“视网膜拓扑图构建”理论提供了范例。视网膜的细胞会激活枕叶区域一群特定的细胞，这群特定的细胞与彼此独立的视觉特征，如直线和圆圈等，一一对应。
但是如果要将这些直线和圆圈解读成有意义的符号，大脑则需要建立新的路径。正如莱切尔的实验所显示的，出现有意义的“真词”时脑神经的激活程度是看到无意义符号时的两到三倍。想要理解更复杂的阅读脑的活动，应先从熟悉“代币阅读脑”的基本神经路径开始。
我们的祖先之所以能够阅读代币，是因为他们的大脑能够将负责基础视觉功能的区域与较为精密的视觉区以及概念处理区相连接。这些负责精密功能的区域邻近枕叶的其他区域，以及毗邻的颞叶和顶叶区。其中颞叶区域与听觉及语言处理活动息息相关，有助于我们理解词汇。而顶叶参与一系列与语言相关的活动，同时也参与空间与运算功能。当代币这类视觉符号被赋予意义时，大脑已将基础视觉区与语言及概念处理系统联系了起来，同时也联结到了视觉、听觉的专门化区域，组成“联合区”。
因此，即便是小小代币这样的象征符号化也开发和扩展了人类大脑最重要的两项功能：专门化的能力，以及在联合区建立新联结的能力。人类大脑和其他灵长类动物大脑的一个最大的区别在于联合区占整个大脑区域的比重。为了能阅读符号，这些联合区不仅要承担更多的感官信息处理过程，同时还要建立起供将来反复使用的信息的心理表征这种表征能力对于符号的应用和我们的智力发展都非常重要。从猛兽的脚印、代币符号这样的视觉图像，到老虎的咆哮、单词发音这样的听觉信息，表征能力能够帮助我们迅速回忆并检索储存在大脑中的各类表征。
心理表征：信息或知识在心理活动中的表现和记载方式。心理表征是外部事物在心理活动中的内部再现，一方面反映客观事物，另一方面又是心理活动进一步加的对象。
此外，表征能力还为我们的进化奠定了好基础，使我们能够自动化地辨认与我们相关的一切信息形式。这使得人类成为辨认各种感官信息的专家，无论是长毛象的足迹还是买羊用的代币，都是小菜一碟。
阅读符号要求我们的祖先具备更多的视觉专门化功能，而最关键的是将视觉表征与语言、概念信息建立联系。大脑枕叶、颞叶、顶叶交界处的角回是联系不同感官信息的理想位置，杰出的行为神经学家诺曼·格施温德称其为“联合区中的联合区”。 19世纪的法国神经学家约瑟夫-朱尔斯·代热林经观察发现，一旦此区域受伤，即会造成读写能力的丧失。麻省理工学院的约翰·加布里埃利与加州大学洛杉矶分校的拉斯·波尔德拉克这两位当代的神经学家也通过脑成像的研究考现，当孩童发展阅读能力时，无论是从角回区域传出还是传导至角回的神经回路都会被强烈地激活。
从莱切尔、波尔德拉克及加布里埃尔的研究中，我们可以推断出人类祖先最初“代币阅读能力”的生理构造基础，可能就是在角回与邻近一部分的视觉区之间产生的新而微弱的神经回路联结。若是戴哈尼没错的话，新的联结还涉及负责处理数字的顶叶，以及负责物体识别的颞叶及枕叶的部分区域，也就是大脑皮层分区系统中的37区（见图2—3）。
图2-3 第一个代表阅读脑的结构
最初使用代币的时候，虽然脑内建立的联结只是一个基本的雏形，却是人类在阅读方面最早的突破。通过教育下一代使用更丰富的象征符号，我们祖先把与大脑能力相关的知识传递下去，逐渐调整、改变大脑结构，促使它做好阅读的准备。
文字的第二次突破：楔形文字和象形文字
你可曾注意过Y这个字母就像一幅画？你可曾注意到它蕴藏着许许多多含义？它可以代表树、岔路口、两条相交的河流、驴子或公牛的脑袋、高脚玻璃杯、带柄的百合花以及高举双手的乞丐等。对Y的观察也可扩展到所有由人类发明的字母元素。——维克多·雨果
公元前3300年到公元前3200年间，发生了阅读史上的第二次突破：苏美尔人的铭刻记号发展成为楔形文字，同时，埃及人使用的符号也演变成象形文字。虽然现在仍在争论苏美尔人与埃及人是不是这两个书写系统的发明者，但毫无疑问的是，苏美尔人创造出了一种最早的令人敬仰的文字系统，它持续影响了整个美索不达米亚平原的阿卡德语系“楔形”—词源于拉丁文的意思是“钉子”，借以描述苏美尔人的文字貌似钉子。苏美尔人利用芦苇尖端在柔软的黏土表面刻下的字迹，对没有受过这种教育的人来说，看起来颇似鸟爪的痕迹（见图 2―4〉。
这些形状奇怪的书写系统的发现年代距今不远，当时不少勇敢的语言学家都去研究文字的起源。最为当代语言学家津津乐道的是19世纪的一位军人兼学者亨利·罗林森。罗林森曾经冒着生命危险，到现今的伊朗研究楔形文字，为了复制刻在悬崖上的最早一批苏美尔人文字，他用绳索把自己吊在离地面90米高的半空中。
图2—4 楔形文字
幸运的是，另外5 000多块刻有楔形文字的泥板能以比较轻松的方式获得。在许多苏美尔文明的遗迹如宫殿、庙宇或仓库中都可以发现这些文字，它们的发明和使用主要是为了满足政治和会计方面的需求。
古代居住在底格里斯河与幼发拉底河交汇的三角洲一带的居民，对于他们文字的起源，有着一个浪漫的传说。在一首史诗中有这样的描述：库拉布国王派遣一位使者带着重要的信息前往远方的国家，他担心使者到达时会因太过劳累而说不清这些重要的信息。为了保证信息的传达，库拉布国王“拍打黏土，这些信息便一字不漏地留在泥板上’ 文字因而诞生。然而，苏美尔人对于为何有人可以解读库拉布国王的文字，并没有交代清楚。
不过苏美尔人的楔形文字确实是书写系统演进过程中的一个里程碑。这是一套真正的书写系统，它暗示了书写者、阅读者以及教学者大脑中逐渐出现的认知技巧。尽管楔形文字比代币的复杂度要高出很多，但最初的苏美尔文字其实还只是象形的（仿照物体形状呈现的图像），只比代币多一点点抽象的成分。因此，这些象形文字很容易被视觉系统识别，需要的只是与口语中的物体名称进行匹配。
观察世界上众多的书写系统和数字系统中所运用的符号与字母后，神经学家戴哈尼发现，许多符号与字母的形状或特征有很高的相似度，且大都取材自自然界或是我们世界中的各种物体形象。法国作家雨果则认为所有的字母都源于古埃及的象形文字，而这些象形文字又是通过模拟自然界中的河流、蛇或百合花的茎等物体形成的。文学家与科学家之间不谋而合的推论虽然仍有争议，却也说明了为什么从一开始大脑就能学会辨认字母。在戴哈尼的进化观点中，利用外部世界的已知形状创造出早期的象形文字符号，其实是“再利用”了大脑内部负责物体辨识以及命名功能的神经回路。
但是苏美尔文字的简单形态并没有维持多久，在出现后的短时间之内，本就十分神秘的楔形文字的复杂程度又大大增加。符号快速发展，象形的成分逐渐消减，而标记化和抽象化的成分则逐渐增加。这种意符文字可直接表达苏美尔人口语中的概念，但尚无发音单位的出现。随着时间的流逝，苏美尔文字中的许多符号渐渐可以代表苏美尔人口语里的部分音节，这种有双重功能的文字系统，语言学家称之为“意音文字” 或“语素音节文字”。文字系统发展至此，对人类大脑功能的需求又大幅增加。
楔形文字：由苏美尔人创造，因其笔画形状像钉子而得名。楔形文字是人类历史上最早出现的文字系统之影响了当时整个美索不达米亚平原的语言文化。
事实上，为了实现文字系统的双重功能，必须在苏美尔人阅读大脑的神经回路上建立起交叉回路。首先，视觉区及视觉联合区里必须增加更多的神经通路，以对上百个楔形文字进行解码。在视觉区域做出这种调整，就像在电脑中增加内存条。其次，语素音节文字的概念处理不可避免地需要更多认知系统的参与，因此，需要在枕叶的视觉区、颞叶的语言区及额叶区增加更多的神经联结。额叶区之所以参与其中，是因为它在分析、计划、焦点注意等方面的“执行能力”，这些能力对于处理词语中的短促音节，以及人类、植物、神殿等语义类别来说是必需的。
对我们的祖先来说，专注于词语中的各种语音模式是一种崭新的体验，是智慧的产物。当苏美尔人需要创造更多的词汇时，他们在文字中利用了语言学的“假借”原理，即用一个单字（如：鸟）表示发音，而非意义，这正是音节的发明。如此一来，“鸟”这个字即被赋予双重任务：既可以表示语义，也可以表示发音。显然，要区别同一个字的两种功能，需要新功能的介入，包括发音的标示，或语义的分类。反过来，要同时记忆语音和语义，需要更多复杂精细的大脑神经回路。
有两个方法可以帮助我们揣摩苏美尔人的大脑结构。首先，让我们回想一下莱切尔团队的研究，他们探索的是当词汇被赋予意义时，大脑内部如何工作。举例来说，研究团队给被试一个无意义的假词mbli也和一个有意义的单词limb(四肢)，两者的组成字母完全相同，但是其中只有一种排列具有意义。结果显示，被试看到两个词时，大脑视觉区都会被激活，但在进一步辨认时，假词对视觉联合区的激活程度较小；而真词则令大脑变得非常活跃。看到真词后，大脑的处理系统开始工作：视觉区与视觉联合区对视觉模式，或称其“表征”，产生反应，然后额叶、颞叶、顶叶将词语中最小的发音单位，即音位，转换为信息提供给大脑，最后颞叶与顶叶的部分区域联手处理词语的意义、功能以及与其他相关词的联想。
因此，尽管假字与真字组成字母完全相同，但动用到的大脑皮质区相去甚远，几乎差了半个皮质。由此可见，第一批楔形文字或象形文字的阅读者，不论是苏美尔人还是古埃及人，毫无疑问都使用了上述大脑区域的一部分。就像他们当时创造这两套最初的书写系统的时候，两者势必运用了重叠的大脑区域。
而第二个得以窥见苏美尔人阅读脑构造的方法是，由具有相似结构且至今仍极具生命力、蓬勃发展的汉字系统着手。汉字同样是从象形文字演变至语素音节文字的典范，同样运用了语音及语义标记来区别符号的不同功能，最重要的是它有充足的脑成像样本可供观察。语言考古学家兼汉字专家约翰·德弗朗西斯在把汉字与楔形文字进行比较后发现，尽管两者有些许的差别，但是也有很多类似的元素，因此把它们都归类于语素音节文字系统。
因此中国人的阅读脑（见图2—5）为我们提供了一个现代的、比较合理的类似于古代苏美尔人的阅读脑结构的范例。广泛分布的神经回路，取代了代币阅读脑的小范围神经联结，这种新的调整要求视觉区与视觉联合区在大脑的左右半球覆盖更多的表面区域。不同于其他的书写系统（如字母表），苏美尔语和汉语更多地涉及右脑，众所周知，右脑能更好地提供阅读表意文字所需的空间分析能力及整体处理能力。表意符号数量繁多，对视觉要求极为严格，它们不仅需要大量的视觉区域，大脑内负责识别物体的枕叶——颞叶区（37区）也同样重要。戴哈尼推测该区域是认识文字时“神经再利用”的主要区域。
图2-5 “语素音节文字阅读脑”的结构
虽然所有的阅读行为都或多或少地使用额叶、颞叶区规划分析词语的语音及语义，然而阅读语素音节文字会激活大脑额叶和颞叶一些特殊的区域，尤其是专门负责动作记忆的区域。匹兹堡大学的认知神经科学家谭力海和查尔斯·拍费提及其研究团队提出了一个重要的观点：这些动作记忆区域在阅读中文时比阅读其他文字时更为活跃，因为年幼的初级阅读者就是通过反复书写来学习汉字的。而这也正是苏美尔人学习楔形文字的方法，在一个小小的泥板上，一遍又一遍地练习。接下来要谈的历史，可证明“此言属实”。
苏美尔人如何教儿童阅读
苏美尔人会将单字一行行地记录在小小的泥板上，让所有的孩童读出来。这件事在人类智力发展史上看起来似乎微不足道，实则不然。因为教学不仅要求老师对内容本身有扎实的知识背景，同时也要对其所教内容的学习情况进行深入分析。此外，好的教学过程能从多元的角度将复杂的课程，如特性复杂的文字系统，更清晰地传授给学生。因此，逐步地学习如何进行最早的文字教学，促使世界上最早的文字教育者，身兼语言学家的角色。
来自特拉维夫大学的亚述研究专家尤里·科恩，近来在分析古老史料后，发现苏美尔人的学生要经过漫长的时间才能学会读写，他们必须在 “泥板屋”学校花费数年才能学会读写技能。
亚述：古代西亚奴隶制国家。位于底格里斯河中游，属于闪米特族的亚述人在此建立亚述尔城后逐渐形成贵族专制的奴隶制城邦。
“泥板屋”一词，暗示了苏美尔人的基本教学方法：教师会先把一些楔形文字符号写在泥板上，接着学生必须在另一面模仿其写法。新生还同时学习阅读含有表意符号及语音信息的文本，有时一个字里就有这两种信息。如果想流畅地理解这些楔形文字，年轻阅读者必须具备丰富的背景知识、训练有素的自动化技巧以及相当高的认知灵活度，这些需要数年的练习。最近新发现的练习泥板上描绘了学生的悲惨生活。他们和老师待在一起的每一年都十分痛苦，经常重复书写这样一句话：“他用鞭子打了我。”
最让人惊讶的还不是频繁的鞭刑，而是这首批阅读指导老师使用了高度分析性的语言学规则，这些规则在任何时代都是实用的。尤里·科恩观察发现，初级阅读者学习词汇表时已经运用了一些特殊的语言学原则。有些词汇表是用来教导不同的语义类别的，每一类都有特殊的标记。
后来苏美尔人把音节符号纳入书写系统中，又出现了另一类依据发音来分类的词汇表。这意味着苏美尔人在对语音系统进行分析，这也是现代以语音为基础的阅读训练的重点。20世纪的教育者还在为阅读该从发音教起还是从意义教起争吵不停，而很久以前的苏美尔人在教育体系中已经同时釆用了这两种方法。
苏美尔人教学方法的另一项重要贡献是促进了认知能力的发展。要求学生从语音、语义两个角度学习词汇，有助于他们更有效率地记忆单词，扩充词汇量，增长概念性的知识。以现在的术语来说，即是所谓的元认知策略。也就是说，苏美尔教师已经懂得利用教学工具，明确地把学习与记忆的方法传授给学生。
随着时间的流逝，苏美尔人的初级阅读者学会了一些带有词法特征的词汇。词法是利用语义的最小单位，即词素来构成词语的规则。举例来说，英语中bears这个单词是由两个词素组成的：词根bear加上s，因此bears既能代表一个复数名词_{（一些熊）}，又可以表示动词“忍受”的第三人称现在式。如果缺少了这种意义重大的语言组合能力，我们的词汇量及思考能力的发展将会受到严重限制，人类智力的进化与认知能力也会受到影响，我们人类与其他灵长类动物的差异也许就不会这么明显。
我们的一种灵长类亲戚，尼日利亚的白鼻长尾猴，其叫声系统也显示出语言组合能力的重要性。白鼻长尾猴与黑面长尾猴都有两种叫声来警告同伴有天敌接近。猎豹靠近时发出”Pyow”声，老鹰接近时则发出一种类似干咳的声音。近来，两位苏格兰动物学家在观察后发现白鼻长尾猴会通过组合两种叫声创造出一种新的叫声，用来警告年轻的猴子”快离开”。白鼻长尾猴的这项创新，与苏美尔文字中频繁出现的利用词素构造新词的做法有异曲同工之妙。
苏美尔人的楔形文字及教学法的重要性，不仅在于他们了解了词法原理，而且在于他们意识到阅读教学必须从研究口语的基本特质开始。这也正是我们实验室目前正在开发的“前沿”课程，即在阅读教学中融人语言的所有特点。这样的教学方法是非常有道理的。试想一下，如果你是地球上第一批具备读写能力的人之一，在没有任何先人经验能指导你的教学时，你必须搞清楚口语的所有特点，然后才能创造出书写系统并进行教学。苏美尔文明的第一批教师就是在这样的情境下提炼出了可长期使用的语言规则，不但增进了教学效率，还帮助了那些有读写能力的苏美尔人发展认知和语言技巧。因此，苏美尔人在读写教育方面的贡献拉开了阅读脑改变人类思考方式的序幕。
这一切改变的是人类整体而不论男女，有一个鲜为人知的故事很好地说明了这一点。我们在苏美尔人的遗留之物中发现，当时的苏美尔人规定皇室女子必须学习阅读。女人拥有自己的语言变体，称为“艾米索”（Emesal）又名“优雅之语”，用来区别有“高贵之语”之称的标准语“艾米格”（Emegir）。艾米索在许多文字的发音上不同于艾米格。我们可以想象得到，当时的学生在女神所属的“优雅之语”和男性天神所属的“高贵之语”之间转换，需要何等复杂的认知技巧。这个古文明因此留下了一些美丽的见证，世界上最早记录下来的情歌与摇篮曲有不少是由苏美尔女性创作的：
睡吧，睡吧!
我的孩子！
快快睡吧，我的宝贝！
轻轻地合上你颤动的双眼，
妈妈的手来安抚你闪烁的双眸，
安慰你梦中的咿呀，
不让呢喃赶走你的美梦。
从苏美尔语到阿卡德语
苏美尔人留下的另一项证据是，包括早期波斯人、赫梯人在内的至少15个民族，在苏美尔语停止使用之后，仍继续沿用楔形文字及相关的教学方法。文化会绝迹，语言亦然，在公元前2000年，作为口语的苏美尔语渐渐消失，学生开始学习日渐占有主要地位的阿卡德语。到了公元前1600年，苏美尔口语时代正式宣告结束。
然而阿卡德人的书写系统及其教学方法仍保留着许多苏美尔文化的文字符号及方法。苏美尔人的学习方法对整个美索不达米亚平原的教学历史都有深远影响。后来还有人发现，一直到公元前700年，依旧有人分别以泥板与纸莎草来刻写这两种文字，泥板上是古老的楔形文字，纸莎草上是当时的新字。
直到公元前600年，苏美尔文字才真正绝迹。但是，它的影响力却持续下去，体现在阿卡德语的某些字符和教学方法上。此外，在公元前 3000年到公元前1000年的所有共同语中，处处可以看到苏美尔文字的影子。阿卡德语逐渐成为当时美索不达米亚各族人民的共同语，而且历史上许多珍贵的古文献都是用阿卡德语记录的，如《吉尔伽美什史诗》，在此节录其中一段描述人生的不朽诗行：
我辛勤地劳作，为了谁?
我不停地旅行，为了谁？
我遭受的磨难，为了谁？
为什么最后我仍一无所有？
这首史诗是在尼尼微城亚述巴尼拔图书馆中的12块石板上发现的，年代是公元前668年至公元前627年的亚述王朝时期，《吉尔伽美什史诗》上刻有“Shin-eq-unninni”的名字，他是历史上最早的知名作家之一。这首史诗的母题来源于一个古老的传说：英雄吉尔伽美什克服了种种困难，打败了可怕的敌人，却也失去了亲爱的朋友，最终领悟到包括他自己在内，没有人能够逃离人类永远的敌人——死亡。
《吉尔伽美什史诗》与其后风行一时的阿卡德语，是书写历史上重要改变的典型。书写系统的完整发展以及文学类型的百花齐放，奠定了公元前第二个千年里人类的知识基础。许多古籍的内容从其书名就可以看出，例如令人感动的人生教诲书籍《父亲对儿子的忠告》、如带有宗教意味的著作《人与神的对话》、或是充满神话色彩的故事《恩利尔与尼利尔》等。而编集成典的冲动更为人类带来了历史上第一部百科全书，这部书有一个谦虚的名字：《关于宇宙万物》。同样的，编纂于公元前1800年的《汉谟拉比法典》，使社会一切事物都在此规则下运行，此外还有综合了所有已知医学知识的医学大全《论医疗诊断及预防》。
阿卡德人在认知能力、组织力、抽象应用与创造力方面的水平，已将人类智力发展的重点从先前的“个人学习文字需要什么样的认知能力”转变为“认知发展的方向”。
阿卡德语的某些特征让它较容易使用音节表。阿卡德语这类古代语言，以及日语、切罗基语等其他一些语言，都有一套简单有序的音节结构。这类口语很容易发展出音节文字（syllabary）书写系统，以一个符号代表一个音节，而不是一个发音。例如，美国原住民领袖塞阔亚决定为切罗基语创造一套书写系统，他选择了音节文字系统，这非常适合仅有86个音节的切罗基语。这是一套非常完美的演绎，然而，阿卡德语的“纯音节”意味着必须舍弃苏美尔语的语素音节文字形态以及与其联系紧密的过去，而这对阿卡德人来说是难以接受的。
阿卡德语：一种已灭绝的闪米特人语言，属于亚非语系闪米特语族东闪米特语支，主要由古美索不达米亚的亚述人及巴比伦人使用。该语言约于公元前1世纪灭绝。
因此随着历史的发展，折中的方案出现了，而这办法也常常运用于其他语言。最终阿卡德语书写系统里面保留了苏美尔语中较为普遍或重要的词汇，如“国王”，而将其他词汇纳入音节表。如此一来，阿卡德文化的骄傲——传统苏美尔语连同其文明——得以继续存在，然而阿卡德语也因此变得更加复杂。由此可见，在许多至今仍存在的文字里，都包含着使用者延续珍贵文化传统的心愿。
英语的情况也类似，它是混合了传统与实用主义的历史产物。英语为融合希腊语、拉丁语、法语、古英语以及其他语言，必须付出颇高的代价，特别是对小学低年级的学生而言。语言学家一般将英文归为词音文字，因为在英文单词的拼写里同时包含词素与音位 (phoneme，语音的单位）。因此如果初学者不了解相关的历史背景，这一点将会给他们造成很大困扰。
语言学家诺姆·乔姆斯基及卡罗尔·乔姆斯基曾经以英语单词muscle（肌肉）为例，来描述词素音位文字的规律以及英文的历史演变过程，如同阿卡德语接纳苏美尔语的一些元素。英语单词muscle中不发音的c似乎是多余的，但事实上却与它的拉丁文词根musculus有莫大的关系，因此有了相关单词muscular（肌肉发达的）、musculature（肌肉系统）的存在。在这两个单词中，c是发音的，体现了字母作为音位的一面，而muscle里的c则体现了字母作为词素的一面。换而言之，英语的本质是追求体现口语发音与展现词根两者之间的平衡。
正是因为文字进化的这种“平衡”关系，古代阿卡德语的初学者要学会这种文字，必须面对智力及大脑结构上的挑战。因此我们不难想象，阿卡德语和早期苏美尔语一样，至少要花上6到7年的时间才能掌握。如此漫长的学习时间与强权的政治环境，导致阿卡德文最终变成少数上层阶级专属的书写系统，只有神殿或法庭的人能够花费宝贵的时间学习。而在历史上，另一个强权国家——至今仍很鲜活地存在于人们心中的古埃及王国，也创造了最早的文字之一：古埃及象形文字。某些近代学者认为，象形文字的出现比苏美尔文字至少早100年，是真正“最早”的文字。
象形文字的发明
长久以来，大多数学者都认为苏美尔人发明的楔形文字是人类最早的文字，而古埃及象形文字则是由此系统演变而来的。然而，新的语言学证据指出，象形文字出现在公元前3100年左右，应是独立于楔形文字的系统。通过研究埃及阿比德斯的证物，德国的考古学家甚至认为，象形文字的发明可能比苏美尔文字还早，大约在公元前3400年就存在了。若这项新发现属实，那么象形文字才是阅读脑进化的起点。
因为到目前为止尚未有确切的结论，所以在本书中，我们暂且认为埃及人是独立发明文字的，并以此角度来介绍古埃及象形文字（见图2—6）。与鸟爪形状的楔形文字不同，早期的象形文字可归类为一种表意文字，线条抽象而优美，大部分解读文意的人很快就倾心于其纯粹的艺术美感。楔形文字与象形文字的相似之处，在于两者都运用了 “假借”原理来发明新词，而且两者都被当做是神的礼物。
图2-6 古埃及象形文字：鸟、房屋、神殿
随着时间的推进，象形文字逐渐演变成一种复杂的文字系统，既有表示词义的意符，也有表示辅音的特殊符号，这样的象形字类似汉字中的形声字。例如图2-6中显示的象形文字“房屋”看起来像由上往下，即从神的角度看到的房屋形状。此符号除了可以简单地表示“房屋” 之外，还可以读做复辅音pr，或置于其他意义符号的后面，表示pr的发音，相当于注音符号（phonetic marker），这种造字方式在苏美尔文字中也可以见到。另外，“房屋” 一字还可以与其他语义符号组合成新字，如上图的 “神殿”，使词义的类别一目了然。
而在认知能力的需求上，对初级阅读者来说，阅读象形文字与阅读苏美尔文字一样，都是极大的挑战。由于多元的造字原理，初学者必须凭借认知判断力以及灵活性，判断每个符号在不同情况下的具体用法，如此一来便自然而然地拥有了功能活跃的大脑。例如在辨识意义符号时，需要视觉区与概念区的神经联结；辨识辅音符号时，需要视觉、听觉、发音区共同合作；而在识别语音标记及语义标记时，则需要额外的抽象能力与分类能力，以及语音和语义分析能力。
此外，早期象形文字中没有标点符号，而且书写方向时而由左至右，时而由右至左。象形文字及一些其他的早期文字，常以“牛耕式”来书写，也就是一行从左至右写到底后，再从右写到左，就像牛犁地一样来回往返。因此与现代人直线式的阅读方向不同，阅读此类文字时双眼必须随着文字移动到句末，再以反方向继续阅读；另外，根据建构结构的要求，文字刻印的方向还可以由上至下，再由下至上。可见阅读象形文字需要各式各样的技巧，其中包括高度进化的视觉记忆能力、音位听觉分析能力、空间认知的灵活性等。
经过数个世纪的洗礼，象形文字与苏美尔文字及其他古老的字母文字-样，增添了许多新的符号和元素。不同于其他文字系统的是，经过主要负责文字抄写的专职人员的改造，象形文字发生了两次变革。第一次的字体变革提高了抄写的效率，使抄写员的工作轻松不少。然而对于这些古代的抄写员来说，第二次的变革更是振奋人心。
简单来说，古埃及人发现了一套简单来说，古埃及人发现了一套类似音位系统的东西，这虽然不至于令所有人欢欣鼓舞，但对抄写员而言，这项创举具有重大的意义，因为他们可以更容易地记录一些新的城市名、皇室成员的名字，以及更容易地拼写外语词和外国名字。假借原理能够达成此目的。如今日文的两种书写系统也具备此项特质，一是由古汉语而来的表意系统日文汉字(kanji)，另一个则是较晚创造的音节系统假名(kana)。跟古埃及文半拼音系统类似，假名用做日文汉字的补充，以便记录口语中的新词、外来词及外国名字。
象形文字:古埃及人发明的象形文字是另一种古老的文字系统。随着象形文字的发展，古埃及人又发现了一套类似音位系统的东西，这使得他们可以方便地记录新词语和外来语。
我们注意到这项人类语言学创举开始于古埃及象形文字，因为它吸收了一组表示口语发音的文字。如语言学家彼得·丹尼尔斯(Peter Daniels所言，书写历史中“半拼音文字”的诞生真是莫大的惊喜!这项古埃及文字中所诞生的新的文字类型，标志了人类文字第三次突破的曙光:依据文字内部的发音结构来建立书写系统。
但是就像摩西无法在“应许之地”常住，古埃及人并没有充分开发自己发明的拼音字母前身。尽管创造出了半字母文字系统，但是由于文化政治、宗教等因素的限制，象形文字系统始终无法演变为更高效的文字，古埃及王朝中期出现了 700多个标准象形文字，在其后的1000年里，文字数量发展到几千个。其中一些文字表达了隐晦的宗教意义，书写起来层层叠叠，篇幅冗长，因此识字的人越来越少。这样的变动意味着象形文字需要更强的概念处理能力，因此对阅读者的要求反而更严格了。
象形文字最终的没落，若单纯以人类无法负荷过重的视觉记忆来解释显然是站不住脚的。看看目前众多的汉语阅读者，就会明白为什么了。公元前 1000年，埃及文字在进人文字密码化的时代后，抄写员或许运用了人类有史以来最活跃的大脑皮质与最充分的认知资源。奇怪的是，由于象形文字的复杂性而产生的半字母文字系统，反而对早期文字历史中字母的演变贡献最大。
龙骨、龟甲与绳结:其他早期的奇妙文字
象形文字与苏美尔文字大相径庭的发展史，还是未能解答它们究竟是各自文化的独立产物，还是从一种语言传播至另一种语言而形成的。目前累积的考古证据显示，在公元前第四个千年的晚期，人类至少发生过三次以上的文字创造;稍后在不同的地区又至少出现过三次文字发明事件。除苏美尔文化和古埃及文化之外，由约公元前3300年的陶器刻记演变而来的印度河文明的书写系统，在公元前2500年左右成形;这种文字至今仍无人能够破解，一再使热衷的学者们无功而返。
在希腊克里特岛(Crete)发现了公元前第二个千年出现的书写系统。可能是受古埃及文字的影响，克里特文字发展为包含象形特征的线性文字A(Linear A),而之后发展成另一种著名的文字形态线性文字 B(Linear B)。此外，还有萨波特克人(Zapotec)发明的语素音节文字，除了萨波特克人使用之外，还有玛雅(Mayan)、奥尔梅克(0lmec)等民族使用，整个中美洲几乎都可见到此文字的遗迹。
玛雅文字及希腊的线性文字B被发现后，经过数十年的解读，依然是未知的谜团。然而斯大林时期的一位俄国学者尤里·科诺罗索夫(Yuri Knorosov)却取得了令人震惊的成就，他在几乎无法取得相关材料的环境中，成功破解了神秘的玛雅文字。而他的故事也因此巨细靡遗地记录在麦克尔·科尔(Michael Cole)所著的《破解玛雅密码》(Breaking the Maya Code)一书中，内容完全可以看成是20世纪一部引人人胜的智力推理小说。科诺罗索夫整理玛雅文字的线索后发现，玛雅人与苏美尔人或古埃及人相似，也是利用语音、语义标记造字。然而更令人感到意外的是，玛雅文字的建构原理与现今的日文更为接近，同时结合了表意符号及音节系统。
中美洲另一个伟大神秘的发明仍然处在曙光乍现的阶段。在哈佛大学人类学系执教的加里·厄顿(Gary Urton)及杰弗里·奎尔特(Jeffrey Quilter )，针对美丽而神秘的印加结绳，提出了一个新的研究方向。印加人以染色的纤维缠绕编织成图案极为复杂的结绳(见图2-1)，厄顿有别于语言学家及其他印加文明的学者，他大胆推论现存的600多个绳结其实是一种尚未破解的印加书写系统:每一个绳结的类型，每一个绳结的颜色每一种编织的方向都代表了不同的语言信息，就像犹太人晨祷披肩或女用披肩的编织一般。
人们至今认为印加结绳的功能就像算盘;但早在16世纪，西班牙的历史学家就已经记录，印加人曾经告诉西班牙传教士，所有的印加文明都记录在结绳上了。传教士知道后，立刻将能找到的结绳全部烧毁以免印加人与旧神之间有任何联系。
而今天，厄顿及奎尔特的研究团队正试图利用剩余的结绳破解印加文明的神秘语言。
另一种神秘的文字存在于古代中国的书写系统之中。尽管中文书写系统的发明常常以公元前1500年至公元前1200年的商朝为起点，但是一些学者认为确切时间应该远远早于商朝。早期汉字的发现可谓考古学中意外事件的经典案例，这些文字大量出现在19世纪的中药铺里，因为当时的中国人相信“龙骨”的神奇疗效。直到某天有人发现这些古老的龟甲或兽上竟然有一系列的符号。目前已经证实了这些符号是中国人求神问卜的记录。把要询问的事情事先记录在龟甲及牛的肩骨上,而后以烧红的火钳劈裂从龟甲内部裂痕的形状来推测神的答案。
一个完整的龙骨或龟甲会记录问题、时间、神的回答以及真实的结果。例如某个龟甲记录了距今 3000年的商朝，武丁王询问他妻子的怀孕是否为一桩“喜事”，神明的答复为，只有当他的妻子妇好在特定的日子分娩时才是喜事。结果她的分娩日与神明的预测不一致，最终一行字也记载了神明预言的准确性:“不是喜事，因为出生的是女婴。
这些在龟甲中埋藏数千年之久的精致文字讲述了古代中国的历史。汉字与楔形文字相似，是一种素音文字，文字的结构中也蕴含着过去点点滴滴的历史。因此，初级阅读者必须经过反复的书写练习，才能发挥出超强的视觉空间记忆力。与苏美尔文字与古埃及文字的语音标记一样，许多常见汉字都包含了声符，以标记汉字的发音。这些声符可以帮助阅读者辨别文字，以弥补意符的模糊性。
然而，相较于其他古代的书写系统，汉字仍有一些不同之处。首先它目前仍在使用。汉字可以说是古代中国留给现代人的礼物，直到现在仍是神圣的存在。著名的美籍华裔小说家任碧莲(Gish Jen)曾经旅行到中国并在中国定居多年。她注意到有一位手执长杖的老人，每天都会到公园里游玩。整个下午，他都会用长杖蘸水在干泥地上书写巨大的汉字，每一个文字的特征都能得到完美的演绎。在这些文字被风吹干之前，公园里每一个看到的人都不免称赞一番。这一番场景，告诉我们汉字不仅是沟通工具更是艺术的载体。或许对老人而言，还是一种精神上的表达。
我在指导研究生专题讨论时，发现了汉字与其他古文字的另一个不同之处。我询问塔夫茨大学的中国学生，他们是怎样在小小年纪就学会如此之多的汉字的，他们笑着回答道:“我们有一个秘密武器,那就是拼音系统。也就是说，初级阅读者先学习拼音，掌握读和写的概念，为在五年级前能学会2000个汉字做好准备。可是，拼音的秘密究竟是什么呢?拼音其实只是一套小型的字母系统。初级阅读者通过这套小型的字母系统来增进对文字的敏锐度，了解阅读究竟是什么，为大脑准备好第一套阅读的迷你神经回路。
这还不是汉字唯一让人惊讶的地方。作为世界上最复杂的书写系统之古代汉字最可爱的地方是，它包含了一套专由女性使用的汉字。和素音文字的特征不同，这套系统完全是基于发音的。这种特殊而又美好的传统文字被称为“女书”，即女性的文字。
在邝丽莎(Lisa See)的小说《雪花秘扇》( Snow Flower and the Secret Fan)中，对女书有极为详细的描述。女书通常画在优雅精致的扇子上，或者绣在美丽的纺织品上。几个世纪以来，这套令人震惊的书写系统帮助少数女性忍受生活的无奈，使她们在裹小脚的文化梏中获得精神上的升华。最后一位能够读写女书的阳焕宜(Yang Huanyi),于 2004 年去世，享年 96岁。
女书深刻地提醒了我们文字在现实生活中扮演的重要角色。女书同时也是世界上多元文字系统的典型例子，它还体现了文字从表意向表音发展的趋势。就像汉字一样，字母文字系统同样蕴藏着许多谜题疑问以及让人意想不到的惊喜。我们试着去发掘我们之中有多少人是字母文字的阅读者，也在积极寻找那些为了学习而失去的、那些一知半解却仍然难以把握的东西。苏格拉底宁可我们从未学过文字。也许正是这个原因使人类在 2500年后的今天，暂时停下来，深刻地反思自己。
优雅之语:苏美尔皇室女子使用的一种语言，她们用这种苏美尔语变体创作了最早的情歌和摇篮曲。中国也曾有一套仅供女性使用的汉字“女书”，女子们以此巩固姐
妹情谊。
第三章苏格拉底反对的“阅读”是否会妨害人的思考
有一片土地人称克里特……四周围绕着色深如酒的海，还有滚滚白浪袭来，国土俊丽而丰饶，人口众多，惜传统如金，其中九十座比邻的城市号称彼此的语言可以融合。 ——荷马，《奥德赛》
喜欢阅读的人，就像拥有两个人生。 ——米南德，公元前4世纪
近代在书写史上最令人着迷的发现之一，发生在埃及的瓦迪耶尔霍尔(Wadi el-Hol)河谷，译名为带有噩兆之意的“恐怖之谷”。在这个人迹罕至、炙热难耐的高山地区，埃及考古学家约翰·达内尔(John Darnell)和德博拉·达内尔(Deborah Darnell)发掘出一段奇特的铭文，将人类发明字母文字的年代提前了数百年之久。这段铭文的特征具备一切“失落的环节”所要求的条件，将早期古埃及文字系统(Egyptian precursor system)与之后的乌加里特文字(Ugaritic)联系了起来，后者被研究者归类为字索斯(Hyksos)王朝时代，由居住于当地的闪族(Semitic)抄写员和工人发明。这种文字充分利用了小型的古埃及辅音系统(意料之中)，并结合了其后出现的乌加里特文字中的许多元素(意料之外)。
在考察过出土于瓦迪耶尔霍尔的文字后,哈佛大学的学者弗兰克·穆尔克罗斯(Frank Moore-Cross)认为这套系统“显然是最古老的字母文字”他在其中找到了许多与较晚知道的字母相似或相同的符号，并推论出该系统“属于字母文字系统的传承和进化”。神秘的瓦迪耶尔霍尔文字非常重要，因为这涉及阅读脑的新调整，它使我们将注意力集中到两个多维度的问题上:第一，字母文字由什么组成?如何将这套文字系统与更早之前的音节文字及素音文字区分开来?而这些问题的答案又引出了第二个更大的问题:阅读字母的大脑是否需要特殊的智力资源?
瓦迪耶尔霍尔的古老文字或许是语言学长久以来失落的环节，它连接了两种形式的文字:音节文字和字母文字。无奈可供研究的文字数量太少以致在分析上遭遇了很大的困难。稍晚出现的乌加里特文字更适合用来研究最早的字母文字，因为它既被归类为音节文字，也被归类为字母文字。
乌加里特文字起源于富饶的乌加里特王国(今叙利亚北部海岸)。该地区贸易极为发达，海路的船只和陆路的马车熙熙攘攘，一派繁荣的景象在乌加里特地区，除自己的语言和文字之外，不同的民族至少还使用了10种语言、5种文字来进行沟通。乌加里特人遗留下的大量文献为我们提供了重要的线索，它们展现出对字母文字的关键性贡献。其中一项贡献是文字中符号数量的减少，以及由此带来的效率的增加。
虽然阿卡德楔形文字是乌加里特文的发展基础，但阿卡德文字并不能解释乌加里特文字30个符号的书写系统，其中27个使用于宗教文献。在这套独特的类楔形文字中，独立的辅音符号与用来区别邻近元音的辅音符号结合了起来。根据语言学家威廉·丹尼尔(William Daniel)对文字的分类乌加里特文字可被看做辅音音位文字(abjad)，这是一种特殊的字母文字但人们对于此种观点仍有争议。
不论如何分类，乌加里特文字本身就是一项了不起的成就。从管理性文件到赞美诗、神话、诗篇，尤其是宗教文献，这种文字涵盖范围极为广泛。其中最引人注意的是，乌加里特文的口语和文字对希伯来文的《圣经》产生了深远的影响。包括哈佛大学圣经研究所的学者詹姆斯·库格尔在内的少数学者强调：《旧约》的故事题材、人物形象甚至经常采用抒情诗句的写法，与乌加里特文献有着诸多相似之处。
另一项不可思议的发现是，乌加里特人用到了今日语言学家所谓的“字母教材”，即按照固定的顺序排列字母。语言学家还发现，乌加里特文字的字母排列顺序与原始迦南文相同，进而发展为腓尼基文字化的辅音系统，最后演变为希腊字母文字，这一观点已被大多数学者接受。
乌加里特文字：乌加里特人按照固定的顺序排列字母。语言学家发现，乌加里特文字的字母排列顺序与原始迦南文相同，进而发展为腓尼基文字的辅音系统，最后演变为希腊字母文字。
字母的排列顺序证明乌加里特文确实在早期字母文字的发展中起着过渡作用，同时也意味着早期教育系统采用了将字母按固定顺序排列的标准化教学模式。类似于苏美尔人的生词表教学法，这样的排列给初学者提供了一个更容易记忆文字特征的认知技巧。但令人着迷的乌加里特文字却在公元前1200年入侵者毁灭乌加里特王国时消失了，这古老而美妙的书写系统留下了许多未解之谜。我们不能确定它对《圣经》启示性的语言风格是否有影响，也不知道它是不是人类第一种实用性的字母文字。
托马斯·曼受《圣经》启发而编写的短篇故事《摩西十诚》中，曾提及字母文字的创造。上帝要求摩西雕刻两块石碑，每块刻上5条戒律，且内容必须为众人所理解。但是摩西担心的是，他该以哪种文字写下戒律？摩西通晓古埃及文，他曾经看到来自地中海的人使用类似眼睛、盔甲、牛角与十字架的符号，也看过某些沙漠部落使用的音节文字。但这些文字都不能把上帝的10条戒律传达给每一个人。忽然间灵光一闪，摩西突然领悟到，他必须发明一种通用的文字系统，让说不同语言的人都能阅读。因此他创造出每个发音皆有与之对应的符号的文字系统，使得不同民族的人都能以自己的语言来阅读，这就是字母文字的由来。使用这种新发明的文字，摩西在离瓦迪耶尔霍尔不远的西奈山上刻下了上帝的旨意。
虽然托马斯，曼既不是语言学家也不是考古学家，但是他描述出了字母文字的革命性贡献，也指出了文字历史上第三次认知突破的精髓：文字系统仅需要有限的符号，就能表示出一种语言中所有的发音。通过减少文字系统符号的数量，瓦迪耶尔霍尔文字及乌加里特文字获得了认知效率的优势，人们更加经济地使用丨己忆力，减少了读写文字的能量消耗。
认知效率取决于大脑的第三项伟大的特征：大脑内部专门化区域的运转速度非常高，几乎达到自动化的程度。认知自动化意味着人类智力发展的惊人潜力。当我们能以自动化的速度识别符号时，就可以把较多的时间分配到智力活动上，因此可以在读写的同时持续地发展智力。苏美尔人、阿卡德人、埃及人花费数年才发展出高效的阅读脑，而认知自动化为人类提供了更多思考的时间。
然而，由这些早期的类字母文字系统所引发的问题也相当复杂：符号的减少是否再造了具有独特结构的大脑皮质？阅读字母文字的大脑是否发展出了特殊的认知能力？在初级阅读者的发展过程中，如果具备这样的潜能，会产生什么样的影响？为了寻求解答，我们必须再次面对一个基本的问题：什么是字母文字？
什么是字母文字
不同领域的学者们一直在为“真正的字母文字”所需具备的条件争论不休。早在瓦迪耶尔霍尔文字发现之前，古典主义者埃里克·哈夫洛克就提出了字母文字的三个标准：
@这套符号的数量是有限的〔理想数量介于20到30之间〕；
@这套符号可表达这种语言中最小的发音单位；
@这种语言中的每一^音位都有符号与之对应。
因此古典主义者认为希腊字母文字之前的“类字母文字系统”皆不符合标准。举例来说，闪族文字没有元音；希伯来文字中的元音符号也是在千年后才出现的，因为日常生活中使用的语言，如阿拉姆语和希腊语更强调直接描述元音。哈夫洛克等古典主义者认为，字母文字代表了所有文字系统的最高水平。公元前750年发明出来的希腊字母才堪称第一套符合所有标准的“真正的字母文字”。希腊文也是第一种使人类思想产生巨大飞跃的文字。
然而，大多数语言学家及古语言学者却对此持反对意见：亚述研究专家尤里，柯恩强调了哈夫洛克从未提及的一些论点。他认为字母文字对于本国的居民来说，是一套能够以最少记号来明确表达口语的文字系统。在柯恩看来，字母文字能表示任何口语中能够用人耳分辨的最小发音单位即可，而不是表示较大的发音单位，如音节或是整个单词。根据这个观点，早期的瓦迪耶尔霍尔文字及乌加里特文字都可以归为字母文字。
至今，对于人类历史上“第一种字母文字”的讨论仍没有达成共识。但是近年来，有关古代书写系统的新信息越来越多，或许能给21世纪的阅读者提供一个不同的、更宏观的视野。回溯人类认知和语言能力的系统性变化，以及早期众多不同文字系统逐步走向希腊字母文字的历史，我们可以换一个崭新的视角：从荷马、赫西奥德以及奥德修斯的口语世界，一直到苏格拉底、柏拉图和亚里士多德的雅典时代，字母文字不断地发展。这其中不仅仅是地点和时间的变化，人类的记忆与大脑结构也随之变化。这就意味着，阅读脑下一个重要的调整即将出现。
克里特岛的神秘文字与希腊的黑暗时代
传说在克里特岛上，每一块石头下都藏着一个神话，仅仅这件事本身就令人着迷。这些石块是古希腊克里特文明的一部分，有可能是画满精美壁画的宫殿所留下的遗迹，宫殿中有代表当时时尚的空气调节系统以及早期排水系统。古希腊克里特人在4000年前便建造了许多纪念碑，制造了许多美得不可思议的艺术品和首饰。除此之外，他们还创造了一套文字系统，尽管现代人尽了最大的努力，但是这套系统至今难以解读，让我们充满挫败感。
1900年，英国考古学家亚瑟·埃文斯挖掘出古希腊克里特文明的中心——荷马所描述的伟大城市克诺索斯。根据希腊神话，此地为米诺斯王的皇宫，其中也居住着米诺斯王骁勇的“飞牛怪” ，以及掌控迷宫的牛头怪弥诺陶洛斯。
在此次考古行动中，埃文斯发现了一些令人震惊的东西——7000块刻满神秘文字的黏土石碑，这些石碑成为他终生的研究对象。这些文字既不像古埃及象形文字也不像阿卡德楔形文字，它具备了某些早期克里特文字 (线性文字A）的特征，但是看起来却和稍晚出现的希腊字母文字没有任何关系。埃文斯将其命名为线性文字B，由此开始了此后40年艰辛的破解生涯。
1936年，一位勤奋的学生迈克尔·文特里斯认识了埃文斯，很快地，他也痴迷于这种神秘的文字。1952年，文特里斯终于解开了克诺索斯石碑文字（线性文字B）之谜。尽管这种文字困扰了学者们长达半世纪之久，但事实证明线性文字B—点也不神秘。简单来说，不过是那时的希腊口语未经整理的记录罢了。在受过古典主义训练的文特里斯的思想中，这样的发现过程类似于破解古代版的即时信息软件。文特里斯从未想过要破解希腊口语，但是塔夫茨大学备受尊崇的古典主义者史蒂夫，赫什认为，文特里斯破解线性文字B“革新了人类对早期希腊历史的认识”。
然而，除了知道线性文字3在公元前15世纪开始出现于克里特岛、希腊本土与塞浦路斯等地，消失于公元前12世纪至公元前8世纪之外，我们对这种文字仍然知之甚少。公元前12世纪至公元前8世纪被称做希腊的黑暗时代，侵略者摧毁了许多存放典籍的希腊神殿，只有极少数的文献能被保留下来。但是在这样的黑暗时代，口头文化却因此蓬勃发展，成就了公元前8世纪记录这一切的荷马史诗。
关于荷马的传说众说纷纭：有人认为他是传说中的那位盲眼的游吟诗人（近期又出现了支持这一说法的新证据）；有人称“他”其实是一群诗人，甚至是那些仍未破解的口头文化的集体记忆。毋庸置疑的是，荷马史诗《伊利亚特》和《奥德赛》中包含的广博知识及神话，对每一个希腊人的成长都做出了卓越的贡献。根据希腊史学家修昔底德的描述，每位受过教育的希腊公民都必须背诵许多史诗的段落，其中包括各位希腊男神、女神、男英雄、女英雄之间令人感动的爱恨故事。
当代此研究领域的巨匠沃尔特·翁认为，史诗具备许多便于背诵的特点：节奏感强烈，韵律感丰富，反复性高，描绘栩栩如生；以及《伊利亚特》和《奥德赛》中的经典主题——交织着爱、战争、美德和人性的脆弱，这些共同构成了超越时空的传说。举例而言，学者米尔曼·帕里发现，游吟者世世代代都会记忆那些记录了大量事实
与事件的“公式”。这些公式与希腊著名的记忆术都对古希腊人背诵大量材料有很大的帮助。其中一项方法就是将需要背诵的内容与容易唤醒记忆的具体空间联系起来，例如图书馆或神殿的内部构造等。
古希腊人的记忆力有多好，当时的诗人西蒙尼德斯为我们提供了一个典型的例子：某次他和许多人一起欢庆的时候，突然发生了地震，整座建筑瞬间崩塌，事后他能完全记得所有参加者的名字以及他们被瓦砾掩埋的位置。
究竟西蒙尼德斯与其他古希腊人是如何获得这样强大的记忆力的？在最近的4万年间，人类的大脑构造变化不大，因此可推论出现代人与古希腊人的海马杏仁核、额叶及其他记忆功能区差异无几。真正区别两者记忆能力的关键在于古希腊人对口头文化与记忆的高度重视。正如苏格拉底一样，他在反复的对话中考核学生的理解程度。受过教育的希腊人不断练习他们的修辞和朗诵技巧，把包含知识与力量的精辟口才看做学习的最高目标。古希腊人的记忆力可能就是这样练成的。这群拥有不可思议的记忆力的老祖先提醒我们现代人，记忆力并不像之前想象的那样是一种天生的认知过程，而是深受文化的影响。
在这样高度发达的口头文化中，古希腊字母文字一开始的发展并不顺利。今天的某些学者认为，希腊字母文字的出现在很大程度上是因为古希腊人需要保存荷马史诗的口头文化传统。这就意味着，字母文字只是扮演了一个口语附属品的角色。不管怎样，若是古希腊人得知2700多年后的今天，专家学者们仍非常敬畏他们发明字母表的成就，想必也会相当惊讶。这份成就逐渐消磨了古希腊人引以为傲的记忆力与口才，却创造了大脑记忆与认知资源的新形态，直到今天仍然对我们有很大的影响。
希腊的黑暗时代：在这个时期，希腊的口头文学蓬勃发展，出现了《荷马史诗》这样伟大的作品。同时，由于希腊公民非常重视背诵经典，他们的记忆力也好得超出我们的想象。
“借来的”希腊字母
如果古希腊人被问到他们的字母文字从何而来，你可能会得到这样的答案：“借来的。”古希腊人称希腊字母为“腓尼基字母”，这更加巩固了这样一个观点：希腊字母的最直接来源是以辅音为基础的腓尼基文字。而腓尼基文字又源自更早的迦南文字，腓尼基人甚至称他们自己为迦南人。希腊字母的alpha和beta来自于腓尼基字母的aleph和bet，这是另一个表明希腊字母和緋尼基字母有渊源的重要证据。然而近年来，有些学者发现两者间的联系并不是那么直接。关于希腊字母文字的起源问题，在持不同观点的学者之间已经引发了一场悄悄的战争。
德国学者约瑟夫，特罗拍提出了第一个理论：字母文字起源的“标准理论”。该理论认为：希腊字母文字起源于腓尼基文字，而腓尼基文字的前身则是乌加里特文字或原始迦南文字，迦南文字又可以进一步追溯到古埃及文字的小型辅音系统。但是另一位来自德国的专家卡尔一托马斯·佐齐希强烈主张另一种解释：希腊字母并不是腓尼基文字的传承者，而是它的“姐妹”，这两种书写系统共同的祖先是失传已久的闪族文字。佐齐希认为与腓尼基字母相比，希腊字母文字的字母更接近古埃及的草体文字。从这一点出发，再加上其他证据，他总结道：希腊字母文字并不是腓尼基文字的分支，两者地位相同，都源自同一个更早的文字系统，佐齐希形容这一关系就像“姐妹” 一样。
神话是另一种较为微妙的考古资料。为数不少的希腊神话皆提及，字母由底比斯的创立者卡德摩斯（Cadmus，希腊语中称为Kadmos）传入古希腊，他的名字在闪族语中的意思是“东方”。这或许暗示了一些古希腊人知道希腊字母起源于闪族文字。希腊神话中众神将文字赐予凡人卡德摩斯，这个故事的血腥程度与格林童话有得一比。至少其中一个版本的结局是这样的：卡德摩斯将带血的牙齿（象征字母）撒进土壤之中，使其生长和传播。
希腊字母的起源：目前关于希腊字母的起源问题仍无定论，有学者认为希腊人借用了腓尼基的辅音字母系统；另一些学者则认为希腊字母和腓尼基字母共同起源于更早的闪族文字。
正如这些传说中大有深意的牙齿，有许多关于希腊字母起源的戏剧性传说仍未浮出水面。目前，教材的说法比较接近“标准理论”的观点，其内容主要是这样的：公元前800年至公元前750年之间，古希腊人设计出自己的字母，通过贸易往来传播至殖民地，如克里特岛、锡拉、埃尔明亚与罗得岛。为达到此目的，在字母设计过程中，古希腊人系统地分析了希腊语和腓尼基语的每一个音位。然后，以腓尼基文字的辅音系统为基础，他们创造了自己的元音符号。希腊人非常执著，他们使所有已知的发音与字母之间有了完美的对应关系。在此基础之上，希腊字母成为多数印欧字母和文字系统的祖先，其范围从伊特鲁里亚语—直延伸至土耳其语。希腊字母的发展历程给认知科学家与语言学家留下了一系列的谜团，本章的第二个大问题就是谜团之一。
字母文字是否造就了不一样的大脑
当许多人聚集起来时，便总会有人称自己比别人更“优等”，文字亦然。 20世纪的许多学者认为，字母文字代表了书写系统演变的巅峰，也因此得出结论，字母文字的阅读者“想的不一样”。在人类的认知历史中，有3种观点认为字母文字更优越，下面我们就来逐一检验：
@在所有的文字系统中，字母文字的效率最高；
@字母文字最能雌新思想的产生；
@字母文字对语音的重视使阅读学习变得更简单。
观点一：在所有的文字系统中，字母文字的效率最高
效率是指一种书写系统能够被快速阅读、流畅理解的特性。由于字母文字的组成比较“经济”（与九百多个楔形文字或上千个象形文字相比，大多数字母文字仅需要26个字母），因此能够达到极高的效率。符号数量的减少使得快速识别所需要的时间和注意力也相对减少，因此只需要较少的感官、记忆资源。
然而，文字的历史是否最终必然走向字母文字？通过观察大脑的活动，我们可以检验这一观点是否正确。图3-1展示了 3种不同文字的阅读脑，我们可以看到，不同语言激活了大脑的不同区域。
图3-1 三种阅读脑：英文、中文、日文假名
中文阅读者一般需要记忆数千个汉字，才能迅速有效地进行阅读。在图3—1所示现代中文阅读者的脑部影像中我们可以看到，阅读中文时，左右脑的视觉专门区域都需要参与运作。中文阅读者能够流利地阅读，这证明了大脑使用效率的提升并非字母文字阅读者的专利。另一个有力的证据是在音节文字的阅读脑中也发现了类似的情形。综合起来我们可以推断，许多文字系统引起的大脑的调整都可以提髙阅读效率。然而，这并不能说明不同文字系统的大多数阅读者，达到流利阅读程度所耗费的努力是否一样。
与早期素音文字的阅读者相比，字母文学阅读者的大脑中，某些区域的激活程度要低得多。字母文字阅读者更依赖左脑后方的专门区域，这些视觉专门区域只激活了少量的双脑区域。相反，中文阅读者（或苏美尔楔形文字阅读者〉在阅读时，左右脑两侧许多专门化、自动化的处理区域都会被激活，从而提高了阅读效率。
20世纪30年代晚期，3位中国神经科学家发现了一个有趣的双语使用者案例，再次证实了阅读不同文字时，左右脑区域活跃程度的差异：一位原先能流利使用中英文的商人，在一次严重的后脑中风发作后失去了阅读中文的能力，但令人惊讶的是，他仍然能够阅读英文。
如今，这个案例早已不值得大惊小怪了，因为现在的脑成像图告诉我们，在处理不同的书写系统时，大脑会采取不同的组织方式。日文阅读脑便是一个相当有趣的例子，因为日文阅读者需要同时学习两种不同的文字系统：第一种是效率极高的音节（假名）系统，常用于记录外来语、城市名、人名或新名词等；另一种是来自传统中文的表意文字——日文汉字。阅读曰文汉字时，大脑使用的神经通路与阅读中文时相似；但是阅读假名时，大脑则使用近于字母文字阅读脑的神经通路。
换句话说，不仅是中文阅读脑与英文阅读脑之间有所差别，同一个大脑在阅读不同的文字类型时，也会转换不同的神经通路。由于大脑能够神奇地改变其神经联结模式，阅读者可以学会多种语言，且都有可能熟练掌握。此外，大脑的使用效率并非二元的非此即彼的模式。日本的研究者就发现，同样的词语由假名或者日文汉字书写，前者的阅读速度比后者快很多。因此，阅读效率应当被理解为一个连续体，而不是字母文字所特有的。
如果我们能一直观察早期人类学习阅读时大脑神经路径的变化，将会发现某些改变仅限于特定的语言系统，但某些区域则具有惊人的相似性。匹兹堡大学的认知科学家以研究开创性的元分析方法进行了 25项不同语言阅读脑的成像研究，发现了阅读各种书写系统时通过的3 个区域。
第一个区域是枕叶-颞叶区，这一区域包括了学者曾推测是读写功能“神经再利用”的区域，该区域使得我们无论阅读哪种文字，都是熟练的视觉专门化专家；第二个区域是环绕布洛卡区的额叶区域，该区域使我们成为两个层面上的专家，一是识别音位，二是了解词义；第三个区域则是颞叶上下方相连顶叶区的多功能区域。在这个区域我们处理语音及语义的各种元素，这对于阅读字母文字或者音节文字来说格外重要。
匹兹堡大学的认知科学家查尔斯·拍费提以及他的同事把所有的脑成像图排列在一起，发现了一个被珀费提称为“通用阅读系统”的区域。这一系统连接了额叶区、颞叶- 顶叶区以及枕叶区，换句话说，这些区域遍布四大脑叶。
一瞥之下，这些脑成像图可以帮助我们得到两个关于文字进化的结论：一、无论阅读何种文字，皆会对整个大脑的长度和宽度进行重塑；二、不同的书写系统会以不同的形式促进大脑的使用效率，且有多种神经通路可以帮助我们获得流畅的理解能力。许多因素都会影响一个书写系统的效率及其激活的特殊神经通路，如文字中符号的数量、口语的发音结构、文字的规律性、抽象程度、学习文字时的肌肉运动量等。总之，这些因素决定了初级阅读者学习文字时的难易程度。正如阅读音节文字（假名〉效率高于阅读表意文字（日文汉字）一样，儿童在学习希腊文或德文之类规则性较高的字母文字时，效率也会高于学习规则性较低的字母文字，如英文。
哲学家本杰明·沃夫及沃尔特·本杰明曾提出这样的问题：不同的语言是否会以特殊的方式影响阅读者的心智？本书中提到的关于字母文字优越性的三种观点都认为这个问题的答案是肯定的。虽然本书因篇幅有限，关于字母文字或许阐述得不甚周全，但简单来说，正如乔治敦大学的神经科学家吉尼维尔·伊登观察到的：不同的书写系统，在阅读脑的发展过程中，会创建不同的脑神经网络。字母文字阅读脑并非创造了一个“更好的”大脑，只是创造了一个与其他文字系统阅读脑“不同的”大脑，它以自己特有的方式来影响大脑的阅读效率。
具体来说，相较于苏美尔文字和古埃及文字的年轻阅读者，希腊文字年轻阅读者的“不同的脑神经网络”发展得较早也更有效率。但这并不意味着阅读脑的发展效率是字母文字的独有之物。当音节文字能够更好地表示口语时，如日文或切诺基文，音节文字无论是在习得时间还是在脑皮质使用上，都同样高效。不论是字母文字还是音节文字，减少文字符号的数量，都有利于提高脑皮质的使用效率，进而促进学习效率的提升，这成为文字历史进程上的重要转折点。脑皮质使用效率与学习效率除了提升阅读速度之外是否还有其他意义？这就涉及字母文字优越性的第二个观点——促进新思想的产生。
观点二：字母文字最能促进新思想的产生
语言决定论：沃尔夫等人认为，语言决定了经验，有多少种语言就有多少种分析世界的方法，不同的语言塑造了不同的大脑和心智。
古典主义者埃里克，哈夫洛克以及心理学家戴维，奥尔森提出了一个发人深省的假说，他们认为希腊字母的效率，使思考内容发生了无可比拟的变化。当字母文字把人们从口述传统的禁锢中解放出来时，其效率便“促进了新思想的产生”。
试着想象一下这种情形，生活在希腊口头文化中的受教育者，必须完全依靠个人的记忆以及元认知策略来保存集体的知识。但是使用这些令人印象深刻的策略需要付出一定的代价。有时精妙有时肤浅，这些策略依赖于韵律、记忆、公式，并且会约束那些本可以说出的、记住的或者创造出的内容。
希腊字母文字及其他文字系统打破了上述种种限制，拓展了许多人思想与书写的界限。但这是字母文字特有的贡献吗？还是说所有的书写行为都能提升大多数人的思想境界？如果回顾一下比古希腊文字还要早约1000年的乌加里特文字系统，我们就会发现一个很好的例证，那就是乌加里特的类字母文字系统一样能对文化起到很大的促进作用。如果我们仔细研究哈夫洛克未曾研究过的更早时期的阿卡德文字，我们同样能发现，这套语素音节文字系统记载的思想同样闪烁着智慧的光芒（其中一些是基于口述传统的）。
从宏观的角度来看文字的整体历史，人类智力的发展进步并不取决于第一套字母文字的诞生或是某个字母表的完美迭代，而是取决于文字本身。如20世纪俄国心理学家列夫·维果茨基所言，在将能说的口语与不能说出的思想转化为文字的过程中，不仅表达了思想，更改变了思想。当人类学会以文字更加精确地表达思想时，抽象及创新的思维能力也随之提升。
实际上每个儿童在学习阅读他人思想及写下自己的思想过程中，都会在文字与新想法之间建立起新的、之前完全没有想象过的联系。这种生成性的联系在众多古代的早期文字中也有突出的表现，如古埃及人通过巴比伦人的《悲观主义的对话》来指引来世，以及柏拉图《对话录》中的深邃思想。但是在这一段文字发展史上，希腊字母文字确实是文字和思想之间创造力的最佳例证。
因此，从认知学的角度来看，字母文字并不是唯一对促进新思想有贡献的书写系统，但是字母文字或音节文字这两种系统提高了阅读效率，使新思想有可能被更多人提出，对那些处于早期发展阶段的初级阅读者而言更是如此。由此揭开了人类智力发展史上革命性的篇章阅读能力的民主化。在此广阔的背景下，我们就不会讶异，为什么随着希腊字母的传播，历史上会出现一个在写作、艺术、哲学、戏剧、科学等许多方面都诞生了众多深刻作品的时代。
观点三：字母文字对语音的重视使阅读学习变得更简单
古希腊的字母文字确实不同于先前的文字系统，因为其中融人了成熟的语言学观点。古希腊人发现，口语的语音是可以作为一个整体来分析的，而且可以系统地切分为单个的语音单位。这种理解力不是任何时代的任何人都会有的。但希腊人是口语文化最大的支持者，由他们来发现语音的潜在结构和组成，是再正常不过的了。
想要理解希腊人在分析语音方面的成就有多么伟大，只需姜看一下美国国防部的一个故事。
现代对语音知觉研究的历史开始于第二次世界大战期间，当时的信息传播条件是非常具有挑战性的，因此人们必须全力研究语音的组成。整场战役的胜负可能就取决于一位军官是否能在炮声隆隆的战壕里听清楚一条至关重要的消息。贝尔实验室的科学家尝试建造出一台可以分析所谓的”语音讯号”并最终能够合成人类语音的机器，这项研究可是高度的军事机密。这些科学家们利用“声谱仪”改装后的一种新仪器，来观察语音中许多重要组成部分的视觉化形式：某段信号中语音频率的分布、某段信号的时间长度以及某个既定信号的声音振幅。每种语言中每个发音的特征都是由这三个属性体现出来的。
随着现代研究人员逐渐“看到”人类语音各方面的特征，极度错综复杂的语音也渐渐得以视觉化。举个简单的例子，语音学家格雷丝·耶尼-科姆昔安研究指出，我们说话的频率是每分钟说出 125至180个连续的词，而且每个词的头尾都没有听觉线索（试着想象一下，一种陌生的外国语言在你听来，只是一系列连续但无法理解的声音）。在我们使用的语言中，我们都知道如何根据字句的意义、语法角色、词法单位，以及音律、重音和语调来区别语音单位。然而，这些信息对于辨别第一个词的结束与第二个词的开始只能提供极为有限的帮助。因为音位与音位之间相互重叠，前后影响，所有的音位都通过这种协同发音的方式粘到一起。耶尼-科姆昔安曾这样描述：对于语音知觉研究者来说，一个极大的挑战就是确定单个语音是如何从复杂的语音信号中切分出来，并被恰当识别的。
协同发音：指发音时在声道中的两个 (或偶遇多个）不同的部位形成阻碍。这两个阻碍可能同样是完全阻塞，或其中一个阻碍程度较轻。
希腊字母文字的发明者把这两个问题都解决了。首先，如本书中所述，他们系统地分析了腓尼基语的每一个音位，以及这些音位与腓尼基字母之间的对应关系。其后他们利用同样的方式分析了希腊语的语音。接着再以腓尼基文的字母为基础，最终给希腊语中的每个音位都配上了一个希腊字母，这样使得元音有了新的字母。举个例子，表示元音a的希腊字母alpha是由腓尼基文字中的aleph—词变化而来，该词的原义为“公牛”。
在一次有趣的语言改革中，希腊人改变了一些符号以便匹配某些地区方言的特征。因此在希腊的不同城市，希腊文字会出现细微差异。这种改变文字中的字母以符合方言的做法，体现了语言的实用主义和语音方面的精妙知识。就算是今天法兰西学院的学者也未必会有如此的胆识。只有在完全理解所有语音令人震惊的复杂程度之后，我们才能真正体会并欣赏希腊人的成就。如果说苏美尔人是最早的语言学家，梵语学者是最早的语法学家，那么希腊人就是最早的语音学家。
希腊字母的发明者有意识地系统地分析语音，这项伟大的创举在今日每个学习阅读的儿童的身上都会无意识地发生。古希腊的莘莘学子拥有一套近乎完美的字母文字，里面包含着近乎完美的字母与音位的对应关系。这使得他们比苏美尔人、阿卡德人或古埃及人都能更快地获得流畅读写的能力。有人甚至提出这样的问题：是不是正因为古希腊人能较早达到这样的阅读流畅程度，其思想才得以蓬勃发展，并开创了古典希腊文化的盛世？
这是个目前无法回答的问题，但具有讽刺意味的是，古希腊人数百年来对于教授希腊文字都持有矛盾心态。在创造出革命性的字母文字之后不久，古希腊国内主要的反响却是持续长达400年之久的抨击声。与古埃及人和阿卡德人完全不同，受过教育的希腊人认为他们高度发达的口语文化要比文字文化更优越。
历史上的苏格拉底是口语文化最具雄辩力的捍卫者，和对文字文化最强烈的质疑者。在理解古希腊人对字母文字发明的矛盾态度前，我们首先必须要问：为什么世界上最杰出的思想家和新思维的提出者要反对使用字母文字？现在我们必须把焦点转向古希腊的口语文化与文字文化之间那场没有硝烟的战争。柏拉图小心翼翼地记载下苏格拉底反对读写能力的那些令人震惊的观点，这些论断告诉我们为什么今天的人们还应该听从古代哲人们的建议。
苏格拉底的抗议
苏格拉底自己完全不动笔，若我们采信柏拉图《对话录》中所记录的原因，这是因为他相信书本会造成积极判断思考的短路，造就出仅拥有“虚妄智慧”的学子。——玛莎·努斯鲍姆
不消多说，是因为亚里士多德，希腊社会才从“头口传授” 转型为“习惯阅读”。——弗雷德里克，凯尼恩爵士
他的生活和衣着都很简朴，他形容自己是“在那头尊贵而懒惰、名为 ‘希腊’的马背上尽情吸吮的虻”。有着睿智的眼神、凸出的额头与脱俗的外表，他站在中庭，四周围绕着学生，他们激烈地讨论着抽象之美、知识以及“审视生命”的深刻意义。他讲话时就像拥有一种神奇的力量，规劝着雅典的年轻人为追求“真理”而奉献终生。他就是我们都熟知的苏格拉底，那个哲学家、老师，同时也是雅典市民。
在编写早期阅读脑的历史时，我意外地发现苏格拉底在两千多年前提出的对读写能力的质疑，说中了众多21世纪我们所关心话题的要害。我开始理解苏格拉底为什么会担心文化传承由“口耳相传”转化为“诗书继世” 可能会造成危害，尤其是在年轻人身上，因为这正如我们担心自己的孩子沉溺在数字化世界中一样。正如当时的古希腊人处于重要的转型期，如今我们正处于从“文字文化”向“数字文化”和“视觉文化”转变的时代。
我把公元前5世纪至公元前4世纪苏格拉底及柏拉图讲学的时代当做一个窗口，由此来观察与我们不同但差异不大的希腊文化如何在不确定的情况下，从一种主要的传播方式向另一种新的方式转型。没有几位思想家能像苏格拉底这只“牛虻”一样，帮助我们审视口语及书面语言在21世纪的地位。
希腊三杰”对文字的态度：苏格拉底强烈反对人们未经指导就随意使用文字；柏拉图则用文字把老师的话忠实地记录了下来；而年轻的亚里士多德早就养成了阅读的习惯。
苏格拉底强烈地谴责对书面语言不加控制的传播；而柏拉图则以正反并立的矛盾态度，用文字记录下了这段书写史上最重要的对话；至于年轻的亚里士多德，则早就沉迷于“阅读习惯”之中。这三人组成了世界上最著名的学术王朝之一，苏格拉底是柏拉图的导师，而柏拉图又是亚里士多德的导师。不过大多数人不知道这一点：如果柏拉图的《对话录》对苏格拉底本人的历史记载无误，那么苏格拉底师承自狄奥提玛，她是一位来自曼尼提亚的女哲学家，向来以对话的方式教导学生。
苏格拉底与其学生的对话因柏拉图的文字而永垂不朽，这些对话展示出了苏格拉底心中的理想，他认为所有的雅典市民都应为达成人类自我成长而努力。在这些对话中，所有的学生都意识到：只有通过斟酌过的语句及分析过的思想才能达到真正的美德，而唯有真正的美德才能塑造一个公正的社会，才能使人民更接近他们的神。换句话说，美德，不论是个人的还是社会的，都要基于对已有知识的彻底审视，并吸收内化其中的最高原则。
这种高强度的学习模式迥异于大多数早期希腊的传统教学模式。在早期的教学模式中，个体要接受集体的智慧，例如《荷马史诗》。而苏格拉底则教导学生质疑言谈中的话语及概念，并了解其背后隐含的信念和观点。苏格拉底要求学生质疑所有的事物，《荷马史诗》中的章节、政治议题，甚至每一个字，直至原文的本质变得逐渐清晰。学习的一贯目标在于了解这些文本如何反映，或为何不能反映出社会的深层价值，而对话中的问题与答案则是教导的载体。
苏格拉底的教学被认为是蛊惑青年，他因此受到审判。当时有500个雅典市民宣称他罪可当死，部分人则谴责他不信神。对于苏格拉底而言，反对者的这些控诉仅仅是为了掩饰他们的政治动机，其一是惩罚他成立了一个似乎会危害国家的友谊圈；其二是制止他质疑已被广泛接受的智慧。苏格拉底一生致力于“以全部的智慧”审视我们的言语、行动和思想。虽然他被毒死，但他的精神是不朽的。他的训诫没有随时间消逝，而是在我们的耳边回响了数个世纪。以下是他接受审判时的一段著名的申辩：
如果我告诉你们，对一个人来说，最好的事情就是每天探讨美德的问题，每天审视自己和他人的内心；如果我说未经检视的生命是不值得活的，你们可能更不相信我所说的。但是各位，我所说的话，确为事实，只是我很难让你们相信。
当审视书面文字时，苏格拉底的立场颇令人意外：他深信书面文字会对社会造成严重危害。他的三个顾虑看似简单，实则不然。随着现在信息获取方式的转变，我们的智力也在发生改变，我们有必要理解他反对的理由，并推敲其中的本质意义。首先，苏格拉底认为口语和书面文字在个人的智慧生活中扮演不同的角色；其次，他认为书面文字对智力的新要求较不严谨，对于记忆以及知识的内化吸收具有毁灭性的影响；最后，苏格拉底极力推崇口语在社会的道德和美德发展中扮演的独一无二的角色。在每一条理由中，苏格拉底都认为口语优于书面文字，他的理由至今仍值得我们深刻关注。
苏格拉底的反对理由之一：书写文字缺乏弹性
文字的道理，在于通晓文字与热爱文字，这是一条通往事物和认知本质的道路。 ——约翰·邓恩
在电影《寒窗恋》中，哈佛法学院教授查尔斯·金斯弗尔德每天以质问的方式威吓他的年轻学生，要求他们无论说什么都要用法律案例来证明自己的论述。在第一场教室场景里，金斯弗尔德宣布： “我们在这里采用苏格拉底的方法……回答，提问，回答。通过我提出的问题，你将学会自我教育……有时你可能会认为你有最终的答案，但是我可以肯定那是错觉。因为在我的教室里，永远都会有另一个问题在等着你，我们在这里做的是脑部外科手术，我提出的所有小问题都是在探索你的大脑。”
金斯弗尔德这个虚构的形象不仅是现代苏格拉底教学法的体现，也是一个运行良好的阅读脑的体现。现在许多教师和教授在教室里讲课时，仍然延用这样的方式，让学生在每一次讨论中分析彼此的假设和智力基础。这场戏再现了雅典学院的提问场景。金斯弗尔德教授要求学生理解法律案例，如此才能以法律来维护社会正义。苏格拉底则希望他的学生知道字词、事物及思想的本质，这样才能培养出美德。
苏格拉底的教学方法以一种特别的视角来看待语言，他认为语言是丰富的、有生命力的，经过指导，可以用来追求真善美。苏格拉底相信，不同于“死气沉沉”的书面语言，口语，或者说“活的语言”代表着动态的实体，到处都充满着意义、声音、旋律、重音、语调和节奏，时刻准备着在审视和对话中被一层层揭开。相比之下，写下的文字不会回应它的阅读者。文字的沉默破坏了苏格拉底视为教育核心的互动式对话。
苏格拉底教学法：“通过讨论而探索。”不给学生现成的答案，而是用反问和反驳的方法使学生在不知不觉中接受其思想。
大概没有几位学者比列夫，维果茨基更认同苏格拉底对生动的演讲及对话的价值的重视。在维果茨基的经典著作《思维与语言》这本书里，他描述了文字与思想、老师与学生之间的启发性的关系。同苏格拉底一样，维果茨基认为在儿童发展字词与概念对应关系的问题上，社会交往扮演着极其重要的角色。
但是维果茨基和当代的语言学者并不认同苏格拉底关于书面文字的狭隘观点。在维果茨基短暂的生涯中，他观察到写作可以引导每个人精炼思想，并从中发现新的思维方法。在这个意义上，写作过程的确可以在一个人的身上再现苏格拉底同斐德罗的对话。换言之，写作之人需要在其内心的对话中，找到更精确的文字来捕捉思想。每个试图表达出自己思想的人都有这样的经验：在写作的过程中，慢慢地可以观察到我们思想的改变。苏格拉底从不曾体验到文字所具有的对话能力，因为在他的年代，写作还处于萌芽阶段。如果他能晚一个世代出生，可能会对文字宽容许多。
数百个世代之后，我很好奇若是苏格拉底还在世，他会对21世纪人与人之间的交流方式有何反应。现在有许多不同的方式能达到他所谓的“回应”，如人们互相发短信，互发电子邮件，使用可以朗读、识别并翻译多国语言的机器。对苏格拉底和今天的我们来说，本质的问题在于：这些有 “回应”的交流方式是否能培养出真正具有批判性的思想。
苏格拉底更关心的是书面文字会让人误以为它们就是真理。文字看似无法看透的特性掩盖了其虚幻的本质。因为它们“看起来似乎……具有智慧”，所以它更接近事物的本质。苏格拉底担心这种假象会让人们在刚开始了解一件事物之时，就误以为自己已经完全了解了它。这会导致骄傲自负、一无所获。
在这种担忧之下，现在数以千计的老师和父母，与苏格拉底和金斯弗尔德教授具有同样的心境，他们看着年轻人每天花费很多时间在电脑前接收大量信息，却未必能理解所有信息。苏格拉底一定无法想象这种没有经过深思熟虑的学习方式，对他而言，教育的真正目的是追寻真理、智慧和美德。
苏格拉底反对理由之二：记忆力的毁坏
在今天的危地马拉，玛雅人这样评价外来者的行为：他们做笔记不是为了记住什么，而是为了可以不必记住。 ——尼古拉斯·奥斯特
如果人们学会了写字，他们的灵魂会变得健忘。他们不再会训练自己的记忆力，因为他们依赖文字来记住某件事，对事物的记忆不再来自内心而是来自外在的记录。事实上，你所发现的不是记忆的秘诀，而只是提醒的技巧。 ——斐德罗
苏格拉底认为，文字与口语在教育、哲学、描述事实、精炼思想以及追求美德方面的差距还不是最严重的，最严重的是，文字会损害个体的记忆力，影响知识的内化吸收。苏格拉底清楚地知道读写能力通过降低对个人记忆的需求，将极大地提高文化的集体记忆，但是他无法接受以降低个体的记忆力作为代价。
古希腊人非常崇拜记忆力：古希腊人认为记忆力是女神妮莫辛的化身。妮莫辛是所有女神中漂亮的一个，宙斯跟她待在一起的时间最长。在希腊人看来，将活力（宙斯）注入记忆（妮莫辛）就会产生创造力和智慧。现在我们用来称呼记忆法的专用术语”记忆术”，即由女神妮莫辛的名字演变而来。
受教育的古希腊年轻公民们利用超强的记忆力，反思、检查了大量的口传资料，不仅保存了社会现有的文化记忆，同时也增进了个人及社会整体的知识。苏格拉底与当年审判他的法官是不同的，他重视整个教育系统，而不担心“保存传统”的问题，他相信唯有通过勤勉的记忆过程，个体的知识才得以巩固；唯有通过与老师的对话，个体的知识才得以进一步精炼。
在这个语言、记忆与知识相互作用的观点下，苏格拉底认为文字非但不是记忆的“秘诀”，反而是摧毁记忆的潜在威胁。文字在保存文化的集体记忆上的优势是毋庸置疑的，但更为重要的是，个体记忆的保存，以及个体记忆在知识的反思与实践方面所起到的作用。
如今大多数人把记忆力看做从幼儿园到大学整个教育过程中的必备条件，但是与古希腊人相比，甚至和我们的祖辈相比，我们需要背诵的知识越来越少。有一年我问我的学生们这样一个问题：有多少诗是你们可以 “铭记在心”的？ 10年前的学生大约可以背诵5到10首，最近的学生只能背诵1到3首。这个简单的调查不禁让我重新思考苏格拉底那些看起来过时的论点。需要背诵的知识逐渐减少，诗歌，甚至是乘法口诀都不再需要完整地记忆，这对我们的下一代意味着什么？当停电、电脑死机或火箭系统出现故障时，我们的孩子又该怎么办？我们的孩子和古希腊的孩子在连接语言与长期记忆的大脑神经通路方面，又会有什么区别？
显然，我孩子的祖母，86岁高龄、犹太血统的洛蒂·诺姆肯定会令未来的孩子们感到震惊。在任何场合，她都能够给孙子们背出应景的里尔克三段诗、歌德的诗句，甚至是带点颜色的打油诗，这给他们带来无穷的乐趣。有一次，我满怀羡慕之情地问她是如何记忆这么多的诗篇以及笑话的。她回答得很简单：“我总是希望能拥有一些即使进了集中营，别人也无法夺走的东西。”洛蒂的话促使我们停下来思索，日常生活中“记忆”占有什么样的地位？随着世代更替，记忆又将蕴含什么样的终极意义？
关于苏格拉底对逐渐消失的个人记忆的态度，有一个生动的故事：有一次，他抓到学生斐德罗在背诵利西阿斯的演讲词时偷看小抄，这可能是人类有史以来的第一张小抄。为了帮助记忆，斐德罗把演讲的内容记录下来，并且把小抄折叠起来放在长袍里。猜到了学生的所作所为，苏格拉底开始批评文字的本质及其在教育上造成的反面效果。
苏格拉底将文字比喻成一幅美丽的绘画，仅仅是“逼真”而已，“如果你问它任何问题，它依旧保持庄严的沉默。它看似充满智慧，能告诉你许多事情，但当你因求知欲提问时，它也只能告诉你一成不变的答案，总是如此”。
不过，让苏格拉底生气的学生也不只是斐德罗一人而已，《普罗塔哥拉》中记载，苏格拉底严厉抨击某些人的思维“像莎草纸般僵硬，既不会回答问题，也不会提出问题”。
苏格拉底反对理由之三：语言的失控
其实，苏格拉底最深的恐惧并非阅读本身，而是知识泛滥及不求甚解的学习态度，也就是“浅尝辄止”。不受教师指导的阅读常会于无形中导致难以矫正的知I只失控。正如苏格拉底所言：“一旦某件事付诸文字、写成文章，不论以何种形式传播，它不仅会流人理解的人手中，也会流入无知者手中。文字并不会选择对象，也分不清对错。因此当它遭到误解或者滥用时，便再也无人替它阐释或辩驳了。”
在苏格拉底随处可见的幽默与经验丰富的嘲讽之中，隐含着深深的忧虑，那就是缺乏学校教育或社会教育的文字，将引发知识的危险性。在他看来，阅读犹如新版的潘多拉之盒^文字一旦传播，对于什么该写、谁来阅读以及阅读者该如何阐释文字，将会出现无人负责的情形。
知识越多，疑问越多，这个规律贯穿着人类的历史一从“知识树上的果实”到现在的搜索引擎。苏格拉底的担忧在今天显得更为严重，因为每个拥有电脑的人都可以随时随地以无人指导的方式在电脑屏幕上迅速地获取各种知识。
这个集“即时”、“虚拟现实”、“近乎无限”于一身的信息时代，是否将给备受苏格拉底、柏拉图及亚里士多德推崇的知识与道德带来极大的威胁？电脑屏幕上涌现的肤浅信息会淹没我们的好奇心还是引发我们对更深刻的知识的求知欲？持续的部分注意力及多重任务的处理能力是否能引起我们对文字、思想、现实及道德的深刻反思？文字、事物与概念的重要本质能否通过32位操作系统来学习？被这些过于真实的影像惯坏了的孩子，仍能脚踏实地吗？当我们面对图片、电影或所谓的电视真人秀时，我们是否更加自以为是地认为自己已经了解了真相？苏格拉底如果身处今日，在电影上看到带有自己风格的对话场景，进入维基百科查到关于自己的条目，他将做出何种反应？
对于我们获取信息的方式，苏格拉底会持何种观点？这个问题每天都困扰着我，尤其是当我看到两个儿子用网络完成功课，并告诉我他们“完全懂了”的时候，我体会到很久以前苏格拉底对抗文字的那种无力感。我不得不思考，目前的失控局面，正如苏格拉底在2500年前担心的那样。在这种情况下，下一代将学到什么、如何学习、学到什么程度呢？不过这种变化的好处也是显而易见的，柏拉图正是以文字保存了苏格拉底反对文字的观点。
综上所述，苏格拉底最终还是输了这场反对文字普及的战争，因为他没有看到文字的全部能力，也因为新的沟通方式及知识形态的出现是无法逆转的。苏格拉底不能阻止阅读的普遍化，我们也不能拒绝接受日益先进的科技。我们对知识的追求更加确定这是必要的。不过，思考苏格拉底的反对理由和探索大脑与阅读的动态关系同样重要。其实，正如柏拉图所意识到的，苏格拉底真正的敌人并不是文字。他所反对的是”丧失检视语言的能力”，以及”没有使用我们所有的智慧”去使用语言。
在这一点上，即使是在他那个时代，苏格拉底也不是孤独的。公元前 5世纪，世界另一端的印度梵文学者同样贬低文字，认为口语才是真正促进智力与灵性成长的载体。这些学者质疑并批评任何对文字的依赖，认为文字将破坏他们毕生的工作一对语言的分析研究。
下一章将讨论“最年轻的人类群体”是如何发展语言及阅读能力的。当我们帮助下一代或之后的子子孙孙学习文字与追求知识与道德时，我希望苏格拉底的提醒犹在：别忘了检视它们对生活的真正意义。
第二部分阅读如何改变了我们的思维：阅读脑的发展
在诸多人类凭借自己的精神、而非与生俱来的天赋所创造的世界中，书本世界是最了不起的一处。当孩子开始在他们的小黑板上涂写、识字时，他们就此进入了一个错综复杂的人造世界，没有人的生命长到足以完全了解、完美运用这世界运行的法则。没有文字，没有书写，没有书本，就没有历史，也就不可能产生人之所以为人的观念。 ——赫尔曼·黑塞
第四章阅读决定孩子拥有怎样的思维与人生
当世上第一个婴孩发笑时，笑声碎裂成上千片，这就是童话故事的开始。 ——《彼得·潘》
在我看来，打从两岁起，每个孩子就都成了语言天才，不过这个时期很短。到六岁时，这样的才能逐渐消退。到了八岁，完全看不出来他们曾有过的文字创意，这是因为他们不再有这样的需求。 ——科涅·丘可夫斯基
我脑中常常浮现出一个画面：一个小孩坐在疼爱他的大人的腿上，全神贯注地聆听着从大人口中流溢出的一字一句，讲述在此之前他想都不曾想过的远方的精灵、神龙和巨人的故事。幼儿的大脑开始准备阅读的时间比我们想象的要早很多，童年初期所接触的一切材料，每一个感知、概念与文字几乎都会为他们所用。儿童会学习使用那些构成大脑常规阅读系统的所有重要结构，接着将他们的所见所闻与书面语言结合起来，后者是人类经过一次又一次的突破，在过去近两千年的历史中，才逐渐学会的。而这一切都始于长辈温暖的臂弯和舒适的怀抱。
数十年来的研究显示：一个儿童聆听父母或其他亲人阅读的时间长短，与他数年后的阅读水平有很大关系。为什么？再仔细回想一下刚才所描述的情景：一个孩子坐在妈妈的怀里，看着彩色图案，听着古老传说与新奇故事，渐渐地认识书中构成字母的线条、构成文字的字母以及组成故事的文字，而且故事可以一遍一遍地阅读。这种很久以前就存在的场景，蕴藏着对儿童阅读发展至关重要的众多前提条件。
儿童一开始究竟是如何学习阅读的？聆听充满魔法与精灵的传说？还是错失听故事的机会？这两种情形代表着两种截然不同的童年：第一种童年是大家所衷心期盼的，我们的每一个愿望在故事里都会成真；而第二种情况，儿童没有听到多少传说与故事，没有学会多少语言，在还未开始阅读之前，这些儿童就已远远地落后了。
从听故事到读儿歌
对婴儿的研究显示，亲人的抚摸对他们的发育起着至关重要的作用，阅读发展的道理与此类似。只要婴儿可以坐在抚养者的腿上，就能将读书和被宠爱的感觉联系起来。在《三个奶爸一个娃》这部搞笑而温馨的电影中，汤姆·塞莱克念赛狗的结果给婴儿听，大家都责骂他毒害孩子，但实际上他歪打正着。不论是赛狗的结果、股市行情还是陀思妥耶夫斯基，你都可以念给8个月大的婴儿听，如从此爱上文字。儿童有机会果是彩图版的效果就更好了。
启蒙阅读：把婴儿抱在怀里给他读故事，他会把阅读过程和被爱的感觉联系起来，从此爱上文字。儿童有机会在故事里体验、揣摩各种情绪，学会理解别人，变得细腻敏感。
试想一下，为什么许许多多的儿童夜复一夜地求着父母念玛格丽特-怀斯.布朗的《月亮晚安》给他们听？是因为故事插画里有小夜灯、连指手套、一碗热乎乎的粥和摇椅这些属于童年世界的东西？是因为找到每一页隐藏在不同地方的小老鼠而带来的成就感？还是因为朗读者随着一页页的阅读而变得更加温柔的声音？这一切都为儿童长期的阅读学习过程提供了理想的开始，因此有些研究者称此为自发的或早期的读写能力。聆听文字与感受被爱之间的联系，为以后长远的学习历程奠定了最佳基础。没有一个认知科学家或教育研究者可以设计出比这个更好的方案。
重要的文字游戏
这个过程的下一步涉及对图案的进一步理解。当儿童能够认出书本中的插图，就意味着这些书很快会被翻破。这个现象的背后暗藏着一套婴儿在6个月大就发育完备的视觉系统、一套离成熟还很遥远的注意力系统，以及每一天都在跳跃性成长的概念系统。随着时间一天天地过去，婴儿的注意力与日俱增，对熟悉图案的理解与对新事物的好奇心也不断提升。
儿童理解力与注意力的增长为阅读提供了最重要的前提条件——早期的语言发展，领悟到小马、小狗这些东西都有一个名称。每个儿童的童年一定都经历过与海伦，凯勒一样的认识水的过程，她通过触觉来感知水，第一次明白了这种东西是有名字的，而这个名字是她通过符号语言与所有人交流的一个标签。正如编撰《梨倶吠陀》的古代作家所认识的那样：智者建立了命名系统，此乃语言的第一原则。
对于成人来说，拋弃习以为常的概念，去理解“婴孩不知道这世上的每样东西都有一个名字”，恐怕并不容易。渐渐地，儿童学会给他们世界里最突显的部分安上标签，通常是从照顾他们的人开始。不过通常要到18个月大时，他们才能意识到每样东西都有一个名称。虽然很少有人注意这一点，但是这可是个体生命前两年中了不起的突破之一。
婴儿能发展出这种能力，有赖于大脑连接两个以上系统的能力，如此才得以判定新事物。婴儿顿悟的潜在基础是婴儿大脑能够联系、整合来自于视觉、认知与语言等几个系统的信息。当代儿童语言学家琼·伯科·格利森(Jean Berko Geason)强调:不论是亲人、小猫还是小象巴巴尔，婴儿每学会一个名字，大脑就会有一次重大的认知转变，开始将发展中的口语系统与逐渐成形的概念系统联系起来。
儿童开始知道事物有名称后，书本内容的重要性便显现出来，因为这时儿童可以决定读什么。这里有一个重要的动态发展:对儿童说的话越多，他们对口语的了解也会越多;为儿童读的书越多，他们对周围语言的理解就越深，词汇量也会越大。
童年初期这段将口语、认知与文字交织发展的时期是语言发展最为丰富的一个阶段。哈佛的认知学家苏珊·凯里(Susan Carey)研究儿童学习认字的过程，她戏称这是“快速制图”(zap mapping)。她发现大多数在2~5岁之间的儿童平均每天可以学会2~4个新字，在童年早期的这个阶段中可以学会上千个字。这正是俄国学者科涅·丘可夫斯基所谓的“语言天才”。
语言天才来自于口语中的诸多元素，这些元素日后将融入文字的发展。随着语音能力的发展，儿童渐渐地能够听出、辨别、切分甚至操作文字中的音位，这些为他们明白文字是由声音组成的这一至关重要的事实铺平了道路。举个例子，cat一词是由三个不同的字音(/k//a//)组成的。语义的发展是指儿童词汇量的增加，这使得他们不断增进对文字意义的理解，是整个语言发展的主动力。语法的发展是指儿童理解并使用语言的语法关系，这为他们逐渐理解书本语言中复杂的句型打下基础。例如，这使得孩子懂得词语顺序会影响句子的意思:如“猫咬老鼠”和“老鼠咬猫”的意思是不同的。
词法的发展则是理解与使用最小的意义单位(如cats中表示复数的s与walked 中表示时态的ed)，这有助于理解故事与句子中不同词性与词法功能的词汇。最后是语用(pragmatics)的发展，儿童在自然的语境中认识并使用语言的社会文化“规则”，还可以帮助他们日后理解文字如何运用在书中描绘的无数种不同语境中。
口语发展的每一个方面，对于儿童的语言发展–对词句的理解以及在口头和书面语言中遣词造句，都做出了必不可少的贡献。
快乐、悲伤与友情
然而，上述这些语言能力都不是凭空出现的。这一切都基于儿童大脑的发育和概念性知识的积累，其中贡献最大的是儿童的情绪以及理解他人的能力的发展。儿童成长的环境决定了这些因素不是得到培养就是受到忽视。
举个现实中的例子:假设有个三岁半的小女孩，具备了所有应该具备的语言天赋，经常有人抱着她，读书给她听。她已经明白哪些图片是出现在哪些故事里的，也能感受到故事通过文字想要传达出的感情，有快乐有恐惧，也有悲伤。通过这些故事与书本，她开始学习一整套的情绪。对她而言，故事与书本都是体验这些情绪最安全的地方，因此对她阅读的发展有着潜在的贡献。儿童的情绪发展和阅读之间是相互促进的关系。儿童通过阅读来探索新的情绪这种体验也为接下来理解更复杂的内容做好准备。
童年时光为人类提供了学习社交、情绪与认知技巧最重要的基础，即了解他人观点的能力。对3~5岁的儿童来说，理解他人的感觉并不是一件容易的事情。20世纪最著名的儿童心理学家让·皮亚杰(Jean Piaget)曾表示:这一时期的儿童是以自我为中心的，意思是由于这段时期智力发展的限制，他们是以自己为中心来理解整个世界的。正是他们日益增进的“理感受。
阿诺德·洛贝尔(Arnold Lobel)的〈青蛙与蟾蜍》( Frog and Toad)童书系列中，便有一个这样的例子。在一则故事中，青蛙病得很重，蟾蜍想都不想便赶去营救他，这完全是出于同情心。蜍每天喂青蛙吃东西，照料他的起居，一直到他可以起床玩要为止。这个小故事提供给孩子们一个意义深远的范本，让他们知道了解别人的感受是什么意思，以及这如何成为互助的基础。
在另一本以河马为主角的故事书中也传达了人类似的概念，教导孩子们何谓共情。在詹姆斯·马歇尔(James Marshall)著名的系列书籍《乔治和玛莎》( George and Martha)中，有两只可爱的河马，他们是最好的朋友。在每一个故事里，他们都教导孩子如何做一个很好的、能够理解他人的朋友。其中有这样一个故事:有一天乔治被绊倒了，摔掉了他的两颗大门牙。门牙对河马来说非常重要，在换成金牙以后，他都不敢给玛莎看，但是善解人意的玛莎对他说:“乔治你帅呆了，你的新牙齿让你看起来与众不同!”乔治立刻就高兴起来了。
许多小朋友在听这些故事时，会体验到故事传达的想法与感受，这些故事起了很好的示范作用。也许我们永远都不会坐在热气球中飞翔，不会在赛跑中跑赢兔子，或是和王子跳舞直到午夜钟响，但在故事书里，我们可以体验到那样的感受。在这个过程中，我们不断走出自我，开始理解“他而这正是普鲁斯特所谓的沟通的中心在于文字。
书本语言教会了我们什么?
我们开始意识到我们和他人的感受之间是有联结的，同时也能区分这当中的界线。大约就是在这个时候，我们更强烈地意识到了另一件事:书本上充满了长短不一的文字，每次念到时声音都相同，就跟图片一样。这种智能上的发展只是整个大发现的一部分，我们渐渐认识到，书本拥有一套自己的语言。
“书本语言”这一概念很少在儿童的脑海里出现，我们自己也很少会考虑到。事实上，这套语言具备一些独特且重要的概念特征和语言特征，它对认知的发展可谓贡献良多。首先，最明显的是，一些书中特有的词汇不会出现在口语中。回想一下那些你喜欢的传说故事，开头通常是这样的:
很久很久以前，在一个黑黑的、孤独的、永远看不到阳光的地方，住着一个小精灵，由于皮肤从来没有受到阳光的洗礼，所以脸颊消瘦面色苍白。在山谷的另一边，阳光在每一朵鲜花上舞蹈，那里住着一位少女，有着玫瑰花瓣一样的脸颊，金色丝绸般的头发在阳光下闪闪发光。
没有人会这样讲话,至少我从来没有遇到过这样的人。“很久很久以前这样的语句，或是“小精灵”这种字眼，也不会出现在一般对话中。这些都是书本语言，给孩子们提供线索，帮助他们猜测这是哪种类型的故事以及可能发生的事情。实际上，到了幼儿园阶段，多数5岁左右的儿童的主要词汇来源是书中的文字，他们那时储备了10000左右的词汇量。
在这成千上万的单词当中，有相当大比例的词形是由已知的词根变化而来的。举个例子，认识sail这个词根的孩子，很快就能了解并学会这个词的各种相关形式:sails，sailed，sailing，sailboat等。
不过词汇的增长并不是故事与书本语言唯一的贡献。同样重要的还有日常对话中并不经常出现的语法结构。“永远看不到阳光”和“由于(for皮肤从来没有受到阳光的洗礼”这样的句子结构一般仅见于书本中，理解这些需要更多的认知灵活度与猜测能力。5岁以下的孩子很少听到for出现在这样的句子里，for在这句话里是连接词，意思是“由于”，和then、because 这类表现因果关系的词一样。孩子可以从故事的前后文中学到for这样的用法。当孩子学会类似的词汇用法以后，他们的语法、语义、词法与语用各层面的能力都会得到全面的发展。
阅读研究者维多利亚·珀赛尔-盖茨(Victoria Purcell-Gates)的研究更加凸显出给孩子讲故事的深刻意义。珀赛尔-盖茨比较了两组还不会阅读的5岁儿童，他们的家庭经济背景、父母教育程度都相似，只是一组在过去两年内经常有故事可以听(每周至少5次)，我们暂且称之为“听故事组”;另一组则是没有故事听的对照组。珀赛尔-盖茨只要求这两组儿童做两件事情:首先讲一个关于自己的故事，比如过生日的情况，然后假装给洋娃娃念故事。
结果两组的差异很明显:与对照组儿童相比，“听故事组”的儿童在讲自己的故事时，不仅会讲出许多书本上特有的文学语言，还会使用更为复杂的句型、更长的语段和从句。
这样的差异之所以重要是因为:当儿童能使用自己的语言中一系列语义与语法后，理解他人的口语和文字的能力也会更强。这种语言和认知能力为孩子几年后的发展打下了独特的基础，当他们开始独立进行阅读时会掌握更多的理解技巧。
最近,社会语言学家安妮·夏丽蒂(Anne Charity)与其同事霍丽斯·斯卡伯勒(Hollis Scarborough)的一项研究显示:语法知识对于母语是其他方言或外语的孩子来说更为重要。他们发现在说着一口非式美语(African-American English)而不是标准美语的儿童身上，儿童的语法知识和他们将来学习阅读的好坏关系密切。
书本语言还可以帮助儿童理解什么是“修辞手法”，例如隐喻与明喻想想刚刚那个故事里的几个明喻:玫瑰花瓣一样的脸颊，金色丝绸般的头发。这样的段落是美好的，但是需要很高的认知能力才能理解。儿童必须将“脸颊”和“玫瑰色花瓣”进行比较，将“头发”和“金色丝绸”进行比较。在这一过程中，他们获得的不只是词汇技巧，还有类比这一复杂的认知技巧。类比的技能无比重要，足以作为每个年龄层主要智能发展的代表。
在《好奇猴乔治》(Curious George)中可以找到一个关于早期类比技巧的有趣例子。《好奇猴乔治》讲述了一只猴子对气球有着无止境的好奇心，最终使他飞向天空，在那里“房子看起来都像是玩具屋，人就和洋娃娃一样”。这些简单的明喻实际上在帮助孩子进行复杂的认知练习，如比较大小、远近。20世纪40年代，作者汉斯·雷伊(HansRey)和妻子玛格丽特开始撰写这本书，他们那时可能不知道这本书对儿童的认知与语言发展有多大贡献。从他们写完的那天起，这本书已经持续影响了数百万学龄前儿童的发展。
书本语言对提高儿童的理解力也有贡献。想想“很久很久以前”这句话，霎时它就能带你脱离现实，激起你对另一个世界的期待。“很久很久以前”是一个暗号，每个具有理解力的学龄前儿童都知道这意味着他们即将进入一个童话世界。这些故事在不同文化与不同时代中，仅有几百种不同类型，而且彼此出入不大。儿童最终将发展出理解许多不同类型的故事的能力，每一种都有其典型的情节、背景、年代与角色。这些认知信息日后会转变为“认知图式”(schemata)的一部分，认知图示是一种惯例化的思考方式，可以帮助我们更好地理解事件与加强记忆。这种规则以一种自我强化的螺旋方式来运行:故事越有条理，孩子就越容易记住，对孩子正在形成的认知图式贡献也越大;而孩子发展出的认知图式越多，也就越能读出其他故事的条理，儿童积累的知识越多，越有助于未来的阅读。
认知图式:一种惯例化的思考方式，可以帮助我们更好地理解事件与加强记忆。
能够预测即将发生的情节，对于儿童推理能力的发展(从旧有的信息演译或推测)有很大的帮助。拥有与巨人战斗、拯救美丽少女与破解巫婆咒语等经验的5岁儿童，能更容易地认出书中的生词(如“巨人”)。更重要的是，他们日后便能理解整段话的意思。
明白了增加儿童与书本接触的机会，将有助于他们日后阅读能力的发展，我们可能会认为只要多读点故事给孩子听就算做足了学龄前的阅读准备，实则不然。根据一些研究者的研究，讲故事给儿童听只是帮助他们准备开始阅读的一部分，另一个有效的方法是教孩子辨认字母。
字母的名称中蕴藏了什么 ?
当儿童熟悉书本语言后，他们开始留意更多书本的细节。许多文化中的许多儿童都会通过在书本上移动手指来“阅读”，即便他指的地方一行字都没有。文字意识的一个方面开始于发现书本上的文字有一定的方向:比如英语和欧洲语言都是由左至右，希伯来语和阿拉伯语则是由右至左，还有一些亚洲的文字是由上而下。
接下来是一系列更为复杂的技能。随着对某几行字的形状越来越熟悉有些儿童能够认出冰箱门上、浴缸上或是图画纸上的几个彩色字母。大脑能够识别出一个字母的视觉形状不是必然的成就，每个古代祖先阅读代币的大脑都是最好的证明。正如前几章提到的，这种能力来自于极为精密的视觉认知系统，还需要与相同的模式和特征有大量的视觉接触，这样才能让我们识别出猫头鹰、蜘蛛、箭头和蜡笔。
在儿童能够自动辨认出字母之前，必须使用视觉皮质层专门化区域的些神经元来发现每个字母细微而独有的特征，就跟古代的代币阅读者一样。要想从视觉分析层面上理解儿童是如何学习阅读的，可以参考图4-1中的两个汉字。
图4-1 两个汉字
这两个汉字有许多和字母文字一样的视觉特征，如曲线、弧线和斜线等。注视这两个字几秒钟，然后立刻翻到本章的最后一页，看看那两个字是与这两个字一模一样，还是有些许不同?大多数成人觉得这个测试很简单，但对幼童的视觉系统来说，这需要复杂的知觉功能，儿童必须先知道西方字母系统中每个细微但明显的特征都能传达信息，还要明白字母是由这些特征组成的固定模式，而这些特征是不会改变的或至少改变不大。
一个重要的早期概念技能——模式不变性(pattern invariance)有助于字母的学习。早在婴儿时期,儿童就知道他们看到的某些特征(如父母的脸)是不会改变的。这些都是不变的模式。本书第一章就曾讨论过，天赋的本能让我们能够在记忆中存储知觉模式式的表征，然后应用于新的学习情境。因此，当儿童尝试学习新事物时，从一开始就会寻找不变的模式，这有助于他们建立视觉表征和规则，最后他们可以认出冰箱上的任何字母，不论大小、颜色或字体如何。
从认知发展的角度来看，儿童第一次努力给字母命名，不过就是“配对”学习而已。这就像训练鸽子，鸽子为了得到食物，必须学习将物体与标志进行配对。然而，不久后会出现更精细的字母认知学习，正如苏珊·凯里(Susan Carey)提出的那样:在儿童学习数字时，会出现“自展”（bootstrapping)的情形。举例来说，对许多儿童来说，数数到10与字母歌都提供了概念上的“占位符”(placeholder)表。渐渐地，列表上的每个数字与字母的名称都会与其书写体相对应，最后通过慢慢了解这些字母与数字的作用而完成整个命名过程。
已故的神经心理学家哈罗德·古德格拉斯(Harold Goodglass )曾对我说他小时候一直以为背字母表中的L、M、N、0时发出的类似elemeno 的声音仅是一个很长的字母。这说明了儿童对字母的概念会随着他们语言和概念系统的发展，以及大脑中识别字母的视觉专门化区域的使用而发生改变。比较幼儿对事物与字母的命名可以发现，在拥有字母识别能力之前与之后，大脑中出现了令人意想不到的变化。简单来说，在识别、命名物体的过程中，儿童的大脑第一次将基础视觉区与语言处理区连接起来。之后在一个“神经再利用”的过程中，这些神经回路又被用到识别与命名字母的过程中，因此书写符号最终可以被快速地阅读。
目前没有幼儿首次学习字母名称的脑成像研究，但是我们有成人给物体与字母命名时的脑成像图。在最初的几毫秒里，两个过程共同使用37区梭状回(fusiform gyrus)的大部分区域。针对此种现象，有种假设认为儿童早期字母命名的过程和识字前儿童的物体命名过程差不多。当儿童为每一个字母建立起独立的表征后，神经元工作组会逐渐专门化，所需要的区域也越来越小。从这个意义上来说，命名物体和稍后的命名字母代表着现代阅读脑的前两个阶段。
德国哲学家沃尔特·本杰明(Walter Benjamin)认为命名是人类心智活动的精髓。虽然他从未看过任何一张脑部断层扫描影像，但就命名与阅读的早期发展来讲，他的看法再正确不过。学习在脑海中提取一个抽象的视觉字母符号，是一切阅读过程的基本前提，也是判断儿童能否开始阅读极为重要的指标。儿童在很小的时候具有了命名物体的能力，然后随着日益成长，掌握了命名字母的能力，我的团队经过多年的研究发现，这两种能力决定了孩子未来整个阅读脑神经回路的发展效率。
不同文化中的儿童开始认识字母的年龄有很大差异。在某些文化或者国家中，比如奥地利，儿童要到一年级才开始学习字母。此外，同一种文化中的儿童也有个体差异。在美国，有些2岁大的孩子就认得出所有的字母，但有些到了5岁(尤其是男孩)还是很吃力。我曾听说有几个5到7岁的男孩，必须要轻声唱完整首字母歌，才能找到所要找的字母，确定其名称。
应该鼓励父母在儿童看起来已经准备好时，帮助他们学习命名字母，同样的原则也适用于“阅读”环境文字(environmental print)，即儿童周围环境中常见的文字与符号，如停止标志、一盒麦片以及兄弟姐妹和朋友的名字。许多还没有上幼儿园的孩子和大多数幼儿园的孩子都可以认出熟悉文字的形状，像是“出口”(Exit)与“牛奶”(milk)，通常还有他们名字的前几个字母。有些孩子坚持“象牙色”(Ivory)这个词应该读做“肥皂”这并没有什么关系。
环境文字:即儿童周围环境中常见的文字与符号，如停止标志、一盒麦片，以及兄弟姐妹和朋友的名字。
在大多数文化中，每个儿童都先学会识别常见字母和文字，然后开始学习书写这些内容。这一阶段的阅读就像是儿童发展过程中的“表意文字阶段。儿童所理解的正是概念与书写符号之间的关系，这和我们阅读代币的老祖宗没什么两样。
儿童应何时开始阅读
一旦儿童开始学习认识字母，家长马上就想到是不是该早点让孩子学习阅读。父母认为早点让孩子读书，将来在学校就能多点优势。许多商家抓住家长的这点心理,为了招揽生意,打出了许多学前阅读系列产品的广告26年前，我在塔夫茨的同事儿童心理学家戴维·埃尔金德(David Elkind）针对这种社会风气写了一本发人深省的书——《揠苗助长的危机》(The Hurried Child)。在书中他提到父母要求孩子阅读的年龄越来越早。最近戴维决定推出这本书的新版，因为他认为这一情况比 20年前更为严重。
在谈论这个问题时，必须要考虑一下我们的发育时间表。阅读依赖于大脑联结与整合各种信息来源的能力。具体来说，就是视觉、听觉、语言与概念区。整合能力则取决于每一个区域的成熟程度、区域间联合区的成熟程度,以及这些区域联结和整合的速度。而速度则仰赖于神经轴突的“髓鞘化”(myelination)程度。髓鞘是自然界最好的传导材料，由包裹在神经轴突四周的脂蛋白构成(见图4-2)。轴突上覆盖的鞘越多，神经传导的速度越快。大脑各个区域的髓鞘的发展程度是不同的，比如听觉神经在怀孕第6个月时就形成髓鞘，而视觉神经要到出生后6个月才有髓鞘形成。
图4-2 神经元和髓鞘
在5岁前，大脑各区的感觉与运动神经区域都有髓鞘形成，并且各自独立运作，但是大脑中快速整合视觉、语言与听觉信息的区域，如角回其髓鞘化过程要到5岁之后才陆续完成。行为神经学家诺曼·格施温德认为多数儿童角回区域的髓鞘一直要到学龄期才发育完成，大约是在5到7岁之间。格施温德还提出过一个假说:某些男孩大脑的重要皮质区的髓鞘形成更慢，这可能解释了为什么多数男孩的阅读能力发展比女孩要慢一些。我们的语言研究也支持这种说法，8岁以下的女孩在许多计时的识字测验中都比同龄男孩要快一些。
格施温德对于儿童大脑发育到何时才该学习阅读的结论，得到了许多跨语言研究的大力支持。英国阅读研究者乌莎·戈斯瓦米(Usha Goswami）的研究团队进行的跨语言研究引起了我的注意。他们的研究涉及3种不同的欧洲语言，结论是欧洲5岁开始学习阅读的儿童，并不比7岁开始学习阅读的儿童优秀多少。从这项研究中我们可以知道，花许多功夫教导4至5岁的儿童读书识字，从生物学角度来看，其实是揠苗助长，在许多儿童身上可能会收到相反的效果。
到底何时才准备好阅读，就跟人生一样，总是充满意外。在哈珀·李(Haper Lee)的《杀死一只知更鸟》( To Kill a Mocking bird)里，有个5岁之前就学习阅读的小女孩。故事中的斯考特(Scout)，能读出所有视线中的东西，这种超常能力吓坏了她的新老师:
我读字母表时，她的眉头皱了起来。在叫我大声读出《我的初级读本》( My First Reader)与《莫比尔注册报》( Mobile Register)上的股市摘要后，她发现我识字，反而以更厌恶的眼神看我。卡罗琳小姐让我和爸爸说不要教我了，这样会干扰我的阅读。我从来没有想要学阅读……阅读是突然降临到我身上的……我不记得是何时，在阿提克斯移动的手指上方的那些线条变成一个个文字，在我的记忆里，每个夜晚，我都坐在阿提克斯的腿上，注视着这些文字，听他念每一个字。我从不喜欢阅读，直到我开始害怕会错过他念的东西。就像没有人喜欢呼吸一样。
作家佩内洛普·菲茨杰拉德(Penelope Fitzgerald)也有相同的经历。她回忆道:“我4岁就开始阅读:好像突然间就看懂了书本上的字母:也了解它们的意义。瞬间，我对它们充满了感激。”像斯考特和菲茨杰拉德这类孩子，当然应该立刻就让他们阅读。至于其他的孩子，有充分的生物学理由让我们相信阅读应该开始于对他们来说最合适的时候。
教导儿童阅读的时间:一般来说，5 岁之后儿童的大脑才做好学习阅读的准备，男孩可能要晚一些。花工夫教导5岁以下的儿童读书识字，从生物学角度来看是揠苗助长，甚至会适得其反。
髓鞘形成前期的注意事项
即便不接受正式的阅读训练，儿童在5岁前还是会发生许多美好的事情，他们各方面都已经发展得很好，可以为未来的阅读做准备，并享受学前生活的乐趣。例如，聆听诗歌朗诵可以强化儿童的听力，最终便能切分语言中最小的发音单位音位。尝试写字反映了儿童对口语与文字之间的联系日益增长的了解。首先，模仿着写出或是画出字母，这时候的确比较接近“草体艺术”而不是概念。接着，这些字母开始反映出儿童逐渐演变的书写概念，尤其是他们名字中的字母。渐渐地，孩子们注意到其他字母开始想到单词是由字母组成的，正如他们的名字一样，这真是一种天才的行为。
读写研究专家格伦达·毕赛克斯(Glenda Bissex)在她的《孩童读写学习》( Gnys at Work: A Child Learns to write and Read)一书中给出了一个儿童以字母名称来拼写单词的生动例子。当毕赛克斯正专心写作时，她5岁大的儿子给了她一张纸条,上面写着RUDF。意思就是“你了吗?”(Areyou deaf?
毕赛克斯的儿子就跟许多同龄的孩子一样开始明白两件事情:首先写字可以让大人偶尔转移注意力;其次，字母可以对应到文字的发音。他还不明白的是，字母所代表的发音和字母本身的读法并不相同。R这个字母并不代表 are，而是表示英语音位 //。字母与发音之间的对应是一个微妙而困难的概念，通常连父母，或是一些未受过阅读基础训练的教师，都会忽略其中的复杂性。在绝大多数用来教导儿童阅读的初级教材中，这样的概念几乎不存在。
四五岁的学龄前儿童可能分不清这其中微妙的差异，不过他们的确开始进入到学习符号表征的新阶段。他们知道文字代表着口语，口语中的词是由语音组成的，最重要的是字母能够传达出这些发音。对多数儿童来说这一认识会引导他们写出一大堆不符合英文拼写规则的东西，但实际上却极具规则性。
卡罗尔·乔姆斯基和查尔斯·里德(CharlesRead)称这种写作方式为“拼写法创造”，想想刚才提到的毕赛克斯的儿子就可以明白。但是这其中的原则比表面上看起来的要复杂得多。举个例子，试着破译一下“YN’的意思。这样的拼法在儿童的书写中至少代表两个词，分别是wine(酒)与win(赢)。在这两个词中，儿童都以Y来表达/w/的发音。在写wine这个词时，字母Y代表其完整的发音，但在win中则以完整的N的发音来表达 /in/，这两个可能的拼音规则都很合理。
以“拼写法创造”进行的早期书写，还有一项不寻常的特征，那就是发音并不符合一般所接受的拼音方式。因为英语发音本身变异性很高，再加上其他诸多因素的影响，如地方方言。以我居住的波士顿为例，许多单词中的t，比如 1ittle，小朋友都会写成d(LDL);波士顿南区那些精英人七家庭的儿童要比全国各地的其他儿童多用一年的时间才能学会在cart(车子)里面写上r。大多数波士顿地区的儿童则和已故的肯尼迪总统一样，在AMREKR的后面大方地加上一个r的音。
关于儿童最初的书写，一个最让人感兴趣的问题是:他们自己是不是能读懂自己写下的文字。实际上，多数的儿童都很难读出他们自己写的东西,不过他们也不见得想要这么做。这样的书写动机以及利用“拼写法创造组成文字的个别字音，都表明了儿童早期书写对阅读的学习有着极大的帮助，对阅读过程有极佳的辅助作用。
音位意识与聪明的鹅妈妈
幼儿所感知的发音单位与我们是不同的，正如之前提到的古德格拉斯“elemeno”的例子，以及儿童书写的不符合拼音规则的文字。不过，儿童会渐渐地从意识到是什么组成了单词，进展到了解一个单词内的音节(如sun-ny)，最终会明白单词还可以划分为单个独立的音位(如s，u，n)。
在孩子学习写字与阅读的过程中，对一个单词的发音组成及音位的认识，是极为关键的一步，也是学习过程的必然结果。正如我们所了解的希腊人的光辉成就，他们对口语中的每个发音的元意识不会凭空出现在文字的历史中，也不会凭空出现在每个孩子身上。当阅读专家玛丽莲·亚当斯(Marilyn Adams)问孩子们 cat 这个词的第一个音是什么时，有个孩子马上回答“喵”。
希腊字母发明者的一个创举是意识到了口语的各个语音。这是字母表最有力的贡献，也是用来衡量儿童未来阅读成就的最佳指标之一;另一个指标则是快速命名的能力。从RUDF这类创造性的拼法中，我们可以看出这类语言意识发展的一些线索，而这些活动也促进了语言能力的发展。
除写字之外，还有其他同样具有娱乐性的方式也能帮助儿童音位意识的发展。鹅妈妈童谣便是一个极好的例子。“钟声滴响，老鼠爬上钟’(Hickory, dickory dock, a mouse ran up the clock!)这一句中的韵律，以及其他的韵律形式，如头韵、类韵、尾韵与重复等，都有助于语音意识的发展。头韵与韵律告诉儿童，单词会因头尾字母相同而有类似的发音。当你第一次听孩子们讲笑话时，马上会被他们古怪的韵律吓到。像小熊维尼，孩子们喜欢一遍遍地重复“配对”的声音(例如：Funny bunny, you’re funny bunny honey!)，仅仅是因为他们喜欢这样的韵律。
同样重要的是，开始区分成对语音的儿童也开始将文字划分成几个部分。四五岁的儿童正在学习辨别单词的首音(如Sam的S)与韵脚(如Sam的am )，识别单词内的每个音位有助于阅读的学习，但这个漫长而重要的过程才刚开始。
英国几位研究者进行了一个著名且极有创造力的实验，凸显出上述原则的重要性。琳恩·布拉德利(Lynne Bradley)与彼得·布赖恩特(Peter Bryant)以4组学龄前儿童为研究对象，他们在各方面的条件都很类似，唯一不同的是有两组儿童在4岁时受过头韵和押韵的训练。在训练中，研究者要求这些儿童听一组要么词首相同(押头韵)要么词尾音节具有相同元音(押韵)的单词。然后教他们将有相同发音的词归为一组。此外，上述两组受训儿童中的一组还会在根据声音分类时看到相应的字母。几年后，布拉德利与布赖恩特为所有的儿童进行测试，结果令人惊讶:接受过简单韵律训练的儿童，其音位意识的发展更为完备在学习阅读时更容易。而其中表现得最好的是接受韵律训练并且看见相应字母的那组儿童。巩固丘可夫斯基所说的幼儿时期的“语言天赋”有许多方法，其中一项便是托儿所中的押韵儿歌。
那么这时期的儿童究竟发生了什么事，才会产生这样不可思议的能力呢?在最基本的层面，儿童首先学会用最不费力的方式去观察分析单词。例如通过对头韵与韵脚的认识，学习给单词分类。接着，他们将这些发音与字母或者视觉图像联系起来。把这些技能结合起来，聆听鹅妈妈童谣中的旋律、节奏与韵律，有助于提升儿童的音位意识。语音发展方面的大量研究显示，着重于韵律、词首、词尾发音的系统性文字游戏、笑话与歌曲，对儿童准备学习阅读有明显的好处。教导儿童欣赏诗歌与音乐是一项重要的儿童游戏。
有助于阅读学习的游戏活动:我们小时候都会念一些内容上毫无逻辑、发音却朗朗上口的儿歌。这些游戏活动可以使儿童逐渐感觉到音节的内部构造，对准备学习阅读的儿童有很大好处。
苏格兰语言研究专家凯蒂·奥弗里(Katie Overy)以及我们实验室的凯瑟琳·莫里茨(Catherine Moritz)和萨沙·杨波斯基(Sasha Yampolsky)，目前正在研究音乐训练的某些重点，例如韵律模式的生成，观察其是否有助于培养音位意识与其他阅读发展的必要条件。如这项研究假设被证实，他们希望根据节奏、旋律与韵律来编写-份早期的教学方案。
幼儿园:各种必要条件的聚集之处
当儿童五六岁的时候，所有学习阅读的必要条件都会集合出现在幼儿园中。优秀的老师不会白白浪费儿童先前学习到的任何概念、字母或文字，早期的学习成为儿童正式进人文字世界的引路灯。虽然多年来教师都在培育这些必要条件，但直到近几年，促进音位意识发展的系统性工具才得以推广。这些看似简单的方法可以帮助儿童学习各种困难的语言概念：
@ 发音与符号之间存在着一一对应的关系；
@ 每个字母都有自己的名称，此外还可以代表一个或一组语音;反过来每个语音可由一个或多个字母代表；
@ 每个词语都可分解为音节与音位。
阅读研究专家路易莎·库克·莫茨(Louisa Cook Moats)清楚地解释了将这些基本语言规则融合到阅读教学和早期阅读技巧发展中的重要性。儿童通常会经历一段痛苦的时间，才能搞清楚如何把发音组合成像cat或sat这样的单词。如果能明白s这个音可以一直持续，然后在后面加上韵(如at)，对于教师和儿童来说，指导发音的合成就会容易很多。因此，若是要教导发音的合成，从 sat与rat开始会比从 cat开始容易很多。
我们还可以为孩子做什么
迄今为止，阅读的不断发展发生在一个特别的世界里，那里有兔妈妈和可爱的河马解释文字与书中的喜怒哀乐，有巨龙传达概念与句型，而托儿所里潦草书写的儿歌与字母教导语音与文字的意识，以及这两者之间的关联。在这样的世界里，儿童用5年时间来发展高度复杂的认知、语言知觉、社交与情感能力。这一切在丰富的互动环境中，会得到很好的发展。
而那些不曾在家里听过鹅妈妈童谣，不曾被鼓励去读符号、去乱写乱画或没有玩过任何书本游戏的孩子又会怎样呢?在美国，从小听西班牙俄罗斯或越南版本故事的小孩又会怎样呢?那些不会像其他孩子一样学习或是对语言刺激毫无反应的孩子呢?越来越多有着不同情况的孩子出现在教室里，每个人都有不同的需求。他们在幼儿园里的境遇将会严重影响他们的一生。
向“词汇贫乏”宣战
家长们可能不知道，在没有读写经验的家庭里成长的孩子，在进入幼儿园与小学时，就要开始拼命追赶他们的同学的学习进度。这可不只是生词词汇量的问题。某单词连听都没有听过，对其概念当然一无所知。从来没见过某种句型，当然就不容易理解故事的情节。连同类的故事都没听过自然就难以进行推理或预测。从来没有体验过他人的感觉与文化传统，当然就不容易理解他们的感受。
之前曾提到托德·里斯利(Todd Risley)与贝蒂·哈特(Betty Hart)在加利福尼亚一个社区的研究，显示出让人不寒而栗的结果。冷酷的现实揭露出几个严重的问题:有些出生在语言环境贫乏的环境中的儿童，到5岁时，和中产阶级的儿童相比，少听过的词约有3200万个。路易莎·库克·莫茨所谓的“词汇贫乏”，不仅是指儿童所听到的词汇。另一项针对 3岁儿童口语词汇量的研究发现，语言贫乏环境中的儿童所用的词语数量，与其他儿童相比，整整少了一半。
还有一项针对家中的书籍(任何种类)数量的研究。对洛杉矶三个社区的调查结果发现，不同家庭的孩子能读到的书籍量有着惊人的差异。在大部分贫穷的社区里，家里完全没有书供孩子阅读;在中低收入的社区里每家平均有3本;而在富裕社区则有 200本左右。这样的统计数字，让我们悉心策划的有关蟾蜍、文字与句型的故事显得没有一点价值。书籍的严重匮乏将损害儿童在童年早期应该学习到的文字知识和对世界的认识。
加拿大心理学家安德鲁·比米勒(Andrew Biemiller)研究了儿童词汇量水平过低的后果。他发现在幼儿园里，词汇量居班级人数后25%的儿童在词汇与阅读理解这两个方面都一直落后。等这些孩子到了六年级，他们在词汇和阅读理解这两个方面至少比同年级的孩子落后三个年级，而比起当年幼儿园前75%的儿童他们差得就更多了。换句话说，词汇发展与日后的阅读理解能力相互关联，幼儿园里发生的一切不只是不幸的社会现象更是他们后期词汇增长缓慢的恶兆。在语言发展中，没有哪项因素对儿童的影响是单一的。
在幼儿园中，儿童已明显表现出来的许多因素是不能更改的，但语言发展并不在其中。一般的家居生活为孩子语言正常发展提供了充足的机会。在一项读写技巧早期发展的大型研究中，哈佛的教育家凯瑟琳·斯诺(Catherine Snow)与其同事发现:除了文字材料之外，对以后的阅读能力最主要的一项贡献因素其实只是“晚餐闲聊”时间的长短。简简单单的讲话、朗读与聆听就是早期语言发展的重点，但事实上，在许多家庭中(有些是经济状况不好，有些不是)，家长在儿童5岁前做这三项基本工作的时间少之又少。
“晚餐闲聊”时间:“晚餐闲聊”时间的长短是影响儿童以后的阅读能力的一项重要因素。简单的讲话、朗读与聆听是早期语言发展的重点。
如同政策专家佩姬·麦卡德尔(Peggy McCardle)一再强调的那样，只需要一些很小的共同努力，儿童学龄前的日子就可以变得丰富多彩，充满语言发展的各种可能性。所有的儿童专家都可以帮助父母确定他们对于孩子潜能的贡献，帮助每个孩子拥有良好的学龄前生活。举例来说，他们为每个前龄前的孩子接种疫苗、在家访时和初为父母者谈一下“晚餐闲聊’时间的作用，以及向他们提供一系列有助于儿童发展的书籍。家庭访问机构如“健康人生”(Healthy Start)中的社会工作者与社会服务人员，可以提供此类宣传品和有关方面的训练。要做到在所有孩子进入幼儿园之前让他们都公平地享有这样的待遇，这并不是很难的事情。
耳部感染对早期语言发展的影响
让所有儿童得到公平待遇的最大阻碍来自幼儿的中耳炎，这是全美儿科诊所最常出现的病例。试想，对一个每天要学2到4个新词的幼儿来说没有诊断或是没有治疗他们的中耳炎，会有怎样的影响。孩子第一天听到的可能是 pur(咕噜声 )，第二天听到的可能是 pill(药丸 )，之后还会听到purple(紫色)。由于中耳炎的缘故，孩子接收到的听觉信息不一致，因此会认为 purple 这个词有三种不同的发音
除了认知混淆以外，儿童学习新词的时间也会拉长。感染发生在何时发生过多少次，这些因素最终可能导致他们无法完整且良好地发展出一种语言系统中全部的语音表征。未经治疗的感染会影响到对阅读来说极为重要的两项必要条件，分别是词汇发展与音位意识。
但是问题还不仅仅是这样，若是词汇发展与音位意识这两项对阅读至关重要的必要条件受到影响，后果也会波及阅读本身。在我指导的一项大型研究计划中，研究者要求父母在问卷中勾选儿童是否在学龄前得过中耳炎,并且尽可能地追踪有儿科病史的儿童。结果显示,经常患中耳炎的儿童日后遭遇阅读问题的可能性更大。
这项研究最让人惊讶的地方不在于这个可预期的结果，而是有相当多的家长都会有“但是我每个孩子有大半的时间耳朵都在发炎”这样的解释换而言之，许多善意的家长从来都不明白中耳炎比起许多短暂的不适，会产生更严重的后果。未经治疗的中耳炎是一项对口语与文字发展的无形障碍，每个儿童工作者都必须了解这一点。就跟贫乏的语言环境一样，只要付出一致的努力，不需要花费很多工夫，中耳炎对于儿童来说就不会是个障碍。
双语环境对阅读学习的可能影响
在踏入学校的同时开始学习英语，这产生的影响是一项更为复杂的论题。学习两种或两种以上的语言,对儿童来说是非常吃力且复杂的认知投资目前这样的学龄儿童数量正在不断增加。一开始会有些损失，如语言之间转换的错误，但是如果(请注意，“如果”在这里很重要)孩子把每种语言都学好,那么肯定是利大于弊的。儿童大脑的可塑性比人生的其他阶段更强这使他们只需要少许的额外努力就能够精通两种以上的语言。青春期过后学生具备了许多学习语言的优势，但是对于学习说没有口音的语言，儿童的大脑在某些重要的方面显然更具优势。
审视众多双语学习的相关议题，常常让人眼花缭乱，但是这其中有3项主要的原则。
首先，以英语为第二语言的学习者，他们在母语中已经学过的词语与概念，比较容易在英语中使用。也就是说，家庭中丰富的语言环境，为所有的学习奠定了基本的认知与语言学基础，并不需要特别在学校的语言教学中给儿童提供这样的协助。儿童若是生活在语言较为贫乏的家庭环境中毫无疑问会缺少学习母语或第二语言的认知与语言学基础。
第二项原则和第一项类似。在学习阅读英文时，语言发展的质量比学习阅读英语更为重要。上千名学龄儿童在进入学校时英语能力各有差异，在每个教室，针对每个学龄儿童，都必须系统性地教授英语“新的”音位和学校、书本中的新词汇。康妮·朱尔(Connie Juel)指出我们的教师常常会轻易忽略掉一个基本的语育问题:进入学校的儿童，英语对他们来说是新的，或者说他们没听过学校的标准美式英语。他们并不知道在阅读时正确的音位是什么样的。在过去的5年里，他们“学会忽略这些，只听他们自己的语言”。
第三项原则与儿童何时开始讲双语有关，无论是口语还是书面语言的发展，接触双语环境都是越早越好。达特茅斯(Dartmouth)的神经科学家劳拉-安·贝蒂托(Laura-Ann Petitto)和他的同事发现，早期的双语环境(3岁之前)相比于单语环境，对于语言与阅读来说，具有更加积极的影响。他们进一步针对幼时接触双语环境的成人进行脑成像研究，结果也发现这些双语处理两种语言的大脑区域大幅重叠，就跟单语者所用的区域一样，对比之下，长大后才接触第二语言而成为双语者的成人大脑，则展现出两种不同的脑部运作模式，比较接近左右脑分别运作的模式。
双语或多语学习:学习两种或两种以上的语言，对儿童来说非常吃力，但如果孩子能把每种语言都学好，肯定利大于弊。儿童大脑的可塑性非常强，他们只需要少许的努力就能够精通两种以上的语言。
作为一个认知神经学家，我认为拥有一颗双语大脑是非常好的事。贝蒂托的研究还发现，早期接触双语环境的大脑，在语言灵活性与处理多重任务上，比单语大脑更具优势。我在许多社区进行教育工作，多半的家庭都不说英语，但是，我始终被学习两种语言所涉及的复杂且有争议的问题所困扰，这包括儿童的自尊、在某一社区文化中的归属关系、对自我能力的感知，以及这一切累积起来对阅读的影响。我知道我们必须帮助所有的儿童学习学校用语，这样他们才能在这个英语文化中发展自己的潜能，而这一切都要从成为一个阅读者开始
对一些听西班牙语、日语或者俄语故事长大的儿童来说，学习阅读英语是一项尚可应对的挑战，并且听英语故事对他们将母语中熟悉的词语与概念对应到第二语言有极大的帮助。对那些小时候没有这样坐在大腿上听故事的小孩来说，上学还要同时学习第二语言，这一过程会对他们的认知、社交与文化产生重大的影响。他们都是这个国家的孩子，我们必须准备好照顾他们每一个人，从怀有一份教导每个儿童的热忱开始，随时增加自己关于各种语言阅读发展的知识。
阅读不是自然而然就会发生的。在孩子出生后的2000个日子里，没有一个词语、概念或是社交习惯被浪费，这一切都在为这颗年轻的大脑做着准备工作，使大脑运用所有发展着的部分更好地进行阅读。儿童阅读的发展，以及他们的人生都是从这里开始的。
第五章阅读者的五大进阶(1)
从未有人告诉我们，我们必须研究自己的生命
研究生命，犹如学习自然史或音乐
一切都应从最简单的练习开始
慢慢地，由易到难
不断练习
直到拥有力度和准确度
成为一个勇敢的人，
才能跳跃到超越技巧的
表现情感与意境的练习曲……
——阿德里安娜·里奇，《超然的练习曲》
就某方面来说，整部人类书写的历史仿佛会在孩童身上重演一次。从早期摸索出字母文字的书写方式，一直到发现口语是由一定数量的字音所构成的事实，这两样智能上的壮举可说是不相上下。 ——珍妮·查尔
普鲁斯特的名著《追忆似水年华》的灵感是由玛德琳蛋糕这种贝壳状的重油糕点的美味唤起的，这是20世纪文学史上一项近乎神话的典故。不管小说中叙事者的感觉记忆是否仅仅只是普鲁斯特自己幻想的再造，在现实生活中这样的事情真的会发生。人类大脑会以各种方式来存储和提取记忆，其中也有各种感官的作用。
原本我想以寻找自己的“玛德琳蛋糕”来作为本章探讨学习阅读的开端，那会同样释放出我第一次真正在阅读的记忆。但是我办不到。我记不起第一次知道自己能阅读时的情形，不过我其他记忆中的一部分——一所只有两间教室的小学校，一共只有8个年级、两个老师，倒是唤起我许多过去的回忆片段，就像语言学家安东尼·巴希尔(Anthony Bashir)提出的阅读生命的“自然史”一样。阅读的自然史始于简单的运用、练习与正确性最后，如果幸运的话，在工具的帮助下，就能拥有“跳跃到超越技巧的表现情感与意境的练习曲”的能力。在我身上，这一切都发生在一个名叫埃尔多拉多(Eldorado)的小镇。
开始阅读之旅
学会阅读时，你将重生……从此以后再也不会感到这么寂寞。 ——鲁默·高登
我在书中旅游，不只探索其他世界，也进入我自己的世界。我明白我是谁，我想要成为谁，我的想望以及我胆敢对我的世界与自己所怀抱的梦想。但也有很多时间我觉得自己身处在另一个不为人知的空间，时而清醒，时而沉睡。然后有书，一个与此乎行的宇宙，在那里什么都有可能发生，通常也是如此。在那个宇宙中，我也许是新人，但绝对不是个陌生人。对我而言，那是实实在在的世界，我完美的岛屿。——安娜·昆德伦
是因为父亲的希望，我才能上学的。这可非比寻常，因为女孩通常不会去上学的……教育对像我这样的人来说能有什么作用?我只能说出我不曾拥有的，只能以我所有的来思量，然后在这些差异中明白自己的不幸。但是啊但是!正是因为这样，我才第一次见到，在往返家园的道路后面，还有另一个世界。——牙买加·金凯德
意大利瓦尔道尔契亚(Val D’Orcia)的侯爵夫人艾丽斯·奥里戈(Iris Origo)是位历史学家，常常引用鲁默·高登(Rumer Godden)的话来描述她20世纪初在意大利佛罗伦萨学习阅读的经历。安娜·昆德伦(Anna Quindlen)则生动地描述了20世纪中期在费城学习阅读的场景。牙买加·金凯德(Jamaica Kincaid)在她那本《我母亲的自传》( The Autobiograph of My Mother)中捕捉到在加勒比海的安提瓜岛(Antigua)那里，童年的阅读对女孩子意味着什么。确实，金凯德小时候表现出来的阅读天分让老师相信她是个天才。
在这些女作家之间虽然有着时空和文化的差异，但有一个共同点将她们和每一个爱书的人联系起来。这个共同点也发生在我的经历中，当我在伊利诺伊州的埃尔多拉多学习阅读的时候，我在书中发现了另一个平行于这个世界的宇宙，就是奥里戈所谓的“再也不会感到这么寂寞”的世界昆德伦的“完美的岛屿”，并且认识到金凯德“往返家园的道路后面，还有另一个世界”。
用“拼写闹刷”(orthographic irony)来形容我家乡小镇名称的由来再恰当不过。在19世纪中期，埃尔德(Elder)和里德(Reeder)两人从城市里请来一位画家,想要为他们在的伊利诺伊州南方共同创立的这个小镇“艾尔德里德”( Elderreeder)画一个标志，用来欢迎每一个路过的人。自以为受过良好教育的画家自作聪明地更正了镇名，他认为这是政府人员的拼写错误。最后他将欢迎标志改成了“埃尔多拉多”(Eldorado)。也许是因为这个标志做得很好看，也许是因为没有钱再买一个，又或许是因为这个名称对小镇的人们来说，唤起了一些先前不可言表的梦想;不管怎样，这个名字就这样定了下来。一个世纪之后，我就在这个小镇长大。
埃尔多拉多有两间学校供儿童念书。我就读的是一个很小的叫做圣玛丽的学校，教室看起来像是19世纪木板画上的建筑:深红色砖块搭建的两间大房子，每间有四排桌椅容纳4个年级的学生。一年级的学生坐在最左边靠窗的那排，每升一年级，就往门口移动一排。
在靠窗坐的一年级的日子里，我开始大量地阅读，读得多说得少，这真的是很好的习惯。一开始我学习第二排的孩子们的功课，然后是第三排的。我不记得自己是什么时候把四年级的功课也读完了，应该是我坐在第二排的那段时间。在这样的环境中，教室里满满地挤着40个孩子，还有我这样的学生，除了圣人之外，每个人的耐心都会被消耗殆尽。但不论从哪个角度看，在那间小学校的每个老师，从罗丝·玛格丽特(Rose Margaret)修女、撒莱西亚(Salesia)修女到后来的伊格内修斯(Ignatius)修女，她们每个人都是圣人。
在我坐第二排的时候，发生了一件重要的事情。我的老师对我父母说了些什么，突然之间房间后面出现了许多书，原本半空的书架神奇般地出现了许多书:童话故事、科学知识、英雄传奇，当然还有圣徒的传记当我上完四年级时，我弟弟乔伊坐在第三排，妹妹凯伦坐在第一排，另一个弟弟格雷格则在走廊上等着，我已经读完了每一本书，甚至还想要读更多。
在这个过程中，我改变了。不管在这个世界里我看起来有多渺小，我每天都有文字与图画中的巨人陪伴，伐木巨人保罗·班扬(Paul Bunyan)顽童汤姆·索耶(Tom Sawyer)、精灵小矮人(Rumpelstiltskin)与阿维拉的圣女特蕾莎(Teresa of Avila)，这些人物对我来说就跟华纳街上的隔壁邻居一样真实。我开始沉溺于这两个平行的世界中，身处其中的任何一个我都不觉得奇怪或者孤单。这样的经验让我受益匪浅，尤其是对我以后的人生而言。在那段日子里，我出奇安静地坐在那间小教室里，每一天我都经历着加冕、结婚、成为圣徒的生活。
关于这段日子的其他鲜明记忆，大都围绕着撒莱西亚修女，她努力教导那些似乎学不会阅读的儿童。我看着她耐心地倾听这些儿童在上课时痛苦地尝试，然后放学后把他们留下来，一个个辅导。我最要好的朋友吉姆也是被留校辅导的一个。当撒莱西亚修女尽力教导他的时候，突然之间他不再是我所认识的那个男孩子了，那个大家的领头者，那个无所不知的男孩，就像是马克·吐温笔下汤姆·索亚与哈克贝利·费恩混合版的男孩子竟然不见了。这个版本的吉姆看起来很柔弱，结结巴巴地发出撒莱西亚修女要他念出的字母的读音。看着这个从不退缩的男孩子变得对自己这么没有信心，我的整个世界都颠倒了。至少有一年的时间他们在放学后静静地坚持练习。撒莱西亚修女告诉吉姆的家人，有些聪明的孩子，像吉姆这样的在阅读学习上会需要特别的帮助。
撒莱西亚修女那时只说了这些，但是我明白了两件事情。首先，我看到撒莱西亚修女与吉姆妈妈的决心和毅力，他们相信吉姆的潜力，甚至是在他自己都想放弃的时候。我心想他们是在进行一件非常特别的事情。其次，当吉姆升到第三排的时候，我留意到我的老朋友又回来了，就跟从前一样，狂妄、大胆、难以管教。那个时候，我觉得撒莱西亚修女与吉姆母亲正做着奇迹一般的事情。
阅读阶段:学习阅读有许多发展阶段，这些阶段聚集起来，使儿童能够运用文字进入复杂的世界。
学习阅读就像是一个神奇的故事充满了许多发展阶段，这些阶段聚集起来，使儿童能够运用文字进人复杂的世界。苏格拉底与古印度学者都担心，阅读文字与倾听和口头叙说相比，阻碍了我们了解文字的意义、字音、功能与可能性等许多层面。事实上，在早期阅读的探索阶段，当这些层面聚集起来，共同形成脑部新的阅读神经回路之时，古老的未专门化的结构对每一个层面都有所贡献。因此，研究早期阅读的发展，使得我们能够了解人类取得这项成就的基础。这一切开始于给5岁儿童做的各项相关准备，一直延伸到不同的但是可以预测的阅读发展模式(见表 5-1)。
总之，以上所述的所有发展会加速儿童早期认识词语组成的能力，强化理解与拼写的熟练程度，促进儿童对已知与未知文字的理解力。儿童所接触到的文字越多，对语言的理解，无论是字面的还是隐喻的，就越好。就此看来，与苏格拉底所担心的相反，儿童更像是苏美尔人。
哈佛的阅读研究学者珍妮·查尔(Jeanne Chall)表示，学习阅读是一个循序渐进的过程，从初级阅读者一路发展到专家级阅读者，我们可以采取“研究自然史或是音乐的方式”来研究。我个人真的很喜欢将阅读各元素之间互相交织的关系想象成音乐:我们最终所听到的是许多演奏家的整体表现，在当中很难区别出个人的演奏，他们早已融为一体。早期的阅读阶段，是我们一生之中唯一可以觉察出各元素的时候，让我们这些早已忘记往事的人，试着回想一下当初是如何读出每一个字的。
阅读的发展
我坐在婴儿床上，假装自己在读书。我的眼睛跟随着每一个黑色的符号，一个都没有跳过，大声地念每一个故事给自己听小心翼翼地发出所有的音节。家人非常惊讶地看着这样的我，总之是非常激动，他们决定是时候教我认识字母表了。我就像初学者那样兴奋，自己偷偷地学习。我带着早已烂熟于心的埃克多·马洛(Hector Malot)的《苦儿流浪记》(No Family)爬上婴儿床，半是预习，半是破解其中文字的意思。我一页一页地读着，翻到最后一页时，我知道怎么读书了。整个人欣喜若狂。 ——萨特
在回忆录《文字生涯》( The Words)中，萨特详细叙述了第一次阅读的情景，以及伴随这段经历的狂喜。虽然层层的记忆会有疏漏，但是萨特的描述与无数个儿童的经验相似，一半靠记忆，一半靠解读地看一本自己喜爱的书，然后突然之间(或者在他们看来如此)就学会阅读了。事实上萨特不断地积累各种知识来源，全面的，片面的，直到“突然之间”跨过了阅读的门槛，他破解了文字的秘密语言。接下来本章将叙述我们成为阅读者这一渐进、动态的变化过程，从像萨特一样兴奋地破解密码，一直到不知不觉地转变成一个完全自动阅读的专家级阅读者。为了组织好这一过程，在本章与第6章我准备将阅读者分成5种类型:
@ 萌芽级阅读者
@ 初级阅读者
@ 解码级阅读者
@ 流畅级阅读者
@ 专家级阅读者
每一种类型代表了阅读发展中，我们穿越未知的动态变化。然而，并不是所有的儿童都经历了同样的过程。著名儿科医生梅尔·莱文(Mel Levine)曾提及“不同类型的心灵”涉及不同儿童学习的不同方式,类似地也有“不同类型的阅读者”，一些人遵循着不同的顺序,在阅读发展过程中开始和停止阅读都和我在此描述的不同。稍后我们将解释这一原因。
萌芽级阅读者
在人的一生中，会有两次知道自己受到每个人认可的时刻第一次是学会走路时，另一次是学会识字时。 ——佩内洛普·菲茨杰拉德
正如第4章所描述的，萌芽级阅读者坐在“宠爱者的大腿上”，在生命最初的5年里，全面地尝试学习各种语音、词语、概念、图像、故事，接触文字、书面材料或是一般对话。这个阶段最重要的一点是，阅读不会平白无故地出现在一个人身上。萌芽期阅读来自于长年的感知、不断增加的概念与社交发展，并且持续地接触到口语与书面语言。
初级阅读者
我可以看见他们彬彬有礼地站在宽宽的书页上这书页，我还在学习如何翻动它穿着蓝色工作服的珍与棕褐色头发的迪克正在玩球，或是探索整个后院的世界，完全没察觉到他们自己就是开始幻想的儿童的第一对主角。 ——比利·柯林斯，《第一位读者》
很少有比看着儿童学会识字，阅读书本上的文字并且理解一个故事更窝心、更愉快的时刻。不久前，我和一位名叫阿梅莉亚(Amelia)的小女孩一起坐在地板上,她十分害羞,就像森林里的小动物一样。她还不会读书也很少说话，更不可能在我这样的访客面前大声念出任何句子。
但是那天注定有事情发生。阿梅莉亚跟往常一样，盯着“猫猫坐在毛毯上”( The cat sat on the mat.)这个短短的句子很长一段时间。她看起来像是一头吓坏了的小鹿。然后,缓慢但很完美地,她口齿清晰地念出了这些字她抬头望望我的眼睛，眉毛开始上扬。然后她开始念出下一个短句，接着再一句，每念完一句都会看看我，寻求确认。念完整个故事，她笑得合不拢嘴，也不再看我以寻求支持。她可以阅读了，她自己明白了这一点。阿梅莉亚的家里没什么书让她阅读，这往后的路恐怕很漫长，但是至少她开始阅读了。
不论阅读的必要条件准备得如何，成长的文字环境如何，老师的教学方法是什么，对阿梅莉亚以及所有的初级阅读者来说，这时候的任务就是破解文字，并且了解其含义。要做到这一点，每个孩子都必须弄清楚几千年来我们的祖先所发现的拼音规则，以及这一路上林林总总的其他发现。
类似地，学习每一件事情——从骑自行车，到理解死亡这样的概念儿童会不断地积累知识，从只有片面的概念，到建立起完整的概念。在初期的努力中，初级阅读者仅能理解部分字母原则。我最喜欢引用马萨诸塞州剑桥的阅读专家梅丽尔·皮查(Meryl Pischa)的一句话，每年她都会问那些莘莘学子同样的问题:为何万事开头难?
总的来说，不论是在幼儿园，还是读一年级，大多数儿童开始阅读时脑中已有一些基本概念，即书本上的文字是带有某种意义的。他们中的绝大多数都见过父母、保姆以及老师读书。然而多数人都还没有一个完整的概念，不懂书中的文字是由我们语言中的发音所构成的，而发音是以字母来表示的，每个字母代表一个或两个特定的发音。
初级阅读者的一大发现和阿梅莉亚逐渐成形的概念一样，即字母和语言中的发音互相联系。这是拼写原则的要义，也是阿梅莉亚往后阅读发展的基础。她的下一步将是学习解读文字中所有的字母音位对应原则，这有一小部分是她自己的发现，但是绝大多数来自于努力。这两项都仰赖3种解码能力:语言学习的语音、拼写与语义。
初级阅读者语音的发展
日常的牙牙学语，尝试破解文字中的字母时发生的逐日的、缓慢的改变，有助于培养儿童的音位意识，这是语音发展中一个相当重要的方面。渐渐地，儿童开始从言谈中听出或长或短的声音单位，像是短语中的几个词(kitty+cat)，一个词中的几个音节(kittty)，词语与音节中的音位(/k/+/a/+/t/)。这一切反过来将进一步促进阅读的发展。
早期音位意识的重要性:在早期学习阶段中，儿童的音位意识是将来在一二年级学习理解文字的关键。在一年级无法顺利解码的儿童，大多到四年级时阅读水平依然
较低。
初级阅读者可以听出并切分大型语音单位。渐渐地，他们能听出并操作音节与文字中更小的音位，这项能力是预测儿童阅读学习成功与否的重要指标。斯坦福的研究员康妮·朱尔发现，在早期学习阶段中，儿童的音位意识是将来在一、二年级学习理解文字的关键。在一年级无法顺利解码的儿童，有88%到四年级时还是阅读水平较低。教师要把握各种机会来帮助儿童察觉文字中的音位，例如押韵的儿歌可以提高儿童的听觉与区分文字韵脚韵首的能力。一些随着词语发音拍拍手、书写或是舞动的简单的“即兴游戏”也很有帮助。
语音的合成需要儿童更强的整合能力。语音的合成指的是混合各个单独的发音，形成更大的发音单位，如音节或单词(如s+a+t=sat)。跟音位意识一样，随着不断练习与越来越多的阅读，这种能力也随之发展。
语音合成的教学方法越来越多。哈莱姆(Harlem)的教育学家乔治·丘尔顿(George 0. Cureton)采用的技巧就非常有趣。他给每个儿童指定一个字母的发音，然后将儿童排成一排，让他们“演出”字音合成文字的情况。想象一下这种情景，第一个儿童发出简单的嘶嘶声/sss/，然后轻推下一个孩子，第二个人要敞开喉咙尽量延长 /a/的字音，再传给下一个孩子，让他发出较不简单的结束音 //。第一轮可能有点混乱，但是在老师的指挥下孩子们的行动变得更快更协调，s-a-t最后就变成 sat(坐下)。
要是文字中只有两个重音，儿童会学得更容易:一个音节的第一个音称为起音;一个音节最后的元音加上辅音，称为韵音(如cat中的at)。按照指示，儿童学习起音(c)再学习韵音(at)，最后将两者合成为一个词。之后，开始学难度较高的起音，然后加上韵音，如:ch+at=chat，fl+at-flat。这样的做法可能比丘尔顿的教学法文明些，不过两者的目标都是一样的:为了让儿童顺利地将发音单位整合起来。语音合成看似简单，但是这妨碍到许多儿童的阅读学习，特别是那些有阅读障碍的孩子。
大声朗读的作用:可以让初级阅读者注意到口语与文字之间的关系，他们可以用这种方式来自学。此外，大声朗读还可以让老师与家长及时发现儿童学习阅读时出现的错误和问题。
“语音再编码”的方法可以帮助初级阅读者提升音位意识能力与语音合成能力。乍看之下这不过就是大声朗读的冠冕堂皇的说法，不过若以“大声朗读”来表示，则其中涉及的两种动态过程又显得过于简单。大声朗读让儿童注意到口语与文字之间的关系。它还是初级阅读者的自我教学方式是“获得阅读能力的必要条件”。
两位波士顿的阅读专家艾琳·方塔斯(Irene Fountas)与盖伊·苏·平内尔(Gay Su Pinnell)延伸了新西兰知名教育专家玛丽·克莱(Marie Clay)的教学方法，很早就指出大声阅读可以将某个孩子常犯的错误暴露给老师与其他听众。大声阅读有助于发现儿童对文字已知什么、还不知道什么，
我永远不会忘记我们是如何发现蒂米(Timmy)，这位典型的一年级初级阅读者，一直念错词语中字母的情况。蒂米把house(家)念成horse(马)，然后继续“读完”他自己编的一个关于马的故事。蒂米自创的有趣故事，跟原本那篇乏味的关于家的文章毫无关联，但是却帮助我们了解了许多造成他错误的原因。
比米勒研究了蒂米这个年纪的儿童所犯的典型错误。他发现年幼的初级阅读者在犯错时，一般会出现三个短暂但相当容易预料到的步骤。首先出现的错误是在语义与语法上正确，但是和真正的词之间没有发音或词形上的相似性(把father念成 daddy)。一旦他们学到部分字母与音位之间对应关系的规则以后，他们念错的词多半都是词形相似，但语义上没有什么关系的(如把house读成horse)。到了初级阅读者的最后一个阶段，儿童犯的错误在拼写与语义上都有一定的恰当性(如把ball 念成bat)。这些儿童很快就会进入顺利解码的阶段，开始整合他们所拥有的从各方吸取的文字知识。非常重要的是，比米勒发现能顺利学会阅读的儿童，从来不会停留在这些早期错误上，而是很快就能摆脱它们。
初级阅读者拼写的发展
英文具备让人愉悦的清教徒式的写作传统，比如以“sh_t”来表示众所周知的骂人字眼。每个人都知道空格处代表的字母是“i”，这种“字母代表”的办法兼顾了品位与拼写正确。这一条横线也彰显出所有视觉符号的任意性，以及一套被广泛接受的语言系统对猜测出当中的每个发音有多么必要。拼写的发展包含学习这一整套约定俗成的视觉符号、常用的字母组合以及看似没有规律的用法。最重要的是，这牵涉到将字母视觉形式和常用字母组合转化成能够自动产生的表征。
儿童一步步地学习这些拼写习惯，从他们坐在年长读者的大腿上或是身边的经验中，萌芽级阅读者学习到英文中的每一行文字都是由左至右而读的，文字中的字母也是如此。接下来的认识则涉及认知而非空间的发现:例如，字母模式的不变性。孩子必须知道，无论何种字体，A永远都是A。类似地，还有些儿童必须学习上标与下标都代表同一个字母
但真正的任务，是要学习英文以多种但特有的字母组合来表现其读音的独特方式。看看两种语言中源自同一个词根的单词:英语中的shout与德语中的 schreien。虽然英语中的sh与德语中的schr有很多相似之处，但在各自的语言中，这些字母却有不同的拼写表征，就像法语中的ois与西班牙语中的lla与n^~a。
初级阅读者在他们自己的母语中吸收全部的常见字母组合，以及许多常用但不遵循语音规则的单词，如have、who，以及在who said yachts are tough?一句中的所有单词。虽然儿童可以依赖他们具备的语音知识来破解绝大多数的常用单词，但还不足以应对少数重要的常用词。这些不规则拼写的单词，通常称为“英语常用词”，它们的发音必须以其自身为一个独立的表征。幸运的是，拼写不规则的单词比通常想象的要少得多，如果你注意到英语规则，大多数拼写不规则的单词，如yacht，也不是完全不规则的。初级阅读者的拼写发展需要多方面地接触文字——多练习。华盛顿大学的神经科学家兼教育学家弗吉尼娅·伯宁格(Virginia Berninger)和她的研究团队记录了年轻的大脑如何通过这些接触形成大多数常用视觉组块的拼写表征，如此一来，像ant这样简单的字母组合，可以眨眼间转变成chant 与jenchantment。
无可否认，这其中需要的不仅仅是眼睛，而是视觉系统拆解辅音群的能力，像chant中的ch，以及拆解词素单位的能力，如enchantment中的en和ment，这将大幅度提高阅读速度。掌握常规元音模式、词素单位与英文中各种拼写模式(例如各种辅音群)，将有助于视觉系统的运作。
据说英语的元音字母是全球语言中使用频率最高的符号。怎么会有人发明出一套书写系统，规定5个元音字母(偶尔加上y)来承担双重或三重任务，构成至少12种元音呢?马克·吐温对英文字母模式的愤怒，每天都会在每间英语教室里出现。下面这首无名氏的诗，正好表现了马克·吐温的愤怒，以及成千上万英语初学者的感受。学习所有的元音对与“元音+r”和“元音 +w”的组合可以解决部分难题;学会各种语义与词语中的常用词素也会加快初级阅读者阅读大量多音节词的速度。
我想你已经明白，
touch，bough,cough 和 dough ?
其他人或许会出错，但是你不会弄不清，
hiccough,thorough,slough 和through ?
做得不错!现在你可能希望
学些其他的把戏?
当心 heard 这个可怕的词
看起来像是 beard，读起来却像是 bird:
还有 dead，说起来像是 bed 而不是 bead，
看在上帝的份上，不要读成 deed!
还要留意 meat，great 与 threat,
它们的韵律类似 suite，straight 与 debt;
moth 的发音和 mother 中的 moth 不一样,
bother和 both，brother和 broth 的关系也是这样。
而在 there 中的 here 也和单独的 here 读音不一样,
以及 dear, fear, bear 和 pear,
接下来还有 dose，rose 和lose,
还有 goose和 choose,
以及cork与 work, card 与ward,
font 与front, word 与sword,
do 与 go, 以及 thwart 与 cart.
来吧!来吧!我真的千头万绪不知从何开始。
一种可怕的语言?为什么人活着这么辛苦，
我从5岁开始学说话:
还要读它，努力又努力，
到了 55 岁还没有学成。
初级阅读者语义的发展
早些时候，我引用了认知科学家斯威尼的有趣研究，大脑每读到一个字就会激活许多可能的意义，即便我们完全没有觉察到这个事实。童年时光最美好的一段便是玩要于各种各样的意义中，若没有经历过这些，真是非常可惜。对某些儿童来说，词义的知识会提升他们理解文字的能力，正如我们在阿梅莉亚的例子中所见到的那样，在开始学习解码的早期阶段每个词都是很大的挑战。对阿梅莉亚与其他上千名破解文字密码的初学者来说，语义的发展扮演的角色与许多人想象的都不一样。在语义的发展中有三项相互关联的原则，超越了所有的教学法的差异，
第一，了解意义能促进阅读。如果儿童能立即知道他辛苦破解出来的那个字的意义，他可能会明白自己所发出的声音即是一个字，也比较容易记住它并且储存在记忆中。正如康妮·朱尔所强调的，在教导阅读时犯下的最大错误之一是，以阿梅莉亚为例，当她终于破解出一个词时，教师或家长以为她知道自己在读什么。大的词汇量使解码文字更容易，速度也更快。
下面这个实验可以证明成人也遵循着同样的规则。请试着大声读出下面的单词:periventricular nodular heterotopia、pedagogy、fiduciary、micronspectroscopy。读这些单词的速度取决于你的解码能力，但同样需要你的背景知识。如果这些单词不在你的词汇库中,你很有可能是用当中的词素(如peri+ventricletar)来猜测这些词义，也以此来改进自己的发音。成人在读到自己理解的单词时，会更容易、更有效率。
第二，阅读促进词汇知识。对大多数儿童来说，词汇量可以给他们提供特殊的“帮助”。就像临床语言学家丽贝卡·肯尼迪(Rebecca Kennedy)主张的那样，词汇是学习阅读时免费获得的礼物。有时候我会要求我的学生解释一个表示特定症候群的术语，比如agoraphobia(广场恐惧症)这个单词。如果他们有所迟疑，我就会给他们含有这个单词且前后有所关联的一句话:“斯巴克医生那位患有agoraphobia的病人拒绝参加在开放的演讲厅里举行的团体聚会。”这句话每次都能给他们提供足够的信息，让他们对这个单词有进一步的认识。
我们利用语境的能力是通过阅读潜移默化得到的。随着文本难度的提高，初级阅读者们运用他们部分的概念以及“推导”和“语境化”能力将许多词语归类到已建立的类别中，从而增加他们的词汇量。当一个人了解到儿童在学校期间必须学会大约88700个单词，而且其中至少有9000个单词需要在三年级以前学完，就完全能明白儿童词汇的发展有多么重要，第三，多重意义强化理解力。回过头来看看早期阅读发展的两个故事我们就会发现同样的结论。路易莎·库克·莫茨计算出进入一年级的儿童在具有语言优势的和处于语言劣势的孩子之间差了约15000个单词。这些处于劣势的孩子怎么可能赶得上呢?在课堂上清楚地教授词汇可以解决部分问题，但即使是在简单的故事里，初级阅读者除了要懂得字面上的意义外，还需要学习更多。他们也需要对一个单词在不同语境中的多种用法与功能，有更多明确的认识与弹性变通能力。他们需要知道虫子会在人的身上蠕动、纠缠、爬动、侦查，还要对此感到很自然。
阅读可促进词汇知识的发展:回想小时候的语文课老师很少直接解释生词的意思和用法，母语中大部分的词汇知识都是我们阅读时在语境中学到的。
我的合作研究员斯蒂芬妮·戈特瓦尔德(Stephanie Gottwald)详细描述了我们研究工作中接触到的许多问题阅读者，他们对一个英语单词竟然可以有许多意思的想法感到非常恐惧，当教导bug、jam、ram与bat等单词时，他们的第一反应是:“你一定是在开玩笑吧!”当年轻的新手解码级阅读者知道词语和随口的玩笑或双关语中的话语一样会有多重意义时，理解力会提升许多。文字有多重用途的概念使初级阅读者想要从所读的内容中推敲与得到更多的意义，这正是下一阅读阶段的重心。但是在此之前，先让我们看看在阅读 bat、rat 或 bug 这些简单的单词时，大脑是如何开始破解文字的。
初级阅读者的大脑
姑且不论初级阅读者破解或理解的能力如何，卡特·斯图德利(Cat Stoodley)描绘了他们的大脑在读一个单词时的状况(见图5-1)。跟成人的通用阅读系统一样，幼儿阅读时也会动用三大区域。幼儿阅读脑的主要工作是连接这些区域。和成人不同的是，幼儿的大脑激活的第一个大片区域在枕叶(即视觉与视觉联合区)，以及枕叶深处一块与题叶相关联的在进化上相当重要的区域:梭状回。重要的是，这个时候两个半脑都出现高度激活的情形。
乍看之下并不合理，但请想想学习任何技能之前的情况。一开始学习任何技能都需要动用大量的认知与运动过程，涉及许多神经区域。逐渐地熟能生巧后，就不需要下这么多认知工夫，神经通路也变得更直接和有效。这是脑部朝专门化与自动化的方向缓慢地发展。
图5-1 早期阅读脑
第二个大片区域也是横跨两个半脑，但在左半脑较为活跃，涵盖颞叶与顶叶的诸多区域。近来，华盛顿大学的神经科学家发现，儿童使用几个专门化区域的情形比成人更多，尤其是角回与缘上回，这两处是将语音过程与视觉、拼写及语义过程相整合的重要结构。儿童大脑题叶中韦尼克区(Wermicke’s area)也被高度激活，这是一个负责语言理解的基本区域，最有趣的是，这两个在通用阅读系统中使用的区域，除了在特定情况下之外，以儿童使用较多。当成人在遇到困难的单词时，动用这些区域的程度会多于儿童，这时我们便后退到孩提时代的策略，就好像许多人刚刚试着去读 periventricular nodular heterotopia 时经历的那样。
额叶的许多部位，特别是位于左半脑的语言区，即布洛卡区，是儿童动用的第三大区域。这很合理，因为额叶本来就在各种执行过程中扮演着重要的角色，比如记忆过程以及各种语言的处理过程，如语音、语义等诚然，成人阅读者激活的额叶区域更多，这些区域与更为复杂的理解与执行过程有关。大脑下层的其他区域在儿童与成人身上都扮演着活跃的角色。举例来说，小脑(cerebellum)与多功能的丘脑(thalamus)–大脑的交汇区,连接着五大分层。小脑的意思是“小型的脑”，人们阅读时许多运动技巧、语言技巧的使用时机和准确度都与小脑有关。
布洛卡区:位于左半脑的语言区。布洛卡区为语言的运动中枢，主要功能是编制发音程序。
总之，任何一个看到年幼初级阅读者大脑的第一张图像的人都应该感到震惊。打从一开始，大脑便展现出产生新联结的能力，那些原本设计成负责其他功能——特别是视觉、运动与许多语言层面的区域正在加速相互之间的交流。等到孩子七八岁时，年轻的大脑开始解码，同时展示出它所能达到的成就，以及我们离最初代币阅读者的进化距离有多远。这三大分布区域将会成为基本解码，甚至提高流畅度(这是下一阶段阅读者的特征等所有阅读发展阶段的基础，这是阅读进一步发展的标志，为阅读脑尚未展开的图景增加一点趣味性的说明。
解码级阅读者
如果你去听解码阶段儿童的话语，你会听出一些不同之处。阿梅莉亚阅读时痛苦(可能会伴随些许兴奋)的过程已经消失了。解码阶段的阅读者声音较为平稳而有自信，他们即将成为流畅的阅读者。
我最喜欢的解码级阅读者是一位叫范(Van)的越南男孩。第一次遇到他是在莫尔登(Malden)暑期学校，我们研究中心的人员在那里教导需要加强语言技能的儿童。4周的时间内，在独具慧眼的老师菲莉丝·希夫勒(Phyllis Schifer)的指导下，范从一个原本要被老师留级的二年级初级阅读者，转变成同等于甚至超越他这个年级阅读测试水平的能手。范在暑期学校开始时费劲阅读的情况完全消失，如今他不仅能够注意到文字的韵律成分，也会花更多时间来理解所读的内容。范的朗读极富表现力，他也几乎完全理解自己所读的内容。范从刚学会解码时犹犹豫豫的孩子，转变成几乎具有三年级学生水平的、完美的半流畅型解码级阅读者。有了优异的阅读测试成绩，我们只花了一点工夫就说服了范所在学校的校长和老师他们立即同意让他继续三年级的学习。我们和他的家人都非常高兴。
但后来范的故事出现了奇怪的转折。接下来的一个暑假，范又回到我们的暑期学校。主持暑期学校计划的两位优秀教师凯瑟琳·唐纳利·亚当斯(Katharine Donnelly Adams)与特里·约菲·贝纳耶(Terry Joffe Benaryeh)被告知范有退步的危险。这次他们依旧安排希夫勒老师来指导他，奇怪的是范读给她听的时候十分流畅。学校主任和我对此非常不解。最后希夫勒老师将他拉到一旁，问他为什么明明朗读得很好，学校里的三年级老师却认为他的表现差。他害羞地回答道:“不然我怎么回到暑期学校?”我们当中还没有谁遇到过假装的阅读障碍者:范是第一个。
解码级阅读者的语音与拼写发展
在阅读半流畅的时期，阅读者的词汇量至少要增加3000个，之前学的 37个常用字母组合已不够用了。要做到这一点，除了要接触下一阶段的常用字母组合，还要学习麻烦的元音韵脚变化与元音字母组合。读一读下面这段文字，想想看这些相当常见的单词中元音字母ea的变化，以及各种可能的发音:
There once was a beautiful bear who sat on a seat near to breaking and read by the hearth about how the earth was created. She smiled beatifically, full of ideas for the realm of her winter dreams.
这一堆 ea 双元音的各种发音解释了有些教育者在教英语拼写时的无奈心理，让儿童自己在文章中学习一切，尽管这样做没什么效果。但是如果你仔细考虑一下整个单词中的字母组合，你就会发现一些常见的规律。举例来说，当ea后面接r时，通常只有两种可能(如bear与dear)，但是后面接m、n、p或t时，通常仅有一种可能。对半流畅型的解码级阅读者来说，这一阶段最主要的任务是学完组合后的字母模式，从入门程度进展到认识组成单词的元音字母的“视觉组块”。此外，他们得学会自动目测出这些区别。
“视觉单词”为初级阅读者的成就添加了重要元素，而视觉组块则会促进处于半流畅阶段的解码级阅读者的发展。儿童看出beheaded是be+head+ed 组合的速度越快，辨别文字的能力也越强，越能将这些词语整合起来。顺便说一下，在进行下一阶段阅读时，这一现象比你想象的还要多很多。
解码级阅读者的语法、语义及词法发展
儿童对“文字组成”的认识非常重要，这将让他们从基本的解码发展到流畅的阅读。“两种童年的传说”可能在此重新改写，也可能就此保持一生。阅读研究者基思·斯坦诺维奇(Keith Stanovich)以《圣经》中的马太效应(matthew effect)来描述阅读发展与词汇之间建设性或者破坏性的关系，在文字世界里也是富者越富，穷者越穷。词汇量丰富的儿童，能自动认出旧词，同时飞快地累积新词，一方面来自于纯粹的基础，另一方面则是从新的语境中推敲出新词的含义与功能。这些阅读者准备好进入流畅阅读的阶段了。
但在词汇贫乏的孩子身上，他们“发育不良”的语义与语法对其口语与书面语言都有影响，如词汇没有发展，那些一知半解的单词就永远不能被熟悉，他们也学不会新的语法结构。流利的单词识别能显著地推动词汇和语法知识的发展。若儿童很少或者从未接触与使用这些词语，面对即将变得日益复杂的材料，解码级阅读者掌握起来就很困难.
对词汇贫乏的孩子来说，现实更加严峻，因为一般很少有人去讨论伊莎贝尔·贝克(Isabelle Beck)和其同事们最近描述的现象:在大多数课堂上，老师很少会清晰地教授词汇。了解“文字组成”的儿童，阅读水平要领先其他塱证儿童很多年。
随着阅读与拼写渐渐地发展，儿童不知不觉地学会许多单词的内部组成，了解词干、词根、前缀、后缀等构成我们语言的语素。儿童已经认识了常用的“附着词素”，如s(表复数)、ed(表过去式 )，这些词素经常会附在另一个词后头(moons是由moon与s这个附着词素构成的)。解码级阅读者会接触到许多类型的词素，如前缀(un、pre)与后级(er、ing);而当他们学习这些“视觉组块”时，阅读力与理解力都会加速成长。
例如，孩子在潜移默化中学到一些有可能改变一个词语法功能的语素:比方说在动词 sing(唱歌)后面加上er就会变成名词 singer(歌手)。他们也开始理解许多词虽然发音不同，但当中所含有的相同词根还是会传达出相关的意思，如sign(签名)、signer(签名者)、signed(签名的过去式)、signing(签名的现在式)、signature(签名的名词)。
但是，儿童几乎不曾接受过英语这套“词素音位”书写系统后半部分内容的明确指导。正如词法专家马西娅·亨利(Marcia Henry)所提出的诸如sign与signature这些词，正是对儿童说明英文书写系统中词素音位特性的最佳范例，也正好可以说明那些看似不和谐的无声字母，如sign中的g与muscle 中的c。词法知识是儿童发现“文字组成”的一个美好面向，也是各种辅助流畅理解文字的方法中最少被探索的。
“危险时刻”:迈向流畅理解
也许只有在童年时，书才对我们的生活有很深的影响……我记得很清楚，突然之间就像钥匙打开了锁，我发现我会读书了不是那种阅读课本上的像火车车厢般一组组的音节组成的句子而是一本真正的书。那是一本平装书，封面上有个男孩，被绳索绑着，嘴巴被堵住，吊在井里面，水已经淹到他的脚踝–这是侦探狄克逊·布雷特(Dixon Brett)的探险故事。整个暑假我都守着这个秘密，不想让任何人知道我会读书了，我想即便是在那个时候我也有点意识到，这是个危险时刻。 ——格雷厄姆·格林
过去我写了许多与流畅度有关的文章。我和来自海法(Haifa)大学的同事塔米·卡茨尔(TamiKatzir)一起，写出了对流畅度发展性的新定义，在这里我想讲的其实很简单。流畅度与速度无关，而是儿童能够动用他们对一个词的全部知识，包含字母、字母组合、意义、语法功能、词根与词尾等，要快到让他们有足够的时间思考与理解。跟一个词有关的一切都有助于阅读它的速度。
因此，要变得流畅的关键在于阅读–真正的阅读，与理解。解码阶段的末尾会直接进入格林(Greene)所谓的“危险时刻”，以及金凯德和昆德伦所描述的“平行的世界”。这时候，儿童能非常快速地解码格林所谓的“火车车厢般一组组的音节”，足以推测当中英雄的处境、预测坏人的下一步行动、对女主人公的痛苦感同身受，并且深思他们正在阅读的内容。当然，解码级阅读者还很稚嫩，才刚开始学习如何运用他们不断增长的语言知识与厘清文本的推理能力。约翰·霍普金斯大学的神经科学家劳丽·卡廷(Laurie Cutting)表示，在这些孩子身上，有些非语言的技能有助于阅读理解:例如，通过工作记忆等获得主要执行功能;通过推理和类比等获得理解技能。工作记忆为孩子提供一种临时性的空间来存放字母与文字的信息，刚好让大脑有足够的时间使之与孩子日益增进的概念信息相结合。
非语言技能:有些非语言技能有助于阅读理解。如通过工作记忆等获得主要执行功能，通过推理和类比等获得理解技能。
随着解码级阅读者的成长，他们的理解力已经和这些执行过程、字词的认识以及流畅度密不可分，彼此相关。流畅度的逐渐提升让孩子能够进行推理，因为这延长了他们进行推理与思考的时间。流畅度并不确保有更好的理解力，但是会提供整个执行系统额外的时间，好将注意力直接放在最需要的地方，诸如推测、理解、预测或者回过头修正前后不一致的理解或是重新赋子一种意思。
举个例子，在《夏洛的网》(Charlotte’s web)一书中，解码级阅读者必须明白要是没有夏洛帮忙，小猪韦伯将会有什么样的命运。但是怎样让儿童准备好去理解这个帮助背后错综复杂的关系呢?在这个阶段的阅读中儿童开始学习如何在故事中的明喻与暗喻之间进行推理。这是儿童第一次学习“超越已知信息”。这只是一个开端，最后将对阅读脑做出重大的贡献–思考阶段。
但是有时候，这个发展阶段的儿童也需要知道，要想正确地理解，必须回头再次读一个单词、句子或是段落。知道何时重读(比方说修正之前错误的解释，或获得更多的信息)以改善理解，我的加拿大同事莫琳·洛维特( Maureen Lovett)称此为“理解力监测”。她对儿童元认知能力的研究特别是对他们思考自己理解文本能力的研究，强调了这个发展阶段的两个方面，一是儿童能够在无法理解某个事物时改变策略，二是在促进这类改变时，教师扮演着重要的角色。在这个阶段的最后，解码级阅读者能够在阅读时以新的方式思考。
最大程度地投入
任何年纪的阅读者，尤其是儿童，必定会遇到这样一种情况:阅读时不仅会参与整个故事，还会身陷其中，最为强烈的是感官经验被限制在故事里。——伊丽莎白·鲍恩
正如每位老师都知道的那样，情绪上的投入通常是能否进入阅读生涯的关键，有些儿童可能就此打住，停留在童年的阅读水平，阅读仅仅是一种看懂事情的方法。在我们能够记忆、预测与推理之后，我们的感觉与认同会强烈影响到童年时理解力的发展;在这个过程中，我们能够更完整地理解，迫不及待地翻到下一页。从解读良好进展到解读流畅的儿童，通常需要来自学校老师、家庭教师与父母的真心实意的鼓励，才能努力面对日益困难的阅读材料。这就像阿梅莉亚需要我来肯定她的努力，范需要希夫勒老师的支持那样。
不过，感觉还有另一个维度:儿童让自己完全进入《夏洛的网》的能力，或者投入任何故事、任何书籍的能力，“最大程度地投入”。在学会使用所有的字母与解码规则后，在掌握了文字隐藏的生活之后，在各种各样的理解过程展开之后，这种投人的感觉能使儿童终生热爱阅读，培养他们成为理解型阅读者。
这种永久保持新鲜感的能力形成了阿德里安娜·里奇(Adrienne Rich)“跳跃到超级技巧练习曲”的基础，也形成了阅读发展的最后几个阶段的基础，这几个阶段使我们变成了今日的我们。没有经历过这种跳跃的儿童永远不会知道在伊利诺伊州埃尔多拉多小镇上，坐在教室第三排的那个小女孩，第一次被加冕、第一次结婚，以及第一次被王子亲吻的感觉。
第6章阅读者的五大进阶(2)
我心里明白要是我能一路读回去，巨细靡遗地分析童年时阅读的所有书籍，便可以找到一切的线索。孩童就好像是住在书里一样，而其涉入的程度就是书在孩子生活中的分量。 ——伊丽莎白·鲍恩
我想要享受自己独处的甜美时光。 ——卢克，9岁
在我们实验研究的参与者中，我最喜欢的是一位叫做卢克(Luke)的小男孩。参与者们因为各种理由加入我们的治疗计划，而他是以最不寻常的方式参加的。一般来说，适合参与我们研究的阅读困难儿童都由他们的老师推荐，并经过诸多复杂的测试，但卢克不是。他是自荐来参加我们的治疗计划的。当我们问他原因的时候，他很少回答，只是说:“我必须读完咏叹调，但我就是记不住它们!”原来卢克是波士顿儿童歌剧团的团员他是一个有天赋的歌手，但却跟不上其他孩子阅读歌词的进度。
学校的老师认为卢克的阅读能力很好，只是有点慢而已，因此没有推荐他参与我们的治疗研究。他们并没有注意到，尽管卢克可以正确地阅读但是他的表现和努力之间存在着很大的差异。在经过一系列的测试后，临床经验丰富的研究助理凯瑟琳·比德尔(Kathleen Biddle)冷静地说道，她从未测试过这样的儿童，在认识字母和阅读单词上需要花如此长的时间卢克在这方面的问题非常严重。接着她还说，卢克的智力和他的阅读测试成绩之间的落差相当惊人。
在治疗计划中，经过我们的努力，卢克终于学会流畅地阅读，能读完他的咏叹调歌词，并从解码阅读转变到流畅阅读。但在这个过程中，他告诉我们，在阅读的高级阶段，要从正确地阅读迈向流畅地阅读有多么困难。
许多儿童从未完成这样的转变，原因各异，但都不同于卢克的阅读障碍。最近美国国家阅读委员会(National Reading Panel)的一份报告提到:美国国家报告卡显示，有30%~40%的四年级儿童无法完全流畅地阅读无法恰当地理解所读的内容。这是一个恐怖的数字，再加上教师、教科书作者甚至整个学校系统对四年级学生的期望各不相同，情况变得更加糟糕。
基于某种认识，一套教学方案中整合了这种方式:一到三年级的儿童是“学习阅读”(learn to read)，四至六年级则是“通过阅读来学习”(read to learn)。在儿童三年级结束之后，教师会期待他们有足够的自动阅读技能，能够在日益困难的文本材料中,“靠自己”来学习越来越多的知识。当我教导学生的时候，我也持有这种期待。虽然这不是四年级教师本身的错，但他们大多数从来没有学过如何去教导那些无法流畅阅读的儿童。
在美国的教育中存在着一项近乎隐形的议题:能够正确阅读却无法流畅阅读的三四年级小学生的命运。除非及时处理这些问题，否则这些儿童的未来注定蒙上尘埃。目前对发展性的阅读障碍及其治疗有相当多的认识但对于那些无法流畅阅读的儿童，这类一般性的问题我们知之甚少。有许多原因会造成这样的结果，例如环境不好、词汇量缺乏，或是教学方法不符合他们的需求。
这些儿童有些可以成为解码级阅读者，但是阅读的速度还是不快，且无法理解他们阅读的内容。他们当中有些与卢克相似的儿童，有着未被诊断出的“处理速率”型的阅读障碍，稍后我们会详加讨论。不论何种原因，我们的儿童中有近40% “未能发挥学习潜能”，这是对人类潜能的一大浪费，也是美国教育的一个黑洞，有越来越多的孩子掉进这个半文盲的地狱深渊。
流畅级阅读者
孩提时代多半的时光是为他人而活的……当我是个孩子的时候，每当黑暗降临，我就会关紧门，坐在床上读书，这是一种反抗的举动，是完全为我自己所做的事，也是唯一的一件。那是我做回自己的方式。 ——琳恩·莎伦·施瓦茨
在中学学校的书架上，很少有比《吉尼斯世界纪录》所更受欢迎的书。这本书将众多匪夷所思或惊险刺激的事迹分门别类，使其便于查询，正好可以用来比拟新的流畅型阅读脑。处于流畅理解阅读阶段的阅读者，正在通过各种渠道来学习建立他们个人的知识库。
阅读《吉尼斯世界纪录》这类书籍的儿童，通常解读很顺利，而且毫不费力，要是没有脑成像技术，我们根本无法知道他们的大脑是如何运作的。这时候的老师与父母会因为儿童流利的读书声而相信他们了解所读的每一个字。
苏格拉底所抨击的正是书面文字无法“作出回应”这种沉默的情况。因为解码并不意味着理解。即便一名阅读者理解内容里的许多事实，但是这一阶段的目标更为深远：增进理解字词各类用法的应用能力，如反讽、语态、隐喻与观点表达，这些都已经超越了对字面意思的理解。随着阅读的需求不断增加，好的阅读者发展出的比喻与反讽等语言知识，会帮助他们在文本中发现新的意义，促进他们超越文字本身来理解。
正如心理学家埃伦·温纳在《单词的意义》中所描述的，隐喻是“了解儿童分类技能的一扇窗户”，而反讽则描绘出作者独特的世界观。举例来说，看看马克·吐温《哈克贝利·费恩历险记》中的一段文字。马克·吐温独特的反讽幽默与隐喻让许多年轻阅读者理解起来比较困难，甚至有时候不能理解。在下面这段文字中，哈克和他的朋友吉姆乘着木筏一同在密西西比河旅行，吉姆是一个逃走的奴隶，可能随时被抓捕。在一群人盘查吉姆的身份时，为了使吉姆逃脱，哈克灵机一动，让吉姆假装得了天花，当别人急忙躲避时，哈克又被焦虑困扰着：
他们都走了，我也上了竹筏，但是感觉很糟，情绪低沉，因为我很清楚地知道自己做错了，我明白试着做些对的事情也于事无补；人在小时候刚幵始时没有做对，紧急关头来临时，不会有什么可以支持他，让他信守承诺，所以会挨揍。我又想了一分钟，然后对自己说，等等，假设你选择对的路走，放弃吉姆，你的感觉会比现在好吗？不会，我说，我会觉得很糟糕，就像现在一样。既然如此，我想，那你干嘛要在做对的事情会导致麻烦，而做错的事情却不会造成任何麻烦，但付出的代价一样的时候，学习做对的事情呢？我真不知该如何是好。
哈克混乱的逻辑与自我谴责正是马克·吐温的高明之处。刚刚成为流畅级阅读者的儿童会从马克，吐温的反讽与他富有表现力的画面和隐喻中读出言外之意，欣赏作者试图传达的弦外之音。对于刚从简单的掌握内容到发现言外之意的年轻读者来说，奇幻和魔法故事是再理想不过的读物了。
想象托尔金在《魔戒》中描绘善恶的诸多画面。中土、纳尼亚与霍格沃兹的世界正是培育隐喻、推理、类比等技巧的温床。因为正像你在这些地方看到的那样，没有什么是永恒的。要如何逃避戒灵与巨龙，如何做出正确的行动，都取决于一个人的智慧。在哈克和佛罗多一连串艰辛的旅程中，无论他们的挑战多么困难，他们都学习采取各式各样的作为来应付，而一路相随的年轻阅读者们也是如此。
奇幻世界对刚刚从较为具体的认知处理阶段走出来、准备建构概念性认知的儿童来说是最完美的环境。阅读生命中影响最为深远的时刻，有着 “苏格拉底式对话”的转化性效果。这发生在流畅级阅读者学习进人故事中的男女主角生活的时刻，可能是沿着密西西比河，或是穿过衣橱。
在这样的地方，儿童理解力的成长十分惊人，他们在其中学习联结先前的知识、预测结果的好坏，在每一个充满危险的角落进行推理，修正他们理解的漏洞，并且解释每一个新的线索与启示，或者以新增的知识来改变旧有的认知。为了练习这些技能，他们学习在一个单词、一个片段或是一个想法中层层分析，挖掘深层的意义。在这个漫长的阅读发展阶段中，他们从了解文本字面的意义，进展到探索文字背后令人惊奇的领域。
阅读专家理查德·瓦卡曾描述过从“流畅的解码级阅读者”转变到“策略性阅读者”的这段转变：“阅读者知道要如何在阅读前、阅读时及阅读后激活先前的知识，决定文章的重点，整合信息，从中推论，提出问题，自我检测并且修正错误的理解。”
这段旅程通常会一直持续到青少年时期，一路上会遭遇许多障碍，就像佛罗多、哈利，波特、吉姆与哈克所遭遇的那样。初中的年轻阅读者从一开始就必须学会以新的方式进行思考，虽然有许多儿童都准备好了，但是也有许多儿童还没有。
这个步骤是如何发生的？著名教育心理学家迈克尔·普雷斯利提出了一个论点，他认为有两项因素对流畅的理解最有帮助，一是学习主要内容部分时老师对儿童的明确指导，二是儿童自身对阅读的渴望。学生和教师进行对话有助于他们询问自己关键性的问题，从而获得他们所读书本的本质。以安妮玛丽·佩林克萨与安妮·布朗的“相互教学法”为例，老师要尽力协助学生学会询问自己不理解的部分，总结整篇文章，找出主题，归类并且推测接下来将要发生什么。要是成功的话，这种“苏格拉底式对话”的变体会让学生终生受用，帮助他们从日益复杂的文章中提取出意义来。
流畅级阅读者：他们目标是学会理解反讽、隐喻，超越文字表面。对他们来说，奇幻文学是理想的书籍。在霍格沃兹和纳尼亚的世界里，儿童的理解力会有惊人的提升。
儿童对阅读的渴望反映出他们沉浸在“阅读生活”中的程度。在儿童先前的发展中，只有当认知、语言、情绪、社交与指导因素等一切都具备时，才能够理解文本。而普鲁斯特所描写的沉浸在阅读中的“神圣的愉悦”会将儿童再往前推一步。在卡洛斯·鲁伊斯·萨丰的《风之影》中，描写了令人印象深刻的一幕，把这种观念带入了生活。书中年幼的男主角丹尼尔正被他父亲带往一个神秘的图书馆, 这是他第一次对书本有更深的体会，他父亲要求他找出他“自己的书”：
欢迎光临遗忘之书墓园，丹尼尔！每本书都有自己的灵魂，作者的灵魂以及和它一同生活、一起做梦的读者的灵魂。每一次被借阅，每一次某个人的眼睛注视着它的书页，书的灵魂就会再一次成长，再一次增强。
丹尼尔的父亲清楚地表达了我们沉浸在书本中的那份神奇特质，告诉我们书本拥有自己的生命，而阅读者只是稍作停留的受邀客人，而不是相反。丹尼尔对他那本“遗忘之书”的着迷，带出整本书其余的情节，给我们展现了阅读者如何彻底地进入“书的生命”，其整个人生也因此改变。
知道青春、敏感与害怕是何种感觉的阅读者比较容易理解丹尼尔的生活，了解丹尼尔的反应则增加了阅读者的阅历。通过角色认同，年轻阅读者拓宽了他们生活的边界。在每一次深层的悸动中，他们都会学到一些新的并且终生难忘的东西。如果被放逐到无人岛，我们当中的哪一个人不会想到鲁滨逊的故事？在遇到一个骄傲自大的男子时，谁不会联想到简，奥斯汀笔下的达西甚至暗自希望能发掘出他内心潜藏的善良？此外还有许许多多熟悉的角色，我们认同这些角色的能力有助于我们自身的建构。
让我们和书本共舞，在阅读生涯的每个时期，都潜在地改变我们自己。但是在自主性与流畅理解力成长的时期，我们的可塑性最强。在阅读的第四个发展阶段，年轻人的任务是学习为自己的生命而阅读。随着内容领域的数量与日俱增，无论是在教室里，还是在学校之外，阅读生活都成了一个安全之所，供年轻人探索千奇百怪的想法与感受。
流畅而敏感的大脑
流畅的阅读脑必须独自完成一段大脑皮层的旅程。不仅要扩充解码与理解的能力，还要产生前所未有的细腻感受。正如将理论神经科学转化为实用教育方法的杰出翻译家戴维·罗斯狀所言，阅读脑的三项主要任务是模式识别、规划策略以及感受情绪。流畅级阅读者的脑成像图清楚地显示出主管我们情绪的边缘系统和认知区联结部分逐渐被激活。这套系统位于大脑皮层的下方（见图6-1），掌管我们在阅读时感受愉悦、恶心、恐惧与兴奋的能力，进而能够理解佛罗多、哈克与安娜·卡列尼娜的经历。正因为有这样的情感上的影响，我们的注意力与理解力过程才能被激发或是被麻痹。戴维·罗斯提醒我们，边缘系统这个区域在帮助我们决定阅读的优先顺序，评价所阅读的内容。
图6-1 边缘系统
正如我们在较为年幼的儿童身上所看到的，越是费力阅读，大脑激活得就越多，而且通常激活的区域也更大。记得之前提到，大脑两半球视觉区动用的大量皮质层，以及从视觉区到上颞叶、下顶叶最后到额叶这一条较为缓慢、效率偏低的路径，反映出年幼的大脑在辨认字母与单词方面的努力程度。图6-2描绘出这条传导缓慢的路径—— 背侧路径，允许儿童有时间认识一个单词中的音位，也允许其有较多时间来察看与单词有关的各种表征。因此，年轻的大脑会在解码上花费更多的时间。
图6-2 流畅理解中的大脑（背侧与腹侧路径）
流畅理解型的大脑则不需要花这么多时间，因为大脑中专门化的区域早已学会表征重要的视觉、语音与语法信息，并以极快的速度提取这些信息。根据耶鲁大学、哈斯金斯实验室以及乔治城大学的肯·皮尤、丽贝卡·山达克等神经科学家的观点，儿童阅读得越流畅，他们的大脑越倾向于用左半脑中效率较高的系统——腹侧途径，来取代两侧半脑的共同激活。
流畅理解型大脑：大脑中专门化的区域已经学会表征重要的视觉、语音与语法信息，并以极快的速度提取这些信息。
这套流畅的阅读路径一开始会比幼儿所用的视觉区与枕叶-颞叶区更为集中与直接，接下来会用到颞叶中区与下区以及额叶部分。随着我们对每一个单词日渐熟悉，就不再需要费力来分析它了。我们所存储的字母模式与字词表征，在大脑尤其是左半脑中，会激活一个速度更快的系统。
看似矛盾的是，这种基本解码过程在左脑专门化发展，反而激活了更多两侧半脑来处理意义与理解过程。这样的转变反映着阅读与人类发展的改变。我们不只是信息的解码级阅读者。
这时候流畅理解型的大脑即将获得阅读脑进化中最得天独厚的一份礼物一时间。当解码几乎自动化，年轻的流畅型阅读脑，每一毫秒都在学习整合更多隐喻、推理、类比、情绪背景和经验知识。在阅读发展中，大脑阅读的速度第一次可以快到足够进行思考和体验情绪。这份时间的礼物，是我们得以思考“世间万物，美好至极”的生理基础。在阅渎的行为中，没有什么比这更重要的了。
专家级阅读者
要彻底分析阅读时我们究竟在做些什么，恐怕只有心理学家可以完成。毕竟这需要描述许多人类大脑运作中最错综复杂的层面，并弄清一个个复杂的故事。这些故事揭示出文明在自身历史中学到的、意义最为重大的表现。 ——埃德蒙·休伊爵士
在前言中我曾提到，埃德蒙，休伊爵士在上面这段文字中捕捉到了完全流畅的专家级阅读者，如何在阅读的进化中体现出所有文化、生物与智能的转变，以及在阅读者本身的“自然史”中所有认知、语言与情感转变。休伊爵士1908年的这段话可能是有史以来对阅读最为清晰的描述。现代的认知神经科学则强化了休伊的猜想——在仅仅半秒钟的阅读中，就动用了大量复杂而广泛的大脑网络。
半秒钟几乎就是专家级阅读者花在辨认出任何单词上的时间。在迈克尔·波斯纳与其他许多认知神经学家的研究基础上，我现在画出完全进入专家级层次的阅读脑运作过程的时间轴（见图6-3）。因为阅读中的各种过程都是相互作用的，任何一种将阅读线性化的概念（如时间线）都必须经过质化。有些是平行发生的，有些是先激活，然后在需要将增加的概念信息进行整合的时候再次激活。举个例子，观察你在阅读下面这句话时发生了什么，“船头被一个巨大的红色弓形物体所覆盖”（The bow on the boat was covered by a huge red bow.），大多数人在boat获得额外的概念性信息后，不得不回头第二次读再次激活这个单词，以判定词义。
图6-3 阅读时间轴
这条时间轴呈现出的那个时刻正是我所期待的：认知、语言与感受历程，以及多处脑部区域与用于阅读的数亿神经元几乎瞬间融合到一起。接下来，这些描述较为专业，也许并不适合每一个人。如果读者想跳过这个部分，可以翻到“语言、语法与词法进程”部分，直接了解为何这一切会在你和每一个专家级阅读者身上产生如此非凡的影响。
认知——每个单词都有500毫秒的辉煌时刻
（1） 0~100毫秒：将专家级阅读者的注意力转移到字母上
一切阅读都始于注意力——实际上，是好几种注意力。当专家级阅读者注视一个单词时（如bear），会首先进行3项认知操作：
@从我们正在做的事情中抽离出来；
@把我们的注意力转移到新的焦点上（将我们带入文本〕；
@专注于的字母与单词。
这三个步骤是再次定位注意力的网络，脑成像研究显示这三项操作分别动用到脑部的不同区域（见图6-4）。注意力的脱离要动用顶叶后侧的区域；注意力的转移涉及中脑中负责眼动的区域上丘；而专注于某个东西则要动用我们的丘脑，它相当于大脑的内部交换机，负责协调大脑中五大区域传出的所有信息。
图6-4 注意力网络
还有另一套注意力网络对阅读的每个阶段都极其重要，一般被称为执行注意力网络。执行系统位于额叶深处，占了相当大的一块被称为扣带回的区域，这个区域的前侧与许多阅读专门化功能密切相关：指导视觉系统聚焦在给定字母或单词的独特视觉特征上（比如初级阅读者必须注意到bear中b的方位）；协调其他额叶区域传来的信息，尤其是有关字词意义的语义过程的部分（比如说考虑你是否想要一个bear hug）；以及控制工作记忆这类特殊记忆的使用。
认知科学家并没有将记忆看做单一种类。大多数人所认为的记忆，即我们回想个人信息与发生在我们身上各种事件的能力，心理学家称之为情景记忆，以和代表我们脑中存储的字词与事实的语义记忆区分。他们还区分出陈述性记忆与程序性记忆。陈述性记忆是从我们知识库中提取知识内容（what）的系统，比如独立宣言是于何时签署的；程序性记忆是我们知识中的动作技能（how），比如说怎样使用录影机，怎样骑自行车，或怎样钉钉子。
语义记忆：人脑中存储的字词与事实。包括陈述性记忆与程序性记忆。陈述性记忆是从我们的知识库中提取知识内容的系统，程序性记忆是我们知识中的动作技能。
下一种记忆类型对识字最有帮助——工作记忆。工作记忆是当我们必须暂时掌握信息时所使用的，如此才能用以来执行一项任务。这是我们的“认知黑板”或是“便笺本”。工作记忆是专家级阅读者的关键，确保我们可以在大脑中暂时记住一个词初始时的视觉形式，让我们有足够的时间加入与这个词有关的其他信息（如字义或语法运用）。
流畅级阅读者在辨认一串字符时，尤其是含有重要的语义与语法信息的字符，他们会同时使用工作记忆与联想记忆。联想记忆会帮助我们回想起长久以来存储的信息，比如我们的第一辆自行车、我们的初吻与其他值得记忆的时刻。
（2）50~150毫秒：辨认字母与大脑的变化
阅读学习的一个关键步骤牵涉到掌控文字具有的感知特质，如此视觉系统才能有效地和语言系统对话。这样的学习成果会在前视觉皮质区中形成一套新的运算结构，在阅读之前这是不存在的。 ——托马斯·卡尔
学习阅读会改变大脑的视觉皮质区。因为视觉系统具有辨识物体和专门化的能力，专家级阅读者的视觉区域会开始加入负责辨识字母、字母模式与单词的视觉图像的细胞网路。这些区域在专家级阅读者大脑中的运作速度极快，这要归功于几项非常重要的“处理原则”，其中有些已经被 20世纪的心理学家唐纳德·赫布(Donald Hebb)描述过。
赫布提出“细胞生产线”的概念，各群细胞会聚集，以工作单位的形式来运作，以形成表征。如果专家级阅读者看到一个常见的字母模式或是bear 这样的单词，会激活一套专属的网络细胞，而不是激活大量互不相干的个别细胞去负责辨识字母中的直线、斜线与圆圈。这个操作性原则生动地体现了生物学准则“同步激活的神经细胞总是集合在一起”，它也是大脑创造大型神经回路、联结各细胞集合的基本工具，从而将整个大脑的网络联结成一个系统。专家级阅读脑是名副其实的网络拼贴画，大脑中每一种心理表征,从视觉与拼写表征到语音表征,无所不有。正如之前在斯蒂芬·科斯林的想象字母研究中所见，我们可以在瞬间提取这些表征，哪怕刚开始的刺激不是真的出现在眼前，而仅仅是在我们大脑的眼睛中。
另一项对阅读自动化的贡献，来自看似简单的眼睛扫视文本的动作。这看起来顺利且毫不费力，但正如眼动专家基思·雷纳(Keith Rayner)所指出的那样，这只是假象而已。研究揭示出当我们从视觉区的中央(视网膜的中央凹)收集信息时，眼睛会短暂地停止跳动，出现注视点(fixation)随后则会持续进行微弱的运动，称为眼跳(saccade)。在这期间至少有10%的时间，我们的眼睛会稍微往回看去，拣选之前的信息。成人阅读时，一般的眼跳范围约是8个字母，儿童更少。人类眼睛一项卓越的设计是让我们能够在用中央凹的外围区域继续阅读每行文字的同时,以副中央凹区“向前看”。现在我们知道在读英文时，实际上看的是注视点右边14至15个字母，若读的是希伯来文，看的则是左边同样数量的字母。
因为使用中央凹与副中央凹区的信息，我们总是可以预览将来要读到的一部分内容。稍后——约莫几毫秒的时间，预览的部分变得较易辨认，这对我们阅读过程的自动化有进一步的贡献。如雷纳所言，眼球运动与其规则中最令人惊叹的是眼睛和大脑之间的密切联系。
这样的联结显而易见。看看刚刚那条时间线,在第50至150毫秒之间，发生了许多视觉与拼写表征过程;接下来，在150至200毫秒之间，额叶的执行系统与注意系统被激活。这是我们的执行系统影响下一步眼球运动的时刻。执行系统会决定此时收集到的字母与词语的相关信息是否已经足够，若是足够的话，则会在第250毫秒时进入下一回合的跳视，不然必须回去收集更多的信息。
另一项对自动化有所贡献的是眼球运动的顺序，这关系到判断一组字母是否形成我们语言系统中可接受的一个模式(如bear相对于rbea)，以及一个像词的字母串是否真的是一个词(如bear相对于reab)。大约在时间线上的第150毫秒时，一些枕叶-颞叶的相关区域(37区)变得重要起来。
之前曾讨论过,斯坦尼斯拉斯·戴哈尼(Stanislas Dehaene)与布鲁斯·麦坎德利斯(Bruce McCandliss)这两位研究人员认为儿童学会阅读后，这区域的一些神经元会因为某个书写系统的拼写模式而产生专门化。他们假设这项能力由物体识别的神经回路进化而来。若真是这样，维克多·雨果对字母及其特征源自于自然形象的观察——Y与河、S与蛇以及C与新月不仅耐人寻味，也相当有先见之明。戴哈尼的团队认为原来识别蛇、型与月亮的区域被用来识别字母。视觉专门化的这个改变在专家级阅读者身上达到顶峰，在学会阅读之前，他们的视觉皮质区不存在这样的神经回路。这样的变化凸显出文字对人类大脑改变的主要方式。到目前为止，一切都发展良好。
然而，戴哈尼的团队继续提出一个争议性更大的假设，他们推测37区这群专门化的神经元，变成了“视觉单词形成区域”，使阅读者能够在150毫秒左右，就判断出一组字母串是不是一个真正的词。另一个英国的认知神经团队不同意这个假设，并提出一个更为复杂的版本。借助一种对时间非常敏感的脑成像技术，他们描绘出在前几毫秒大脑中各类受到激活的结构，发现早在37区将一个词形信息带到意识区之前，额叶可能已经将字母的信息对应到了音位。但是目前还不确定这些激活的额叶区域是否真的参与语音对应或规划的工作，因为它们也有可能与执行功能有关。但是这些脑成像图显示出，专家级阅读一开始几个过程几乎是同步发生的，令人感觉不可思议。
不论哪一个团队的假设是对的，他们的研究都凸显出大脑在接下来的100至200毫秒之间，每次重新启动字母文字原则时的快速反馈与前馈机制。
(3)100~200毫秒:拼写与语音的结合
字母原则的本质是某种语言中字母-发音的对应规则，对这些规则的熟练掌握会改变大脑的运作方式。没有学会这些规则的人，成年后其大脑会和熟悉这些规则的人不一样，而且他们对本身语言的发音掌握较不准确。葡萄牙的研究人员设计了一系列有意思的研究，凸显出读写能力对大脑产生的极大影响。他们研究了葡萄牙的偏远乡镇，那里的人因为社会或政治原因而没有机会上学。他们将这群人与同样是在乡村、但是之后会想办法学会识字的类似群体作比较，结果发现在行为、认识语言与神经上，这两群人都表现出差异。语言任务的目的在于显示出我们是如何感知与理解我们语言中的音位的(例如试着念出birth，但是不要发出b音)，结果发现只有识字的人才能发掘到谈话中的音位。识字有助于他们理解单词由音节组成，可以拆分与重组。在要求重复无意义的词时(如benth)，文盲受试者无法马上做到，而且会试着将无意义的词转变成一个类似的真词(如 birth)。
学习字母原则对大脑的影响:学习字母原则，不仅会改变大脑视觉皮质区的运作，也会改变听觉与语音的运作，如知觉、辨别、分析、语音的表征与操作等。
后来，在这两群人六十几岁的时候，又进行了一次脑部扫描，结果发现他们之间的差异变得更大。文盲组的大脑以额叶来处理语言工作(就好像这些是需要记忆或解答的问题)，而识字组则是使用颞叶中的语言区。也就是说，成长背景相似的乡村居民:他们的大脑会根据识字与否，采取完全不同的方式来进行语言处理。学习字母原则,不仅会改变大脑视觉皮质区的运作,也会改变听觉与语音的运作如知觉、辨别、分析、语音的表征与操作等。目前的语音研究显示出在第150至 200毫秒之间，这些过程在皮质区的多处地方有大量的结构性活动包括额叶、颞叶与部分顶叶区(见图 6-5 )，以及右侧小脑。
图6-5 语音处理区
阅读时使用的具体语音技巧取决于阅读者的水平、要读的文字以及所使用的书写系统。高度规则化、高频率出现的单词,如“地毯”(carpet),比“语音”(phonological)这样艰深的单词所需的语音处理过程要少得多。正如我们之前在阅读早期阶段所看到的，初级英文阅读者痛苦地组装字母的音位表征,学习语音合成,使之组成一个词。这个过程有时会花上好几年时间。相反语法较为规则的语言，如德语或意大利语，阅读者只需花一年时间去努力解码，就可以很快学会这些比较固定的字母-发音对应规则。
不同的字母书写系统影响到时间线上的皮质层部署它的语言区域。学习较为规则的芬兰语、德语与意大利语字母文字的人，会比英语和法语的阅读者更早使用颞叶区，并且使用的区域更为广泛。英语和法语的阅读者也使用到颞叶区，但是他们多半将这一区域用来确认文字，使用的是戴哈尼的团队所假设的那个“视觉单词形成区域”。英语和法语较为强调词素与不规则词语(如yacht)，因此在第100至200毫秒这段时间，可能需要更多的视觉与拼写表征知识。同样的原则也适用于中文与日文的汉字阅读者，比起其他语言的成人阅读者，他们会动用更多左脑枕叶-颞叶区后侧围绕 37 区的区域,以及右脑的枕叶区。汉字阅读者的语音区域在这期间(第100 至 200毫秒)并不特别活跃。
(4)200~500 毫秒:知道一个字的一切
人对一个字的认识总是不断地演进着，对阅读者来说如此，对研究它的科学家来说也是如此。一些认知神经科学家在语义处理的阶段追踪了字词的各种意义与关联被激活时，大脑电流活动的情况。举例来说，我在塔夫茨的同事菲尔·霍尔库姆(Phil Holcomb)研究我们如何处理前后文意不协调的句子(如“龙虾吞下了一条美人鱼”)。他运用一种称为“诱发反应电位”的技术，结果发现在我们读到前后不协调的字眼(如“美人鱼”)后的 200~600 毫秒之间，大脑爆发了大量电流活动，在400毫秒时达到顶点。这类研究为我们提供了两点关于时间轴的信息：首先，这表示对一般阅读者来说，第一次提取语义信息的时间为200毫秒左右;其次，这显示出若是文字和我们预期的语义不一致，我们会一直增加信息，特别是在400 毫秒左右时。
不论是在童年，还是专家级的阅读时期，我们对一个词所确立的知识越多，阅读的正确性就越高，速度也越快。想想在前几章中读到的这个吓人的单词——“语素音位”(morphophonemic)。在你阅读本书之前，这个单词可能会降低你的阅读速度。但是现在，它所引发的知识会加速你的识别与理解。我们阅读一个词的速度可以有多快，很大程度上是由伴随着这个词被激活的、我们所拥有的语义知识的数量和质量来决定的。就跟童年的早期阶段一样，成人的词汇知识也是一个连续统一体，从未知到认识再到熟练。
至于一个词到底位于统一体上的哪个位置，则取决于它的频率(在文本中出现的次数)个人的熟悉程度与接触时间的早晚。想想“冗长的单字”（sesquipedalian)这个词，正如散文家安妮·法迪曼(Anne Fadiman)所说:这个词看起来就像是一个“长的单词”，的确是这样。在她的《一个普通读者的自白》( Confessions of Common Reader)中，法迪曼列举出了-串可以测试任何专家级阅读者勇气的稀有单词:基督一性论者monophysite)、有毒的(mephitic)、全协和音(diapason)、容易打开的(adapertile)与巫技(goetic)是少数几个打败我的单词。法迪曼的单词便是在文字熟悉度的连续统一体的最底端，削低我们的效率，即便这个词当中有我们极其熟悉的词素，也只是在用一线希望折磨我们。
芬兰的研究人员发现处理语音与语义时都会用到上颞叶区域，若是遇到连续统一体中“已构建好”的那端的单词，激活的速度就会更快。而且如前所述，一个字的语义“邻居”(相关的单词和意义)对我们单词知识的贡献越丰富，辨认一个词的速度也越快。此原则适用于各年龄层的人:你对一个词的认识越好，你知道得越多，那么你读得越快。此外，拥有一个联系丰富的、已经构建好的词汇和语义网络也会直接反应在大脑结构上:在 200~500毫秒之间，这片广泛分布的网络反映出将要负责处理听觉的各种语音过程和精密的语义网络。激活的网络越多，大脑阅读这个单词的整体效率也越高。
语言——语法与词法进程
与语义过程一样，语法信息在200毫秒后的某个时间点，似乎会自动地使用额叶的区域，如布洛卡区、左半脑的颞叶区以及右侧小脑。语法过程几乎都是与相联系的文本(如句子或是段落)一同使用，通常需要一些前后回馈的操作(好像在读 the bow on the boat这个短语时所用到的)，以及一定程度的工作记忆的运用。bear与bow这类单词在语法上具有模棱两可的信息，需要段落或句子的上下文来传达更多的信息。
语法过程:语法过程几乎都是与相联系的文本一同使用，通常需要一些瞻前顾后的操作，以及一定程度的工作记忆的运用。
语法信息在本质上和语义知识、词法信息都是相连的，而这些集合系统一起工作的能力会促进在 200~500毫秒期间的效率。例如，你要是知道ed 这个词素是表示过去时态的语法标志，就会很快地认出与理解bowed 这类单词。如图6-6所示，我们对一个单词认识得越多，脑部不同区域的累积性与整合度也就越高，阅读这个单词时理解得就越好，速度也越快。
图6-6 大脑如何大声的读出一个单词
一旦开始了解大脑阅读一个单词所需的条件，我们就不禁要问，我们究竟是怎样阅读整句话、整个段落的，更不用说整本书了。要了解这些我们需要从词语的时间轴上移开，考虑一下阅读以及理解《白鲸记》、物理学家史蒂芬·霍金的《时间简史》以及进化生物学家肖恩·卡罗尔(Sean Carroll)的《蝴蝶、胚胎与斑马:探索演化发生学之美》(Endless Forms Most Beautiful)时激动人心的成就感。
感受——时光飞逝，阅读如何改变我们
阅读是一种经验。任何一个文人的传记，都必须有相当的篇幅来描述他们的读物及其阅读年代，因为就某种层面来说，我们所阅读的成就了我们自身。 ——约瑟夫·爱泼斯坦
对每一位思考者来说，每句诗每过几年都将以崭新的面貌显现，在他身上唤起不同的共鸣……这种阅读经验最棒、最神奇之处在于: 学习阅读时越能明辨、感悟与联想，我们在读每个思绪、每首诗时，越是能读出其独特之处及与众不同的地方，还有其确切的限制。 ——赫尔曼·黑塞
专家级阅读改变成人生活的程度，主要取决于我们所读的书籍，以及我们阅读的方法。也许最能形容这类改变的，不是认知研究或脑成像图而是诗人。威廉·斯塔福德(William Staford)在文章中曾提到“你早已被赋予了注意力的品质”，这句话清楚地表现出这些改变的第一要素。他也许没有想过讨论注意力网络或专家级阅读者，不过这项几乎得来毫不费工夫的阅读专注力特质会随着我们学习阅读的过程而改变，正如德国小说家赫尔曼·黑塞(Hermann Hesse)所言:“越是明辨，越是感性，越是具有联想力。”
随着我们逐渐成熟，面对文字时，我们不仅会动用词语时间轴上所列的一切认知才能，也会联系到我们的生活经验，我们的喜爱、遗憾、高兴痛苦、成功与失败都会左右我们的阅读生涯。我们对阅读的诠释通常会引导我们超越作者的思想，向新的方向思考。这点可以解释为什么我们在17岁、37岁、57岁或77岁时都会阅读《圣经》《米德尔马契》(Middlemarch)或是《卡拉马佐夫兄弟》(The Brothers Karamazov)等书而且每次都有全新的感受。我想用后面这两本书中的几个例子来说明，当我们每次将注意力的特质和生活经验带人阅读的过程中时，会造成多么不同的结果，我们可能错失了什么，或是获得了不同的见解。
首先，介绍下面这个段落的背景:在19世纪乔治·艾略特的小说《米德尔马契》里，美丽而又满怀理想的年轻女主角多罗西娅·布鲁克(Dorothea Brooke)不顾多方的劝阻，执意要嫁给老迈的学者卡索邦(Casaubon)。她之所以这样做，主要是想帮助他完成其充满雄心壮志的文学巨作。在罗马的蜜月生活中，卡索邦去了很多图书馆，多罗西则只能沉浸在她自己的想法里。
打从结婚以来，多罗西娅还不完全明白，却已感受到一份令人窒息的沮丧，原本梦想要在她丈夫脑海中寻找的远大前景和大片新鲜空气，如今却为厅堂与那些看似哪里也到不了的蜿蜒廊道所取代。
艾略特在这段描述中使用了一系列的比喻，帮助我们揣测多罗西娅的心情,在看完卡索邦的笔记之后,多罗西娅明白他其实没有什么伟大的巨作更写不出什么书来，记录在卡索邦那些小白卡片上的，除了毫无关联的漫天思绪，什么都没有。
这段节选自《米德尔马契》的片段，说明了专家级阅读的几个方面，首先，如阅读者没有读出隐含的意义，那么之后再读到后面五十几页中的精微玄妙之处，可能还是读不出什么。这些比喻向我们展示了“注意力在了解文章各层次的隐藏意义时的关键作用。少了这个方面，我们可能就读不出多罗西娅的困境。
其次，这种标准的19世纪的句子彰显出熟悉语法结构对理解语义的重要性，以及语法形式如何强化作者试图表达的意思。在这段话的原文中艾略特连用了4个从句与6个短句，才最终表达了女主角“哪里也去不了的困境。这就好像她发挥了语法迂回的潜能，重新创造了代表卡索邦贫乏思想的无尽的厅堂。在句子的结尾，句型结构加上比喻修辞的组合，将我们的注意力更深一层地引导至对多罗西娅现实处境的揣测，对她产生更深一层的认同。
第二个段落出现在前一段文字之后，这次是站在卡索邦先生的角度来进行叙述，一般阅读者可能记忆不深刻，这也不是没有道理的:
他之前对她崇拜正确事物的能力表示赞赏:现在他突然感到一阵忍惧，因为他想着这样的能力可能会被另一种假设取代——只看到许多美好的结果，对研究它们需要付出的代价却没有任何概念。
我已经看过好几遍《米德尔马契》。但直到去年读到这段关于卡索邦的片段时，才有了另一种想法。30年来，我完全站在多罗西一边，同情她理想的幻灭。直到现在我才开始理解卡索邦的恐惧、他的无法完成的希望以及不为年轻的多罗西娅所理解的另一种形式的幻灭。我从来没有想过自己有一天会同情起卡索邦来，但是现在，我大方地承认我确实同情他。我想艾略特也是如此，或许理由和我类似。在阅读改变我们生活的时候，我们的生活也改变了我们的阅读。
为了描述最高层次的专家级阅读所动用的一切智能过程，现在我要以世界上最美好的一本书——陀思妥耶夫斯基的《卡拉马佐夫兄弟》中一个相当艰深难懂的段落为例。在这本深刻的俄罗斯小说里，愤世俗的哥哥伊凡对他性格温和、社会经验很少的弟弟阿辽沙说了一个关于善恶的故事——“大审判官”。这个故事里的一个情节描述了一段紧张的对话，作者把它设定在令人畏惧的大审判中。在这段对话中，90岁的老僧侣以你(you)、他(he,him)来讽刺神性。来看看你是否能达到陀思妥耶夫斯基对读者的要求，观察你自己要了解这段对话所需执行的任务，这段对话是僧侣在谴责沉默的“他”并讲述为何“他”必死。
正是这种一致崇拜的需要，给每一个人以至从开天辟地以来的整个人类带来了最大的痛苦。为了达到普遍一致的崇拜，他们用刀剑互相残杀。他们创造上帝，互相挑战:“丢掉你们的上帝，过来崇拜我们的上帝，不然就立刻要你们和你们的上帝的命!”……你已知道，你不能不知道人类天性的这个根本的秘密，但是你却拒绝了对你提出的那面可以使一切人无可争辩地对你崇拜的唯一的、绝对的旗帜，——那一面地上的面包的旗帜，而且是以为了自由和天上的面包的名义而加以拒绝的。你瞧，你以后又做了什么。而且又是以自由的名义!………你不接过人们的自由，却反而给他们增加些自由，使人们的精神世界永远承受着自由的折磨。你希望人们能自由地爱，使他们受你的诱惑和俘虏而自由地追随着你。取代严峻的古代法律，改为从此由人根据自由的意志来自行决定什么是善，什么是恶…………更给他们留下许多烦恼事和无法解决的难题……
经验对阅读的影响:阅读塑造我们的经验，经验也会改变我们对文字的理解。青少年时期读过的经典文学作品，现在再读一遍，我们必定会有不同的感受，而且现在的感受会比当时更深刻、更丰富。
思考一下你刚读的这一大段想要读懂的话。首先，要知道僧侣到底说了什么;其次，要明白为何伊凡要把这个故事讲给阿辽沙听;最后，要知道天真的阿辽沙对于个人善恶观的偏见有何反应。
在你开始阅读之前，我所提供的背景资料已经在你的脑中激发出一套预测、期望与计划的执行过程。这些过程为你准备好了一个特定的文学风格(俄国小说)与历史背景(在审判时僧侣与神的对话)。接下来，当你解读文字时，你将词语的表征临时储存(工作记忆)，来“维持”高度复杂的知识，不只是单个单词和短句(共同崇拜)的意义与语法的运用，还有文中许多艰深甚至不合常理的假设(崇拜的痛苦、自由的折、自由选择的诱惑)。与此同时，这些概念的意义激活了关于一般背景知识的长期记忆——19世纪的俄罗斯、大审判、善恶的哲学思考与陀思妥耶夫斯基小说的警世目的。
接下来，在所有的可能性中，你开始揣摩可能的意思，并产生一系列关于伊凡和阿辽沙、审判官和“他”、陀思妥耶夫斯基和读者之间关系的假设例如你可能对僧侣真正所说的内容以及为何而说构建了新的想法。在读这些段落时，你检查自己的理解力，确保你的推论符合你所存储的背景知识。若在所读和推论之间出现偏差，你会再读一次，修正你对反常之处或者全文的理解。
智能进化永不止步
从字词的意义与语法需求到记忆中要维持的诸多概念性假设，每一个文本的整体复杂度都会影响到专家级阅读者的理解力(见图6-7)。正如上面摘录的文字所描述的，智力的变通性会先设法让与传统假设相抵触的概念具备意义(如自由的负面价值、谴责与迫害神性的僧侣)。如同我们在《米德尔马契》的那段中所看到的那样，理解力会受到阅读者对文本的所有认知的影响。伊凡与卡索邦先生也许不会随着年龄的增长而有所改变，但我们对他们的了解却与日俱增，在37岁、57岁与77岁时读到的，一定要比在17岁时多。
图6-7专家级阅读者的理解力
文本与生活经验之间是双向的动态关系:我们将自己的生活经验带人文本，而文本也会改变我们的生活经验。在捕捉这种互动关系方面，没有人能比得上阿尔维托·曼古埃尔(Alberto Manguel)，他在《阅读史》(4History afReading)中，将这种关系描述得淋漓尽致:整本书就是他与文本如何相互改变的历史。有时，我们沉浸到另一个思维世界之后，就会像曼古埃尔那样，自我浮现出来，以崭新的、充满勇气的方式来扩展思考、感受与行动的能力，不论这将我们带向何方，我们和昔日的自己都不同了。
这样的经验和生理方面也有关系，这意味着当阅读达到专家级的层次之后，会有神经层面的改变。认知神经学家马塞尔·贾斯特(Marcel Just)与他在卡耐基梅隆大学的研究团队对此提出了一个假设，他们认为专家级阅读者在阅读中做出推论时，大脑中至少有两个阶段的过程，一是产生假设，二是将假设整合到他们对文本的固有知识中。
专家级阅读者所用的这些技能，类似于《魔戒》的主角弗罗多在他最后一段旅程中，对佯装他向导的咕噜姆的理解。弗罗多看出来咕姆对魔戒反常的痴迷，在这种情况下，他首先强迫自己分析重构咕姆每个举动的真正意涵，然后将这些想法与他的行动相结合，最后预测出咕姆下一步企图做什么。
跟弗罗多一样，专家级阅读者动用不同的理解过程、语义过程与语法过程，以及大脑皮质层中与此相关的区域，来理解文本。举例来说，研究人员发现当阅读者推测一段文字可能的含义时，在布洛卡区周围两个半脑的额叶系统都被激活了。此外，每当处理的文字在语义与语法上较为复杂时,额叶区就会与颞叶的韦尼克区、部分顶叶区以及右小脑相互作用。其次同样重要的发现是，当专家级阅读者将产生的推论与原本的背景知识进行整合时，似乎会动用整个右半脑与语言相关的系统。这项推理过程的第个步骤对右半脑系统运作的需求远远超过初级阅读者早期单纯的解码任务。
在阅读发展过程中，右半脑的语言系统具有明显的改变，变得更为泛，就像左半脑的语言系统一样。最终，在专家级阅读者的脑部，更多地涉及了左右半脑的布洛卡区，以及包含右角回与右小脑的多处颞叶与顶叶区域。根据贾斯特的研究，图6-7向我们展示了专家级阅读者的理解型大脑，与初级阅读者相比，呈现出更美好的变化:专家级阅读者通过使用大脑的许多部位，鲜明地向我们证明了智能进化正在持续扩张的事实。
如果我能用海明威毕生寻求的“真实的句子”来总结阅读发展的自然史那么肯定就是下面这句:阅读的发展永不结束，阅读这个永无止境的故事将永远继续下去，将眼睛、舌头、文字和作者带往一个新的世界，在那里鲜活的真相无时无刻不在改变大脑与读者。
下一章，我们将进人另一个截然不同的阅读“自然史”，那就是阅读障碍者的故事，还有那些最终可能带来希望的遗传学研究。我们将探讨无法识字的阅读脑的过去与未来。因此，我们将探索未知的领域，在更为广阔的背景下探索文字的成就。在这个世界里，文字与图像等各种语言难以表达的形式相遇。
第三部分不会读的大脑也有高品质的思维:阅读脑的变奏
对男孩子来说，从10岁起开始阅读与书写持续3年左右，是一段相当合理的时间。男孩与父母都不能因为个人好恶而延长或缩短这段时间。当然他们必须研读字母，达到能读能写的程度，但不应要求迅速地达到完美的境界，因为这段时间背后的自然过程要缓慢得多。 ——柏拉图
第七章阅读脑的补偿机制
孩子最大的恐惧就是没人爱，而被拒绝则是他们最害怕的地狱。我想世上每个人或多或少都有过被拒绝的感受，再加上犯罪和罪恶感，这就成为一部人类的故事。得不到渴求的爱，有的孩子会踢猫一脚，然后怀着罪恶感瞒住这个秘密;有的会偷窃，用钱使自己感觉到被爱;另一种则是征服世界——总之就是罪恶感与报复，然后是更深的罪恶感。 ——约翰·斯坦贝克
我宁愿清洗浴缸周围的霉菌，也不想读书。 ——一个阅读障碍儿童
苏格兰赛车手杰基·斯图尔特(Jackie Stewant)在退休之前曾赢过27次大奖赛冠军，还被查尔斯王子封为爵士，可以说是全世界最成功的赛车手之一。他同样是个阅读障碍者。最近，在一个国际科学研讨会中，他以下面这段话结束了他的演讲:“你永远不知道阅读障碍者的感受，不论你在这个领域工作了多久，就算你自己的孩子是阅读障碍者，你还是不会理解整个童年都被人羞辱的感觉，每天都有人教育你，使你相信你永远做不成任何事情。
身为一个阅读障碍者的母亲，我知道斯图尔特是对的。全世界阅读障碍者的故事都是一样的。一个聪明的孩子，假设是个男孩子，进入学校时满怀生命力与热情，和其他孩子一样努力学习阅读，但和其他人不同的是，他似乎学不会阅读。父母告诉他试着再努力一些，老师说他“没有发挥潜力”，其他孩子则说他是“智障”或“笨蛋”。他接收到的那些强烈信息都在说他做不成什么大事，离开学校时他与刚进人学校时那个性格热情的孩子大不相同。只因为无法学会阅读，这样的悲剧不断重复上演着，次数多到让我们只能叹息。
幸运的话，实际上，要非常幸运，挣扎中的年轻阅读者会遇到贵人相助，帮他发现自己“意想不到的天赋”。斯图尔特说要是他没有发现自己可以赛车的话，一定会“进监狱，甚至更糟糕”，因为他曾学过如何用枪。一直到很久以后，他的两个儿子被诊断出阅读障碍，他才了解到自己小时候的生活到底是怎么一回事。他发誓绝不会让孩子们再经历这一切。
晚期诊断是阅读障碍故事中常见的情况。金融家查尔斯·施瓦布Charles Schwab)、作家约翰·欧文(John Irving)与辩护律师戴维·博伊斯(David Boies)等人，一直到他们的孩子被诊断出阅读障碍时，才惊觉自己也有同样的问题。拉塞尔·科斯比(RussellCosby)也是直到他的侄子恩尼斯(Ennis)在大学里被教育与阅读障碍研究专家卡罗琳·奥利弗(Carolyn Oliver)诊断出阅读障碍时，才发现自己也有阅读障碍。
有时这些故事是幸福的结局。在被好几所高中拒收后，保罗·奥法里(Paul Orfalea)最后成了图文快印行业巨头金考的创始人，大卫·尼尔曼成为美国捷蓝航空公司的总裁，约翰·钱伯斯成为思科的首席执行官。
童年经历的影响:儿童如果有不断失败的经验，就可能对生活感到恐惧，童年的梦魇可能影响其一生。我们现在已经知道如何判断儿童是否会有阅读障碍，应当在他们经历失败之前进行诊断。
当然这样的圆满结局并不常见。在对阅读障碍者的研究中，让我和我的许多同事最为沮丧的是，阅读障碍者人生失败的轮回在很大程度上是可以避免的。我们现在已经知道如何判断出许多可能会有阅读障碍的儿童，我们可以在他们开始经历种种挫败之前就诊断出来，否则这些经历会对儿童造成很大的伤害。儿童如果有好几年不断失败的经验，常常会对生活感到恐惧。斯图尔特曾透露，成年后无论他赢得多少奖项，或拥有多少车子与飞机，他也无法真正地认同自己。他童年的梦太长，即便后来跳出了阅读障碍者的轮回，早期学习时遭遇挫折的恐惧仍对他有持续的影响。
研究有些大脑为何无法学会书写语言，让我们对大脑的运作有了新的认识，就像研究无法学会游得快些的乌贼的中央神经系统，立刻我们就知道游泳所需的必要条件。反过来也一样，了解阅读脑的发展给了解阅读障碍提供了新的希望。在检验这两者的过程中，我们以智力进化的宽广视野来进行探讨，把阅读这种文化产物只看成是大脑无尽潜力的一种表现。
当我们着手阅读障碍的研究时，很快就发现这本身是一个棘手的浩大工程。之所以这样说，至少有3个原因:1.阅读脑的需求复杂;2.这项研究牵涉到许多门学科:3.阅读障碍者身上兼备强悍与极度软弱的特质。阅读障碍者的研究史正好反映出这一切的复杂度，它同时也反映了过去100年来我们的智力历史和社会的许多变化，例如诺姆·乔姆斯基的语言革命以及社会等级制度对阅读障碍诊断的影响。
奇怪的是，一直以来关于阅读障碍都没有一个广为世人所接受的定义有些研究人员不用阅读障碍(dyslexia)这个词，而改为较一般性的描述，如无法阅读或无法学习。尽管柏拉图与古希腊人都注意到这个现象，还是有些人认为阅读障碍并不存在。由于历史原因，我个人倾向于使用“阅读障碍这个词，但不论我们称大脑无法学会阅读与拼写的状况是什么，最终并不会造成任何差异。只要我们了解这个概念中具有吸引力的观点，以及不重视这个问题所造成的悲剧就够了。
盲人摸象般的历史
这段复杂的故事应该从我们的进化史开始。英国神经心理学家安德鲁·埃利斯(Andrew Ellis)清楚地认识到这一问题的背景，他宣称无论阅读障碍最终被证明出是什么，“它绝不是一项阅读疾病”。埃利斯其实是要强调一个事实，大脑在人类进化的过程中从来就不是用于阅读的;正如我们所知，没有一个基因或生物构造是专门为阅读而设计的。相反，为了读，每个大脑必须学习在原本担负着物体识别等其他功能的旧的区域上建立起新的神经回路。
阅读障碍:不是大脑的“阅读中心”出了差错，因为根本就没有这种结构。为了阅读，大脑必须学习在担负着其他功能的旧区域上，建立起新的神经回路。
阅读障碍绝不是大脑的“阅读中心”出了差错这么简单，因为根本就没有这种结构。要找出阅读障碍的成因，务必要检视大脑旧有的结构，以及它们在处理过程、结构、神经元与基因的多个层面，每个层面的所有环节都必须在同一时间内迅速运作，才能形成阅读的神经回路。
换句话说，我们必须再一次回顾之前提过的5层的阅读金字塔，这次需要更多的注意力。图7-1再次显示出这个金字塔，该金字塔显示出所有的活动都是为了支持顶层的基本行为，如阅读一个单词或一句话。我再次使用这个图，有新的用意:协助说明阅读神经回路发展时可能出错的各个部位与各个路径。
图 7-1 阅读行为的金字塔
金字塔的第二层是认知层，由基本的认知、概念、语言、注意力与运动等过程组成，是多数心理学家的主要研究范畴。20世纪的许多理论家相信阅读障碍的主要原因是这一层有了问题。因为这一层的许多过程建立在神经结构之上，这些神经结构联结起来，使我们能够学习阅读。近来许多的脑成像研究试着探索这些结构之间与其内部的联结，试图去了解阅读障碍。这一结构层的下一层是由许多神经元工作组所组成的。这些神经元工作组能够产生和提取如字母和音位等各种形式的信息的持久表征，并且自动化整个过程，使人类成为视觉与听觉的专家。
金字塔最下层代表着控制这些神经元形成工作组和结构，最终控制视觉与语言等原有过程的神经回路的基因。近来有许多的阅读障碍研究着重于这个层面。事实上，这类研究工作非常复杂，因为阅读神经回路并没有代代相传的独特基因。每个大脑内的阅读金字塔的上面4层，必须在每次阅读时重新学会如何形成所需要的途径。因此，阅读与其他文化产物、其他过程大不相同，它们不像语言或视觉，不是“自然地”出现在儿童身上并且在年幼的初级阅读者身上尤为脆弱。
本书所呈现的阅读脑进化的观点，将从3个让大脑能够读出第一个代币的组织规则开始。在所有的文字系统中，阅读的发展涉及以下几个原则:重新组合旧的结构，并以此创造新的学习神经回路;神经元工作组在这些构造内重现信息的专门化能力;这些神经元工作组与学习神经回路以几乎自动化的速度来提取与连接信息。如用这些设计规则来审视阅读障碍，许多可能的成因就会一一浮现:
@ 语言或视觉结构的发展过程出现了问题，有可能是遗传的(比如这些结构中学习专门化的工作组出了差错);
@ 自动化过程出现问题，可能是在专门化工作组中提取表征的地方出错了，或者神经回路结构间的联结有误，或者两者兼有;
@ 这些结构之间的神经回路联结有障碍；
@从原本特定的文字系统的固有结构中重新组合出一个全然术同的神经回路。有些阅读问题的原因在所有的文字系统中都有发现，而有些则专属于某个特定的文字系统。
在过去120年来混乱的阅读障碍研究史上，这4种问题都曾出现在一种或者几种假设中。事实上，根据这些问题将阅读障碍的各种假设组织起来将会对理清这段历史极有帮助。更重要的是，以大脑设计为主轴来整理各种阅读障碍的理论，会让我们更清楚地明白阅读障碍研究如何增进我们对阅读脑的了解。
假设1:固有结构的缺陷
20世纪的阅读障碍理论主要是以神经回路中固有的结构来解释，并且从视觉系统开始探讨。现在我们所说的阅读障碍的第一个用语是“词盲”这个用语可回溯到19世纪70年代德国研究者阿道夫·库斯莫尔(Adolph Kussmaul)的研究。根据他的研究与X先生的奇怪病例，童年时期的阅读障碍被称为“先天性词盲”。
X先生是一个法国商人，同时也是业余音乐家，有一天他起床后突然发现自己几乎一个字都读不出来。法国神经学家约瑟夫-朱尔斯·代热林(Joseph-Jules Dejerine)发现尽管视觉完好无缺，X先生却无法再读出文字、说出颜色，或者阅读音符。几年后，X先生中风，完全丧失了读写能力，最后因此而过世。
解剖X先生的遗体后，发现他遭受到两次中风，分别损害脑部不同的区域。代热林以此作为他关于大脑和阅读的新理论的基础。第一次的中风损害了左侧视觉区与连接两侧半脑的胼胝体(corpus callosum)的后部(见图 7-2)。在第一次中风时，X先生的两个半脑的视觉区“失去了联结”虽然他可以用右半脑处理看见的事物，却无法和左半脑的语言区或是左侧受损的视觉区相联结。这是起初造成他无法阅读的原因。第二次中风时他完全丧失读写能力，这次受损的是角回区域。代热林所报告的这个经典失读症病例，成了阅读障碍研究真正开始的标志，也是关于视觉角色与联结重要性的第一个假设的基础。
图 7-2 失读症的大脑
20 世纪的神经科学家格施温德将代热林的案例转译成“联结阻断综合征”的一个个案，具体是指:脑部负责文字这类特定功能的部位与其他部位之间失去联结，便会导致该功能丧失。因此X先生的病例实际上反映出两个不同的假设:第一个假设，固有的视觉系统结构受损导致失读;第二个假设，阅读神经回路的联结出现障碍导致失读。
联结阻断综合征:脑部许多部分共同负责一个给定的功能，若负责这类特定功能的部位与其他部位之间失去联结，便会导致该功能丧失。
另一项关于阅读障碍的早期的、符合逻辑的解释是听觉系统出了问题(见图7-3)。1921年，阅读研究人员露西·法尔兹(Lucy Fildes)表示有阅读问题的儿童无法形成字母所代表字音的听觉图像，这个概念类似于我们今天所说的音位表征。1944年,神经学家兼心理学家保罗·席尔德(Paul Schilder)清楚地描述出阅读障碍者无法联系字母与其字音，也不能根据发音来区分口语单词。席尔德的观点与法尔兹早期听觉图像的研究，是现代阅读障碍最重要的一个方向的先驱:儿童无法处理单词内部的音位。
图 7-3 视觉过程与听觉过程
20世纪70年代初期，在语言学家乔姆斯基的影响下，心理语言学即研究语言的心理学，开始兴起，为阅读的研究另辟蹊径。早期心理语言学家的目标主要是系统地了解言语、语言、阅读发展与阅读障碍之间的关系。他们将阅读障碍视为语言疾病的观点，颠覆了之前以知觉与视觉为基础的理论。
这种观点中最让人深思的研究之一是心理学家伊莎贝尔·利伯曼(Isabelle Liberman)与唐·尚克韦勒(Don Shankweiler)对一群患有严重耳聋的儿童的研究，当然这些儿童无法听见任何话语。他们发现其中只有少数人可以阅读得很好，而且他们和其他有音位表征的儿童有所区别。利伯曼与尚克韦勒解释道:这项研究结果加上其他的相关研究，意味着阅读更取决于语音分析语音意识等语言技能(见图7-4)，而不是以感官为基础的语音的听觉感知力。
图7-4 语言假设与语音分析
实验心理学家弗兰克·维卢蒂诺(Frank Vellutino)彻底将阅读障碍领域从以知觉结构为主的解释中脱离出来。维卢蒂诺与其同事证实阅读障碍中最常见的知觉问题，就是众所周知的“视觉”颠倒(如将b误读成d或将p看成q)，这并不是肇因于知觉缺陷，而是儿童无法正确提取这些语音的语音标志。在一个精心设计的研究中，维卢蒂诺首先给有阅读问题的儿童几组典型的颠倒配对(如b和d)，然后让他们画出(视觉过程的非语言任务)或是说出(语言任务)字母。儿童可以非常正确地画出字母，但老是读错，这表明问题存在于他们的语言过程中。
目前有上百个语音研究显示出许多阅读障碍儿童无法像一般儿童一样感知、切分或操作个别音节与音位。这项发现具有深远的意义。无法意识到bat有3个独立音位的儿童，将来如果遇到老师开始一节这样的课程:“分解这个词的读音:/b/-/a/-t/”，会面对极大的困难。这些儿童无法及时删除一个单词的词首或词尾的音位，更不用说单词中间的部分，并且无法读出这些音位;他们的声韵模式意识(如判断fat和rat是否押韵)发展得极其缓慢。更重要的是，我们现在知道在学习阅读时，这些儿童所遭遇的最大困难，是要求他们自己把字母和发音之间的对应规则纳入自己的语言体系。
实际上，以语音来解释阅读障碍的最大贡献在于它对早期阅读指导与补救的影响。研究人员约瑟夫·托格森(Joseph Torgesen)与理查德·瓦格纳(Richard Wagner)与其佛罗里达州立大学的同事的研究证明，系统、明确地教导年幼阅读者音位意识与字母-音位的对应规则，在处理读障碍问题时远比其他方式有效。在早期阅读技能的解码过程中，为了证明音位意识与明确指导的功效而积累的证据，多到足以填满图书馆的一面墙壁。语音研究代表了阅读障碍中研究最多的结构性假设。
其他较少为人研究但依旧很重要的结构性假设有很多:从额叶的执行过程，包括注意力的组织、记忆力与理解力的监测，到小脑后侧这些与计时、语言过程以及运动协调和概念构成之间的联系等诸多层面相关的区域每个结构性假设的重要性都有两层意义。如华盛顿大学的弗吉尼娅·伯宁格(Virginia Berninger)所证明的那样，有些儿童的阅读问题源自较为基本的问题，如执行过程中的注意力与记忆力;有些儿童则是阅读与注意力的综合征。还有下面将介绍的，有些儿童的问题与时间相关。几位英国研究者则假设这可能与部分儿童的小脑功能不全有关。
不过该部分的整体重点还是检验所有结构性假设的类型后得到的全体图像。从20 世纪初期至中期，善意的研究人员倾向以一个丧失功能的区域来作为大多数阅读障碍的主要解释。盲人摸象的故事在阅读障碍的研究领域或许已经成了一个滥用的比喻，但却是这项研究的最佳写照。
阅读障碍研究:许多专家认为阅读障碍是由固有结构的缺陷造成的。把各种假设涉及的脑区在同一张图上标出来，就得到了阅读脑的主要结构图，阅读障碍研究反过来增进了我们对阅读脑的了解。
许多理论家将自己对阅读障碍的独到见解冠以新的名称，这早已是司空见惯的事情。试想如果将所有过去针对“过程-结构”层面提出的各种假设列出，就像人类大脑地图上的拼图一样(见图7-5)，我们会得到什么?那就是:这些假设合起来就像“通用阅读系统”主要部位的最佳示意图。这以另一种方式来说明，将众多阅读障碍成因的假设合起来刚好映射出阅读脑的主要组成结构。
图7-5 各种阅读障碍假设的集合
假设2:无法进行自动化
第二类假设重点强调了自动化失败的问题，或者说这些结构内部或彼此之间的处理速度不够快。其背后的假设是，这项障碍会造成阅读神经回路的各个部分,无论是在神经层面还是结构处理层面上,都无法流畅地运作导致用于理解的时间不够。
和第一类假设一样，研究者会用许多与阅读流畅度有关的解释来说明金字塔的不同层次与结构。毫无疑问，其中一些和以前一样，开始于视觉。举例而言，布鲁诺·布赖特迈耶(Bruno Breitmmeyer)和威廉·洛夫格罗夫William Lovegrove)发现阅读障碍者在处理视觉信息的速度上有显著差异试想一颗星星的图像紧接在另一颗星星的图像后出现，在许多阅读障碍者的大脑中，两个快速出现的“闪光”会汇合成一个，因为他们处理视觉信息的速度不够快。
类似的研究是针对阅读障碍者听觉信息的处理，同样也显示出他们和·般阅读者的差异。在这两种过程中，阅读障碍者和同龄的儿童在最基础的感觉层次上一样，如有视觉刺激或听觉信号发生时，会立刻有所感知。但是当复杂度增加时，就会出现落差。有些阅读障碍儿童和大多数语言缺失的儿童都比一般孩子需要更多的时间来处理两个简短的分开的音调，处理视觉图像也是这样。
日益精密的研究显示出，处理语音的困难因为需要区分文字内细微的音位与音节等因素而加重。如剑桥大学的戈斯瓦米在英国、法国与芬兰进行的研究发现，阅读障碍儿童对自然语音中的韵律较不敏感，常规的韵律形成部分取决于字音中的重音与节奏模式的变化。这–切会导致形成不良的音位表征及以后的阅读障碍。
有关阅读障碍脑部运动过程中速度差异的证据依旧是充满吸引力的话题之一，最后可能还是要回到戈斯瓦米关于言语的研究上。波士顿著名的心理学家彼特·沃尔夫(Peter Wolf)观察到儿童尝试按照节拍器来数出节奏，因此他总结道:当阅读障碍者必须将一个行为的各个部分组合成“临时而有序的一个整体”时，其运动区的自动化便会出现问题。换句话说，无论是眼睛还是耳朵的运动功能，有为数不少的阅读障碍儿童在需要正确有序快速地联结一项任务的各个部分时，就会出现问题，而不是在最基本的感觉处理方面。以色列心理学家扎维亚·布雷兹尼茨(Zvia Breznitz)为阅读障碍故事增加了另一个不寻常的情节。布雷兹尼茨研究阅读障碍儿童超过20年，她以一系列各种各样的测验来进行研究，结果发现有相当广泛的问题都与处理速度有关。顺着这个思路，她有了不寻常的发现。和其他人一样，她也发现阅读表现差的阅读者，在每种类型的任务中都有处理缓慢的情况，而阅读能力受损的阅读者似乎在视觉与听觉过程间有一个“时间差”，布雷兹尼茨称之为“非同步性”。似乎阅读最需要用来建立字母一发音对应关系的两个脑部区域之间处理各自信息时不能同步，无法把独立的信息整合起来进而影响到整个阅读过程。几年前珀费提也观察到布雷兹尼茨所说的时间“非同步性”现象。时至今日，这仍是阅读障碍之谜中最奇妙的一部分。
在每种语言中最佳的阅读障碍预测指标,是一项和时间有关的称为“命名速度”的测试，这项任务几乎涵盖了金字塔第二层所有的认知过程。命名速度的故事要回到X先生的案例上，他脑部罕见的损害使他无法阅读也无法说出色彩的名称。从这点看来，格施温德推论“给色彩命名”与阅读系统一定使用了一些相同的神经结构，并且共用许多认知、语言与知觉过程。而且他还进一步推论，一般在进幼儿园之前就已经发展好的“给色彩命名”的能力，会是儿童日后学会或学不会阅读的一项良好预测指标。
约翰霍普金斯大学的儿科神经学家玛莎·布里奇·登克拉(Martha Bridge Denckla)对这一推论做了测试，结果发现阅读障碍儿童可以正确无误地念出色彩名称，但是无法快速地念出。念色彩(或字母、数字)名称时，大脑用来联结视觉与语言过程所用的时间才是无法学习阅读者的预测指标。登克拉的发现以及她和麻省理工的神经心理学家丽塔·鲁德尔(Rita Rudel)的合作研究成果成为“快速自动命名”测试的基础，在这项测试中儿童必须尽可能快地念出成排重复的字母、数字、颜色或物体。我们实验室与世界上其他实验室的大量研究都显示出，在任何语系中，“快速自动命名”的结果都是“阅读表现的最佳指标之一”。
后来我以此为基础发展出一个新的命名速度测试，称为“快速交替刺激”，这种测试的设计是为了在快速自动命名的需求中，增加更多的注意力与语义过程。你试想一下整个阅读发展由快速的解码能力来指导，以便让大脑有时间来思考输入的信息，就会明白这些命名速度研究的深层意义。在许多阅读障碍的案例中，大脑的阅读发展从未达到最高层次，这是因为花费在联结该过程的最早部分之上的时间过长。许多阅读障碍儿童在面对大量文字时，根本就没有时间思考。
“快速自动命名”测试:在这项测试中，儿童必须尽可能快地念出成排重复的字母、数字、颜色或物体。大量研究显示，在任何语系中，”快速自动命名”的结果都是“阅读表现的最佳指标之一”。
不过，研究人员从来就不打算以命名速度的缺陷来解释阅读障碍，而是将其作为阻碍阅读过程速度的某些潜在问题的指标。正如格施温德所说命名的过程与结构是阅读的主要过程与结构的子系统。命名速度涉及的过程与结构的缺陷，包括它们之间的联结、自动化或不同神经回路的使用都有可能导致命名或阅读上的缺陷。
在命名速度的背后隐藏着一个进化的故事，并且不断丰富着第一个阅读脑进化的故事。图7-6是加州大学洛杉矶分校神经科学家拉斯·波尔德拉克(Russ Poldracle)与我们的研究团队绘制出的与命名速度有关的大脑图像，该图像完美呈现出这其中的关系。
图 7-6 执行快速自动命名时的大脑功能核磁共振成像
正如过去的研究者所假设的那样，在这些图像中，大脑使用枕叶-颞叶区(37区)的固有物体识别路径来命名字母与物体。功能性核磁共振成像(fiunctional magnetic resonance imaging,fMRI)支持这些研究人员的假设:人类的确是“神经元再利用”者。不过这些图像告诉了我们一个更为重要的故事反映出字母与物体之间的3个差异。
首先，在命名物体时，左侧枕叶-颞叶区的激活程度远大于命名字母时。物体通常不需要我们超级专门化的能力（除非是特别有趣的物体，如爱鸟者眼中的鸟），因为可能的物体实在太多了。因此，物体的识别过程并没有完全自动化，需要更多的大脑皮质层面积。物体命名神经回路是我们完全学会识字前的写照。
其次，命名字母时更直接地使用枕叶-颞叶区，显示出识字的大脑视觉专门化与特定信息自动化的能力。这正是为什么阅读者“快速自动命名” 字母永远比“快速自动命名”物体快。
再次，也是非常重要的一点，在常规阅读脑中，相较于物体，字母这种文化产物在每一个其他“固有构造”〔尤其是在颞叶-顶叶的语言区〕中激活的程度明显偏高。这也是为什么“快速自动命名”与“快速交替刺激” 这类方法，可以预测所有语言的阅读表现。同样地，这也解释了为什么命名物体与字母测试中的大脑图像，看起来像是大脑学习阅读前后的进化图片的比较。
最后，在为早期诊断出阅读前儿童的阅读障碍而进行的命名速度研究当中，可能含有一些儿童发展的重要启示。我们知道绝大多数的阅读障碍儿童在幼儿园早期，念出字母或物体的速度就明显地缓慢很多，而且那时字母测试比物体测试更具有预测性。若以命名物体与字母的脑成像图来分别代表阅读前与阅读后的大脑，我们可以检查小至3岁的儿童发展中的大脑，来判断他们提取物体名称的能力是否较为薄弱。若是能及早发现任何一个大脑的发育速度和他人不同，或者采用另一套神经回路来处理物体与颜色，例如用右半脑回路，脑成像研究会显示出明显差异，我们便能及早预测未来阅读失败的情况，也就有机会更早地进行治疗。
我希望未来的研究者能在儿童开始学习阅读之前，就拥有他们进行物体命名时的大脑图像，如此我们就能研究神经回路中一套特定的结构的使用，究竟是无法学会字的原因还是结果。
这样复杂的想法让我们从速度与自动化的问题，转移到这些与时间相关的缺陷成因。原因之一可能与神经回路的联结有关。
假设3 :结构间神经回路的联结障碍
有一类假设强调了解结构间联结的重要性，而不是确定结构内部出问题的位置。格施温德在翻译代热林第一个经典失读症的案例时，再次提到 19世纪神经科学家卡尔·韦尼克的“联结阻断综合征” 的概念，以此强调所有组成系统一起工作以完成每项认知功能的重要性。因此，在X先生功能失常的案例中，右半脑的视觉信息无法经过胼胝体进入左半脑的视觉-言语历程，而左半脑的结构性损害也一样重要。阅读神经回路内部的联结和结构本身一样重要。
20世纪中叶许多理论家提出的假说，都强调阅读神经回路的结构与过程之间的联结。最普遍的两种想法都着眼于联结中断的问题来源，一是视觉-言语过程出了问题，再一个就是视觉-听觉系统有问题。现代神经科学已经超越这些表面的解释，可以深入检查对阅读来说很重要的各种结构间的功能性联结，或是交互作用的强度。对功能性联结感兴趣的神经科学家倾向于研究阅读神经回路的主要组成结构的效率，以及这些结构间交互作用的强度。
在此种类型的研究中，至少有3种形式的联结障碍持续地受到关注。这些积累起来的信息再次揭露出一个重大的事实。第一类型的神经回路运作失常是由意大利神经科学家发现的：他们观察到意大利的阅读障碍者可扩大联结的脑部区域脑岛（insula）的活跃程度偏低，似乎暗示着他们额叶与语言区后侧失去了联系。这个区域相当重要，是大脑内部距离相对较远的各区域的中心，对自动化过程至关重要。
耶鲁大学与哈斯金斯实验室的研究人员则发现了另一种不同却有潜在关联的联结障碍形式。他们在研究非常重要的枕叶-颞叶区时发现，无论何种语言，该区域似乎都会在阅读初期被激活，但是阅读障碍者在该区域 37区的联结方式和其他人不同。在未受损阅读者的大脑中，最有力、最自动化的联结发生在左半脑的后侧与额叶区域之间。但是，在阅读障碍者的大脑中，最有力的联结却出现在左半脑枕叶-颞叶区和右半脑额叶区之间。此外，一些神经科学家还发现，一般初级阅读者在阅读与处理语音信息时所动用的左半脑角回，在阅读者障碍者大脑中运作时似乎和其他左半脑的语言区失去联结。
最后一种类型的联结障碍是在脑成像研究中发现的，这对整合上述所有的发现颇有助益。休斯敦的一个研究团队采用脑磁图技术，提供了阅读时大脑各区域中激活区域的大概图像与时间。他们发现阅读障碍儿童是从左右半脑枕叶的视觉区开始，移动到右角回，再到额叶区。换而言之，阅读障碍儿童使用的阅读神经回路和一般人全然不同。
阅读障碍者的脑部活动：阅读障碍者的左脑角回区的激活程度较低，左脑枕叶-颞叶区激活程度大幅降低。这项意想不到的发现有助于解释许多谜团，包括我在麻省理工的一些同事的发现：阅读者障碍者的左脑角回区的激活程度较低，左脑枕叶-颞叶区激活程度大幅降低。这些发现使我们从神经回路内部明显联结障碍的讨论，转向最让人兴奋的第4种假设：一颗以不同方式重塑的大脑。
假设4 :阅读的不同神经回路
在阅读障碍的研究史上，最不寻常也最容易理解的一项研究，来自于杰出的神经学家塞缪尔·奥顿与他的同事安娜·吉林厄姆。根据他在20世纪二三十年代的临床研究，奥顿重新命名阅读障碍为“视像颠倒症”，或称“扭曲的符号” (见图7-7）。
图7-7 奥顿的“视像颠倒症”设想
奥顿认为在大脑工作的常态分布中，通常处于主导的左半脑会选择字母的正确方位（如b或d），或是字母的排列顺序（如是not而非ton）。但是在阅读障碍者脑中，由一侧半脑主导的模式要么不会出现，要么严重地推迟。结果造成左右半脑之间的沟通失败，奥顿写道：有些儿童无法选择正确的字母方向。这造成视觉空间的混淆、字母的扭转，阅读、拼字与写字的困难，这也就是我们所说的阅读障碍。
20世纪六七十年代的研究人员对这个想法非常着迷，他们急于发现阅读障碍者左半脑在处理诸多与阅读相关的工作时，看似比右半脑弱势的现象。例如，在让儿童分别以两只耳朵聆听以各种方式呈现的声音信号的任务中（现在称为“双耳相异信息任务”），测验的普遍结果都是阅读障碍者在使用左半脑执行听觉过程时，和正常的阅读者不一样。1970年，波士顿退伍军人医院的神经心理学家以一系列的视觉、听觉与运动任务来测试一般阅读者和阅读障碍者，结果不仅发现阅读障碍者在每项测验中成绩都明显较差，还发现他们在双耳相异信息任务中具有右脑优势。
同样地，也是20世纪70年代，研究人员在对阅读障碍者进行文字识别测验时，在他们的脑部视觉区发现意想不到的对称性，同时发现左脑在处理语言信息时弱化得令人吃惊。在这期间，“单侧化”研究一个接一个地进行，显示出阅读障碍者的大脑在执行一系列工作时，都特别地倚重于右半脑。多年来，这些发现一直被视为对右半脑与左半脑过程的认识太过简化，但是，稍后我们将看到脑成像研究者正开始重新思考奥顿的想法，以及这些关于半脑过程的旧理论。
在目前典型的阅读神经发展研究中，乔治城大学的研究组发现，随着时间的流逝，右半脑中用于阅读文字的大型视觉识别系统会“逐步撤离”，而左半脑的额叶、颞叶与枕叶-颞叶区的参与程度则持续增加。这进一步证明了奥顿的观点：在发育时，正常阅读者的左半脑逐步承担着文字的处理工作。
然而，我们再一次在阅读障碍者的脑中发现，阅读神经回路的这种持续发展与正常阅读者也不一样。耶鲁大学的萨莉·谢维兹与贝内特·谢维兹夫妇领导的研究组率先观察到这一点，他们让阅读障碍儿童进行持续性的阅读测试，从简单的视觉测试到复杂的押韵测试等，结果发现阅读障碍儿童较多使用额叶区，但却很少用到左脑后侧区域，尤其是在发育上极为重要的左脑角回。更重要的是，这个研究团队还在右半脑发现了潜在的“辅助”区域，补偿原本效能较高的左半脑所执行的功能。
近来耶鲁团队更进一步地研究了阅读正常的成人，以及另外两组有阅读障碍但成因不同的成人，其中一组经过辅助可以正确地阅读，但是并不流畅；另一组则是大脑没有发挥补偿作用的永久性阅读障碍，可能是受环境影响而导致的这项缺陷。结果震惊了所有的人：无阅读障碍和因环境影响导致的、无法辅助的阅读者所使用的神经回路相似。而较为接近典型阅读障碍者的辅助组阅读者，使用了较多的右半脑区域，包括枕叶-颞叶区，而其他两组使用的左半脑后侧区域在他们脑中的激活程度明显偏低。此外，他们还发现环境影响型的永久性阅读缺陷者，使用枕叶-颞叶区的程度甚至高于正常阅读者，这意味着这组人阅读时花费在记忆策略上的工夫远远多于用在分析上的工夫。
为了让读者对稍后即将介绍的最新研究有所兴趣，艺术家斯图德利绘制了一套脑成像的素描，显示出阅读障碍者如何处理视觉、拼写规则、语音与语义信息。图7—8显示出目前在阅读障碍自动化与流畅度的研究中，完全可以预测的情况：从视觉-拼字规则的辨认，到语义处理过程中每一步的推延。从150毫秒起，阅读障碍者没有一个步骤是在应有的时间点上。此外，不久前才提到的那项惊人发现也显示在这些图当中。阅读障碍者使用大脑回路的情况似乎异于常人，他们的比较偏爱使用右半脑的结构，从视觉联合区和枕叶-颞叶区开始，延伸到右脑角回、缘上回与颞叶区。对阅读很重要的额叶区，使用程度是左右对称的，但在激活时有所延迟。
图7-8 阅读障碍时间轴
这条时间轴是世界各地诸多实验室积累的研究成果，包括美国、以色列与芬兰等地。它很难完成。该时间轴的好处是具有启示性；缺点是很容易产生误导。在脑成像与教育研究中，苏格拉底关于文本的警语同样适用于脑部成像，关于这一点请阅读者牢记“他们看似真实的特性带来真理的幻觉”。
事实上，它们只不过是我们就目前所有参与者的统计平均值所能做的最佳诠释。只有时间与更多的证据可以揭露真相，告诉我们一个不同的半脑所具备的能力。不过要是能在某些阅读障碍者身上证实阅读神经回路的右脑优势这一新兴的概念是正确的，那就表示这些阅读障碍儿童的大脑不仅在视觉、听觉、提取和整合拼写规则、语音、语义、语法与推理过程上更为缓慢；而且还要在一个原本不是设计成处理时间准确度的半脑上，使用一套完全不同的神经回路结构。
几年前，曾志朗与王士元这两位杰出的研究者观察到，左半脑进化出处理人类语言与文字所需要的精密准确度与掌握节奏的能力；与此相反，右半脑适合从事大规模的活动，例如创造力、模式演绎和与上下文相关的技能等。右脑优势的神经回路这项令人深思的发现有助于解释这一个世纪以来种种关于阅读障碍的假设，这其中的每一个假设都正确地描述了这种综合征的症状。就本书提出的“阅读金字塔”与组织这一金字塔的大脑设计的基本原则来看，这些历史上的假设没有一项可以解释全部类型的阅读障碍，尤其是在进行跨语言研究之后。
左右半脑所具备的能力：左半脑进化出人类处理语言文字所需的准确度与节奏感；而右半脑擅长创造、模式演绎和与上下文相关的技能。有阅读障碍的大脑更依赖右半脑的神经回路。
这让我们回到现在最迫切的问题，阅读障碍者之间的差异性，不仅存在于不同的语言之间，而且在同样的文字系统中也存在。了解大脑在阅读方面的设计原则之后，我们对阅读障碍的看法从单一的维度转移到更加有价值的多重维度。阅读障碍的可能成因很多，因此治疗也变得很困难。这让研究的重点从寻找阅读障碍的基本成因，转移到阅读障碍者中最普遍的亚型阅读者上。
多余的假设：多重结构、多重缺陷与多种亚型
阅读障碍者在发育过程中表现出来的阅读障碍类型会随着成长而改变，因此接受亚型这个概念，比以经验为根据来分类要容易很多。为了考量多重缺陷的问题，我和我的加拿大籍同事帕特·鲍尔斯一起做了一个简单的研究。我们根据两种最好的判断阅读障碍的指标区分出亚型的种类：
@亚型一，语音意识的问题〈一种结构性假设〉；
@亚型二，命名速度缓慢（以替代方式处理速度与流畅度）；
@亚型三，两种缺陷兼有。
约有25%的英语阅读障碍者仅有语音缺陷问题。更重要的是，不到 20%的阅读障碍者仅出现流畅度缺陷，这种“不够流畅”的阅读障碍亚型, 虽然在英语中相对较少，但在其他语法较规则的语言，如德语和西班牙语中，却占有很大的比例。在英语中，第6章提到的卢克就是不流畅亚型的例子，他不能够快速地阅读并唱出他的咏叹调，但他的老师却不认为他有阅读问题。这类儿童在大多数的学校常常被忽视，因为他们一开始并没有出现真正的解码问题，只有到后来才出现流畅度与理解力的缺陷。
在英语中，最常见也是最难以处理的是亚型三：这些儿童不仅命名速度与语音出问题，同时也伴随着每个阅读层面的严重障碍。因为同时具备结构性与处理速度的缺陷，历史上这类儿童都被当做典型的阅读障碍者。
有趣的是，约有10%的阅读障碍儿童无法依照上述方式进行分类。正如心理学家布鲁斯·彭宁顿所描述的那样，这意味着要有更为详细的多种亚型分类系统，以便将结构性数据与遗传性数据联系起来。在这类复杂的分析中，佐治亚州立大学的罗宾‘莫里斯研究团队证明阅读障碍最为严重的儿童不仅出现复合式的缺陷，而且在短期记忆方面也有问题。
在未能全面地了解所有的亚型系统之前，从国际上几种方言与语言系统的暂时性双重缺陷的架构中，我们获得了一些有用的信息。比如，在英语地区的研究中，儿童在每种亚型上的比例都相当接近，但在标准美语之外的方言中，阅读障碍儿童亚型的比例则不一样。我们的研究团队发现在非裔和欧裔的美国阅读障碍儿童之间有不一般的差异，尽管他们的智力、受教育程度与社会经济地位都极为相似。在非裔美籍阅读障碍儿童中，有很大的比例是双重缺陷亚型及语音亚型，在整个阅读障碍人群中的人数极不均衡。
对此，有一个很可能的假设，非裔美籍儿童使用的主要是“非式美语”，这是英语中的一种方言。塔夫茨大学的社会语言学家奇普·吉德尼和我们的研究组正设法找出标准美语和非式美语间的细微差异。我们想弄清楚，这些差异是不是长久习惯说母语的儿童学习第二语言的字母和音位对应关系规则的阻碍。我们希望了解方言之间的细微差异，是否会对说其他完全不同的语言，如西班牙语或法语的儿童造成更大的语音识别问题。
现在我们比较确定的是，说非式美语的儿童有比较多的语音问题，在这一点上，他们和说不同语言的儿童，比如西班牙语或汉语儿童，有很大的差别。这带领我们回到阅读脑的设计这个更普遍的话题上，我们将探索阅读障碍在不同语言中是如何表现的。
阅读障碍的诸多面貌
奥地利心理学家海因茨·威默操着他带有德国腔的标准英语，听起来很像是亨利·基辛格，他描述了阅读障碍在德语、荷兰语及其他字母文字中的情况。根据已知语言的需求（德语是流畅度；汉语是视觉空间记忆；英语是语音技巧），阅读障碍会有不同的面貌，因此阅读障碍的预测指标也随之而异。如我们在阅读脑的进化中所见到的，不同的书写系统在使用阅读神经回路的主要结构时，会有些许差异。因此，在中国出现的阅读障碍在本质上有细微差异绝非偶然。比如在香港地区，研究人员在汉语的阅读障碍儿童中发现了几种亚型，类似于英语的双重缺陷，但是还多了一种有趣的亚型，毫无悬念，其主要的缺陷出现在拼写规则过程上。
阅读陣碍因语言差异而不同：因不同的语言有不同的需求，如德语要求流畅度，汉语要求视觉空间记忆，阅读障碍呈现出不同的面貌，因此其预测指标也随之而异。
在汉语读者中会发现以组字规则与视觉记忆障碍为主的亚型，这个还比较容易理解。汉字的视觉与空间特性较为复杂，而且许多字的组字规则都很类似。刘文理的研究团队也发现一个类似的规模较小的亚型组，包括以下几种：语音缺陷、快速命名或流畅度缺陷、双重或结合语音快速命名缺陷、组字规则缺陷，或是综合轻度语音、命名与词法缺陷。
研究不同的拼音系统后，自然浮现出一项议题：阅读障碍是否会因特定书写系统的需求而出现不同的形态。在汉语系统中，阅读障碍的研究日益增加，显示出这项原则的诸多表现。正如之前几章所提到的，汉语代表的是一种语素音节文字，以复杂的视觉特征来表示词素（或意义单位〉，再以部首等更小的记号来标记语音和语义类型的辨析信息。
英文字母代表音位的各个层次，但汉语不一样，汉语的词素可能是一个音节，且还有不同的音调来区分。因此汉语对年幼读者来说是一系列的挑战，从认识词素、辨别部首、区分声调到将这些信息联结到文本中正确的异义同音字，在部分读者身上，每一个步骤都是潜在的阻碍。比如，过去在台湾有一项研究便主要依据儿童混淆部首、同音字、笔顺与笔画的错误来分类。
北京的吴思娜团队也进行过一项研究，强调汉语的阅读学习中声韵意识扮演着关键性的角色。吴思娜的研究团队以儿童的认知与语言特征，以及汉语写作的特别需求为依据，发现有5种阅读障碍亚型。不论是哪一种亚型，她发现大多数阅读障碍儿童最大的困难都出现在词素层次。
在中文的阅读障碍中，词素扮演最关键的角色。这结果和英文世界对阅读障碍的想法大相径庭，在英语中阅读障碍主要来自音素层面的问题。事实上，在汉语语系中发现某些阅读障碍儿童无法阅读的原因，和音位意识或是音位分析的关系较不如字母文字密切，这对探求一个阅读障碍更为普遍的成因极其重要。
第一点，这彰显出阅读障碍会因应特定书写系统的需求而展现出不同的形态。第二点，在不同语系中阅读障碍展现出来的差异性又可以显示出同一个文字系统中原则上可能存在不同类型(或称亚型)的阅读障碍。第三点，在有特定要求的特定语系中，某些亚型会特别明显或是更隐晦不明比如音位单位可能是英语中主要的困难点，但在汉语中也会出现。词素单位也是同样的状况，在汉语中较为明显，但在英语世界中则没有那么明显。
由谭力海、萧慧婷与其同事进行的几项研究显示出了这些原则的部分现象。几年前谭力海的团队研究显示出汉语阅读障碍者，其左侧额叶中回的激活区域与常人不同，但英语阅读障碍者则是在较为偏后的地方出现不同程度的激活。谭力海的团队认为语系间的阅读障碍存在差异。在他们一个较新的研究中，他们检查了汉语使用者大脑灰质的体积，包括阅读障碍者和正常人的。
结果非常有趣，他们发现中文阅读障碍儿童与正常儿童相比，左侧额叶中回的灰质区较少。这些区域对工作记忆极为重要，是阅读汉字的关键要素之一。有趣的是，这些阅读障碍儿童的大脑后侧区域(左颞叶-顶叶区)与正常儿童并无差异。听闻这样的结果，我在麻省理工的同事约翰·加布里埃利认为，这表示:很有可能，你在一个语系中的阅读毫无问题，但在另一个语系中会很辛苦!
这也显示出我们低估了阅读障碍所具有的复杂性。若是在各种语系间会出现不同类型的阅读障碍，就有很大的可能在同一语系中出现不同的类型，而每个语系可能出现特有的几个类型。这在英文、中文及其他几个语系中都有发现。
在西班牙语中，马德里的研究人员发现的亚型也类似于双重缺陷分类系统，但是有一个惊人的差异:在西班牙语中，影响最大的亚型阅读障碍者的理解力受损的程度要比英国的阅读障碍者轻微许多。类似的结果也出现在希伯来语中。在一项希伯来语与英语的研究中，以色列海法的研究人员比较了各方面条件都相当的研究对象，结果发现希伯来文阅读障碍者理解力受损的情况较轻微。似乎是因为这些语言和英语相比，所需的解码时间较短，因此有较多的时间留给理解过程。
跨语言研究的好处是可以看出一个文字系统的特点会影响到它“瘫痪的原因。当语音技巧在阅读学习中占有相当重要的地位时，比方说在语法较为不规则的英语与法语中，通常会有的缺陷是音位意识与解码的正确性–这正是阅读障碍的良好指标。
当阅读中这些技巧没那么重要时，比方说拼写规则透明度高的德语以及其他的表意文字，处理速度就成为阅读表现的最佳诊断指标，而阅读流畅度与理解力依旧是研究阅读障碍的重点。在透明度较高的语言中，如西班牙语、德语、芬兰语、荷兰语、希腊语及意大利语，阅读障碍儿童较少出现解码问题，反而是流畅度与理解力的问题较为严重。
根据阅读发展中大脑的设计原则以及跨语言、跨方言的比较所累积的一个世纪的研究，为我们开启了一扇认识阅读脑的重要窗口。这让我们得以超越在文字系统的演变和儿童阅读发展中所学习到的知识。这也为我们展示了阅读所需的每项要点:视觉与听觉过程中最细微的侦测指标;在不同文字系统中，联系各过程所需的时间的差别;两个半脑各负责什么工作的问题。
有了这一切作为基础，21世纪的研究人员开始探寻，在这段盲人摸象般的阅读障碍研究史中所发现的一切，最终是否基于一套有限的、掌管旧有结构发展的基因，以及它们通力合作的能力。这些假设我们将在第8章仔细推敲，最终可能产生一个综合这4类假设的总论，在当中有少数几个特别的基因造成了阅读所需结构的神经的异常发展，从而产生了一个效率偏低的全新神经回路，因为这套神经回路本来就不是用来阅读的。
世纪之谜
100年前，几乎没有人知道阅读障碍的存在。大约就在那时，我的曾曾祖父推着手推车在印第安纳州建立起一个小小的经济帝国。根据19世纪南印第安纳州地方史料的记载，尽管有着这样一个有趣的性格——“据说贝克曼(Beckmann)先生既不识字也不会写字，他采用笔画来表示所有的单位数量，以便用来记账，而不用数字;有时他也用数字，但是会混淆把10写成01”，但是他每年运送几百万磅的烟草到英国。
我不会知道我的祖先对于自己无法阅读及颠倒数字的感受如何，但是我敢打赌他一定有像赛车手杰基·斯图尔特那样感到挫败的时刻，甚至是自卑，尽管他的事业有所成功。
幸运的是，今天每位有阅读教学经验的教师都熟悉什么是严重的阅读障碍。预测阅读失败的知识开始运用到教学实践中。斯图尔特、奥法里、科斯比和其他许多人都表示知识和应用之间的差异深深地影响了他们的生活。目前，只有少数教师熟悉阅读障碍的历史，而仍然关心其研究趋势的人就更少了。如果我有5分钟时间跟全世界的父母与教师来谈论此事，我会用以下各点来总结20世纪阅读障碍研究历史的启示:
学习阅读，就像红袜队的棒球比赛一样，是一件可以因为任何原因而失败的美好事情。如果孩子无缘无故地学不会阅读(既没有视觉异常也不缺乏适当的阅读指导)，让他们接受阅读专家与医生的评估是至关重要的。
阅读障碍并没有固定的形式;相反，它是一个持续的发育障碍，反映出阅读以及特定语言中特定文字系统过程的众多组成问题。因此，阅读能力受损的儿童可能会表现出一系列的缺陷。这其中有些很细微，而且日后在学校中只会影响到流畅度与理解力，但多数儿童一开始还是以解码问题与无法学习字母-音位的对应关系规则为主，至少在英语中是如此。这样的缺陷也会出现在拼写能力上。
众所周知的两项缺陷出现在语音与阅读流畅度的过程当中。因此，在大多数语言中，音位识别与命名速度这两项测试，再加上词汇量，都是读障碍的最佳预测指标。有音位缺陷的儿童一般都学不好字母-发音的对应规则以及解码。音位识别测试可以在幼儿园或一年级时就找出这些有问题的儿童。与之相反，仅有流畅度问题的儿童通常会在早期就表现出命名速度的问题。这类儿童的问题经常会被忽视，因为他们的解码是正确的尽管速度较慢。等成为高年级学生或是成年后，当阅读量增大时，他们缓慢的阅读速度跟不上时，就会产生阅读困难。他们比较像是使用德语和西班牙语这类规则语言的阅读障碍儿童，通常仅有流畅度与理解问题。快速命名测试和快速交替测试都可在幼儿园或一年级时就鉴别出有这些问题的儿童。同时有音位识别和命名速度这两项缺陷的儿童必须马上开始接受强化治疗。有少部分的儿童并没有出现音位识别与命名速度问题，但依旧有阅读缺陷，我们还需要进一步了解。
一些阅读障碍严重的儿童，出生于语言贫乏的环境，因此词汇量成为关键因素。对有些儿童来说，英语是第二外语或方言(如非式美语或夏威夷英语 )，他们所展现出来的阅读障碍也会和一般以英语为第二语言的儿童不一样，因为他们处理英语音位的方式不同。因此有必要判断他们除了学习标准美式英语之外，在其他语言学习中是否也有阅读障碍，还是这纯粹只是因为学习第二外语或是方言的困难度而导致的问题。
阅读障碍儿童的治疗应该处理到每一项影响阅读发展的因素，从拼写规则、语音到词汇与词法规则，以及当中的联结、流畅度和理解力的整合。任何一种阅读障碍儿童都不是“傻瓜”或“不听话”;也不是“不够用功，没有发挥潜力”，这三句话是他们最常听到而且长久忍受的。然而这些话通过许多人许多次的强调，经常会弄假成真，甚至连他们自己也误以为此。家长与老师必须确保所有的阅读障碍儿童，不论是何种形式的阅读问题，都立即受到强化治疗，而且没有一个孩子或成人因为阅读问题而被视为弱智。一旦发现儿童有阅读障碍，应立刻提供一个理解支持的方案直到儿童成为一个独立的流畅级阅读者，不然阅读障碍的挫败会引发另一个学习障碍、退学、行为不良的循环。更重要的是，社会与这些儿童都将失去他们尚未发挥的潜能。
我的长子本就是这样一个例子。继一个世纪前他母系的曾曾曾祖父的阅读障碍后，本也出现了阅读的困难。尽管他跟许多其他阅读障碍儿童-样有相当的智力与天赋，还有积极协助的父母，他仍然在挣扎着。这本书最苦恼的时刻是在我介绍奥顿单侧化假说的时候，本就像他高中时一样，和我一同坐在餐桌旁，他在画画，那时我正写到为何奥顿的假设可能出错了。我看到了本的画，他正在仔细地画出整个倾斜的比萨斜塔，但却是上下颠倒的(见图7-9)!我问他为什么要这么画，他回答说这样画对他来说比较容易。
阅读障碍研究的缺陷:在阅读障碍的历史与谜团中，我们已经知道了很多，但还有很多悬而未决的问题。其中一项便是右脑优势的阅读神经回路存在的可能性。
我们这些研究人员没有一个可以用现阶段的知识来恰当地解释这个现象，在阅读障碍的历史与谜团中，我们已经知道了很多，但还有很多悬而未决的问题。其中一项便是右脑优势的阅读神经回路存在的可能性，这项极具争议的发现或许可以解释本异于常人的空间能力。
2006年，本年满18岁，准备去读罗德岛设计学院，我决定和他讨论所有这些关于他的推测。我们画了一系列的图，从一般阅读者大脑使用两个半脑开始，然后是各种神经路径如何随着时间的变化而强化与自动化，最后探讨为什么阅读障碍者的神经回路路径可能异于常人。我丈夫吉尔和我早已习惯经常语出惊人的本;不过，这次他的第一个问题就让我不知所措。
图 7-9 本17岁时画的比萨斜塔
“那么，这是否意味着我比较有创造力是因为我比一般人更常使用右脑，增强了右脑的神经路径?还是说阅读障碍儿童天生就是有创造力的?我不知道要怎样回答本的问题，但我的确知道这个问题和近来右脑优势的阅读神经回路的新研究密切相关，这些问题反复出现在研究中:右脑优势的阅读神经回路，究竟是无法轻易命名字母与阅读文字的成因，还是结果。
处于21世纪的我们正逐步揭开这个谜团，因为我们正在将过去的阅读障碍研究史中众所周知的与忽略的种种信息连接起来。再加上近来脑部成像研究所得到的新信息，对无法学习阅读的大脑到底发生了什么，我们眼前将会浮现出一个更为透彻的图像。我还不知道这项阅读障碍的新研究最后会有什么结局，身为一个研究者，我不愿多谈自己的直觉。但是如果我是正确的，我们将会发现阅读障碍是大脑补偿策略的一个惊人例子:当大脑无法正常运作一项功能时，它会重塑自身另辟蹊径。为什么会这样呢?这个问题将带领我们进入这座金字塔的最后两层，开始我们基因组成的有趣议题。
第八章不要错失阅读以外的才能
“你读书的时候字母会在书页上漂浮，对吗?这是因为你的心智是由古希腊人打造的,”一起露营的、有着灰色眼睛的安娜贝什解释道，“还有注意力缺陷多动障碍–你太冲动，在教室里坐不住。那是你战斗力的条件反射。在真正的战斗中，这会保住你的性命。至于注意力问题，其实是因为你看到太多，而不是太少了!珀西，你的感觉超乎常人……面对它吧!你是一个混血儿。” ——里克·赖尔登
倘若我们知道
一如雕刻家所知
木材中的缺陷
如何引导他的刻刀
找到最核心的地方
——戴维·怀特
爱迪生、达·芬奇和爱因斯坦这三位举世闻名的伟人，据说都有阅读障碍。爱迪生小时候因为阅读障碍和健康状况不佳，不能进入正式学校学习，但他却是美国取得最多专利权的人，他创造出了无数惊人的发明，其中一项照亮了整个世界。
达·芬奇是历史上最具创造力的人之一，他身兼发明家、画家、雕刻家、音乐家、工程师和科学家等诸多身份。虽然在各方面都很突出，他却经常被怀疑有阅读障碍。这样的猜测主要来自他留下的大量稀奇古怪的笔记。这些笔记都是从右至左的“镜像字体”，当中充满错别字、错误的语法和奇怪的语言错误。好多位为他写传记的作家都提到他不喜欢语言，而且经常提到自己缺少阅读能力。在描写理想的画家生活时，达·芬奇曾说过，最好身边总是有人为他朗读。神经心理学家阿伦(PG.Aaron)在分析达·芬奇的读写问题后，认为这正是“右半脑补偿机制”最有力的证据。
爱因斯坦3岁以前都不大说话，他在任何需要动用文字的科目上，如外语，都表现平平。他曾说过:“我最大的缺点就是记忆力差，特别是记忆文字和内容。”他甚至还说文字在他的理论思考中,“似乎没有扮演任何角色”他“多半都是由清楚的图像来思考的”。
爱因斯坦的状态是否如他自己和格施温德所认为的一样，是一种阅读障碍，目前仍然不得而知。不过，如果真的发现颠覆我们时间和空间认识的理论家，其脑部竟然有和时间相关的缺陷，该是怎样曲折的一段故事!要解开这个谜团，其中一个线索在他的大脑里。加拿大的神经科学家进行了一个有趣但备受争议的实验，他们解剖了爱因斯坦的大脑，发现在他扩大的顶叶中，两半脑异常地对称，不同于一般典型的左右不对称模式。
多数的阅读障碍者并没有爱迪生或达·芬奇那样惊人的天赋，但似乎有不少的阅读障碍者具备不寻常的才能。我曾记录了一份在各领域颇有声誉的阅读障碍者名单，随着名单的不誉的阅读障碍者名单，随着名单的不断增长，我改成了只记录这些领域。
患有阅读障碍的名人们:爱迪生、达·芬奇、爱因斯坦、冠军车手斯图尔特、建筑大师安东尼奥·高迪、思科CEO 约翰·钱伯斯、演员约翰尼·德普、凯拉·奈特利……
在医学界，阅读障碍者可能出现在放射部门，在这里模式识别是工作的重心。在工程和计算机技术领域，他们大量集中在设计与模式识别部门。在商业界，奥法里和施瓦布这些阅读障碍者倾向于从事高级财务或资金管理，这类工作需要从大量的资料中预测趋势和进行推理。我小叔子是一位建筑师，他告诉我在他之前的事务所，建筑师的文章如果没有经过两次的拼写检查，绝对不会拿出去。有阅读障碍的演员包括丹尼·格洛弗(Danny Glover)、凯拉·奈特利(Keira Knightley)、乌比·戈德堡(Whoopi Goldberg)、帕特里克·登普西(Patrick Dempsey)以及约翰尼·德普( Johnny Depp )。
另外两个例子则来自我的亲身经历。当我怀孕的时候，我被介绍去一名波士顿最有名的放射科医师那里做 B超。在我躺在那里等待时，听到旁边的技术员聊天，他们说全世界的患者都想到这个放射科医师的诊所来，因为这位医师是该领域最权威的人士。我的耳朵马上竖了起来，尽可能客气地问他们，为什么她是最好的，他们立刻回答我，因为她可以在几秒钟内找到一般人找不到的模式。后来，我才得知她和她的父亲都有阅读障碍。
最近一趟巴塞罗那的旅行也有类似的经历。有5天时间，我都漫步于西班牙伟大的建筑师安东尼奥·高迪(Antonio Gaudi)的作品所在的街道上，那里的教堂和建筑充满着才华横溢的设计、异想天开的创造力和肆无忌惮的颜色运用。我猜想高迪应该也有阅读障碍。瞧，我是对的!每本高迪的传记都记录了他儿时学习与阅读时的悲惨经历。他差点儿被赶出学校，但在毕业之后，他却成为西班牙有史以来最杰出的艺术家，成为巴塞罗那建筑的守护神。
阅读障碍者的右脑
我们如何才能解释这么多阅读障碍者在创造力方面及其“跳出思维框架”的优势?正如我儿子本所问的，到底是因为阅读障碍者的左脑出现了问题，迫使他们使用右脑，进而增强了所有的右半脑的联结，因此发展出独特的策略来应对所有的事情，还是一开始他们的右脑联结就更有掌控性和创造力，因此接管了阅读这类活动?神经科学家阿尔·加拉布尔达(Al Galaburda)认为这两个推论可能都对了一半:“一开始没有形成左半脑的神经回路，使右半脑的神经回路能够动用许多空闲出来的神经突触。稍后，既然不能进行阅读，它们就往其他方面发展，尤其是那些擅长的方面。这些初步的证据所引发的问题并没有确切的答案，不过通过整合行为认知、神经结构与遗传学等多个层面来探讨阅读障碍问题却是一个很好的开始。基因基础是关键。尽管没有专门的阅读基因，但是这并不意味着没有与某些形成阅读脑固有缺陷有关的基因，这些基因也可能潜在地与其他强项有关。未来阅读障碍的研究方向将把我们关于行为强度和结构缺陷的知识与遗传信息结合起来，以探索是否某些阅读障碍儿童的右脑从一开始就具有建造大教堂的天赋。
八十多年前，奥顿第一次提出大脑的两个半球无法整合各自存储的图像这一极具争议的假设。五十多年之后，格施温德写了一篇论文，标题很简单–“为什么奥顿是对的”。格施温德列出13个他和奥顿对于阅读障碍相同的见解，并认为这些见解应当纳人所有有关阅读障碍的探讨之中。
这份清单以阅读障碍的遗传基础与大脑组织可能出现的结构差异开始，列出了在阅读障碍者受到影响的家族成员和未受到影响的亲戚中发现的优秀的空间天赋:意想不到的阅读能力，如阅读上下或镜像颠倒的文字如我的儿子和达·芬奇所做的那样);书写困难等其他不寻常的特征;并不是每个个案都会表现出说话、知觉与运动异常，这些问题需要更多深入的探索(如口吃、两手同样灵活、笨拙与情绪化等问题);以及口语与语言系统的发展缓慢。
格施温德讨论“为什么奥顿是对的”的论文，给21世纪的研究人员留下了一份检查清单，要想解开阅读障碍之谜，获得一个满意的解释，就要先回答这些问题。以镰状细胞性贫血(这些患者的基因可以抵抗疟疾)为例，格施温德继续深入地观察，获得了这些现在看来依旧相当敏锐的发现:
阅读障碍者经常在许多领域具有极高的天赋……我建议你别把这当做巧合。如果大脑左侧的某些改变造成其他区域尤其是大脑右侧的优越性，那表示在一个到处都是文盲的社会里，有着这样的改变的人反而具有优势，他们的天赋会让他们成为高度成功的公民……因此，我们陷入了-个矛盾的概念:大脑左侧的异常问题，在某些普遍识字的社会里是阅读障碍，同时也在同一颗大脑中决定了其优越性。
这些观察和他大多数神奇的想法一样，是阅读障碍实证研究的先驱我们现在才跟上他的脚步。早逝的格施温德没能见到他的许多真知灼见对这个领域的持续影响，这些影响有的来自于他的直接贡献，有的来自于他的学生的研究，还有的来自于由他开始的阅读障碍研究计划，这些计划一直持续至今，将行为联系到结构、神经元乃至基因层面。
格施温德所设想的研究计划早在二十多年前就在波士顿市立医院开始了:对于一颗保存完好的阅读障碍者大脑，没有人知道该如何处理。因此这颗大脑就交给了格施温德，他知道该怎么处理。他立刻将其交给他的两个年轻的神经科学学生加拉布尔达和托马斯·肯珀(Thomas Kemper)，他们立刻对此进行了仔细的研究，首先解剖几个宏观结构，其次是阅读必须用到的微观区域。
此后没多久，就发生了另一个重大事件。格施温德和加拉布尔达与“奥顿阅读障碍学会”成立了“脑库”，在贝斯以色列医院保存了少数几颗阅读障碍者的大脑。这个机构的影响深远，目前右脑成像研究的发现就来自这个脑库。多数人的颞平面和语言有关。颞平面是颞叶上的一块三角形区域，包括一部分的韦尼克区，在左半脑的区域会比右半脑的大。加拉布尔达与肯珀发现成人阅读障碍者的脑部并没有呈现出不对称;相反，他们的两个半脑是对称的，因为他们右半脑的颞平面比一般人的稍大一些。
颞平面:多数人的颞平面和语言有关。颞平面是颞叶上的一块三角形区域，包括一部分的韦尼克区，在左半脑的区域会比右半脑的大。
加拉布尔达的研究团队从这些发现中推测，阅读障碍者脑部的单侧化不完全，或是和一般人不同–这一观点对许多语言过程的发展有很大的影响。他们推测右脑颞平面异常的大,可能源自于胎儿期间细胞的自然减少这可能导致颞平面神经元的数量增加，接下来，阅读障碍者在右半脑形成新的联结以及整套新的皮质结构。当他们尝试在活着的阅读障碍者的脑部寻找类似的对称性时，功能性核磁共振成像的结果比较复杂，因此他们的假设失去了基础。
在结构层面得不到一致的结论,促使研究往细胞层面推进。加拉布尔达与其同事们采用“细胞建构学”方法，来研究可能与阅读障碍有关的一些区域的细胞微观结构、数量和神经元迁移模式。他们发现在胎儿期初期的发展中，有几处和语言与阅读相关区域的外胚层细胞会进行迁移:左侧颞平面、丘脑的几处区域以及视觉皮质区域。这些区域组成了阅读神经回路部分，它们发生的任何神经元迁移都可能影响到这些回路之间神经交流的准确度与效率。
举例而言，加拉布尔达的研究团队发现，负责快速或瞬间处理过程的“巨细胞系统”，在丘脑内部与阅读相关的两个重要中心里，至少有两处表现出持续的异常现象:一处是脑内的外侧膝状体，负责协调视觉过程;另一处是内侧膝状体，负责协调听觉过程。我们再次发现了两个半脑之间的区别，右半脑的大型神经元要比左半脑多一些。加拉布尔达认为这些细胞的差异会影响到处理文字信息所需要的时间，而且可能意味着阅读障碍者使用了一个不同于常人的阅读神经回路。
加拉布尔达慎重地指出，我们还不知道这些差异到底是阅读障碍的成因还是结果。这里浮现出来的问题是，各类神经元的改变如果发生在重要的部位(如阅读所需的固有结构)，就可能破坏阅读所需的神经效率，因而会促使大脑形成一个不同的阅读神经回路。该观点整合了过去许多基于结构、处理速度与神经回路改变等缺陷的有关阅读障碍的假设。
有两项特别的研究阐明了这个结论。其中一项是在转基因老鼠身上测试神经元层面缺陷的影响,有时候,这种老鼠被夸张地称为“超级鼠”。贝斯以色列医院的神经科学家格伦·罗森(Glenn Rosen)在这些老鼠的听觉皮质区造成一处小小的损伤，类似于早期在阅读障碍者丘脑中发现的神经异常。实验发现，损伤导致了老鼠无法再快速处理呈现的听觉信息。此外，格伦的动物模型显示出重要区域的细胞可能导致处理信息的效率出问题。
另一项由波士顿神经科学家主持的研究显示，患有罕见遗传性癫痫“脑室旁结状灰质异位症”的病人也有相似的情况。这种病人在出生前脑室旁会有“流氓细胞”形成的神经瘤。这些神经瘤类似于在超级老脑中诱发的损伤:它们出现在不该出现的地方，因此在某些情况下具有破坏性。在这个研究案例中,这些神经瘤造成以后生活中的癫痫发作-但也有可能是其他原因。
这项研究的参与者之一张博士找到了我和我的同事塔米·卡兹尔(Tami Katzir)，因为他们在所有的患者身上都发现了一个同样的行为特征:阅读流畅度极差，他对此感到疑或不解。有些患者小时候就被诊断出阅读障碍有些则没有。有些出现语音缺陷，有些没有。但所有患者出乎意料地都是迟缓型阅读者。我们立刻意识到，不论是成人阅读障碍者，还是儿童阅读障碍者，关于他们阅读流畅度的问题，这些患者可以提供无法预测到的证据。
将这些研究集合起来，我们得出了几项重要的启示:造成阅读流畅度受损或是迟缓的途径可能很广泛，阅读障碍的成因是多么各种各样。癫痫患者的例子暗示着阅读障碍可能是由脑部许多区域的缺陷导致的。例如在可能影响到视觉工作效率的地方或是在可能减弱语音过程的区域长出脑瘤，这两种情况都会造成阅读迟缓。但这些案例不能解释为什么在某些读障碍者身上会出现过多依赖右半脑的情况，但是它们确实显示出因为左脑的种种缺陷，大脑被迫使用相应的右半脑区域。
在格施温德的逻辑之上发展出了一个新的假设。在没有文字的社会里，右半脑强化基因可能会高度发达，但在有文字的社会中，同样的基因却在右半脑中建立了基于时间功能的、负责精确度的阅读结构。而这些功能最终会用右脑的独特方式来执行，它们没有左半脑的准确度高，效率也不尽如人意。所以在阅读中不可避免地会遇到困难。一位杰出的遗传学家观察到，阅读受到许多基因的影响，这些基因的出现会增加阅读问题的风险，但是和由一个基因造成一种特定的遗传性疾病的情况不一样。举个例子，囊肿性纤维化症的一个基因就决定了其显型结果，或称为遗传结果。相反，阅读基于许多固有的过程，因此复杂度高不是一个基因就可以完全决定所有的阅读障碍类型。换句话说，阅读障碍不会只有一种显型。
耶鲁大学的遗传学家埃琳娜·格里戈林克(Elena Grigorenko)的研究强化了这个观点。在对与阅读障碍相关的遗传区域进行地毯式分析之后她认为这个问题是多基因而非单基因遗传。这项结论解释了为什么有多种阅读障碍亚型的存在。彭宁顿和科罗拉多的研究团队也观察到，亚型如有语音缺陷、流畅度缺陷、双重缺陷与拼写规则缺陷等表现的阅读者是几种显型在行为层面的表现。而且，由于不同的文字系统所需的条件不同，有些表现型可能在拼写规则的语言中较为普遍，如德语，有些会在不规则的英语中居多，而在全然不同的文字系统如中文、日文这类语素音节文字中，又是另一种表现。
阅读障碍是多基因的遗传现象:阅读障碍不是一种由一个基因造成的疾病。阅读涉及许多复杂的认知过程，没有哪个基因可以完全决定所有的阅读障碍类型，阅读障碍是一种多基因的遗传现象。
目前一些跨国研究初步支持了这样的观点:在其他语言的阅读障碍者身上会发现遗传的差异。芬兰与瑞典的研究人员发表了一份资料，他们发现在第6条染色体上称为DCDC2的遗传位置可以用来辨别多数的德语读障碍者，他们主要都有流畅度缺陷的问题。耶鲁与科罗拉多的研究团队针对英语语系的阅读障碍者进行研究，结果也支持这个基因位置，但在他们的实验对象里仅有17%的人是阅读障碍者。有趣的是，我们发现在我们的亚型研究中，阅读障碍者中约有17%的人也只有流畅度缺陷的问题。
DCDC2的故事里有一个有趣的转折，这与之前提到的阅读障碍使用不同的阅读神经回路有关。通过动物模型，耶鲁的研究人员研究发现压制这个遗传位置的表现时，新生的神经元不会迁移到右半脑皮质区。耶鲁的研究人员据此提出一个假设:有类似的遗传变异的阅读障碍儿童，他们的脑部可能会形成并使用一个“效率较低的阅读神经回路”。
在另一个不同类型的研究中，研究者关注了一个具有悠久阅读障碍遗传史的芬兰大家族，发现他们一个称为ROBO1的区域表现出遗传变异有趣的是，如果按照奥顿早年提出的假设来看，ROBO1协助“发育期间塑造大脑两侧的神经联结，并且阅读障碍者的ROBO1可能受损”。另外，这些研究也在两种规则语言中发现了两个不同的区域–这正好反映出阅读障碍的多种解释以及为何在单一语言中发现诸多亚型的事实。
其他支持来自美国一项大型且相当完善的遗传研究项目“科罗拉多双胞胎研究”。在这个项目中，心理学家迪克·奥尔森(Dick Olson)和其他研究者从幼儿园开始追踪了300对同卵与异卵的双胞胎。这个团队发现儿童在阅读能力、语音意识和快速命名的能力方面，表现出很大的遗传效应和一些环境效应。对理解阅读障碍中可能的亚型极为重要的是，语音技能与快速命名都显示出独立的、意义重大的遗传性。
如果这些结果可以重复验证，那就表示有单独的基因来负责目前已有文献记载的英语阅读障碍亚型的两种过程，并且在许多语言中可以以此预测阅读障碍。倘若未来的研究可以准确地找到不同的显型，以及它们的结构和行为特征、缺陷和强度之间的关系，我们就有办法解开阅读障碍史中的许多谜团。
每个孩子都有自己的潜能
如果有好几种显型，有些儿童的阅读障碍可能同时遗传自父系与母系。我在思考儿子本的阅读障碍家族遗传史时，考虑到各种细微和明显的个案发现他和弟弟戴维的情况，就和奥顿与格施温德观察到的一样。虽然戴维具有写作天赋，还热爱足球，看起来似乎没有受到任何影响，但他提取文字的问题以及书写困难，无论如何补救都没有用。
戴维的情况和本的双重缺陷可能来自两个家族基因综合的结果。我的公公厄恩斯特·诺姆(Ernst Noam)是一个欧洲知识分子，学的是德国法律，但在希特勒时代始终无法执业。我丈夫的姐姐从她父亲那不同寻常的求学史中，发现他有某种类型的阅读障碍，虽然他可以读4种语言。我自己的母系中，曾曾祖父颠倒数字与字母的事迹人人皆知，连印第安纳州的历史中都记载了这个事实。吉尔和我的兄弟姐妹、堂兄表妹、外甥和侄女中分别有人从事艺术家、工程师、律师、商人、外科医生的职业，都取得了成功，其中有些人曾有过轻微或不甚轻微的学习问题。
格施温德花了不少篇幅来讨论，我们有必要了解“未受影响”的家族成员的基因情况。例如他注意到奥顿本身的“了不起的空间才能”。戴维的书写困难与文字提取问题，并不需要我特别花精力去研究，一直到我坐下来动笔写这一章时，我才开始审视自己的学习经历。我自己的阅读过程看似没什么特别之处，我的文字提取过程也毫不费力–这完全是因为我对文字的热爱无形中给我提供了已经准备好的替代方式。
还有一件与此有关的事情，我也是现在才想通。多年前，我有个不为人知的梦想，希望能成为一个钢琴家。当我温柔的钢琴老师告诉我，她很喜欢听我弹莫扎特、肖邦或贝多芬，但我弹的并不是这些作曲家原本想要表达的，我的梦想就此破灭了。她说我有自己的时间感，总是跟作曲家不一样，而且她认为这个问题改正不了。刹那间我明白为什么我每次陪那些可怜的孩子弹钢琴时，总觉得他们的节奏不对。原来是我自己的时间感有问题，而不是他们错了!一直到现在我才想到我读乐谱的怪异方式可能是一种遗传的表现，来自我自身处理速度差异的遗传。
在阅读障碍者的家庭里，其实没有“未受影响”的家庭成员。我们每-天都受到或多或少的影响，任何一个有孩子、孙子或兄弟姐妹是阅读障碍者的家庭，都明白这一点。不过我们受到影响的方式可能超乎我们的理解–这些方式打开了一扇门，让我们理解为什么在具有阅读障碍遗传史的家族中，会出现如此多的各式各样的家族成员。
我对爱因斯坦大脑的重量和脑回没有太大的兴趣，我好奇的为什么拥有相同才能的人，却在棉花田或是毛衣店里耗尽了一生。 ——史蒂芬·杰伊·古尔德
最后，阅读障碍研究最重要的启示，并不是要确保我们不妨碍未来达·芬奇或爱迪生的发展;而是要确保我们没有错过任何一个儿童的潜能并不是所有的阅读障碍儿童都天赋异禀，但他们每一位都具有独特的潜能然而大多无人知晓，因为我们不知道如何开发这些潜能。
我们这些研究阅读障碍的人正尝试寻找能够实现他们潜能的方法。在所有该说的都说了、该做的都做了之后，最终需要将从行为开始一直到基因层面的阅读障碍研究与我们的教学方法和内容结合起来，看看是否适用于特殊的儿童。就我们所探索过的众多原因来看，目前大多数学校采取的是统一的教学模式，无法帮助阅读方面有困难的儿童。因此，有必要让教师们接受培训，使用一套可以用来辅导不同类型的儿童的教育规则。正如政策制定者里德·莱昂(Reid Lyon)一再强调的那样，我们还需要调查与了解不同条件的儿童所需要的最好照顾是什么。世界上并没有一个普遍有效的治疗计划，不过倒是有必须纳入所有语文教学计划的原则。
在这些重要的原则中，有几项和文字本身一样古老。多年来，我和同事在阅读语言研究中心，以我们对阅读脑的了解，设计并且评估了一个治疗计划(RAVE-O)，可以帮助许多正在挣扎的阅读者克服诸多语言缺陷我们从来就没有意识到我们其实是在重复发明苏美尔人的那套教学规则这是目前人类史上所知的最早的阅读教学法。
我们可以用全然不同的方式来组装教学法，但跟苏美尔人一样，强调每天大脑阅读所用的主要语言与认知过程:以词语的语义家族来指导语义的深层知识以便提取词语;强调语音意识及其与字母表征的关系;强调拼写规则的自动化学习:强调语法知识与词法知识。但跟苏美尔人不同的是我们采用多种策略来处理流畅度与理解力的问题。我们和苏美尔人一样想要每个有困难的阅读者尽可能多地认识字;但我们希望每个儿童都能快乐地学习。
和儿童一起工作的我们，希望他们能了解，尽管学习方式不同，但他们每一位都可以学会阅读。寻找最好的方式来指导阅读障碍儿童是我们的工作，而不是儿童自己的责任。我和我们的同事莫里斯和莫琳·洛维特(Maureen Lovett)10年来开展了针对各种治疗计划的研究，我们都努力做着这样的工作。
每个孩子都有自己的潜能:我们不能错过任何一个发展儿童潜能的机会。并不是所有的阅读障碍儿童都天赋异禀，但他们每一位都有独特的潜能，这些神奇的能力往往被埋没了，因为我们不知道应该如何开发。
我们实验室与全国各地的研究中心未来的努力将放在治疗计划造成的改变上，不仅是行为的变化，也包括神经层次的变化。比如，现在我们正和麻省理工的加布里埃利的研究团队合作，研究和判断在我们的治疗计划前后，阅读障碍者的脑部重要区域是否发生了变化。好的老师不需要学习神经学也会知道口语和文字的许多层面是非常重要的，不过纳入神经科学的教育研究可以判定哪些方式对儿童最有效。只要我们能观察儿童在从事特定的测验时所动用的结构区域，以及在经过一套强化疗程后，他们改变的经过和情况，便可以知道这一切。
这些新方向正改变着我对阅读障碍的看法，不论是身为一名研究人员还是一个母亲。如果阅读障碍依赖右脑的理论，在某些甚至大多数的儿童身上被证明是正确的，这相当于开辟了一条前所未有的道路——教育大脑组织架构不同于常人的阅读障碍者，这其中也混合着独一无二的优势和挑战。最后，所有关于以不同方式来学习阅读的儿童的研究，都将成为研究我们如何学习阅读的巨大知识体系的一部分。随着时间的演变，不论最终的解释是什么，这个领域的研究驱使着我们超越过去二十多年的所学，进入一个几乎尚未探索过的新领域。事实上，超越我们的所知，正是本书的最后一个目的。
第四部分让大脑有时间来思考:超越阅读脑
在传统书籍与电脑屏幕的冲突中，屏幕终将取得压倒性的胜利。地球上已有10亿人在看这样的屏幕，搜索技术会把零散的书籍转化为全人类知识的环球图书馆。 ——
凯文·凯利
第九章网络时代的阅读与思维方式
世界每一次沉闷的转折都有这样一些人被剥夺继承权，他们既不占有过去，也不占有未来。因为未来即使近在咫尺，对于人类也很遥远。 ——雷纳·玛丽·里尔克
阅读是一种内在的纯净而简单的行为。其目的并不仅仅是吸收信息而已……相反，阅读是拷问自身……书，是人类创造出来的最美好的东西。 ——詹姆斯·卡罗尔
每个社会都在担忧他们的年轻人以及他们未来将要面临的挑战。在人类进化的过程中，此刻所面临的这些挑战正步伐加速，对此没有人能比未来主义者兼发明家雷·克兹维尔(Ray Kurzweil)描述得更令人信服。他在充满远见的著作中说，通过人类发明的科技和人工智能，我们大脑内上千亿的神经联结将成倍地扩张:
我们有信心在 21 世纪 20 年代看到供我们模拟整大脑的数据收集与运算工具，使人类智能的运作原则与智慧型信息处理的形式有可能结合起来。机器拥有储存、提取与快速分享大量信息的强大功能，人类也将因此受惠。然后，我们便能在电脑运算平台上应用这些强大的混合系统，这将远远超过构造上较为稳定的人类大脑的能力。
受限于我们大脑目前每秒 10¹⁶~10¹⁹次的运算能力，我们甚至难以想象，2099 年我们的未来文明——届时大脑能以每秒 10⁶⁰次的速度来运算，那么我们思考与行动的能力又将如何呢?
有件事情倒是可以预见的，那就是人类行善与破坏的能力也将成倍地增强。如果要为这样一个未来做准备，那么我们做出重大抉择的能力势必来自于过去世代的学习者鲜少使用的严格标准。若是物种想取得完全意义上的进步，这样的准备工作，需要将大脑的注意力与抉择力为全人类的幸福服务。换而言之，要准备好迎接这样的未来，必须将我们目前的阅读脑调整到最好的状态，因为它已经开始经历下一代的改变了。
克兹维尔暗示思想过程成倍地加速发展完全是件好事，我并不同意这种观点。在音乐、诗歌乃至生活中，休息、停顿、缓慢的变化是了解整体的必需要素。事实上我们的大脑中有一种“延迟神经元”，其主要的功能就是延缓其他神经元之间的神经传导，不过仅仅几毫秒而已。正是这些难以估计的几毫秒为我们对现实的领悟带来秩序，协调我们踢足球和演奏交响乐时的动作。
“更多与更快必然就是更好”的假设也应当受到质疑，尤其是在从如何饮食到如何学习这样的想法都在不断影响美国社会的时候，很难说这是否真的会带来好处。
举例来说，我们的孩子目前所经历的这些充满加速度的变化，是否将严重影响他们的注意力，是否会影响他们把一个词转化成一个想法、把个想法转化成一个超越想象的、充满任何可能性的世界的能力?我们的下一代在言语文字中发现见解、欢乐、痛苦与智慧的能力是否也将发生戏剧性的改变?他们和语言的关系是否也将产生本质上的变化?他们是否会因为习惯接收即时的电脑屏幕信息，而使得目前阅读脑的注意力、推理能力与反省能力发育不完全?未来一代又一代的孩子又会如何?苏格拉底对没有指导而随意接触信息的顾虑，在现今的世界中是否比古希腊时代更让人忐忑不安?
或者我们的新信息科技所产生的需求，如多重任务以及整合与权衡大量的信息,也许有助于发展更有价值的新技能?那么人类的智力、生活品质以及作为一个种族的集体智慧是否会因此而增长?智力的加速提升，会给予人们更多的时间来反思与追求人性的美好吗?倘若真是如此，下一代人所具备的那套智力技能，是否将会导致产生一群新的、权利被剥夺的儿童就跟目前的阅读障碍者一样，被置于一般人之外?又或者是在对待儿童的学习差异时，我们会因为认识到大脑组织形式的差异性以及这些遗传变异所带来的优势与缺点，而对此准备得更充分一些?
大脑的结构并不是天生设计用来阅读的，阅读障碍就是目前最好也是最鲜明的证据。阅读障碍在我看来是日常生活中一个关于进化的提醒，提醒自己世界上存在着组织极为不同的大脑。有些组织方式可能不适合用来阅读，但对于建筑物与艺术的创造，以及模式识别–不论是在古战场还是活体组织切片中，都至关重要。大脑组织的某些变异可能会为即将占主导的沟通模式带来一些新的装备。
延迟神经元:“更多”、“更快”未必就是更好。我们的大脑中有一种“延迟神经元”，其主要功能就是延缓神经元之间的神经传导，正是这延缓的短短几毫秒为我们对现实的领悟带来了秩序与和谐。
21世纪，人类身处重大而急速的转变之中，我们中的大多数几乎都不能预料到或者完全了解这些转变。正是因为意识到这种转变的意义，我将本书的主旨放在进化、发展以及阅读脑的不同组织方式上。文字的演变和阅读脑的发展提供了一面具有重要意义的镜子，让作为一个种族的我们看清自己。人类是众多口语与文字文化的创造者，其中许多学习者个体的智能形式都是不同的，而且正在不断地延伸发展。
在本书的最后一章，我将以阅读为镜子，回过头来检视之前提出的几个重大观点，然后踏上“超越文本”的探险之旅。在那个未知的领域里我想讨论这些信息对这一代以及下一代的儿童有何影响。最后，阅读脑中的哪些部分是我们在进入下一次大脑重整前，应该倾尽全力保存的?我想就这个问题提出一些反思。
对阅读进化的反思
我对阅读脑进化的总体反应就是惊讶。为什么一小套代币符号在相对较短的时间里竟然茁壮成长，演变出一套完全成熟的书写系统来?为什么单单一个文化性的发明在不到6000年的时间里，就改变了大脑内部的联结方式，以及我们这个物种智力的可能性?然而更让人惊讶的是:大脑竟然神奇到能超越本身，在这过程中同时增强其功能与我们的智力能力。
阅读说明了大脑如何学习新技能并且增加自己的智能:在旧有结构之间重塑神经回路与联结;充分利用自己的能力促使区域专门化，尤其是在模式识别方面;而且阅读也解释了新的神经回路如何转为自动化，如何释放出更多皮质运作的时间与空间，供其他更为复杂的思考过程使用。也就是说，阅读展现出大脑组织中最基本的设计原则，是如何塑造我们持续进化的认知发展的。
大脑的设计让阅读成为可能，而阅读的设计则以多层次的、关键的、持续演变的方式来影响并改变大脑。这样相互的动态关系在我们这个物种的文字系统中诞生，并且在儿童学习阅读的过程中大放异彩。学习阅读将我们这个物种从许多先前人类记忆的限制中释放出来。突然间我们的祖先可以接触到不需要一遍遍地反复传诵的知识，还可以大幅扩展这些知识。有了文字就不需要重复发明轮子，也就有可能发明更为复杂的东西，就像雷·克兹维尔为阅读障碍者发明的阅读机器。
与此同时，瞬间可成的识字能力让个体阅读者不仅从记忆的限制中释放出来，也从时间中释放出来。通过逐渐自动化，识字能让个体阅读者减少一开始花在解码过程中的时间，将更多的认知时间和皮质空间用于已记录思想的深层分析。在两个半脑的长度和宽度之间，随处可见初级解码时期与完全自动化的理解型大脑之间的神经回路系统的发展差异。通过专门化与自动化，系统可以变得更为流畅，也就有更多的时间思考。这是阅读脑赐予我们的神奇礼物。
没有什么发明可以让大脑准备得如此充分，让物种如此的先进。随着社会文化中读写能力的广泛普及，阅读的行为默默地邀请每位阅读者超越文本本身;如此一来，更进一步地推动个体阅读者与文化的智能发展。阅读的“传承性”来自于生物性，是靠智力获得的，这是时间赐予大脑的一份礼物，贵重程度难以衡量。
这项观点的生物证据要从我们意识到今日的大脑结构和4万年前不识字的原始人之间几乎没有什么差异谈起。我们和苏美尔人、埃及人的大脑结构并没有什么不同，但是我们使用与连接这些结构的方式却创造出极大的区别，就像在象形文字与字母文字等不同文字系统的阅读比较中所见的。
珀费提、谭力海与他们的研究团队进行了一项先驱性的研究，证明了每种文字系统，不论是古代的还是现代的，都使用许多类似的以及一些独特的结构性联结。在用来阅读埃及象形文字或汉字的大脑中的某些激活的区域，在阅读希腊文或英文这类字母文字的大脑中绝对不会被激活，反之亦然。这些逐渐适应的变异，正是大脑重塑自身以执行新功能的内在潜能的鲜明佐证。
在文字系统诞生之初，发生改变的不仅只有大脑神经回路而已。正如古典主义者埃里克·哈夫洛克所主张的，希腊字母文字代表的是人类历史上一场心理与教学法的革命:写作过程释放出前所未有的能力，使大脑产生新思维。一些顶尖的认知神经科学家研究了各种文字系统中这种能力的神经基础，不只是字母文字，还包括所有综合性的书写系统。他们描述了学习阅读时，大脑基本运算的重塑如何成为新思维的神经基础。换而言之，大脑为了阅读而规划的新的神经回路，成了能够以不同的崭新的方式来思考的基础。
阅读革命:阅读革命是同时基于神经元与文化的，始于第一个综合性文字系统的出现。它所增进的书写效率与释放出的记忆，有助于新思维的形成。
因此，阅读革命是同时基于神经元与文化的，而且始于第一个综合性文字系统的出现，而不是第一套字母文字。它所增进的书写效率与释放出的记忆，有助于新思维的形成，神经系统也是如此建立阅读系统的。学会重塑自身结构来阅读的大脑，更容易产生新的想法;阅读与书写促进智力技能日益复杂化，这又增加了我们的智能储备库，而且会持续增加。关于上述讨论，我们必须反思这样一个问题:哪些技能是不会出现在口语文化中，而必须靠文字来提升的?在创造出最早的代币符号后，紧接着是第一套会计系统，伴随而来的是要获取更多更好的信息而提升的决策力。因此，很明显第一套已知的符号(除了洞穴里的壁画)是服务于经济的。
最初的综合性文字系统，即苏美尔人的楔形文字与埃及人的象形文字将简单的会计转变成系统性的文献记载，引发出具有组织性的系统与编码从而加速了智能的重大提升。到了公元前2000年，阿卡德语的文献就开始对整个已知的世界进行分类，例如百科全书式的《关于宇宙万物》(AIThings Known in the Universe )、法律经典著作《汉谟拉比法典》，以及其他各种著名的医药文献。就连科学方法本身，都是源自于我们祖先日益成长的记载、编撰与分类的能力。
在许多地方都可以找到语言意识增进的证据，开始于苏美尔人教导阅读的方式。他们在“泥板屋”所用的方法对于词汇不同特性的高度认知有一定的贡献:例如，词语间多重语义或意义间的关系;不同的语法功能;词语内部组成的结合性，可以用已有的词根与词素组成新的词语;以及方言间、语言间不同的发音。
苏美尔的年轻人痛苦地将老师刻在泥板上的一列列文字复制到另一面。这一过程不仅对语言意识的渐进发展极有帮助，也对思考本身贡献良多。几个世纪以后，我们从阿卡德人的文献，如《吉尔伽美什史诗》《悲观主义的对话》，与其他许多保存下来的乌加里特文献中了解了这些成长中的小学生的感受、想法、尝试与喜悦，走入了他们的内心世界。这些古老著作正是超越时间的见证，见证着现在我们经常思考到的现代意识的出现。
很少有学者比耶稣会文化史学家沃尔特·翁更鲜活地表现出读写能力对于古代世界的意识出现有何贡献。在他毕生对口语和读写能力关系的研究中，沃尔特·翁重新构建了阅读对人类独特贡献的问题，这可能有助于我们思考目前正转移到数字化交流模式的问题。20年前，沃尔特·翁就主张人类智能进化的真正争议点不在于一种文化模式所推动的交流技巧比另一种先进，而是人类在两者间转换的能力。沃尔特·翁曾写过一段很有先见之明的文字:
人类生来就会的口语，和后天学会的书写技术间的相互作用触及心灵的深处。正是口头的文字以清晰的语言来阐明意识，首先区分出主语与谓语，然后探究其中的相关性，并且使社会中的每个人互相联系。书写引入了区分与异化，但是也带来更高度的统一。它加强自我意识并且巩固人与人之间越来越多的意识交流。书写是一种意识的提升。
对沃尔特·翁来说，对人类意识的全新理解是口语和文字交会时真正的改变:阅读改变人类关于思维的思考。从《安娜·卡列尼娜》中列文的揭露，到《夏洛的网》中蜘蛛的预言，洞察他人想法的能力让我们加倍意识到他人的意识，以及我们自身的意识。我们研究他人想法过程的能力贯穿了三千多年，使我们得以内化我们从未设想过的整体人类意识，包括苏格拉底最伟大的口语传统。正是因为我们可以阅读柏拉图充满矛盾思想的作品，才得以了解苏格拉底的想法与他所关心的普遍本质。
显然，在该说的都说了、能做的都做了之后，苏格拉底忧虑的其实并不是读写能力，而主要是知识本身。他真正担心的是年轻人未经指导，尚未有批判力，就能任意接触到信息，这恐怕会影响到知识本身。对苏格拉底而言，寻找真正的知识并不需要在信息上来回思考，而是要去寻找生命的本质与目的。这样的搜索需要投入一生，发展出高度的批判与分析技巧并且通过大量记忆的运用与长期的努力来内化个人知识。
只有在这些条件都具备的情况下，苏格拉底才认为学生能够从和老师对话以探求知识的阶段，转到一条原则性道路上，指引着他的行动、美德最终到达“和他自己的神友爱相处”的阶段。苏格拉底认为知识是达到至高境界的力量;任何可能有危害性的东西——比方说读写能力，都应该被禁止。
苏格拉底的顾虑，有部分可以通过仔细理解知识与读写能力之间密不可分的关系，以及它们对年轻人的发展的重要性来解决。有讽刺意味的是，今日的超文本与在线文本,在电脑环境的阅读中,提供了一种真正对话的维度当代学者约翰·麦克尼尼(John McEneaney)表示:“线上读写能力的动态作用，改变了读者与作者的传统角色，以及文本的权威性。”这样的阅读需要新的认知技巧，不论是苏格拉底还是现代的教育学者都没有完全了解。
我们才处于分析电脑的使用对认知影响的初级阶段，比如使用浏览器的“后退”键、URL语法、“cookies”与“教学性标签”方式，是否能提升理解力与记忆力。这些工具对于使用者的智力发展绝对有影响，尤其是对不同区域有缺陷的使用者来说，应用学习科技可以直接有效地处理他们的问题。应用科技专家戴维·罗斯(David Ross)与他的团队强有力地证明了数字化文本可以给教师与学习者提供更多的选择:“外观、支持度、支持类型、回应方法、内容……所有与参与度有关的重点。”而我们学习者的参与程度和古代雅典学院的学习者一样重要。
其实苏格拉底的这些顾虑还有更深层的意义。从伊甸园到全球互联网谁应该知道什么、何时知道，以及怎样知道，一直是个贯穿于整个人类历史的悬而未决的问题。在一个超过 10亿人可以上网，接触到自古以来最海量信息的时代，有必要将我们的分析能力利用在知识传输的社会责任上。苏格拉底针对雅典青年提出的学习问题，最终还是会用到我们身上。这些未经指导的信息是否会造成知识的幻觉，因此阻碍了我们通往知识的那条更艰深更耗时更关键的思考之路?搜索引擎上分秒可得的大量信息，是否会将我们从那些较为缓慢、需要深思熟虑的过程中完全地剥离出来，而无法深度理解复杂的观念、他人的内在思想过程，以及我们自己的意识?在本书的开头，我引用了科技专家爱德华·特纳提出的问题，他质疑新的信息科技会“威胁到创造它的智慧”。本书提出的种种问题并没有不切实际地企图阻止科技的传播，毋庸置疑的是，这改变了我们全体的生活，特纳在科技层面上的顾虑与苏格拉底十分类似，也与接下来针对阅读脑对物种与儿童智能的贡献的讨论雷同。因此，由此所衍生出来的问题是:若真以坐在电脑屏幕前紧盯不放的“数字原生代”正逐渐成形的技能，来取代阅读脑千百年来进化而来的技能，我们将会失去什么?
文字的演变提供了一个认知的平台，让人类智能历史的前几章中最重要的技能得以浮出水面:文献记载、编撰、分类、组织、语言内化、对自我与他人的意识、对意识本身的意识。阅读本身并不是直接造成这所有技巧逐渐成熟的主因，而是来自阅读大脑设计核心的神秘礼物:思考的时间，这对所有技能的成长产生了前所未有的推动力。纵观整个“阅读的自然史”审视这些技能的发展，等于是以慢动作展示出自从6000年前读写能力出现后，我们这个物种走了多远，又将失去什么。
对阅读自然史的反思
过去每一位祖先的阅读脑都必须学会联结许多区域来阅读象征符号。现在每个儿童也必须做同样的事情。全球的年轻初级阅读者都必须学习如何将阅读所需的一切知觉、认知、语言与运动系统联结起来。反过来，这些系统又要利用大脑旧有的结构，适应专门化区域，强迫其开始服务，不断地练习，直到整个过程自动化为止。
因为这一切都是在没有任何专门用于阅读的遗传基础上变化发生的因此在相对较短的时间内，需要明确的学习与教导。尽管我们的祖先花了将近2000年的时间才发展出一套字母符号，一般来说我们期待儿童花费2000天的时间(大约在他们六七岁时)就能破解这套密码，不然他们会与整套教育体制——老师、校长、家长与同学，发生冲突。如果没有按照社会约定的时刻表学会阅读，这些突然被剥夺权利的儿童将感觉自己和以前再也不一样了。他们会意识到自己是异类，而且没有人曾告诉他们，在进化上，这有可能是件好事。
当明白年轻的脑袋学习阅读所需完成的神经层面上的高难度任务之后作为社会的一分子，我们可以从教导个别儿童开始。有些儿童在阅读的某些环节会比其他儿童需要更多的帮助。我们对这些越加了解，教育所有儿童的能力就越好。在这种观点下，放之四海而皆准的教学方法将不复存在我们对于阅读发展日益扩展的知识，可能有助于达成两项非常重要的目标了解阅读脑的广泛成就;改善下一代每个儿童学习阅读的条件。
阅读臻至成熟的发育转变始于婴儿期，而不是学校。儿童听父母以及其他关爱者阅读的时间长短，一直是日后阅读表现的最佳指标之一。他们每天晚上听小象巴巴尔、蟾蜍与好奇猴乔治的故事，睡觉时对着天空说“月亮晚安”，儿童渐渐地会明白这些书本上的神秘符号会构成文字，文字会形成故事，故事会告诉我们宇宙的所有事情。
他们的世界充满故事、文字与神奇的字母，是一个充满上千个词语概念与知觉的小宇宙，让年幼的大脑发展自己以准备开始阅读。幼儿参与对话的程度越深，他们学到的词语与概念也就越多。读给儿童的东西越多他们对书本语言的理解也就越多，而且这还会提升他们的词汇量，增加他们的语法知识，并且他们会留意到文字内很小但是很重要的字音单位。这些内隐知识，例如 hickory、dickory、dock中相似的语音，bear 的各种意义小猪韦伯的骇人想法，都会让年轻的大脑准备好，将视觉符号与它储存的所有知识联系起来。
因此，阅读的发展其实有两部分。首先，理想的阅读获得方式基于语音、语义、语法、词法、语用、概念、社交、情感、发音与运动等系统基于这些令人惊讶的配套设施的发展，以及将这些系统整合、同步化以达到流畅理解的能力。其次，随着阅读的发展，其中的每项能力都会日益增强。知道“词语的组成”会让你阅读得更好;在阅读中学习一个词则让你更深入地了解它在知识连续统一体中的位置。
大脑对阅读的贡献与阅读对大脑认知能力的贡献之间是一个动态的关系。儿童的语音系统会帮助他们发展单词内部的音位意识，这份意识又会帮助他们学习字母-发音的对应规则而这些规则会帮助他们更容易地学会阅读。然后，随着儿童阅读得越来越多，越能灵敏地调和文字内的语音方面，让阅读变得更加容易。
大脑与阅读:阅读与大脑认知能力之间的相互作用是一个动态的关系。儿童的语音系统会帮助他们发展单词内部的音位意识，这份意识又会帮助他们学习字母-发音的对应规则，而这些规则会帮助他们更容易地学会阅读。
同样的道理也适用于语义系统语义系统发展良好的儿童，会知道较多的字词意义，所以能够更为快速地解码已知的字词。这有助于他们词汇量的增加，更能巩固他们的口语词汇而这又让他们准备好阅读更为复杂的故事–这一切都会增加他们语法词法、与字词关系的知识。“富者越富，穷者越穷”的道理在这里也适用。这种发展与环境之间的动态关系形成了由“学习阅读”跳跃到真正阅读的基础，或者令孩子什么都学不会。
阅读发展后期的流畅理解都是默默进行的，可以说是苏格拉底所担心的读写能力危害最大的时期，因为这会赋予阅读者自主权。这一阶段每个新阅读者都有时间预测、形成新的想法，超越文本，成为一个独立的学习者。脑成像研究确认了这一点，流畅的阅读脑会在推理、分析与批判性评价等理解过程中，激活两个半脑的额叶、顶叶与颞叶等新扩展出的皮质层。苏格拉底曾担心若是识字普及后，这样的智能技巧可能会丢失一部分。
苏格拉底其他的顾虑在转变为“专家级阅读”的发展期间，似乎不是那么容易解决。首先，大多数的年轻阅读者真的完全学会使用他们的想象力了吗?真能独立思考、明辨是非吗?还是这些比较耗时的技能，逐渐地因为儿童现在能从电脑屏幕上接收看似无限的信息而衰退?年轻阅读者阅读电脑屏幕的时间与阅读书本的时间相比，高得不成比例，他们会发展出不一样的能力来认同《简·爱》和《杀死一只知更鸟》中的世界吗?
数字化世界以非比寻常的方式将种种现实、他人的想法与其他文化的观点带给我们，我并不质疑这一事实。这些典型的年轻阅读者认为文本分析与寻找深层意义越来越落伍，因为他们过于习惯电脑屏幕信息的即时性与似乎概括一切的性质–一切都唾手可得，毫不费力，也无须再超越眼前所提供的信息。因此，我真的怀疑我们的孩子是否能在其中学到阅读过程的核心:超越文本。
最近我读到《华尔街日报》上的一篇文章，标题是:“到底能低到什么程度?”主要是在探讨近年来 SAT成绩日益下滑的趋势。作者述最近 SAT测验中的变动，着重在阅读技巧而忽视词汇，这大大有利于分析技能高的学生，而不利于那些在辨析和估测文本潜在含义方面准备较差的学生。他观察到 40 年前的学生在这样的测验形式中，成绩可能比今日的学生要好，因为现今学生阅读的批判力似乎变弱了。这一点他怪到学校头上，而不是测验本身。
忠言逆耳，因此很难传开。这篇论文的作者也许是对的。但是这样的衰退其实有很多原因:有些是社会的，有些是政治的，还有些是认知的。许多学生从小就接触这些比较不费力的互联网，可能还不懂得如何自己思考。他们的视野狭窄，仅仅局限在可以迅速容易地见到和听到的事物上，他们也没有什么动力去思考我们这个最新最复杂的“盒子”之外的事物。这些学生并不是文盲，但是他们可能永远无法成为真正的专家级阅读者。在他们阅读发展的这个阶段，当阅读的关键技能被引导、塑造、练习与磨炼时，他们可能从来就不需要挑战阅读脑完全发展的顶端:自己思考的时刻。
每个和儿童教育有关的人——父母、老师、学者、政策决定者，都需要确保从出生到成年的阅读过程或者教学过程的每个环节，都已经理智慎重、明确地准备好了。从入学前词语组成里最小的语音到诠释艾略特在《小吉丁》( Little Gidding)中微妙的推论，这当中没有一种知识是理所当然就有。
超越文本:沉浸在数字化资源中的我们不应丧失评估、分析、权衡轻重与挖掘信息背后意义的能力。我们不能放弃挑战阅读脑完全发展的顶端: 超越文本、用心思考。
在儿童发展为流畅级阅读者之前他们处于格外脆弱的转型期，我们必须竭尽所能地确保沉浸在数字化资源中的他们不会丧失评估、分析、权衡轻重与明辨任何形式的信息背后所隐藏的意义的能力。我们必须在每个发展阶段，针对任何文本的需求，给予更明确仔细的指导，教导孩子成为“双文本”或“多文本”阅读者，使他们能够灵活地以不同的方法进行阅读与分析。如果想在我们的公民社会中推动阅读过程，使其完全成熟并达到专家级的阶段，应教导儿童挖掘出隐匿在文字中的无形世界，因此需要明确的指导，以及教师与学生之间的对话。
在审视阅读者的发展过程中，我得到的主要结论充满警告。我担心大多数的儿童正处于苏格拉底警告我们要提防的危险之中——一个信息解码者的社会，他们自认为知道一切的错觉，阻碍了他们智力潜能的深层发展如果我们好好教导他们，结果可能就不会这样，这一点同样适用于我们的阅读障碍儿童。
对阅读障碍的反思:跳出定式思维
在一本致力于介绍阅读脑的书中，我原本可以轻松地跳过造成不适合阅读的大脑的原因。但是，游不快的乌贼身上有许多地方教会了我们怎样去弥补这个缺陷。确实，这并不是一个很好的类比，因为乌贼的游泳能力是遗传的，游不快的乌贼通常都死得很快。但是，如果游不快的乌贼不仅没有死，还占了整个乌贼数量的5%~10%，那就值得我们问一下:它们究竟做了什么，为什么能在失去游泳能力的情况下，取得这样的成功。阅读不是遗传而来，学不会阅读的儿童也不会死。更重要的是，和阅读障碍有关的基因非常坚强地保留了下来。
阅读障碍者中的天才人物名单–如罗丹和施瓦布，或许可以解释部分原因。另一个原因则与人类的多样性有关。正如格施温德经常强调的那样，人类的遗传多样性所带来的优势与缺陷使得我们形成了一个能满足各种需求的社会。阅读障碍，看似没有规律的遗传问题和文化弱势，显示了人类的多样性，而这种多样性给人类文化带来了众多贡献。毕加索的《格尔尼卡》、罗丹的《沉思者》、高迪的《米拉公寓》(La Pedrera)和达·芬奇的《最后的晚餐》，就像其他书写文本一样，都是我们智力进化的真实而具体的代表作。它们的创造者极有可能是阅读障碍者，而这并不是巧合。
阅读障碍的真正悲剧是没有人告诉孩子这一切，他们多年来因为学不会阅读而遭受公开羞辱，尽管他们具有一切的智力，尽管他们这种类型的智力对整个物种都有关键的重要性。而且，也没有人告诉他们的同伴这件事情。认识到这点并不能减少每个阅读障碍儿童学习中所面对的困难。不过这使我们的这些儿童知道，他们对我们有多么重要，这也是为什么我们要找到更好的方法来教导这些组织结构不同的大脑学会阅读的原因。
神经科学最有前景的一个应用与此有关。我们对阅读脑和阅读障碍大脑的发展认识得越多，就越能在治疗计划中锁定目标，更好地专注于一些儿童脑部不再发展的特殊部位或联结。阅读障碍的治疗和典型的阅读发展一样，需要明确处理阅读的每个组成系统，直到建立起一定水平的自动化和理解能力。对于天生处理文字过程效率低下的大脑来说，这是一些极为艰难和费力的任务，但这正说明了大脑在阅读上的不同适应性。
为了社会的最高利益，有必要保护阅读障碍儿童潜在的贡献。正如哈佛学者基尔·诺姆(Gil Noam)在他的研究中所描述的那样，必须帮助他们渡过难关，强化他们的抗压能力，好让他们在准备好的时候发明出人类的下一个电灯泡。我不想过多强调忽视阅读障碍而造成的浪费和许多其他的学习困难。在这个有些人学会了阅读、有些人持续创造神奇的事物、有些人以异于常人的方式来思考的大型故事里，那是一个令人悲伤的章节。幸运的是，阅读脑和阅读障碍脑的故事，是一则孪生的传说，浮现在人类大家庭的宏大传说中。
理解遗传多样性如何驱使我们的智力和技能产生差异，在转型到不久的将来的这段时间里显得格外重要。本书跟柏拉图的矛盾心态几乎类似同样也是从正反对立的两个观点来切入:一方面扮演着称赞阅读脑对我们智力库有贡献的辩护人角色;另一方面以一个警惕的观察者观看科技的变化将如何帮助重塑下一代的大脑。今天的人类不需要当二进制的思考者未来的世代子孙当然也不需要。正如一句流传在维也纳的名言所说:“如果你面前出现了两个选择，通常还会有第三个。”
未来师生之间的知识传递不应是在书本与屏幕、报纸与网络新闻，或是印刷品和其他媒体之间进行选择。转型期的我们遇上一个很好的机会如果我们抓住了这个机会，暂停一下，运用我们最可贵的反思能力，使用我们能支配的所有东西，便能准备好迎接下一个即将成形的事物。分析推理、拓展视野，阅读脑具备一切打造人类意识的能力，和敏捷、多功能多模块、整合信息的数字化思维也并非相互排斥。现在有许多儿童学习两种或两种以上的口语，我们也可以教导他们，在不同的文字表现形式与分析模式间进行转换。也许，就像那个值得铭记的画面——公元前600年的苏美尔人，耐心地在阿卡德铭文旁雕刻上转译出来的楔形文字，我们也有能力保存两个系统，同时明白为何这两者都非常珍贵。
总之，阅读发展的自然史呈现出达到阅读最高深层次的故事，表达出极大的希望，又充满着警示。它是一个宏大的、有时激烈但多半谦卑的故事。它开始于数千年前，那时某些具有胆识与神经适应性的祖先将他们的债务与经营情况记录在泥板与纸草卷上，因此我们才得知有这些文化的存在。
同样有勇气的苏格拉底提出一个观点，他担心文字只是披着“真理的外衣”，它们看似永久的特质，会导致人们因此停止寻找真正的知识，而我们都明白丧失这一点意味着人类美德的死亡。苏格拉底从未明白阅读的核心机密:它所释放给大脑的时间，让大脑的思考一次比一次深入。普鲁斯特知道这个秘密，我们也知道。阅读脑最伟大的成就是这份神奇的、看不见的礼物:超越时间去思考。这些在脑内几毫秒建成的结构，形成了我们能力的基础，让我们得以增进知识，思考美德，清晰地表达过去无法表达出的——当这些思想被表达后，又建立了下一个供我们向下深入探索或向上翱翔攀升的平台。
致读者:最后的思考
一本关于人类这一物种如何跳出并超越文本的书，不应该有最后的结局。亲爱的读者们，这结局完全取决于你们……

2024-08-25
李安山：人类起源的非洲考古：发现、积累与辩论

人类对自身的起源一直很感兴趣，努力加深有关自身初史（或史前史）的探索和研究。“史前史”（prehistory）指人类产生并在劳动中逐渐进化成现代人的时代。喀麦隆籍人类学家高畅（Augustin Holl）教授认为，采用“史前史”一词来描述人类历史没有意义。所谓的“历史”是从遥远的人类起源直到现在，它是一个连续体，“史前史”一词容易被人误读，因此他建议用“初史”（Initial history）取代。历史的分期是必要的，但它不需要建立在一个“之前”即史前史和一个被视为文明的门槛之后的历史。有鉴于此，他提出用“初史”来取代“史前史”的概念。这一意见被联合国教科文组织《非洲通史》（9~11卷）国际科学委员会采纳，其观点在新编《非洲通史》（9~11卷）中亦有所表述。
达尔文曾认为非洲是人类的发源地。他指出：“在世界上的每一个大区域里，现今存在的各种哺乳动物和同区域之内已经灭绝了的一些物种有着密切的渊源关系。因此，有可能的是，在非洲从前还存在过几种和今天的大猩猩与黑猩猩有着近密关系而早就灭绝了的类人猿；而这两种猩猩现在既然是人的最近密的亲族，则比起别的大洲来，非洲似乎更有可能是我们早期祖先的原居地。”他同时提出人类的两个重要特征：两足行走和扩大的脑容量。两足行走的重大意义在于它是一种极重要的适应，也包含着巨大潜能：将上肢解放出来以致有一天能用来操纵工具。可以说，所有两足行走的猿都是处于某种进化过程中的“人”。达尔文在1871年作出的这一预测激发了人们的想象，也一直成为古人类学家和考古学家力图探讨的课题。他同时还指出了人类进化中双足直立对解放手的作用。这种将非洲确定为人类发源地的观点在当时颇不受欢迎：一是因为种族歧视使人们难以赞同人类起源于非洲；二是当时在欧洲和亚洲均发现了人类早期化石，而非洲尚无任何发现。值得注意的是，1924年，澳大利亚体质人类学家和古生物学家雷蒙德·达特（Raymond Arthur Dart,1893—1988）在南非汤恩（Taung）采石场发现一个小孩的不完整头骨。这个被命名为“南方古猿非洲种”（Australopithecus africanus）的“汤恩小孩”（“Taungs child”或“Taung child”）生活在约200万年前，它是一个两足行走的猿。达特的这一发现揭开了人类起源及演变的历史画卷，为后来在非洲及其他地区的考古发掘提供了引导和借鉴，也为学术界有关人类起源的争论提供了中心议题。由此，2024年在“南方古猿非洲种”发现100周年以及“图尔卡纳男孩”化石发现60周年之际，本文拟在梳理非洲人类考古重要发现的基础上，探讨非洲在人类起源问题上的独特贡献。
南非古猿非洲种：命名与反对
“人类化石记录的知识在非洲缓慢地发展，这一发展开始于1924年，当时雷蒙德·达特宣布发现了著名的汤恩小孩。”出生于澳大利亚布里斯班的雷蒙德·达特，在昆士兰大学和悉尼的医学院完成学业后，一战后曾在英国与著名解剖学家和人类学家格拉夫顿·艾略特·史密斯爵士（Sir Grafton Elliot Smith）共事并受其培养训练。1922年，他成为南非约翰内斯堡威特沃特斯兰德大学医学院解剖学教授。1924年夏，达特教授发动学生到野外收集动物化石，名为约瑟芬·萨蒙斯（Josephine Salmons）的学生提供了关于在位于贝专纳兰保护地的汤恩矿区有不少动物化石这一极有价值的信息，达特立刻从各方面打听消息并收集化石。而后，他在矿山经理斯皮尔斯（A.E.Spiers）向其展示的所收集的动物化石中发现了一块不完整头骨。作为解剖学家，他发现这个显然是灵长类动物的头骨。这块化石非常奇特，“它对原始人（primitive man）来说不够大，但对类人猿（ape）来说，有一个巨大的凸起的脑，最重要的是，前脑太大了，向后延伸太远，完全覆盖了后脑。”这块化石外表层后面具有清晰无误的犁沟之间的明显距离，这种被称为“月沟”或“平行沟”的犁沟往往出现在类人猿或原始人的脑上。达特的导师格拉夫顿·艾略特·史密斯爵士就是因为发现“人脑月沟”而享有盛名。这块不完整头骨包括部分颅骨、面骨、下颌骨和脑模。头骨有许多似猿的性状，但也有很多人类的性状：上、下颌骨不如猿向前突出，颊齿咬合面平，犬齿小。特别重要的是，枕骨大孔位于中央。这些特征表明这是一个两足行走的猿的头盖骨。当达特发现这些将要改变人类认识自己演变过程的重要物证时，他立刻联想起达尔文有关非洲是人类发源地的预测，其激动心情溢于言表：“我会成为找到他‘缺失环节’（missing link）的工具吗？”他在兴奋之余，经过17天对化石的勘察、分析、比较和综合，于1925年1月6日将稿件寄出，1925年2月7日在《自然》杂志上发表。雷蒙德·达特宣布自己发现了一块独特的头骨化石，认为这是介乎“活着的类人猿和人类”之间的猿人化石，认为这只非洲猿代表了类人猿和人类之间缺失环节，因为它结合了类人的牙齿、直立的姿势以及较小的颅骨容量。这个被命名为“南方古猿非洲种”的“汤恩小孩”生活在约200万年前，是一只两足行走的猿。这证明了查尔斯·达尔文1871年的观点“非洲将被证明是人类的摇篮”。
以亚瑟·基思爵士（Sir Arthur Keith）为首，包括艾略特·史密斯爵士、史密斯·伍德沃德（Smith Woodward）和达克沃斯（W.H.L.Duckworth）等4位英国学术权威在同一期《自然》杂志上表示了自己的态度，其谨慎观点似乎在等待雷蒙德·达特发表其相关研究的详细报告。体质人类学家的权威亚瑟·基思将其归类为黑猩猩和大猩猩的亚种；雷蒙德·达特的导师、神经解剖学家史密斯确认这块化石很重要，但不宜过高地宣扬其与人类的亲密关系；伍德沃德认为所提供的证据不足并否认头骨与人类祖先有任何联系；达克沃斯以不偏不倚的严肃态度接受达特的说法，并对这位年轻解剖学家的能力表示充分信心。与此同时，雷蒙德·达特发现“汤恩小孩”的消息在世界学术界引起了轰动。1925年，在伦敦温布利（Wembley）举行的大英帝国展览上，一块化石的复制品被展示在“非洲：人类的摇篮”的标识下。雷蒙德·达特之所以将这一化石放在温布利展出，当然是希望它得到大众的认可。然而，效果似乎适得其反。一种像猿的生物是人类祖先？这一看法很难被大多数欧洲人类学家接受，一是因为绝大多数人相信人类是上帝创造；二是这种猿出现在非洲简直不可想象。一位法国人直言：达特将会“在地狱里无法熄灭的火焰中烧烤”，因为他声称汤恩的头骨代表了人类的祖先；一位英国人写道：“我希望你能被安置在一个为弱智者服务的机构里”；一位丹麦人警告说，达特签署了他的“引渡令”，因为他不礼貌地解释了人类的起源；伦敦的《星期日泰晤士报》上一封信署名为“一个平凡但理智的女人”声称：达特是造物主的叛徒，并使自己成为“撒旦的积极代理人和现成的工具”。
基思爵士通过致编辑信件的方式在随后的一期《自然》杂志上直接提出反对意见。虽然他的语言比较委婉，但其对立的观点十分明确。首先，基思公开表示达特对这块化石的解释“让我们许多人感到怀疑”，以确立自己代表的是大部分学者的观点。其次，他试图用其他学科的看法来说明达特观点的错误，“动物学家对温布利展出的模型进行检查后会相信这种说法是荒谬的。”他对达特的最后评价是：“他的发现揭示了类人猿的历史，但没有揭示人类的历史。爪哇人（Pithecartopus）仍然是人类和类人猿之间唯一已知的联系，而这种已经灭绝的类型位于人类一侧。”反对者普遍认为这个标本是一只未成熟的猿的头骨。达特于1930年前往英国为自己的观点争取支持，而中国科学家裴文中（1904—1982）在周口店发现的一个基本完整的直立人（Homo erectus）头骨的消息正在欧洲广为传播。这一消息不仅大大降低了达特英国之行的重要性，他自己也颇为沮丧。英国之行在达特的事业上留下了抹不掉的阴影，这一点他在自己的传记著作中坦然承认。此行既未达到宣传自己观点的目的，又没能说服伦敦的专业权威发表他的论文，加之裴文中的发现将人们的注意力集中到人类起源的东方说。最有意思的是，基思爵士告知达特自己正在出版一部有关最新考古发现的著作，其中详细阐述了不同于达特有关“南方古猿非洲种”的看法的观点，而以他为代表的这个领域的英国学术权威们却将达特的论文束之高阁，不予发表。
不容否认，有关“南方古猿非洲种”这一具有历史意义的考古发现的重要价值被长期埋没。在随后的20余年里，达特的观点在以英国为首的国际学术界一直被忽略。与此截然相反的是，达特在南非的地位瞬间上升，成为一位英雄。“在南非，达特因为发现了一个关键的‘缺失环节’并将这个国家放在了进化地图上而立刻被视为英雄。当地媒体对这一新发现进行了重要报道，这是从偏远的北开普省汤恩的一个采石场爆破的石灰岩中发现的。”官方的大力支持加上媒体的炒作，使达特声名鹊起。其中一个重要原因是，这一发现刺激了南非白人的民族主义，对英布战争后英国人与阿非利卡人之间的裂痕产生了重要的修补作用。刚离开政坛的前总理史末资从南非的科学进步潜力和在国际社会中发挥重要作用的角度来描述达特的发现。一些对达特观点持有不同意见的学者甚至难以在公开场合表达自己的看法。达特享受着极高的学术声誉，特别是在他的观点于20世纪50年代得到认可之后。在20世纪60年代，学界在达特75岁生日时专门发文祝贺他取得的学术成就，有关达特发表成果的统计结果（1920~1967年）已经出现，威特沃特斯兰德大学于1969年专门出版了有关他发表成果的参考文献著作。20世纪80年代先后出版了两部有关达特与他发现“缺失环节”的学术成就的书籍由于达特在人类起源考古研究上作出的重要贡献，威特沃特斯兰德大学建立了非洲人类研究所（Institute of the Study of Man in Africa）以纪念他取得的成就。
南方古猿非洲种：否定的原因
20世纪上半叶是一个非常特殊的时代。欧洲殖民统治在非洲的建立、巩固和动摇使种族主义思潮甚嚣尘上，欧美各国以及殖民地的民族主义因两次世界大战而颇为盛行。在这种特殊的氛围下，达特的观点不易被学术界接受。究其原因，除了已有的发现和流行的观点，如其导师史密斯认为脑容量大是人类进化的关键等之外，主要有四方面的原因。
第一，“如此像猿的一种生物可能是人类祖先”的观点在一个信仰基督教并相信“上帝创造人”的环境中确实很让人难以接受，因此人们对他的发现普遍反感。
第二，当时人类起源的东方说在学界比较流行。人们普遍认为人类最初是在亚洲进化而来，这主要是因为荷兰解剖学家尤金·杜波伊斯（Eugene Dubois,1858—1940）等人自1890年起先后在印度尼西亚发现一批人类化石。中国的考古发掘则为东方起源说强化了论据。自1918年以后特别是在20世纪20年代，中国发现不少哺乳动物化石，这些化石中有两颗人的牙齿，这是有关北京猿人的最早发现。1929年，中国考古学家斐文中的发现震惊了考古学界，给达特的发现及其观点的重要性打上了一个极大的问号。这一点在达特的传记中有所描述。达特从英国回到南非以后，曾一度陷入苦恼之中，中国猿人的发现的确对他的冲击太大。他全身心地投入到威特沃特斯兰德大学解剖学系主任的工作中，此后多年未进行有关人类起源问题的探讨。然而，他对自己的观点深信不疑。
第三，英国当时盛行的民族主义情绪使“皮尔当骗局”（Piltdown Hoax）长期占领着主流话语。此前，1856年，在德国杜塞尔多夫（Dusseldorf）附近发现了尼安德特人化石（Homo neanderthalensis,Neanderthaloid）；1868年，在法国的克鲁马努地区发现了克罗马农人化石（Cro-Magnon）。这些发现在欧洲学术界影响极大。既然人类起源地可能在欧洲，英国似乎迫切需要发现自己的古人类化石。古人类学家托拜厄斯（P.V.Tobias）颇为幽默地指出，“法国和英国在非洲领土上的经典竞争可以追溯到19世纪争夺非洲之前的几百万年。”1912年，在英国东萨塞克斯郡尤克菲城（Uckfield）附近的村庄皮尔当“发现的”早期人类化石，给热衷于研究人类起源的欧洲地质学和考古学界特别是英国人类学家带来了惊喜。这些化石在约40年里一直被欧洲考古学界认为是更新世时期的化石。基思爵士是“皮尔当人”作用的主要支持者。达特的导师格拉夫顿·艾略特·史密斯爵士也是“皮尔当骗局”的受害者。这一骗局持续40余年，直到1954年被揭穿。
第四，最重要的因素是欧洲盛行的种族歧视。“政治和种族理论似乎是天生的盟友。”种族主义在19世纪盛行。德国哲学家黑格尔（1770—1831）在《历史哲学》中一方面承认自己对非洲“几乎毫无所知”，另一方面却随意贬低非洲文化。法国文学家雨果（1802—1885）在1879年宣称非洲没有历史。这些文人的见解与其说是对非洲历史的无知，不如说是欧洲人对持续近400年的奴隶贸易的自我宽慰——非洲黑人低人一等的观点使得欧洲白人将他们作为廉价劳动力贩卖为奴的做法显得顺理成章。早期抵达非洲的欧洲探险家们用欧洲白人的视角来观察非洲，通过各种所谓具有亲身经历的日志，传达了两个信息：非洲人是低等种族，非洲需要欧洲人来传播文明。这些观点不仅为逐渐成熟的种族主义理论提供了论据，也为瓜分非洲提供了“合法”的理由。殖民统治建立后，种族歧视观点更为直白。英国历史学家牛顿在1923年认为，“非洲在欧洲人进入之前没有历史”。英帝国时期著名学者伯厄姆在1951年提出，“非洲没有书面语言，因而也不存在历史。”另一位史学家特雷沃尔-罗珀在1963年表示：“可能在将来会有非洲历史可以讲授，但目前还没有，只有在非洲的欧洲人的历史。其余是一团漆黑……而黑暗不是历史的题材。”历史学家汤因比在1966年仍将“西方人”进入非洲作为文明的标志，认为非洲是文明渗透最晚的一个大陆。直到1972年他才在新版《历史研究》中承认，热带非洲在农业和冶金方面有着可与西欧比肩的历史。
“含米特人”（the Hamites）一词来自《圣经》中挪亚的儿子含（Ham），后来被利用作为廉价劳动力的代名词。阿拉伯人旅行家利奥·阿非利肯纳斯（Leo Africanus,1492—1550）在其著述中，认为非洲人是含的后代。作为一个内涵模糊的概念，“含米特人”在欧洲学界的长期操弄下逐渐演化成高加索人种的分支，与欧洲人或白人同属一个伟大的人类分支。塞利格曼在《非洲的种族》一书中对“含米特人”概念的理论化使它的涵义固定下来。此书从1930年出版到1966年一直被作为经典著作。塞利格曼的论点很明确：非洲大陆在伊斯兰社会之前的文化发展归功于可能来自东北非的含米特移民，他们将黑人引入社会变革和技术创新。“非洲的文明就是含米特人的文明。”“除了相对较晚的闪米特族的影响之外……非洲的文明是含米特人的文明，其历史是这些人以及与另外两个非洲种群即黑人（the Negro）和布须曼人（the Bushman）互动的记录，无论这种影响是由高度文明的埃及人施加的，还是由今天以贝贾人和索马里人为代表的更广泛的牧民施加的……新来的含米特人是游牧的“欧洲人”（pastoral “Europeans”）——一波接一波地到来，他们比黑肤色的从事农业的黑人武装得更好，反应也更快。我们今天知道，“含米特主义这一概念的作用是将黑人描绘成一个天生自卑的人，并使得对他的剥削合理化。”在这种歧视非洲的种族主义理论占上风的欧洲语境中，人类起源于非洲这一结论很难被人接受。
南方古猿非洲种：证实的过程
南非德兰士瓦博物馆的古生物学家罗伯特·布鲁姆（Robert Broom,1866—1951）对证明人类起源于非洲的观点起到了重要作用。理查德·利基这样评价他：“英格兰的古生物学家罗伯特·布鲁姆，20世纪30年代和40年代在南非开创性的工作，有助于非洲是人类摇篮观点的确立。”达特的论文发表后，他非常兴奋，专程上门拜访达特。达特后来回忆起布鲁姆突然闯入他的实验室：“他无视我和我的工作人员，大步走到头骨所在的长椅上，跪下‘崇拜我们的祖先’。”作为达特观点的坚定支持者，布鲁姆到处传播这一发现的重要性。牛津大学的地质学家和人类学家索拉斯（W.H.Sollas）原来与其他英国学者的看法一致，认为“汤恩小孩”更接近于大猩猩和黑猩猩。当他看过布鲁姆转来的有关信息后，在1925年6月13日的《自然》以及1926年的《地质学会季刊》杂志先后发文，认为“南方古猿非洲种”头盖骨与类人猿存在诸多明显的差别。索拉斯教授在致布鲁姆的一封信中还表示：艾略特·史密斯也与他持相同观点，认为这块化石是一个高级类人猿的头盖骨，可能接近人类祖先。达特更没想到，1925年6月访问南非的威尔士王子表示希望“看看达特教授的孩子”。他十分兴奋地将这个标本带到威尔士王子下榻的约翰内斯堡卡尔顿酒店，向王子介绍了这个重要发现。
作为南非早期人类化石探索的先驱者之一，布鲁姆是英格兰人，他早在1918年就发表过有关早期考古报告，后来也投身于早期人类化石的考古工作。布鲁姆于1934年开始在德兰斯瓦博物馆就职，1936年开始从事有关人类起源的早期化石的考古搜寻工作。从那时起到1960年，几乎所有关于南方古猿（Australopithecines）的化石都来自南非的石灰岩洞穴。最丰富的来源是4个山洞遗址：德兰斯瓦的斯特克方丹（Sterkfontein）、克罗姆德莱（Kromdraai）、斯瓦特克朗（Swartkrans）和马卡潘斯盖特（Makapansgat）。布鲁姆的团队从1936年开始收集了数百个标本，并于1938年将发现的南方古猿部分头颅骨及颌骨化石命名为“傍人”（Paranthropus）。斯特克方丹是大约300万~250万年期间人类进化的最丰富的信息来源之一。这些发现为证实南方古猿非洲种提供了更多证据。另一位人类起源研究的先驱托拜厄斯的考古报告，特别是马卡潘斯盖特的成果使达特激动不已，他开始考虑重新回到自己一直关切的有关“南方古猿非洲种”的持续研究之中。
1932年底，比勒陀利亚的几名学生在马蓬古布韦（Mapungubwe）发现了早期墓室里的头骨，以及与早期人类生活相关的各种遗物，包括金子。发掘者几乎被葬品惊呆：不少于70盎司的黄金、130个金手镯与大量珠子和镀金作品，头骨下面的镀金碎片明显是为了装饰头枕，还有金圆环和金鞘等大量金制饰品。比勒陀利亚的利奥·福切（Leo Fouché）教授在获知这个“奢华葬礼”消息后，参与了对文物的考察，并对金子的纯度进行检测，结果证明金子纯度非常高。他将马卡潘斯盖特的发现报告给政府。在政府的干预下，这个地区在随后两年里又发现了24个墓葬，其中的遗骸引起了学界的重视。南非方面请基思爵士对这些遗骸进行鉴定，但基思以南非自己有足够的学者可进行测定为由拒绝。当福切教授联络达特参与鉴定时，他也婉言谢绝以避嫌。这些遗址在1905年曾被兰德尔·麦克伊维尔测定为班图人的，但达特通过对各种器物的比较，认为有外来因素的影响。
在布鲁姆的鼓励下，达特后来决定重返人类早期化石的搜寻和论证，并取得了不菲的成果。南方古猿非洲种相关化石的另一个重要来源地是南非的马卡潘斯盖特。达特早期就注意到位于马卡潘斯盖特的一些早期人类居住地的特征，1948年7月，他的助理在马卡潘斯盖特发现了一只成年男猿的下腭，随后又发现了一个成年女性的左脸和4块其他类型的早期化石碎片，包括下腭、髋骨和头骨等，在马卡潘斯卡的一个洞穴中则发现了更多的类似遗骸。这些化石后来被命名为“普罗米修斯南方古猿”（Australopithecus prometheus）。达特及团队也在斯特克方丹、斯瓦特克朗、克罗姆特莱和马卡潘斯盖特等山洞遗址里发现了大量早期人类化石，并为每一个化石取了新的种名。1947~1962年间，他们在那里收集了约40个标本。布鲁姆的主要贡献在于他与谢泼斯关于南方古猿非洲种的专著从理论和实证上阐述了达特的观点，使许多古人类学家相信了南方古猿非洲种是现代人类的祖先。基思爵士最终不得不承认达特的结论是正确的，他自己的判断是错误的。后来的研究甚至涉及“汤恩小孩”死亡的原因，多学科的研究推测这名儿童是被一只大型猛禽杀死并吃掉的，很像现存的非洲冠鹰。这样，在南非，我们有了一个生活在300万年与100万年之间的包括早期人类的系列考古发现。
除了南部非洲之外，在东部非洲和中部非洲也有其他类型的早期人类化石的发现。1960年，乔纳森·利基（Jonathan Leakey）在东非奥杜韦峡谷发现了另一种类型的人类头骨片。这个化石与南方古猿的最大不同之处在于，其头脑几乎大出50%。路易斯·利基（Louis S.Leakey）认为，这一化石代表了最终产生现代人的那一支。鉴于该化石说明此人已能制作石器，达特建议将这一标本命名为“能人”（Homo habilis），即“手巧的人”。利基与托拜厄斯和内皮尔3人根据7个化石个体确定了“能人”这一学名。利基认为，能人是人类最早类型，是智人的祖先，但既不属于南方古猿类，也不同于直立人。路易斯·利基在1961年的“赫伯特·斯宾塞讲座”中明确指出：非洲对人类进步的第一个贡献是“人类自身的进化”。中国人类学家吴汝康指出：“直到50年代末，在东非发现了大量的早期人类化石，人类学界的多数才开始转而认为人类起源的地点是非洲。”南方古猿非洲种在人类起源发展史上的重大意义之所以被重新认定，主要是在同一地区以及东非地区发现了大量的早期人类化石，从而加强并最终证实了达尔文的观点。托拜厄斯曾经就南方古猿非洲种的含义提供了一种令种族歧视者汗颜的解释：“可能需要提醒那些贬低非洲及其成就的人，非洲的最大恩惠是它给了世界第一个原始人和第一种人类文化。”他的评价是对的。
人类在非洲起源的线索：化石搜寻的历史
从20世纪初到90年代，在非洲的有关人类起源的发掘与研究一直在推进。1913年，在南非德兰士瓦的博斯科普发现了一块头骨化石。罗伯特·布鲁姆于1918年将其称为“博斯科普人”（Boskop man,Homo capensis）。1921年，在赞比亚的断山（Broken Hill）发现了一个头骨和一些体骨，其年代被推测为35万年以前。这些人类化石与尼安德特人近似，很有可能是带有相同人种特点的非洲标本。自发现南方古猿非洲种以后，其他地区也先后发现了多个时期的人类早期化石。1931年，路易斯·利基在东非奥杜韦峡谷发现了动物化石和粗石器后，认为人类进化的中心在非洲而不是亚洲。1932年，他在肯尼亚西部坎杰拉发现两块残破的头骨，可能属于智人种。1939年科尔·拉尔森（L.Kohl Larsen）在坦桑尼亚的埃亚西湖东北部发现了下颌骨，由此将南方古猿的分布扩展到东非。1959年，玛丽·利基（Mary Leakey）在奥杜韦峡谷发现了与南方古猿粗壮种相像却更粗壮的175万年历史的磨石齿，命名为“东非人鲍氏种”（Zinjanthropus boisei，简称“Zinj”），后改为“南非古猿鲍氏种”（Australopithecus boisei）。这一发现具有非常重要的意义。首先，它激发了人类古生物学界的热情和引发了有关人类起源的科学辩论，且抓住了公众的想象力。其次，这一发现使得美国国家自然地理学会认识到奥杜韦对早期人类化石发掘的重要性，并愿意提供研究经费。再次，这对达特的早期发现也具有重要意义。路易斯·利基专门邀请达特、德斯蒙德·克拉克（Desmond Clark）等古人类学家和考古学家来考察这一发现的遗址。玛丽·利基在传记中表明：“达特对我们的发现感到特别高兴。”达特自己也表示这一发现印证了南非的早期发现。
继1960年乔纳森·利基在奥杜韦峡谷发现“能人”头骨之后，理查德·利基（Richard Leakey，1944—2022）于1963年在坦桑尼亚纳特龙湖地区发现了南方古猿颌骨，同年在奥杜韦峡谷第二层发现能人化石。1967年，美国、法国和肯尼亚的国际考古队在埃塞俄比亚奥莫河谷下游发现约400块人科化石，被称为“奥莫人”（Omo man）。该地区还发现过较早时期的化石和早期陶器。1969年，理查德·利基团队在肯尼亚发现距今175万年的南方古猿鲍氏种（Australopithecus boisei）头骨，收集了100多件化石碎片。1972年，在肯尼亚库彼福勒发现被归属于能人的编号为“KNN-ER1470”的颅骨化石。1974年，由莫里斯·泰伊白（Maurice Taieb）、伊夫·科彭斯（Yves Coppens）和唐纳德·约翰森（Donald Johanson）领导的法、美联合考察团在埃塞俄比亚的哈达尔发现了许多化石骨骼，包括一个身高0.92米的女人，被称为“露西”（Lucy），露西及其伙伴在约200万年前已能两足行走。
1975年到20世纪80年代初，玛丽·利基团队在莱托利找到多种人类化石，被命名为“南方古猿阿法尔种”（Australopithecus afarensis）。1976年，她在奥杜韦峡谷的莱托利火山灰沉积上发现了距今360万年前留下的一组27米长的足迹，这是她最重要的考古成就之一。玛丽·利基认为“莱托利足迹”（Laetoli footprints）是人族留下的。莱托利脚印的发现是我们理解人类行为和两足动物进化的最重要进展之一。除了证实370万年前人类祖先已能够完全用两足行走外，其步态模式和踏板形态可能也发生了巨大变化。莱托利的那些保留下来的足迹痕迹提供了早期人类活动及其古生态环境的实证。玛丽·利基的观点曾一度被否定，或认为这些脚印是一只幼熊后腿行走留下的，或认为340万年前的足迹与现代人足迹如此相似不可想象，或认为生活在300多万年前的生物会存在如此清晰的人类足部特征不可理解。然而，埃利森·麦克纳特的近期研究表明，用熊来解释这些足迹站不住脚，“莱托利足迹”属于交叉双腿的类人猿。只有人类才有合适的解剖学特征来保持紧凑的步态，或以两脚交叉而不至跌倒。“莱托利足迹”说明，两足直立行走是比脑量增加和牙齿结构进步更为古老的人类特征。1978年，在奥杜韦同一地层发现了能人化石、奥杜韦文化遗存和动物遗骸，由此证明能人使用工具来捕捉动物维生。考古学者在肯尼亚的库彼福勒、南非的斯特克方丹和斯瓦特克朗与埃塞俄比亚的奥莫地区均发现了能人化石。1984年，理查德·利基团队的卡莫亚·基穆（Kamoya Kimeu）在肯尼亚图尔卡纳湖岸发现的“图尔卡纳男孩”（Turkana Boy）是直立人的成员之一，是在人类进化表上比以前的任何类型更进步的类型。考古人员在南非的斯瓦特克朗（250万年前）、坦桑尼亚的奥杜韦峡谷（150万年前）、肯尼亚的图尔卡纳湖边（150万年前）和埃塞俄比亚的默勒卡孔图雷、博德和奥莫等地（50万~150万年前）均发现了直立人标本。直立人已具备某种使用语言的能力，同时是最早使用火、最早以狩猎作为生活的重要活动、最早能按某种方式制造工具的人。更重要的是，直立人是最早分布到非洲以外地区的人。
1987年由阿伦·威尔逊（A.C.Wilson）等人提出“线粒体夏娃假说”：智人之古老类型向现代类型的转变大约在10万~14万年前发生在非洲，今天人类均为该群体的后代，后来的研究使年代稍向前移。研究小组通过检查细胞内称为线粒体的细小器官中的遗传物质去氧核糖核酸（DNA）的原型，确定现代人类线粒体DNA均来自非洲的一位女性，她是人类各种族的共同祖先。研究团队认为可以将这位幸运的女性称为“夏娃”，她的世系一直延续。这一观点因此也被称为“夏娃假说”或“夏娃理论”。这一假说得到道格拉斯·华莱士（Douglas Wallace）实验室研究成果的支持。1987年卡恩等人通过对线粒体DNA变异的研究提出“出自非洲假说”。“夏娃假说”支持人类起源的“走出非洲说”，但否认杂交的可能。这种假说遭到质疑。以沃尔波夫（M.H.Wolpoff）为首的古人类学家指出，化石材料表明世界各地的现代人类是从当地的古人类发展而来的，并不存在“完全取代”。坦普列顿（A.R.Templeton）对最初线粒体“DNA”比对研究的科学性提出质疑，认为这种检验方法容易出错，不足以证明人类祖先是同一女性。
20世纪90年代，考古人员在非洲继续发现了一些新的人类化石，如蒂姆·怀特（T.White）等学者于1994年在埃塞俄比亚阿法地区发现距今440万年的早期人科化石，定名为南方古猿始祖种（Australopithecus ramidus），1995年改名地猿始祖种（Ardipithecus ramidus）；1995年玛丽·利基在图尔卡纳湖西南发现的420万~390万年前的新化石，定名为南方古猿湖泊种（Australopithec usanamensis）；1995年，法国古人类学家布吕内（M.Brunet）考古队报道于1993年在乍得的科罗·托罗（Koro Toro）附近羚羊河地区发现的下颌骨，定名为南方古猿铃羊河种（Australopithecus bahrelghazali）；1996年，埃塞俄比亚古人类学家阿斯发（B.Asfaw）等人在埃塞俄比亚中阿瓦什地区布瑞半岛发现了距今250万年前的头骨和颌骨，1999年定名为南方古猿惊奇种（Australopithecus garhi）；1999年，史蒂夫·沃德（Steve Ward）研究小组在肯尼亚的图根山区发现一具完整且包含牙齿和头骨碎片的骨骼。这具骨骼表明原始类人猿在约2200万年以前出现在东非，显示出早期类人猿与现代类人猿及人类之间的联系。
21世纪非洲人类化石的新发现
21世纪，非洲考古又有重要的新发现。2000年，法国古生物学家皮克福德（M.Pickford）等人在肯尼亚的图根山（Tugen Hills）发现了600万~570万年前的两件下颌骨、3根大腿骨和其他骨骼化石。研究表明这种生物已经习惯用两腿走路，可归属于人类，被定名为“原初人图根种”（Orrorin tugenensis）,也被戏称为“千禧年祖先”（millennium ancestor）或“千禧猿”。这一发现之所以重要，是因为它表明人类早在距今600万年前已用两腿行走。此外，还有1998年和1999年发现并于2001年定名的“扁脸肯尼亚人”（Kbnyanthropus platyops）和2001在埃塞俄比亚发现的520万~580万年前的地猿始祖种的家祖亚种（Ardipithecus ramidus kadabba）。最重要的成果是乍得早期人类头盖骨的发现，它将人类起源追溯至700万年前。2002年，布吕内等学者在乍得萨赫勒地区发现一块头盖骨，牙齿和下颚距今700万年，它被定名为“萨赫勒人乍得种”（Sahelanthropus tchadensis），并取名“图迈”（Tumai）。持不同意见者认为，“图迈”化石更像猿，而非人。2003年《自然》杂志报道了在埃塞俄比亚阿法盆地的赫托（Herto）发现15万~16万年前多件智人头骨化石这一重大事件。研究者认为这批化石在形态上与现代智人不属于同一亚种，与现代非洲人差异较大，而与澳洲土著人较相似，其形态体现为古老特征与现代特征的结合，故定名为智人长者亚种（Homo sapiens idaltu）。2005年，《自然》杂志报道对奥莫人类头骨化石年代的新研究将其推前至19.6万年前，后来的研究将时间确定为23万年前。这批化石比“夏娃”年代更早，因为其形态结合了古老特征和现代特征，年代远早于15万年前。这些考古发现及研究表明，人类发展过程在形态上逐渐过渡。2005年的一个重要成果是布吕内等人在发现“图迈”的地方找到一些牙齿和颚骨碎片。从牙齿判断，“图迈”的犬齿较小，有臼齿和前臼齿，牙齿的釉质较厚，这些特征与人类相似。“图迈”的头骨是平衡在脊柱上的，证明他能像人一样直立行走。化石证据和计算机成像都表明，早期在乍得挖掘到的这具化石属于迄今为止发现的最早人种化石，对人类起源和进化研究具有重要价值。
“伊莱雷特足迹”（Ileret footprints）是马修·贝内特（Matthew R.Bennett）的团队于2009年在肯尼亚北部伊莱雷特村距今151万年至153万年前的两个沉积层发现的古代人类脚印。通过这些脚印的形状、体积和深度可以判断这些生物的重量和体态，很可能是类似现代人类的东非直立猿人的足迹。这些脚印提供了在骨骼化石中所缺乏的有关足部软组织形态和结构方面的信息，是证明人类祖先像现代人类一样行走的最古老的证据。这些脚印与现代人脚印相差无几，步幅也几乎一样。足迹显示出现代人的脚部特征，如脚跟为圆形、大脚趾与其他脚趾平行，而不像类人猿的大脚趾那样单独分开。脚印还显示出明显的足弓以及短趾，这与人类相似并且通常与双足行走的能力有关。研究人员还估算出足印“主人”的体重与现代人相当，从步幅可估算出其中有的身高约1.75米，但尚未发现他们的足部残骸。直立人身材比例接近现代人，腿长、胳膊短。古人类的足迹提供了关于步态和足部形状的证据，但它们的稀缺性加上古人类化石记录的不足，阻碍了研究者对人类步态进化的研究。根据脚印的大小和深度，研究人员认为那时的人类祖先已具备现代人基本的足部功能和直立运动特征。
2010年，美国俄亥俄州克利夫兰自然历史博物馆的古人类学家约翰尼斯·海尔-塞拉西（Yohannes Haile-Selassie）领导的团队在《美国国家科学院院报》（Proceedings of the National Academy of Sciences of the United States of America，PNAS）发表研究成果，报告团队于2005年在埃塞俄比亚中部的阿尔法地区发现了与露西同类的早期人类新化石，但年代更早，距今有360万年的历史。科学家们将新的南方古猿阿法尔种化石戏称为“大个子”或“大人物”（Kadanumuu，埃塞俄比亚阿法尔语语义），因为新化石的高度在5~5.5英尺（约合1.5~1.8米）之间。除了体型比露西大得多之外，新化石还包含一个更完整的肩胛骨、胸腔的主要部分和骨盆碎片，这些碎片为南方古猿阿法尔种的运动提供了新的线索。海尔-塞拉西表示，“‘大个子’的骨骼特征与现代人类惊人地相似”。这项研究表明，“露西”和她可能的祖先“大人物”几乎与现代人一样善于直立行走，人类进化过程中开始直立行走的时间可能比此前研究者认为的更早，甚至可以单腿站立并保持平衡，“这是黑猩猩无法做到的。”“由于这一发现，我们现在可以自信地说，‘露西’和她的亲戚几乎和我们一样熟练地用两条腿走路，而且我们的腿在进化过程中比以前想象的要早。”2016年9月，由哈佛医学院遗传学家领衔的国际团队对全世界270个地点的个体样本，进行了全新的、高质量的全基因组测序。研究证实了当今所有非洲之外人类的祖先都源自10万年前同一走出非洲的种群。
从上述发现可以看出，研究人类起源的学者探索这一问题的证据来自三方面：一是早期进化各阶段的人类化石；二是通过有形的产物、工具和艺术品体现的人类行为；三是20世纪80年代开始的分子遗传学的解释。目前，学界对人类起源提出3种假说。“多地区起源说”（Multiregional Evolution theory）认为，现代人起源是包括整个旧大陆的事件；现代人出现于任何有直立人群体的地方；智人在各大洲逐渐进化成现代人，并伴有基因交流。“走出非洲说”（Out of Africa theory）认为，现代智人在近期产生于非洲，后扩展到旧大陆的其余部分；虽然可能在某种程度上与当地已有智人前的人群杂交，但非洲现代智人取代了已存在于世界其他地区的直立人和远古智人；这些人群的遗传根源浅，均来自晚近才在非洲进化出来的单一人群。这种假说目前基本上已被否定。“线粒体夏娃学说”或“线粒体夏娃假说”（“mt-Eve”或“mt-MRCA”）是20世纪80年代出现的现代人起源假说，它基本上支持“走出非洲说”，但否认杂交的可能；当现代人群走出非洲并在数量上不断增加时，他们完全取代了当地已有的现代人以前的群体；移民与当地人群之间的杂交可能性极小。
初史时代存在4个关键性阶段。第一阶段是人科本身的起源，即类似猿的动物转变为两足直立行走的物种，时间约为700万年前。第二阶段是古生物学家称为适应辐射的阶段，即两足行走的物种繁衍的阶段。第三阶段是人属（Homo）的出现，其标志是脑子的扩大。人属是从诸多物种中发展起来的，距今300万~200万年之间。从猿到人有一个过渡阶段。人属的第一个种是能人，在坦桑尼亚、肯尼亚、南非和埃塞俄比亚发现了能人及其亲近种的化石。人类的这一支以后发展成直立人，并最终发展到智人（Homo sapiens）。第四阶段是现代人的起源，他们具有语言、意识、艺术想象力和技术革新等多种复杂的能力。
余　　论
本文聚集于对人类起源问题的研究和达特在这方面的贡献。然而，学界对达特的历史作用的看法不一，有的学者认为达特对南非“科学种族主义”的发展做出了贡献，他的思想强化了白人种族优越性的假设。随着关于现代人起源的遗传学研究不断深入，不少遗传学家的研究提供了不同的观点，一些新的考古发现也在得出新的结果。1992年，中国与美国人类学家在湖北郧县发现了两块古人类头骨化石，研究后确定现代中国黄种人的祖先不是由非洲迁移而来，而是由当地猿人演化而成。1995年，英国剑桥大学和美国的亚利桑那大学的两个科研小组利用基因技术各自独立地得出结论：世界各地的男性基因源于同一基因。美国学者利用计算机分析了8位非洲男性、2位澳大利亚男性、3位日本男性和2位欧洲男性以及4只大猩猩的基因。他们通过将人类基因与人类近亲大猩猩祖先的基因比较后得出结论：18.8万年前非洲某部落的“Y”染色体是现代男性“Y”染色体的共同祖先。1998年，吴新智（1928—2021）根据中国出土的化石提出“连续进化附带杂交”的观点。早在20世纪80年代，吴新智与美国密歇根大学教授W.F.沃尔泼夫和澳大利亚国立大学教授A.G.索恩依据当时掌握的化石证据，对东亚和东南亚—太平洋地区古人类演化模式进行了分析论证。他们列举了支持这一区域古人类连续进化的化石形态证据，由此创立了现代人起源的“多地区进化说”。这一研究仍在继续。
以吴新智为代表的一批中国考古学家支持“多地区进化说”，认为包括中国人在内的东亚人是独立进化而来。1998年，由中国16个科研单位联合开展的中华民族基因组若干位点基因结构的研究表明：当今亚洲基因库主要源于非洲起源的现代人，从而对东亚地区存在着从直立人到现代人的连续进化过程的说法提出挑战，得出关于“亚洲基因库主要源于非洲起源的现代人”的结论。2001年5月，中国、美国、英国、印度尼西亚等国的研究机构合作进行的一项针对163个东亚人群的1.2万名男性进行的性染色体的基因研究表明，东亚人可能源自走出非洲的现代人而非非洲现代人与当地直立人的混合后代，但不完全排除中国人起源于本土直立人的可能性。2007年，中国科学院古脊椎动物与古人类研究所尚虹、同号文等与美国圣路易斯华盛顿大学的特林考斯教授，对田园洞人类化石研究后得出以下结论：田园洞人的化石指示其存在来自尼安德特人、近东现代型人类和南方现代型人类的基因流；中国人的祖先未必完全来自非洲。概而言之，我们的祖先从非洲来到东亚可能存在两条路线，即“南线”假说和“北线”假说。
目前虽尚无肯定的结论，但学界基本认可非洲作为人类主要诞生地之一或诞生地，非洲是能够按连续年代顺序来证明人类起源发展各个阶段的大陆。非洲发现的人类早期演变的头盖骨化石系列最为齐全；考古发掘表明从2200万年前的类人猿到200万年前的人类物种均已在非洲发现；学者进行的多年的分子遗传基因研究为人类起源于非洲提供了新证据，即生活在地球上的现代人类均是约5万~10万年前走出非洲的史前人类的后裔；语言学研究则推论世界语言源于非洲。非洲是人类发源地，人类从这里走向世界。目前，中国许昌有关人类起源的考古发现为新的观点提供了证据。然而，这两个化石“本身没有涉及非洲起源说、多地区说或者折中说”。概言之，学术界对于人类的祖先约500万~700万年前起源于非洲大陆并无太大异议。然而，对人类进化的最后一个阶段——智人的起源，学界存在两种假说，即“非洲起源说”和“多地区进化说”。
本文转自《西亚非洲》2024年第4期

2024-08-08
2024阿里巴巴全球数学竞赛决赛试题

2024-06-23
布莱恩·费根,纳迪亚·杜拉尼《气候变迁与文明兴衰——人类三万年的生存经验》

前言
长达3万年的故事
来自过去的礼物
作者说明
15,000年前至今的重大气候与历史事件年代表
绪论：开始之前
冰与火的时代，以及更多多种层次的替代指标墨西哥湾暖流北大西洋涛动季风 “恩索” 最后是特大干旱
第一章冰封的世界
不一样的世界裹住全身先进的技术鲜明的打扮寒冷中的舒适
第二章冰雪之后
理解古代的气候不断变化的地形地貌（自16,000年前起）完美风暴第一批农民（约11,000年前）第一批城镇：药物、干旱与疾病（约公元前7500年）生存朝不保夕
第三章特大干旱
苏美尔人与阿卡德人（约公元前3000年至约公元前2200年）可怕的干旱（约公元前2200年至公元前1900年）新亚述人（公元前883年至公元前610年）景观变迁宏大工程的瓦解（公元224年至651年）
第四章尼罗河与印度河
开端（约公元前6000年至公元前3100年）无所不能的法老（公元前3100年至公元前2180年）大旱来袭（约公元前2200年至公元前2184年）印度河：城市与乡村（约公元前2600年至公元前1700年）熬过大旱各有所好
第五章罗马的衰亡
暖和的开始（约公元前200年至公元150年）韧性与瘟疫（公元1世纪以后）后勤与脆弱性（公元4世纪）马匹、匈人与恐怖场面（约公元370年至约公元450年）酷寒时代（公元450年至约公元700年）
第六章玛雅文明之变
低地与君主（约公元前1000年至约公元900年）玛雅农民之古今转折点之后（公元8世纪至10世纪）科潘解体（公元435年至1150年）崩溃（公元8世纪以后）北部的气候事件（公元8世纪以后）
第七章众神与厄尔尼诺
沿海：卡拉尔、莫切、瓦里与西坎（公元前3000年至公元1375年）奇穆：多种水源管理（公元850年至约1470年）农耕环境与十二河谷令人震惊的高原：蒂亚瓦纳科（公元7世纪至12世纪）忽冷忽热
第八章查科与卡霍基亚
干旱与渔民（公元前1050年至公元13世纪）查科峡谷：一场气候踢踏舞（约公元800年至1130年）因灾迁徙（公元1130年至1180年）密西西比人（公元1050年至1350年）
第九章消失的大城市
无边的辉煌无常的季风（公元1347年至2013年）解体（公元13世纪以后）进入斯里兰卡（公元前377年至公元1170年及以后）进入多灾多难的19世纪：中国与印度的大饥荒（公元1876年至1879年）
第十章非洲的影响力
掌控“巴萨德拉”（公元前118年以前至现代）探索内陆（公元1世纪至约1250年）自给农业的现实马蓬古布韦与大津巴布韦（公元1220年至约1450年）
第十一章短暂的暖期
火山作乱（公元750年至950年） “中世纪气候异常期”（约公元950年至1200年）生存与苦役（公元1000年）逐渐变暖（公元800年至公元1300年）黑暗时代和大饥荒（公元1309年至1321年）
第十二章 “新安达卢西亚”与更远之地
神秘的“新安达卢西亚”（公元1513年至1606年）詹姆斯敦的麻烦（公元1606年至1610年）努纳勒克知道如何做（公元17世纪以后）干旱演变成特大干旱（公元16世纪末至1600年）展望未来
第十三章冰期重来
黑死病（公元1346年至1353年） “小冰期”（约公元1321年至19世纪晚期）波罗的海地区的粮食与荷兰的基础设施（公元16世纪及以后）太阳黑子、火山与罪孽（公元1450年及以后）大洋彼岸（公元17世纪以后）
第十四章可怕的火山喷发
失控的火山爆发（公元1815年）[4] 乱局（公元1815年至1832年）美洲的退化？（公元1816年至1820年）以煤驱寒（公元1850年及以后）燃烧的问题（公元19世纪晚期）人为变暖（公元1900年至1988年）
第十五章回到未来
生而为人知识传承亲族关系迁徙时代领导力组织资源转折点前车之鉴
前言
上埃及的尼肯（Nekhen），公元前2180年前后。在饱受异见和饥荒困扰的埃及，安赫提菲（Ankhtifi）是一个权势熏天的角色。他身为州长，属于地方行政长官，至少在理论上算是法老的臣属；可实际上呢，他却是全国最有影响力的人物之一。此人在庄重严肃的队伍中，由全副武装的守卫簇拥着，走向太阳神阿蒙（Amun）的神庙。他身穿一袭白袍，头上的假发整整齐齐，脖子上挂着几串由次等宝石串成的项链。这位贵族大人沐浴着明亮的阳光，毫不左顾右盼，似乎对聚集于路边的一群群沉默而饥饿的民众视而不见。他手持自己那根长长的官杖和一根仪式用的权杖，腰间则系着一条装饰华丽且打着结的腰带。士兵们的目光来回扫视，提防着矛和刀。百姓们全都饥肠辘辘；他们所得的口粮少得可怜，偷盗与轻微暴力的现象正在日益增加。号角响起，这位大人物走进了神庙，太阳神就在那座阴暗的神殿里等着他。州长向太阳神阿蒙献祭，祈祷来一场充沛的洪水以缓解近年的灾情时，全场一片寂静。
这种情况已经持续了数代之久，连许多的当地农民也记不清了。在尼罗河的下游，祭司们多日来都在观察洪水的情况，在河岸边的台阶上标出洪水的上涨位置。其中有些祭司摇了摇头，因为他们感觉到，洪水的流速正在变缓。不过，大家还是满怀希望，因为他们相信，众神掌管着这条河流，掌管着来自遥远上游且滋养了这里的洪水。安赫提菲是一位强悍直率的领导人，用铁腕手段统治着子民。他定量配给食物，控制人们的流动，封锁了治下之州的边界；只不过，这个能干而又魄力非凡的人心中也深知，他和子民都任凭众神摆布。向来如此。
安赫提菲及其同时代人所处的埃及世界，位于尼罗河流域。他生活在一个动荡不安的时代，当时的埃及深受河水泛滥与饥饿的困扰，这两个方面都威胁到了国家的生存；这一点，与我们如今这个世界并无太大的不同。只不过，我们这个时代的气候风险是全球性的，其严重性也史无前例。从政治家和宗教领袖到基层活动家和科学家，有无数人士都已强调，人类的未来岌岌可危。许多专家则提醒说，我们还有机会来纠正人类的前进路线，避免可能出现灭绝的命运。的确如此，只是我们在很大程度上已经忘记，我们其实继承了人类与气候变化方面的巨大遗产。
人们普遍认为，古代人类应对气候变化的经验，与当今这个工业化的世界无关。完全不是这样的。我们不一定要直接学习过去的做法。但是，通过多年的考古研究，我们已经更深入地了解了自身；无论是作为个体还是作为一个社会，都是如此。而且，我们也开始更加理解长期适应气候变化带来的种种挑战。
遗憾的是，如今我们对碳形成的化石燃料的依赖程度几乎没有降低。2020年肆虐美国西部的灾难性森林火灾提供了有力的证据，说明了人类导致的气候变化所带来的威胁。持续变暖，飓风与其他一些极端天气事件更加频发，海平面上升，史无前例的干旱，屡创纪录的气温……种种威胁，似乎不胜枚举。基础性科学研究的浪潮已经确凿无疑地证明，我们人类就是造成大气中碳含量升高和全球变暖的罪魁祸首。尽管有了这种研究，但许多否认气候变化的人（通常会获得他们捍卫的产业提供的资助）却声称，如今的全球变暖、海平面上升以及极端气候事件的日益频发，都属于事物的自然循环中的一部分。这些“怀疑论者”花费大笔的资金，精心策划一些具有误导性的运动，甚至是炮制出一些阴谋论来诋毁科学。他们言之凿凿，以至于很大一部分美国公民认为他们说的是真话。不过，他们又是根据什么来得出这种结论的呢？在这里我们最关注的是，对于人类在过去的3万年里应对气候变化的情况，我们的认识取得了巨大的进步。以前的人们，是如何应对天气与气候中的这些不确定因素的呢？他们采取的措施，哪些有效，哪些又无效呢？我们能从他们的生活中吸取什么样的教训，来指导我们自己和未来的决策呢？否认气候变化者的主张，在这些讨论中都没有立足之地。
哪怕是在25年之前，我们也还不可能讲清这些问题。在所有的历史学中，考古学的独特之处就在于，它能够研究人类社会在极其漫长的时期里发展和演变的情况。考古学家的历史视角可以回溯的时期，要比美国《独立宣言》发表的时候和古罗马帝国时代久远得多。与人类600万年的历史相比，约5 100年有文字记载的历史不过是一眨眼儿的工夫。在本书中，我们会把透视历史的“望远镜”的焦点集中于这段漫长历史中的一个部分，即从最近一次“大冰期”[1] 处于巅峰状态时的顶点到现代这3万年间的人类和气候变化上；这一时期，也是人类社会一个显著的变革期。古气候学领域里的一场重大革命，最终改变了我们对古代气候的认知。其中的大部分研究都具有高度的专业性和技术性，并且发展迅猛，每周都有重要的论文问世。掌握这门知识是一项艰巨的任务，几乎引不起外行的兴趣。但是，我们并没有一头扎进大量的科学细节中去，而是先撰写了一篇关于气候学的“绪论”，作为本书的开篇。这样做，是想概述一些重大的气候现象（比如厄尔尼诺现象和北大西洋涛动），以及人们在研究古代气候时运用得最广泛的方法，它们既可以是直接的，也可以是利用所谓的“替代指标”（proxy）、较为间接的方法。由于本书内容是以考古与历史为主，故我们认为最好是对这些主题分别进行讨论，以免偏离叙述的主要方向。
有史以来头一次，我们这些考古学家与历史学家能够真正开始讲述古代气候变化的情况了。我们认为，过去的人类如何适应长期性与短期性气候变化所带来的影响，与如今人类导致的（即人为的）全球变暖问题之间，具有直接的相关性。为什么呢？因为我们可以吸取过去的经验教训，即我们的祖先是如何应对或者没有应对好气候变化带来的种种困难的。诚如天体物理学家卡尔·萨根在1980年所言：“唯有了解过去，方能理解未来。”
《气候变迁与文明兴衰》一书不但吸收了最新的古气候学研究成果，而且借鉴了一些新的、经常具有高度创新性的研究成果，它们涵盖了人文学科与人类科学，范围广泛，其中包括人类学、考古学、生态学与环境史学。我们还会为您提供那些在过去 20 年里对人类行为与古气候之间的关系进行了深入研究的人所做的贡献；他们的研究成果，常常都深藏于专业期刊与大学图书馆里。我们搜集了这些资料，以便生动地将过去人类对气候事件所做的反应再现出来。
长达3万年的故事
本书并非一部论述古代气候变化的科学教科书，而是一个关于我们的祖先如何适应各种大小变化的故事。气候变化这门科学，则只是我们在本书中讲述的人类故事逐渐展开时的背景；它们讲述的是过去的人，即构成了各种不同社会的个人——无论他们身为猎人和觅食者、农民和牧民，还是生活在工业化之前各个文明中的人。这些故事，跨越了万千年历史，发生于政府机构、天气预报、全球模型、卫星，以及我们如今认为理所当然的任何一项技术出现之前（参见下文中的“15,000年前至今的重大气候与历史事件年代表”）。
我们的故事始于“大冰期”末期，距今大约3万年。我们理当如此，因为此后的数千年里，人类一直采用服装、技术和各种风险管理策略去适应极端的寒冷。“大冰期”的艺术，尤其是洞穴壁画有力地证明了历史上人类与自然界之间的复杂关系；这种关系尽管有着不同的形式，却一直延续到了现代世界。“末次盛冰期”（last glacial maximum）在大约 18,000 年前达到了巅峰，接着出现了一段漫长而没有规律的全球自然变暖期。“大冰期”晚期人类的适应技能，就成了 15,000 年前之后那些后来者面对快速变化和不断变暖的世界时一种充满活力的遗产。我们很快就会发现气候变化的一种现实，那就是气候变化反复无常。它环绕着人类的方方面面，在寒冷与温暖的循环、降雨与洪水的循环、长期与短期的严酷干旱的循环，以及偶尔由大型火山喷发引起的气候变化中消长交替。
本书前三章讲述的是大约 15,000 年前“大冰期”结束到公元1千纪之间的情况。这是一个非常重要的时期，其间出现了从狩猎与采集到农业与畜牧业的转变，随后不久又兴起了工业化之前的第一批城市文明。直觉与社会记忆，对自给农业的成功发挥了至关重要的作用；在这种农业中，经验与对本地环境的深入了解始终都是风险管理和适应能力当中一个利害攸关的组成部分。然而，日益复杂和产生了等级分层的社会不但很快出现了严重的社会不平等现象，而且越来越容易受到气候快速变化的影响。通过将大量人口迁入城市，并且让城市人口依赖于国家配给的口粮，统治者又反过来开始严重依赖于城市腹地的粮食盈余，以及由政治精英阶层掌控的集约化农业。随着罗马与君士坦丁堡这些城市的发展，它们开始严重依赖于从埃及和北非其他地区等遥远之地进口的粮食，风险也日益增加了。这些城市还越来越容易暴发流行性的瘟疫，比如公元541年那场灾难性的“查士丁尼瘟疫”[2] 。
第四章至第十章讲述的，则是公元1千纪，直到罗马帝国终结、伊斯兰教在中东地区崛起，以及中美洲的玛雅文明达到鼎盛时期的情况。在此期间，人们对气候的记录变得精细多了。我们会再次看到，工业化之前那些复杂的中央集权国家变得日益脆弱，有时这会导致灾难性的后果。在柬埔寨的吴哥窟复杂的供水系统受到压力之后，这座伟大的城市便土崩瓦解了。从安第斯山脉南部的冰盖与湖泊中开采出来的岩芯，记录了1,000多年前玻利维亚和秘鲁高原上的蒂亚瓦纳科与瓦里这两个国家的崛起与崩溃（这个词，用在此处恰如其分）。强季风和弱季风，则要么是对东南亚与南亚诸文明发挥着支撑作用，要么是危及了这些文明，并且对非洲南部那些变化无常的王国产生了影响。
这七章里，描述了工业化之前各种不同文明的情况，对古代的气候变化进行了重要的概述。长期或短期的气候变化，从来就不曾“导致”一种古代文明崩溃。更准确地说，在那些专制的领导阶层为僵化的意识形态所束缚的社会里，它们是在生态、经济、政治和社会脆弱性方面助长危险程度的一个主要因素。您不妨想一想，把一颗鹅卵石扔进一口平静的池塘里，涟漪从撞击点向外一圈圈地辐射开去的情形。气候变化所激起的“涟漪”，就是一些经济因素与其他因素；它们会结合起来，撕裂繁荣发展的国家看似平静的表面。
接下来，我们将进入大家更加熟悉的、过去1 300年间的气候学和历史领域，其中就包括了“中世纪气候异常期”（Medieval Climate Anomaly）与“小冰期”这种气候变化无常的情况；在第十一章至第十四章里，我们将加以论述。
同样，我们的论述视角是全球性的，关注的是气候变化对一些重大事件的影响，比如欧洲1315年至1321年的“大饥荒”和1346年的黑死病，以及太阳黑子活动减少的影响，其中包括了 1645 年至 1715 年间那段著名的“蒙德极小期”（Maunder Minimum）。我们将描述寒冷对北美詹姆斯敦殖民者的影响，美国西南部的古普韦布洛印第安人如何适应漫长的特大干旱期，以及气候如何促进了尼德兰地区所谓的“黄金时代”，那里的精明商人和水手曾经利用寒冷天气造成的盛行东风远洋航行。第十四章里还会描述 1816 年那个有名的“无夏之年”；它是前一年的坦博拉火山爆发造成的，而那次火山爆发还带来了全球性的影响，导致了严重的饥荒。最后，我们还谈到了始于19世纪晚期、由日益严重的工业污染导致的全球变暖问题。
这是一场很有意思的历史之旅，但这一切对我们来说又意味着什么呢？第十五章里会强调指出，人类过去应对长期性和短期性气候变化所积累下来的经验，对于我们如今应对史无前例的人为变暖至关重要。在这一章里，我们会仔细列举出今昔之间的差异，尤其是今昔气候问题的规模差异。各种各样的书籍中，对气候“末日”（Armageddon）的预言比比皆是，以至于它们听上去常常像是现代版的《圣经·启示录》，带有“末日四骑士”。相比而言，我们认为，无论是古时的传统社会，还是如今仍在兴旺发展的传统社会，都有许多重要的教训可供我们去吸取。例如，我们应对气候变化的方法中，必须包括长期规划和财政管理两个方面，可古人却不知道这一点，只有安第斯地区的社会除外，因为他们了解长期干旱的种种现实。我们已经知道，就算是到了今天，许多方面也是既取决于我们对具有威胁性的气候变化做出地方性反应，也取决于以过去不可想象的规模进行国际合作。
来自过去的礼物
在适应气候变化方面，古人给我们留下了许多宝贵的教训。但首先来看，最基本的一点就在于：与祖先一样，我们属于人类；我们继承了与前人相同的前瞻性思维、规划、创新以及合作等优秀品质。我们是智人，而这些品质也始终帮助我们适应着气候变化。它们都是宝贵的经验遗产。
来自过去的第二件礼物，是一种持久不衰的提醒：亲族纽带与人类天生的合作能力是两种宝贵的资本，即便在人口稠密的大都市里也是如此。我们只需看一看美国西南部古时或者现代的普韦布洛社会就能认识到：亲情、彼此之间的义务以及一些打破孤立的机制，仍然是人类社会面临压力之时一种必不可少的黏合剂。如今，在各种各样的社会群体（无论是教会，还是俱乐部）中，我们仍能看到那些相同的关系。亲族关系是一种应对机制。分散和人口流动两种策略也是如此；数千年的时间里，它们都是人类应对干旱或者突如其来的洪水所造成的破坏时极具适应性的方法。非自愿移民这种形式的人口流动，如今仍然是人类面对气候变化时的一种重要反应；看一看成千上万从非洲东北部的干旱中逃离的人，或者试图向北迁移到美国去的人，您就会明白这一点。如今，我们经常会说到生态难民。但我们见证的，实际上就是古时人口流动的生存策略，只不过其规模真正庞大而已。
教训还不止于此。过去的社会与其生活环境联系得很紧密。他们从来没有得益于科学的天气预报，更不用说得益于电脑模型，甚至是得益于如今可供我们利用的众多替代指标中的某一种了。古巴比伦人与包括中世纪的天文学家在内的其他一些人，都曾探究过天体的奥秘，却无一成功。直到19世纪，连最专业的天气预报也只涉及一些局部的天气现象，比如云的形成或者气温的突然变化。农民与城市居民一样，靠的都是历经一代又一代习得的一些细微的环境提示，比如浓云密布预示着飓风即将到来。同样，渔民和水手也能看出强风暴到来之前海洋涌浪方面的细微变化。过去的经验提醒我们，适应气候变化的措施往往是人们根据地方性的经验与理解而采取的地方性举措。这种适应措施，无论是修建防海堤、将房屋搬到高处还是共同应对灾难性的洪水，靠的都是地方性的经验与环境知识。小村庄也好，大城市也罢，古时的大多数社会都很清楚，他们受到气候力量的制约，而非掌控着气候力量。
回顾过去数千年间的情况，我们就可以看出祖先们面临的气候变化挑战的一般类别。像秘鲁沿海异常强大的厄尔尼诺现象，以及大规模火山喷发带来的破坏性火山灰云毁掉庄稼之类的灾难性事件，虽说持续时间很短，却会让人们苦不堪言，有时还会造成重大损失和伤亡。但是，一旦这种事件结束，气候条件就会恢复正常，受害者也会康复。它们的影响一般是短期性的，且会很快结束，常常不会超过一个人的一生之久。从此类气候打击中恢复过来，需要合作、紧密联系和强有力的领导：这一点，就是过去留给我们的一种永久性遗产。
在规模很小的社会中，领导责任落在部族首领和长者的身上，落在经验丰富、个人魅力能够让别人产生忠诚感的人身上。这一点，在很大程度上依赖于亲族同胞之间的相互义务，同时也有赖于领导人掌控和统筹粮食盈余的能力。
气候事件与短期的气候变化并不是一回事：一场漫长且周而复始的干旱，长达10年的多雨，或者持久不退、毁掉作物的洪水，都属于气候事件。过去许多自给自足的农业社会，比如秘鲁沿海的莫切人和奇穆人，就非常清楚长期干旱带来的危害。他们依靠安第斯地区的山间径流，来滋养沙漠河谷中精心设计出来、朝太平洋而去的灌溉设施。莫切人与奇穆人的饮食，在很大程度上也依赖沿海地区丰富的鳀鱼渔场；他们靠着精心维护的灌溉沟渠，在一个滴水如油的环境里对水源供应进行分配。他们的韧性，取决于在有权有势的酋长监督下以社区为基础的供水系统管理。
过去5,000年中，工业化之前的诸文明都是在社会不平等的基础上发展起来的，这一点并非巧合，因为社会维护的就是少数人的利益。一切都有赖于精心获取并加以维持的粮食盈余，因为像古埃及与东南亚的高棉文明这样的社会，都是用分配的口粮来供养贵族和平民的。在土地上生活和劳作的乡村农民，可以靠一些不那么受人欢迎的作物，或许还有野生的植物性食物，熬过短期性的干旱。他们有可能挨饿，但生活还是会继续下去。不过，旷日持久的干旱循环，比如公元前2200年到公元前1900年那场著名的特大干旱，就是另一回事了；这场大旱，通常被称为“4.2 ka事件”，曾经蔓延到了地中海东部和南亚地区。面对这种干旱，法老们根本无法再养活手下的子民。于是，古埃及就此分裂，诸州之间开始你争我夺。干得最成功的州长们比较熟悉如何解决地方性问题，故能设法养活百姓，限制人口流动。人们不再说什么神圣的法老控制着尼罗河泛滥这样的话了。后来的诸王则在灌溉方面实行了大力投入，而古埃及也一直存续到了古罗马时期。
工业化之前的文明在很大程度上属于变化无常的实体，其兴衰速度之快令人目眩，这一点也并非巧合。它们的兴衰，很大程度上取决于统治者远距离运输粮食与基本商品的能力。尼罗河近在历代法老的眼前，而玛雅文明以及华夏文明、美索不达米亚地区的许多国家，却只能依赖人力与驮畜进行运输。从政治角度来看，这就再次说明适应气候变化是一种地方性事务，因为当时的基础设施具有严重的局限性，以至于绝大多数统治者只能牢牢掌控方圆约100千米的领土。解决的办法，就是进行散货水运。虽然古罗马诸皇曾用埃及和北非其他地区出产的粮食养活了成千上万的臣民，但这些偏远地区的作物歉收给古罗马带来气候危机的可能性，也增加了上百倍。
随着工业化的进步、蒸汽动力的发展以及19世纪到21世纪全球化进程的加速，较大规模社会的种种复杂性，已经让适应气候变化成为一项更具挑战性的任务。不过，未来还是有希望的；这种乐观态度，在一定程度上源自我们人类拥有抓住机遇和大规模适应气候变化的出色本领。过去的教训，也为我们提供了鼓舞人心的未来前景。
果断的领导与人类最核心的素质，即我们彼此合作的能力，就是过去在应对气候问题时的两种历史悠久的根本性策略。人性以及我们对变化与突发事件的反应，有时是完全可以预测出来的。掩埋了庞贝古城的那次火山爆发与其他灾难中，都记录了人类面对灾难性事件时的相关行为。我们属于同一个物种，有很多东西可以相互学习，可以从我们共同的过去中吸取经验教训。假如不从现在开始，那么过不了多久，人类就将不得不转而采取艰难的办法，因为最终的现实是：有朝一日，或许就在明天，或许是几个世纪之后，人类就将面对一场超越了狭隘的民族主义，同时影响到所有的人并且像瘟疫一样严重的气候灾难。我们撰写本书旨在分析过去，帮助读者把握当下，并且借鉴古人的远见卓识，迈向未来。
[1] 大冰期（Ice Age）指地质史上气候寒冷、冰川广布的时期，大冰期中又可分为相对寒冷的冰期（glacial period）与相对温暖的间冰期（interglacial period）。小冰期（Little Ice Age）则一般特指距今最近的一次寒冷时期，始于约1250年，终于约1850年。——编者注
[2] 查士丁尼瘟疫（Justinian Plague），公元541年到542年间拜占庭帝国皇帝查士丁尼在位时暴发的一场流行性鼠疫。它不但是地中海地区暴发的首场大规模鼠疫，肆虐了近半个世纪，还对拜占庭帝国造成了致命打击，最终导致东罗马帝国走向崩溃。据估计，这场瘟疫总共导致近1亿人丧命，与“雅典鼠疫”、中世纪的“黑死病”等并称人类历史上八大最严重的瘟疫。——译者注
作者说明
年代所有利用“放射性碳定年法”测定的年代，都对照日历年进行了校准。本书通篇使用的，是公元前（BCE）/公元（CE）这种惯例。早于公元前10000年的年代，则以“若干年前”表示。
地名现代的地名，采用的是当前最常用的拼写方式。在合适的地方，我们也使用了普遍公认的古代拼法。
度量衡 本书中所有的度量衡都采用公制，因为公制如今已是一种通用的科学惯例。
地图在有些例子当中，地图上略掉了一些并不知名或者并不重要的地点，以及位于现代城市之内或者紧挨着现代城市的地点。
年代表下文列有一份概括性的年代表。考虑到本书所述内容的时间跨度很大，有时不免会在世纪与千纪之间突然切换，故每一章的标题和章节中的许多小标题里也给出了年代信息。
15,000 年前至今的重大气候与历史
事件年代表
本表列出了“大冰期”以来的一些重大气候事件与文化发展。我们并未试图做到面面俱到。其中的重大气候事件用黑体标注。公元前10000年以前的年代，则列为“若干年前”。
公元
公元前
*多格兰（Doggerland），如今欧洲北海中的一块“失落之地”，位于英格兰、荷兰和丹麦之间，亦译“道格兰”。——译者注
绪论：开始之前
冰与火的时代，以及更多
就在我们撰写本书之时，快速蔓延的森林大火已经席卷了美国加州的大部分地区。迄今为止，过火土地的面积已超过160万公顷，有大大小小几十处火场失去了控制，有时还会连成一片，形成规模更大的火灾。浓密的灰云飘散到了遥远之地，造成了严重的空气污染，威胁着人们的健康。由于温度较高，故火势不可能再在一夜之间得到控制。北加州的“北方综合大火”（North Complex Fire），过火面积在一夜之间扩大了40,468 公顷。自 1972 年以来，加州每年被火灾焚毁的土地面积已经增加了4倍。来自美国和世界各地的14,000 多名消防员，一直都在奋力灭火。成千上万的民众被疏散，数百座房屋在大火中付之一炬。气温已经上升；降雨已经减少，并且变得难以预测；在人们常常难以到达的地方，植被变得更加干燥；山区的积雪，正在消失。该州有不少于30%的人口生活在可能发生森林火灾的地区；至于原因，部分在于一些不恰当的土地利用政策助长了城市的扩张。越来越多的人，正在火灾风险很高的地区建造或者重建房屋。由于人们重新栽种的植被品种很单一，故森林管理的力度也在减弱。当局几乎也没有采取什么措施，去鼓励民众远离危险。加州人与俄勒冈人面临的，似乎正是日益变化无常、具有毁灭性且由人为导致的气候变化的后果，即看上去无法控制的火灾。
这可并非人类在历史上第一次面临环境灾难，无论是洪水、干旱还是肆虐的火灾。只不过，这一次却有所不同。这一次，气候导致的灾难是近期我们自身一些活动带来的直接后果。有些人在问，我们究竟能不能适应气温极端和毁灭性火灾频发的新现实。那些人口密集的地区，极其容易为肆虐的火灾所害；这种火灾由雷击引发，猛烈的下坡风则会将火星吹到数千米之外，在短短的几分钟里就会让整个社区陷入火海。我们是否注定要灭绝，或者被迫疏散到更安全的环境里去呢？还是说，我们终将适应很大程度上是由我们自身造成的、种种更加危险的新状况？直到如今，我们才开始严肃地面对这些问题。
本书论述的，就是人类适应各种气候变化的举措。古代社会曾经成功地适应了一些突如其来、时间短暂的事件，比如遥远的火山喷发带来的火山灰云，或者持续数年的干旱。我们的祖先还适应了较为长期的气候波动，比如海平面上升、数个世纪之久的干旱周期，以及间隔性的多年低温。总的来说，我们拥有的合作、互助以及有效管理风险的能力，都发挥了有益的作用。尽管付出的代价常常很大，但历史记录有力地表明，我们终究会挺过这场最新的环境灾难。我们终将通过短期适应和一些长期性的措施，经过艰苦的辩论，对整个社会和我们的生活方式做出永久性的改变，有时还会付出高昂的代价才能实现这一目标。
幸好，过去的半个世纪，已经见证了研究古代气候的古气候学领域里发生了一场革命。19世纪末和20世纪初，少数天才科学家做出的大胆而具有开拓性的努力，如今已变成科学领域里的重大任务。近年来，论述古代气候的专业文献，有如雨后春笋一般纷纷涌现出来。差不多每周都有重要的论文发表，连气候学家们自己也几乎跟不上文献资料问世的步伐了。像我们这些不是气候学家的人（我们是考古学家），有时更是会对与时俱进失去信心。就算只是适度涉猎一下学术资料，有时甚至只是浏览一下更普通的文献，也会让人对一系列的术语和首字母缩写感到眼花缭乱；其中，“恩索”（厄尔尼诺现象和南方涛动的合称，略作ENSO）也许就是最常见的一个。
我们撰写本书的目的，并不是要深入探究全球气候学或者古气候学当中种种令人望而生畏的复杂之处；这两个领域，本身都是自成一体的编年史。相反，我们是利用最新的信息，讲述过去之人及其与不断变化的气候之间的关系，从古代一直讲到最近；要知道，研究漫长的年代学，正是考古学家之所长。在探究古代的气候变化情况时，我们发现，本书各章中所述的种种气候变化背后，隐藏着许多重要的力量。其中包括人们熟悉的一些现象，比如厄尔尼诺现象与拉尼娜现象、“大冰期”、特大干旱，以及季风。我们在本绪论中，将对气候变化中的这些重要因素和其他一些方面加以说明。我们还会说明一些“替代指标”，即可以揭示古代气候变化情况的间接方法。至于本绪论中余下的内容，假如您愿意的话，不妨像伟大的幽默作家P.G. 沃德豪斯那个令人难忘的说法一样，把它们想象成“真正开怀畅饮之前的小酌”。如果您并不熟悉其中的一些气候因素，那就随着我们，先来简单地了解一下全球的气候吧。
乔治·菲兰德是一位地球科学家兼研究厄尔尼诺现象的专家，他在《气温正在上升吗？》这部论述全球变暖的经典作品中，为我们的研究奠定了基础。[1] 他论述了大气与海洋之间的不对称耦合关系，称二者并非理想的一对：“大气迅速而敏捷，能对来自海洋的暗示做出灵活机敏的响应，可海洋却呆板而笨拙。”这一句话，就概括出了古气候学最根本的挑战之一，即弄清楚一对并不相配的气候“巨人”是如何做到成功共舞的。这对“舞伴”当中，是谁处于主导地位？由谁来改变节奏，或者放慢节奏到几乎停顿下来的程度？这种复杂而不断变化的伙伴关系中，有诸多的细节我们还没有搞清楚。所以，在此我们只能探究一下其中的主要因素。
多种层次的替代指标
全球性的气候变化多数都具有规模宏大的特点。就在一个多世纪之前，奥地利的两位地质学家阿尔布雷希特·彭克（Albrecht Penck）和爱德华·勃吕克纳（Eduard Brückner）发现，阿尔卑斯地区至少经历了四个重大的冰期，而两个冰期之间则隔着气候温暖的间冰期。这两位地质学家研究的，是高山河谷中的冰川沉积物；只不过，如今他们的研究早已落伍了。用这四个冰期来描述“大冰期”，未免太过简单，因为“大冰期”构成了人类进化与现代人类出现在世界舞台之上的背景。如今我们知道，“大冰期”（即“更新世”）是在大约15,000 年前的“武木冰期”（Würm glaciation）结束的。随着“大冰期”的结束，“全新世”（词源中的希腊语holos意为“新的”）带来了气候的自然变暖，并且朝着气候学上的现代世界稳步前进了。
我们对“大冰期”气候的认识，建立在气候变冷与变暖这种笼统的基础之上。在这个方面，我们所用的时间尺度须以千年计、以万年计。例如，我们知道上一个冰期里气候最寒冷的数千年，是在21,000年之前左右。但是，后来的记录极其清楚地表明，气候一直都在变化；因此，对于30,000年前至 15,000 年前“大冰期”中的气候，我们最终就不会根据冰川沉积物，而是根据气候替代指标来进行更加细致的描述。
所谓的替代指标，是指源于大自然的气候信息资料，比如冰川钻芯和树木年轮，它们可用于判断 19 世纪中叶首次利用仪器做出准确记录之前的变化气候条件。在西南太平洋钻取的深海岩芯，可以追溯至 78 万年之前的情况，涵盖了“大冰期”的大部分年代；它们表明，在这几千年间，至少出现了多个完整循环的冰期与间冰期。显然，“大冰期”的气候变化要比人们一度推断的剧烈得多。然后我们有了冰芯，取自格陵兰冰盖与南极冰层的深处；现在，这种冰芯为我们提供了准确得多的气候记录，其年代至少可以追溯至 80 万年之前的更新世。例如，我们如今得知，过去的77万年里有一个时长达 10 万年的周期，支配着全球从寒冷的冰期转换到气温较高的间冰期。气候变冷是一个渐进过程，而变暖的速度却要快得多。
当然，在利用如今几乎从每一个海洋中都能钻得的深海岩芯，以及从许多地方（其中包括了安第斯山脉秘鲁段的热带冰川）钻取的冰芯时，还存在许多的复杂因素。源自冰芯和海洋岩芯的替代指标正在变得越来越精确，但从考古学的角度来看，它们通常为我们提供的是“大冰期”中广泛的气候背景。大量的黄土沉积物也是如此，这些风积尘土源自“大冰期”里的冰川，常常在乌克兰和其他地区的河谷中把“大冰期”晚期的定居点掩埋起来。虽说这是一种很不错的总体视角，但在考虑人类适应气候变化的措施时，我们必须依赖一些更加精细的替代指标才行。
“洞穴沉积物”（speleothem）一词有点儿拗口，这种替代指标在气候舞台上虽然算是相对新鲜的事物，却具有极其重要的作用。钟乳石（聚积于洞穴顶上）和石笋（长在洞穴地面上）是由富含矿物质的水透过地面，滴入洞穴之后形成的。随着富含矿物质的水不停地流动，洞穴沉积物中就会形成许多有光泽的薄层。滴入洞穴的地下水越多，洞穴沉积物里形成的层次就会越厚，而滴入洞穴的地下水越少，分层也就越薄。岩溶洞穴沉积物中的层次，可以通过测量从其周围基岩溶入水中的铀含量来确定年代。这一过程中会形成一种碳酸盐，这种碳酸盐则会变成不断生长的洞穴沉积物里每一层的组成部分。铀会以世人已知的速度衰变为钍，因此我们可以确定各层的年代。这就形成了地下水位随着时间变化的一种大致记录。各种各样的因素，比如当地地下水的化学成分，都会对洞穴沉积物的生长产生影响。这就意味着，我们必须把源自一个洞穴的气候记录，与源自一个广阔地域里其他洞穴中的沉积物所记录的气候信息进行对比才行。
考虑到水中既存在重氧也存在轻氧，因此氧同位素比率就为我们提供了一种方法，可以了解降水随着时间推移而变化的情况。大雨会带来较多的轻氧，重氧则是雨水较少的标志；不同来源的水中，二者的比率也不同。对洞穴沉积物的研究，如今还处于发展阶段，但这种研究有着巨大的潜力，能为我们提供历史上的精确降雨数据；它们可能与过去的事件直接相关，比如公元10世纪玛雅低地文明的没落。在全球许多地区，重要的洞穴沉积物记录都在迅速积累起来。它们有可能成为所有气候替代指标中最有用的一种。
在“大冰期”末期的数千年里，随着海平面上升了90米左右，达到了现代海平面的高度，全球的地形地貌也发生了巨大的变化。本书第二章中描述了两个经典的例子，即曾经将西伯利亚东北部与阿拉斯加连接起来的那条沉没的大陆桥，以及多格兰直到公元前 5500 年左右曾将英格兰与欧洲大陆连在一起的众多沼地河流平原。在公元前 4000 年左右之前，撒哈拉沙漠曾是牧民的家园，而从钻取的岩芯与孢粉分析中我们得知，这一时期的数千年里，撒哈拉地区到处都是浅湖和半干旱草原。
我们研究过去 15,000 年间的气候变化时，开始使用更加完整的替代指标资料，比如来自北美洲和欧洲北部的孢粉记录，它们记录了全球气候变暖以来复杂的植被变化情况。第一批较精确的气候替代指标，就是来自欧洲北部沼泽与湿地的微小颗粒状孢粉化石；它们表明，“大冰期”之后那里的植被出现了巨大变化，从开阔的草原变成了桦树林，最终又变成了桦、栎混交林。此时的孢粉序列，加上木炭之类的其他源头，非但记录了欧洲西部早期农耕村庄周围不断变化的植被情况，而且记录了空地上蓬勃生长的栽培性杂草的情况。例如，人们从英格兰东北部的一个湖畔定居地获得了桦树孢粉和芦苇燃烧后形成的木炭，那里自公元前 9000 年至公元前 8500 年间就开始有人居住了；当时的人曾在春秋两季，趁着芦苇很干燥和新苗开始生长的时候反复焚烧芦苇。这种受控焚烧不但有助于植物的生长，而且可以引来觅食的动物。
人们利用树木年代学（即用古树的年轮来测定年代）的历史，差不多有一个世纪之久了。这种方法，是由对太阳黑子颇感兴趣的美国西南部的天文学家安德鲁·道格拉斯（Andrew Douglass）率先提出来的，后来，它很快演变为一种精确的测定方法，用来判断古普韦布洛遗址发掘出的横梁的年代，比如新墨西哥州查科峡谷中的“普韦布洛波尼托”（Pueblo Bonito）遗址。树木年轮是由木质与树皮之间的形成层或者生长层构成的，其中记录了特定品种的树木每年的生长情况，比如美国西南部的道格拉斯冷杉。与现存活树中的年轮序列结合起来之后，古时的树木年轮就能让我们得知一些建筑物的建造年代，比如欧洲的大教堂、美国西南部的普韦布洛村落、沉船，以及其他各种各样的建筑。它们还能为世人提供宝贵的气候信息，这种信息是通过记录夏季降雨产生的氧同位素信号提供的。现在，树木年代学可以达到惊人的精确程度了。利用来自欧洲中部的7,000个树木年轮序列，人们已经估算出了公元前398年至公元2000 年间，每年4月至6月间这个重要的种植季与生长季的降雨量。树木年轮如今已是气候学研究的重要对象，世界许多地区都有大量年轮序列业已测定了年代。它们不但可以用于测定考古遗址的年代，还能提供非常精确的干湿降雨周期图。如今的树木年轮序列极其丰富，我们据此可以了解到严重干旱在美国西南部蔓延的情况。其中的多场干旱和其他一些气候变化，都是强大的全球性气候力量造成的。
墨西哥湾暖流
大西洋上的墨西哥湾暖流（简称湾流），是一个巨大的全球流动水体传输带中的组成部分，能够改变气候，影响人类的生活。高纬度的冷却作用与低纬度的加热作用——我们可以称之为“热力强迫”（thermal forcing）——会推动海水流向北方。大量的热量随着海水向北流动，然后升腾到北大西洋上空的极地气团中。北部的海水下沉，便形成了这条巨大的海洋传送带，将较高的气温带到了欧洲。这种加热作用，正是欧洲具有相对温暖的海洋性气候，并且盛行湿润的西风的原因。尽管其间也有所变化，但自“大冰期”以来，欧洲一直盛行这种西风。
但情况并不是始终如此。“大冰期”结束后，随着北方的广袤冰盖开始消退，一个叫作“阿加西湖”的巨大淡水湖探入了北美洲正在消退的劳伦太德冰原（Laurentide），长达11,000 千米。这个淡水湖是以19世纪著名的地质学家路易斯·阿加西（Louis Agassiz）的名字命名的。一片广袤的冰原向南隆起，阻止了湖水东流，使之无法经由如今的圣劳伦斯河谷注入北大西洋。势不可当的全球变暖与日益稀少的积雪，导致这处冰原开始消退。接下来，在公元前11500年左右，这道屏障终于倒塌了。大量积聚起来的冰川融水向东奔流，涌入了大西洋。更暖的海水仿佛在向北、向东而去的湾流那温暖的水体之上形成了一个盖子，让欧洲的气候变得更加暖和。在随后长达1,000年的时间里，湾流与大西洋的水体曾经停止了循环。欧洲的气温迅速下降，斯堪的纳维亚半岛上的冰原则开始步步进逼。欧洲与中东地区变得更加干旱了。气候学家以北极苔原上的一种野花“仙女木”（Dryas octopetala ）为名，将这桩长达 1,000 年的气候事件称为“新仙女木”事件（Younger Dryas），并且利用大量的放射性碳样本，测定其年代处在公元前11,500年至公元前10,600年之间。然后，湾流蓦然恢复了循环，全球开始逐渐变暖，并且一直持续至今。
“新仙女木”事件见证了人类社会发生的巨变，其中就包括中东地区开始出现农业和畜牧业（参见第二章）。接下来，基本上就是现代的气候条件开始发挥作用了。它们当中包括了没有规律却不那么旷日持久的气候变化，其持续时间要短得多。这些变化造成了不可预测的降雨和干旱，给人类社会带来了新的挑战。气候波动出现的时间，正值人口密度不断上升、定居农业变成常态的数千年。早在人为造成的全球气候变暖出现之前，人类就必须适应这些波动了。
降雨和干旱对局部地区有影响，但造成这些影响的气候因素往往源自数千千米以外的地方。大西洋上的湾流会把温暖的海水从亚热带地区输送至北极。它的作用就像是欧洲的一台空调，会让气温的波峰与波谷之间的落差趋于平缓。从长期来看，气候模型表明，湾流到21世纪末很可能会有所减弱，但这一点，在很大程度上取决于人类排放的温室气体量。最糟糕的情况是环流量减少 30%，只不过，这主要取决于格陵兰岛上的融冰对环流的影响程度。对此，我们迄今还没有做出什么准确的预测。
北大西洋涛动
对于欧洲地区和地中海的大部分地区而言，影响气候的主要因素就是北大西洋涛动（NAO）。它有如一座巨型的大气“跷跷板”，位于亚速尔群岛上空的永久性副热带高压和北方持久存在的副极地低压之间的海平面上；整个欧洲和地中海地区从12 月份至次年3月间的气温与降水变化中，有高达 60%的变化都是由北大西洋涛动造成的。它是北大西洋上冬季气候变化的主要因素，对从北美洲中部到欧洲，再到亚洲北部的广大地区都有影响。与厄尔尼诺现象不同（参见下文），北大西洋涛动主要是一种大气现象。
北大西洋涛动会在一种正、负指数之间波动。正指数会造成一个更强大的副热带高压中心和一个低于往常水平、以冰岛附近为中心的副极地低压。这就意味着，更强大和更频繁的冬季风暴会沿着一条较为靠北的路径越过大西洋。于是，欧洲的冬季会变得暖和、湿润，但加拿大北部与格陵兰岛在相同的月份里却气候干燥。美国东海岸的冬季，气候也会温和而湿润。北大西洋涛动若为正指数，会让地中海大部分地区和中东大部分地区的冬季变得更凉爽和干燥。由于北大西洋涛动会调节从大西洋进入地中海的热量与水分，故大西洋和地中海的表面气温曾经影响并且如今仍在影响着中东地区的气候。通常来说，北大西洋涛动对北美洲的影响要小得多。
北大西洋涛动处于负指数时的情况，则正好相反，即会形成一个弱副热带高压和一个弱副极地低压。二者之间的压力梯度会减小。冬季风暴会减少和变弱，并且沿着一条更偏东西走向的路线越过大西洋。它们会把湿润的大气带到地中海，把冷空气送到欧洲北部。美国东海岸的冬季则会较为寒冷，降雪也较多。由于北大西洋涛动会调节从大西洋进入地中海的热量与水分，故大西洋和地中海的表面温度会对中东地区的气候产生影响。公元3世纪末至4世纪，在罗马帝国历史上的一个重要时期，处于正指数的北大西洋涛动曾经发挥过重要的作用，为欧洲中部和北部带来了充沛的降水（参见第五章）。
太阳辐照度与火山作用周期上的变化，是过去1,000年间气温变化的主要原因。尽管更早的情况可能也是如此，但北大西洋涛动如今已是全球广大地区一种主要的气候驱动因素。其影响范围东至地中海东部；我们可以把那里称为一个“气候十字路口”，因为亚洲的季风系统与远处西南太平洋上的厄尔尼诺现象都会对那里产生影响。这种情况，就导致整个中东地区在干旱与降雨两个方面都存在巨大的地区性差异。
季风
我们最难忘的经历之一，就是乘坐印度洋上的一艘单桅帆船，在也门最南端的亚丁以东和红海的入口，迎着冬季的东北季风航行。那艘装有大三角帆的货船驶近海岸，每每在眼看就要靠岸时转向一条离岸航线，就这样航行了一个又一个小时。海面平滑如镜，柔和的热带风接连刮了好几天；度过了一天难忘的航程之后，我们得知的情况大致如此。除了离岸的信风航道，借助印度洋上的季风差不多就是最佳的航海选择了。
季风区的范围十分广袤，从东南亚和中国一直延伸到整个印度洋，而在季节性降雨的一般时间方面，则存在几种重要的变化。从根本上说，季风属于大规模的海洋风，当陆地上的气温高于或者低于海洋上的气温时，季风强度就会增大。
陆上气温的变化速度比海上更快，海上往往会保持更加稳定的气温。在较为炎热的夏季，陆地与海洋的温度都会上升，只是陆地气温上升得更快。陆地上方的空气会膨胀扩散，从而形成一个低压区。与此同时，海洋上的气温始终会低于陆地，故其上空的气压较高。二者之间的气压差，就会导致季风从海洋吹往内陆，为陆地带去较为湿润的空气。随着湿润的空气上升，风又会往回飘向海洋，但在此期间空气会冷却下来，从而降低了空气当中保持水分的能力，故经常导致暴雨。在天气较为寒冷的月份，情况则正好相反：陆地上的气温下降得比海上快，故岸上的气压较高。陆地上方的空气飘向海洋，雨水则落在近海上。接下来，冷空气往回飘向陆地，大气循环就完成了。
几千年以来，印度洋上的季风都是驱动帆船航行的动力。季风贸易利润可观，让商船可以在 12 个月内从印度西海岸前往红海或者非洲东部，然后再返回来。商船也可以沿着波斯湾地区和印度西北部之间那片荒无人烟的海岸航行。在数个世纪的时间里，丝绸与其他的纺织品再加上亚洲的舶来品，曾经源源不断地运往西方，而黄金和非洲的象牙则流往了东方。在印度洋上，夏季的西南季风会从7月份一直刮到9月份，而这几个月里，富含水汽的空气则会涌到整个印度次大陆那片炎热干旱的土地上。印度的降水当中，差不多有 80%来自夏季风；有70%的印度人口靠农耕过活，他们种植棉花、水稻和粗粮。印度西部的农民严重依赖季风带来的降雨，故极易受到季风雨延迟到来的影响；就算只是延迟几天或几个星期，影响也不容小觑。季风未能如期而至，曾经导致了无数次饥荒，让成千上万人丧了命；比如根本就没有季风雨的1877 年，情况就是如此。一些更为局部性的印度季风，则会对阿拉伯海和孟加拉湾产生影响。西南季风十分强大，连中国西北的新疆这样遥远的北方地区也能感受到它的威力。
东亚季风则属于一种温暖多雨的气候现象，使得这里的夏季风通常很湿润，而冬季风则寒冷干燥。季风性降雨集中在一个地带，5 月初由华南地区开始，一路向北，然后来到长江流域，最后在7月份到达中国北部与朝鲜半岛。到了8月份，降雨带又会南移，退回到中国南部。在过去，季风性降雨曾经至关重要。柬埔寨吴哥地区的高棉农民向来都依赖于亚洲季风；目前认为亚洲季风最初形成于1,000万年左右之前，远早于地球上出现人类的时间。季风的强度各时不同，尤其在“大冰期”结束之后不久；但在全球气候中，亚洲季风始终发挥着一种主导作用。它给全世界 60%以上的人口带来了相当可靠的季节性降水和干燥的气候条件，如若不然，就是带来干旱。夏、冬两季里，欧亚大陆及其毗连的海洋在升温方面具有差异，导致风向会在半球范围内每年都出现一次逆转。还有一个因素，那就是热带辐合带（Intertropical Convergence Zone，略作 ITCZ），也就是信风带相交的地方。还有三个区域性季风系统，也是影响东南亚地区的那种复杂气候动力中的组成部分。此外，厄尔尼诺现象与“太平洋年代际振荡”（Interdecadal Pacific Oscillation）会造成短期或者较长期的扰动，从而有可能给包括吴哥在内的亚洲大部分地区带来严重的干旱。
热带辐合带环绕着地球，位于赤道上或者赤道附近，也就是南、北半球信风的交汇地带。那里有强烈的阳光与温暖的海水，会让热带辐合带的空气受热，并且增加空气的湿度，使之变得轻盈起来。随着南北信风交汇，这种轻盈的空气便会上升；而上升、膨胀然后冷却的空气，则会在频繁但毫无规律的雷暴中释放出水分。海面附近的风力通常较弱，这就是水手们把热带辐合带称为“赤道无风带”（Doldrums）的原因。热带辐合带的季节性移动，会对许多热带国家的降水产生影响，并且导致热带地区有雨、旱两季之分。在北半球的夏季，热带辐合带会在北纬10°到15°之间移动。这种季节性的移动，曾对中美洲玛雅低地的降水产生过强大的影响（参见第六章）。随着亚洲大陆的升温幅度超过海洋的升温幅度，热带辐合带就会在太平洋上向北移动。大陆上的暖空气上升，空气从海上流往陆地，由此形成的南风就会带来季风雨。接着，到了南半球的夏季，热带辐合带就会南移了。
“恩索”
可以说，“恩索”是全球气候中最强大的一个因素。起初，人们以为厄尔尼诺是一种局部现象，会定期影响秘鲁沿海的鳀鱼渔场，通常出现在圣诞节前后。气象学上的一项伟大成就诞生在印度，由英属印度时期的气象学家吉尔伯特·沃克（Gilbert Walker）贡献；此人原本是一位训练有素的统计学家，后来却孜孜不倦地寻找季风形成的原因，变成了一位研究厄尔尼诺现象的专家。沃克是最早认识到“恩索”是一种全球性现象的观察人士之一。他总结称，当太平洋地区气压很高时，印度洋上从非洲直到澳大利亚的气压往往就会很低。他称这种现象为“南方涛动”，其回旋起伏改变了热带太平洋和印度洋的降雨模式与风向。可惜的是，当时的沃克没有海洋表面与次表层的温度数据，无法证实南方涛动的作用机制，因为20世纪20年代还没有这种数据资料。
曾经任教于加州大学洛杉矶分校的挪威气象学家雅各布·皮叶克尼斯（Jacob Bjerknes）也用一种全球性的视角对大气循环进行了研究。1957年至1958年间一次强大的厄尔尼诺现象，使其将注意力转向了西方。他受此影响发现，赤道东太平洋的海水温度相对较低，而西至印度尼西亚之远的西太平洋广袤海域则水温较高，二者正常的海面气温梯度之间具有密切的关联。他认为，赤道平面附近存在一个巨大的东西向环流圈（circulation cell）。干燥的空气会在相对较冷的东太平洋上缓慢下沉。然后，它会成为东南信风系统的一部分，随之沿赤道往西飘去。东边的气压较高，西边的气压较低，就导致了大气运动。然后，空气会在上层大气中往东回流，完成整个环流模式。皮叶克尼斯把这种环流命名为“沃克环流”（Walker Circulation）。他认识到，东太平洋地区升温时，东、西太平洋之间的海面气温梯度就会减小。这种情况，会导致驱动沃克环流下半圈的信风强度减弱。东太平洋与赤道太平洋之间的气压变化，其作用就像是一座“跷跷板”，由此便形成了沃克环流。
“恩索”的这种关联性，是由许多要素共同构成的，其中包括涛动的“跷跷板”式运动，导致太平洋升温的大气与海洋之间大规模的相互作用，以及它们与北美洲和大西洋地区的气候变化之间种种更广泛的全球性联系。皮叶克尼斯指出，大洋环流好比是驱动一台巨型气候“引擎”的“飞轮”。
每一次“恩索”，都有不同的特点。有些涛动极其强大，还有一些则软弱无力、持续时间短暂，在东太平洋与西南太平洋之间一个广袤而自我延续的循环中，为大洋环流所驱动。沿着赤道，还有一个正常的南北环流，叫作“哈得来环流”（Hadley Circulation），将热带地区与北纬地区的大气连接起来。它会将冬季风暴往北带到阿拉斯加，除非厄尔尼诺现象扰乱了这一模式。风暴轨迹会慢慢东移，袭击美国加州的沿海地区。
1972 年至 1973 年间一次大规模的“恩索”，引发了科学界人士的广泛关注；这在今天被视为一种全球性的现象，在几乎没有预警的情况下颠覆了干旱与降雨模式——这与秘鲁的鳀鱼渔业因过度捕捞而崩溃基本没有关系。如今，我们对“恩索”有了更多的了解，明白它像一个混乱无序、情绪会突然变化的“钟摆”，一旦摆动起来，就有可能持续数月，甚至是数十年之久。这个“钟摆”永远都不会沿着同一条轨迹摆动；就算摆动中有一种潜在的节奏，也是如此。从爪哇的柚木、墨西哥的冷杉、美国西南部的刺果松以及其他一些树木的年轮序列中可以看出，在1880年以前，差不多每7.5 年就会出现一个降雨量较高的年份。现在，似乎是每4.9年就会出现一次，而拉尼娜现象则是每4.2年就会出现一次。海洋中的珊瑚与取自高山冰川的冰芯则表明，“恩索”作为全球气候中的一个因素，其历史至少已达5,000年，很可能还要久得多。“恩索”的循环成了一台驱动全球气候变化的强大“引擎”，故许多专家都认为，它是气候变化方面仅次于季节的第二大原因。
“恩索”是一种热带气候现象，非但对热带地区的千百万觅食者和自给农民的生活产生了强大的影响，而且对位于河谷、雨林以及安第斯高原的那些工业化之前的文明造成了巨大的影响。全球有 75%以上的人口都生活在热带地区，其中三分之二的人口都靠农耕为生，因此这些社会始终都很容易受到旱涝灾害的威胁。随着人口不断增长，热带环境的承载能力承受的压力日多，这些社会的脆弱性也在与日俱增。直到近来，人类才获得了预测“恩索”或者其他重大气候事件的能力。如今，我们的计算机与模型完全能够预测出这些气候事件了。在我们适应全球变暖的过程中，这种知识具有极其宝贵的经济、政治和社会价值。本书所描述的古代社会，全都没有这种难得的技术，故适应“恩索”成了古人面临的一项巨大挑战，有时还是一种致命的挑战。
最后是特大干旱
数个世纪的树木年轮研究已经为我们提供了丰富的记录，表明了“中世纪气候异常期”（约950 年至约1250 年）与“小冰期”（约1250年至约1850年）里出现的长期性特大干旱（这个术语用于气候学文献中）的情况；在第十一章至第十四章里，我们将对这两个时期加以探究。
根据树木年轮所得的气候序列，具有极其精准、可以精确到某一年份的优势。幸好，如今美国的广大地区都获得了丰富的树木年轮序列，故一个令人瞩目的气候学家团队还编纂了一部《北美干旱地图集》（North American Drought Atlas ），以世人所称的“帕尔默干旱强度指数”（Palmer Drought Severity Index）为标准，重建了2,000年里的夏季湿度。最新版的《北美干旱地图集》中，还强调了两场对美国原住民社会产生过重大影响的特大干旱。其中一次发生在 13 世纪末的美国西南部，导致了弗德台地与福科纳斯两个地区的古普韦布洛族群人口锐减（参见第八章）。第二次则发生在14世纪的中部平原地区。这场特大干旱，是在伟大的宗教中心卡霍基亚被人们废弃之前不久出现的，并且此后一直持续；卡霍基亚位于密西西比河上的“美国之底”（American Bottom，亦请参见第八章）。但是，树木年轮序列的分布并不均匀，中部平原之类的地区尤其如此，因此会阻碍人们去了解这两次干旱和其他干旱的影响。
美国西部近期出现的特大干旱都非常严重，但在过去的2,000 年里，特大干旱却要持久得多。它们的持续时间，无疑要比1932年至1939年间那场著名的“尘暴”干旱久得多。一系列有影响力的研究已经表明，特大干旱几乎影响过美国西部的每一个地区，但在“公历纪元”后的早期至中期，墨西哥、五大湖区和太平洋西北部等地也发生过特大干旱。
直到19世纪中叶至19世纪晚期，古代社会都不得不去适应自然出现的气候变化，而其中的大部分变化，又是由过去主导着全球气候变化的一些强大力量导致的。随着化石燃料大行其道和工业活动日益加剧，“人为强迫”（即人类经济活动导致地球的能量平衡出现变化）开始发挥作用，而我们当前的气候危机也就开始了。不过，要想与其可能导致的破坏做斗争，最重要的一点就在于：我们必须了解数个世纪和数千年以来自然性气候变化背后的各种力量。
[1] S. George Philander, Is the Temperature Rising? The Uncertain Science of Global Warming (Princeton, NJ: Princeton University Press, 1998).
第一章冰封的世界（约3万年前至约15,000年前）
24,000 年前，欧洲中部，时值深秋。两名饱经风霜的猎人坐在溪边一块巨石之上，背对着风，转头朝天际望去。小溪对岸，有头驯鹿在秋天的枯叶间觅食，他们没有理会。彤云飞卷，几近贴地，向北方聚集。天色越来越暗，两人看着眼前那幅寒冷干燥的光景，什么也没说。接着，他们对视了一眼，点了点头，把兽皮制成的外套紧紧地裹在肩膀上。
他们的夏季居所紧挨地面，是一种用草皮和兽皮盖成的穹顶建筑。猎人们俯身走进烟雾缭绕的室内，大家围坐在一座熊熊燃烧的火炉旁边，用动物油脂制成的灯火在昏暗中闪烁摇曳。随着夜幕降临，屋外风雨大作，人们都蜷缩到了兽皮之下。其中一位猎人据说拥有超自然的力量，他讲述了一个众人耳熟能详的故事，是关于很久以前第一批人类当中的一个神话人物的。大家已经听过这个故事很多次，内容就是人们曾经在春、秋两季跟着驯鹿与野马不停地迁徙。就在讲故事的过程中，长者会听取每个人的意见，不分男女老少。到了他们该搬往冬季营地的时间了。
我们都是智人，也就是自封的“智者”。我们这个物种，出现于至少30万年之前的气候温暖的非洲；不过，世人对这个时间还存有争议。我们是一种灵活、聪明的动物，越过了一片片广袤的狩猎区域，通过在可靠的水源附近生活，适应了像漫长的干旱周期之类的气候变化。我们曾是彻头彻尾的机会主义者，靠着仔细观察、深入了解周围的地形地貌，以及合作——在家庭、部落的狭窄范围内进行合作，同时也与其他亲族进行合作——来生存。我们使用简单、轻便的工具与武器，随着当地的气候变迁生活。几乎在人类生存的所有时间里，我们都过着这样的游牧生活，即随着动物的迁徙与季节的更替，不断地迁徙。在文字出现之前，我们都是通过口耳相传的方法，将所有真实的或者想象的知识传授给下一代，有时也会通过艺术来传授；文字发端于亚洲西部，距今不过5,000年之久。
在古代，我们的生存依赖于对现实世界的深入了解与尊重；人类身处其中，是这个现实世界的一部分。尽管当今没有哪一个群体能让我们直接回到遥远的过去，但思考一下目前仅存的寥寥几个从事狩猎与采集的游猎社会的生活情况，是大有益处的。在北极地区的因纽特人或者非洲南部的桑人当中，我们发现了一种对猎物的强烈尊重和一种对生存环境的深刻理解，即对季节、植物性食物以及野生动物之迁徙的深刻认识。这种知识意味着生死之别，并且一向如此。
回想远古时代人类的非洲家园，猛烈的风暴、密集的干旱期以及大规模火山爆发之后余下的遍地灰烬，始终都是那里的气候现实。但是，一些智人在大约45,000年前迁徙到气候寒冷得多、人烟稀少的欧洲与亚洲之后，我们生存上面临的挑战急剧增加了。我们发现，自己正在与我们这个物种经历过的一些最恶劣的气候环境做斗争。但还不止于此：我们并不是唯一的人类物种。在人类 600 多万年的进化过程中，不管什么时候，始终都有几个不同的古人类物种与我们同生共存着。
例如，在欧亚大陆上，大约40万年前到3万年前存在着尼安德特人，尽管世人对这个时间范围还存有争议。从进化的角度来看，这些古人类与我们之间具有密切的联系。至少在70万年之前，我们在非洲都有一个共同的祖先。直到大约5万年之前，东南亚的一座岛屿上还生存着另一群与世隔绝的矮人，即“弗洛里斯人”（Homo floresiensis ），也就是所谓的“霍比特人”，以个子矮小而闻名。如今在很大程度上仍然不为人知的第三种人类，是“丹尼索瓦人”（Denisovans），他们曾经生活在西伯利亚，以及更远的东部与南部。还有其他一些古人类物种，我们对他们的情况几乎一无所知。而且，尽管一些不同的人类物种（尤其是智人、尼安德特人和丹尼索瓦人）之间出现过程度最低的杂交，但除了我们之外，其他的所有物种都注定要灭绝。到了3万年之前，我们智人就成了唯一存世的古人类物种了。
其他的古人类物种究竟为什么会全都灭绝，一直都是人们围绕着史前时代进行持久争论的问题之一。这些古人类物种的消失，往往与智人来到每个地区的时间大体一致。这就导致许多人如今都赞同一种“他们对决我们”的情况，也就是我们将他们全都杀光了、战胜了他们，或者二者兼而有之。然而，认为不同的古人类物种之间相对没有什么联系，只有偶尔的、有时还属于性吸引的相遇，这种说法同样有道理。或许，当时还发生了其他更严重的情况。像大卫·赖希（David Reich）这样的进化遗传学家认为，从大约 10 万年前开始，尼安德特人的数量就一直在减少，很可能是气候急剧变化导致的结果，故待到智人抵达尼安德特人的家园时，尼安德特人就只剩几千人了。类似的环境历史，可能也有助于解释如今业已灭绝的其他一些古人类物种的消亡原因。唯一可以肯定的就是，智人最终在世界各地定居下来，适应了种种新的、有时还极具挑战性的环境。
不一样的世界
这个以前的世界，又是个什么样子呢？45,000年前的世界，与如今这个供养着75亿多人、正在日益变暖的世界可大不一样。[1] 当时，广袤的冰盖笼罩着北欧大地，并从阿尔卑斯山脉向外涌出。有两个大冰原，一直延伸到北美洲的腹地，向南远至如今的西雅图与五大湖区。除了南极洲那片深度封冻的大冰盖，非洲的乞力马扎罗山与鲁文佐里山，南美洲的安第斯山脉，以及新西兰的南阿尔卑斯山上，也全都为冰雪所封冻。由于冰原中吸纳了大量的水，故当时全球的海平面比如今低了90米左右，或者更低。一个人可以步行穿过一条极其寒冷而多风的大陆桥，从西伯利亚走到阿拉斯加，连鞋子都不会打湿。北海与波罗的海当时还是干燥的陆地，不列颠则与欧洲大陆连在一起。一些巨大的沿海平原，从东南亚大陆向外延伸，直达新几内亚与澳大利亚。大片大片生长着低矮灌木的北极苔原，从大西洋沿岸一直延伸到了欧洲和西伯利亚的腹地。接连几个月里，猛烈的北风裹挟着来自北方冰原的细小冰尘，在无边无际的干旱草原上肆虐。在欧洲和欧亚大陆的大部分地区，动物与人类每年都要熬过长达9个月的冬季，以及持续低于零度的气温。当时到底有多冷呢？气候学家杰茜卡·蒂尔尼及其同事开发出了一些模型，可以利用源自海洋浮游生物化石的数据，结合模拟“末次盛冰期”的气候，重现海洋表面的温度。他们的研究证实，当时的全球平均气温要比如今低 6℃，而您也可以料到，高纬度地区的温度降幅最大。[2]
源自格陵兰岛冰芯中的数据已经表明，在一个气候不断变化、有时变化还很迅速的时期，那些生活在北方的人适应了那个极度寒冷、气候变幻莫测的世界。从全球范围来看，北半球的气温下降幅度大得多。这种情况，主要是海洋的调节作用导致的。北半球有 60%的地表为水所覆盖，而赤道以南却有将近 80%的地表为水。这就意味着，南半球的陆上气温常常会较高；当然，南极洲附近地区除外。北半球的冬季气温较低，季节性差异更大，而离赤道较远的地方，降温也更加剧烈。大约24,000年之前，纽约附近的气温降幅为10℃，芝加哥地区的气温降幅更是高达20℃。相比而言，加勒比地区的气温下降幅度只有 2℃左右。北极与赤道之间的温差梯度较大，使得北半球的风速显著更高，从而导致风寒因素上升到了对动物与人类都很危险的程度。
不过，当时并未出现永久性的深度封冻。格陵兰岛上的冰芯表明，在6万年前到3万年前这段时间里，曾经出现了多于12个短暂的较暖时期，称为“D-O事件”（Dansgaard Oeschger events）。 38,000 年前，格陵兰岛上突然出现了一个升温期，导致那里的平均气温在极短的时间里（也许只有一个世纪）跃升了12℃。但是，当地的年均气温很可能仍比如今低了 5℃至 6℃。同样短暂而寒冷的间隔期则导致了气温骤降，比那些较为温暖的振荡期低了5℃至8℃。
在大约 35,000 年前的智人当中，北方的人口增长速度似乎有所放缓；这种情况，也许是冰原不断扩大导致的。[3]随着规模很小的家族部落慢慢退避到那些较少受到风雨侵袭的地方，比如靠近地中海的一些深邃河谷与山谷当中，人口数量可能事实上已经有所减少。当时，只有几百个狩猎部落生活在欧洲。一个人在大约20年至30年的寿命当中，碰到的人很可能不超过几十个，而且其中许多人都生活在其他的群落里。假如没有这种接触，就没有人能够生存下去，因为无论怎样专业，都没有哪个狩猎部落可以做到彻底的自给自足，尤其是在“大冰期”那种令人生畏的环境中。从一开始，我们的祖先就严重依赖于亲族关系，来获得信息、专业知识和配偶。人员流动和接触他人，令技术上的创新在极短的时间内传播到极远的地方。幸运的是，在气候最寒冷的数千年里，人口数量从未下降到极其严重的程度，既未让人们丧失适应“大冰期”的严寒时所用的重要技术手段，也没有让他们丧失有助于维持其生存的、与超自然世界之间种种错综复杂的象征性关系。
3 万年前之后，充分的冰川条件卷土重来，导致24,000年前至 21,000 年前的气温达到了极端寒冷的程度。它们是“大冰期”末期最寒冷的几千年，通常称为“末次盛冰期”。由于大量的水被冻入了冰层中，故当时全球的海平面比如今低了差不多91米。
裹住全身
“大冰期”末期的人，是如何适应如此极端的寒冷的呢？我们智人，（本质上）全都起源于非洲的无毛猿类。倘若不穿衣物，那么，气温降到低于27℃时，我们的身体就会对寒冷做出反应。气温降到13℃时，我们就会开始发抖。不过，这些都只是实验室数据，是人们站在静止的空气中时得出的。刮风之时，裸体的热量会流失得更快。即便是稍低于零度的气温，对未穿衣物的人来说可能也很危险。倘若气温到了20℃，风速为30千米每小时，那么不到15分钟，人体就会冻伤。[4]
倘若在寒冷当中再加上潮湿，那么，由于我们体表的水分在温度降低时会凝结起来，出汗就成了一个严重的问题。汗水会浸透衣物，从而让衣物丧失其保暖与隔冷的功能。假如感到太冷，我们的核心体温就会下降到低于37℃这一临界水平。倘若这种核心温度因为体温调节失败而下降，我们的身体就会出现体温过低的状态。体温降到33℃，我们就会陷入昏迷。若是体温低于30℃，我们的心跳就会放缓，血压则会下降，而心脏停搏几乎就是不可避免的事情了。
那么，我们的祖先究竟是如何适应“大冰期”晚期的极端寒冷与气温突变的呢？如今，我们绝大多数人都会把汽车里的空调设置在21℃左右，因为这是我们穿着衣物时觉得舒适的温度。但我们知道，那些打一出生起就不穿衣服的人，都具有较强的挨冻能力。1829年，英国皇家海军“小猎犬号”的船长罗伯特·菲茨罗伊（Robert FitzRoy）在考察麦哲伦海峡时，遇到了雅甘人。他前往那里的时候，雅甘族可能有8,000 人，全都矮矮胖胖，平均身高约为1.5米。尽管那里气温很低，经常有雨雪，可他们一般都是赤身裸体，而在天气寒冷的时候，也只是披一件用水獭皮或者海豹皮制成、长度只到腰间的斗篷。年轻的查尔斯·达尔文曾在1833年随着“小猎犬号”前往，他对此大感震惊。“四五人兀然现身崖上，皆赤身露体，长发飘逸。”[5] 他们的耐寒能力之强，着实令人瞩目。
除了通过人口流动和皮肤表层的基因改变这种普遍的方式来适应低温之外，人类抵御寒冷的仅有武器，就只有火、衣物和高效的石器了。没人确切地知道，我们第一次驯服火是在什么时候，但根据最近从南非的旺德韦克山洞里发掘的证据来看，我们的古人类祖先似乎在大约100万年之前，就已围坐在（有意识地加以控制的）火边了。无疑，火具有难以想象的重要性。火为人类带来了众多的益处，从保护我们免遭野生动物袭击，到提高我们摄取煮熟的食物时的热量，不一而足。（从食物中摄取更多的能量，供我们需要消耗巨大热量的大脑所用，可能一直都是推动人类进化的一个关键因素。）但可以说，最最重要的还在于火可以帮助我们保暖。火既让人们走出非洲之后能够在较寒冷的环境中生存，也为那些留在故土的人缓解了夜间气温的寒冷。人们甚至用火来清理洞穴，然后才住进去。我们还应当记住，穴居本身就是一种进步，不仅可以保护人类免遭掠食动物袭扰，也可以保护人类免受天气之害。
至于衣物，则是另一种了不起的防寒之物；其基本原理也很简单，那就是遮住自身。与其他众多的创新之举一样，将兽皮和别的遮盖物披在自己身上的理念，在很多场合和不同时期都曾为人们所采用，只是我们不知道衣物的确切发明时间罢了。最简单的形式是，人们只用兽皮遮住上半身，火地岛人、非洲南部卡拉哈里沙漠中的桑族猎人以及澳大利亚原住民都是如此。这种兽皮并非仅仅用作衣物，它们还有多种用途：可以包住年幼的孩子，然后挂在肩上；可以将坚果或者其他的植物性食物运送回营地；可以在打制石器时保护双手，或者用于携带从刚刚宰杀的动物身上切下的肉。人们穿着兽皮制成的斗篷入睡，也用这种斗篷裹埋死者。
驯鹿皮足以让人们应对较为寒冷的气候，而热带地区的桑人身上披的则是羚羊皮。夏威夷人与新西兰的毛利人还发明了羽毛斗篷，那可是威名赫赫者所穿的衣物。脱下或者穿上这种斗篷，都只需几秒钟，并且披在身上时，它们从来都不会让人觉得很紧。这是一种极具实用性的多功能衣物，在气温较低的情况下紧裹身体时，其保暖效果会大大增强。
假如没有厚厚的衣物，“大冰期”中就没人能够在北方的寒冬里幸存下来。从生活在5万年至6万年前最后一次冰期那数千年酷寒当中的尼安德特人的遗址中，人们发掘出了大量边缘细长、形状经过小心打磨的石制刮器，用于将兽皮加工成床上用品、斗篷与其他物品。不过，我们的“近亲”尼安德特人所用的技术还不具备强大的适应性，只能加工出披在身上的兽皮；尽管他们可能也曾用锋利的石头或者荆棘作为针来缝制衣物，但就算如此，这种东西也早已湮灭在时间的无情流逝之中了。平心而论，在考古记录中找到针，要比大海捞针更难。然而，在南非斯布都（Sibudu）的智人洞穴遗址进行发掘的一个研究小组，却真的做到了这一点：他们发现了 61,000 年前的一个尖头，有可能是一种特制骨针的针尖。
无疑，在最后一个“大冰期”中的某个时候，生活在欧亚大陆上的人就已认识到，多穿几层更加贴身的衣物会提高个人的防寒效果，并且就算是在极其寒冷的环境下，也有保暖作用。不过，要想真正有效，他们就必须使用由动物肌腱或者植物纤维制成的线，让衣物的内层贴合个人的四肢、臀部和肩膀等部位。有眼骨针和精心制作的尖锥，使他们得以用兽皮缝制衣物。一如以往，需求乃发明之母，这样的例子在欧洲和西伯利亚地区就有。然而，我们所知的最古老例子，还是大约5万年之前的一根鸟骨针，是在西伯利亚的“丹尼索瓦洞穴”发掘出来的；人们认为，这根鸟骨针并非智人所制，而是丹尼索瓦人所制——他们是一个在我们智人到来之前，就已在欧洲生活了数千年之久的人类物种。这些工具，既说明了人类的独创性，也推动了人类为应对变幻莫测的气候而采取技术适应措施，那就是制作能够适应不同气温的多用途衣物。不过，这些其他的古人类物种全都灭绝了，并且原因不明，尽管气候变化可能也在其中发挥了作用。3 万年前之后，地球上只剩下一个人类物种，那就是我们智人了。
先进的技术
虽然经常出现持久不断的严寒，智人还是在欧洲这个新的家园里蓬勃发展起来了。他们可能是在一个较为温暖的间冰期里从非洲迁徙到欧洲的，结果使得人口密度缓慢增长，而狩猎武器也出现了重大变化。诚如南非考古学家林恩·瓦德利（Lyn Wadley）所言，这些创新不一定是有意迁徙的结果，而是在不同部落之间的人定期接触、交流想法的过程中出现的。我们知道，至少在7万年以前，非洲南部就发生了技术上的变革；当时，锋利致命的小型石制矛尖已经开始在大范围里得到广泛应用了。我们可以肯定的是，在地中海以北的陌生环境里，人们以相同的进程产生了其他的想法，发明了其他的技术。
我们并不知道人类究竟是在什么时候向北迁徙到欧亚大陆的，但极有可能，他们是在大约45,000年前，首先迁徙到了如今黑海以北的东欧平原上；当时的黑海，还是一个巨大的冰川湖。[6] 在遥远的西方，古人类物种之间可能出现过竞争。在我们到来之前，尼安德特诸部落早已成功地适应了那里相对寒冷的气候条件。这一点，可能就是智人先在气候较为寒冷、环境也不那么吸引人的东部定居下来，并且在北纬66°以北的北极圈里建起季节性营地的原因。不过，到了35,000 年前，一些智人就已在西部尼安德特人领地的中心地带站稳了脚跟，尽管有些现代智人群落肯定早在此时的1万年之前就已到达那里。
智人在很短的时间里就极其迅速地适应了如此广泛多样的环境，这一点是非比寻常的。随着他们逐渐散布到欧亚大陆上的广大地区，智人也带来了一些复杂的符号、信仰和时空概念，形成了独特的世界观与行为方式。其中一个至关重要的因素，就是流利的口语和语言；尽管语言很可能并非我们这个物种所独有，但它无疑让我们的祖先拥有了通过词句和艺术来构建其世界的能力。他们用吟唱、舞蹈、音乐和歌曲来诠释周围的环境，诠释动物、云彩、冷热、白雪、雨水和干旱。然而，考古遗址中却很少保存下来能够发出声音的鼓与其他乐器，比如长笛。这些东西，就是人们以实际的和象征性的方式构建他们的宇宙及其周围世界时所用的工具。这就反映出，智人的定居地与如今业已灭绝的其他人类物种相比，组织上更加严密。树叶颜色的变化、四季的更替、天体的运行周期，再加上其他一些象征形式，比如时卷时舒的云层，衡量出了时间的流逝与各种空间现实。
与当今北极地区的民族及各地的狩猎与采集民族一样，新到北方大地上的这些居民也积累了关于其周围环境的大量知识。单是他们掌握的植物用途知识就是一部百科全书，有各种各样的术语来描述植物的独特特征；而他们对冰雪的了解，可能也是如此。这种知识代代相传，从制作捕捉松鸡的陷阱所用的最佳材料，到连帽兽皮大衣的正确加工方法，不一而足。
这些方面，几乎全都属于无形的、不成文的和短暂的知识。身为考古学家，我们只能凭头脑去推断，北方智人所用的那些非同寻常和日益复杂的技术究竟是如何产生的。技术最先出现在热带非洲，那里的石匠发明了制作锋利的小型工具的方法。反过来，这又导致他们发明了用不同原材料（包括鹿角、骨头、贝壳和木材）制造工具且更加复杂的方法。
一个正在迁徙的家族，能够在数秒钟之内就从一块燧石上劈下几片窄窄的刃片，然后将它们变成相对具有专门用途的不同工具。他们制造过各种各样的工具，从锋利的矛尖和刮刀，到在皮革或者木头上打孔的锥子，以及考古学家称之为“錾刀”（burin）的模样像凿子的工具，什么都有。这些便于携带的工具很锋利，能够把鹿角切成长条，然后制成鱼叉叉尖和其他的武器。这些手艺不凡的工具制作者还会把一块经过精心打磨、纹理细密的石头当作模板，来制造专用工具。北方智人掌握的技术，与当今的“莱瑟曼”牌多用工具或者“瑞士军刀”的制作技术之间有着惊人的异曲同工之妙。而在一系列令人眼花缭乱的工具当中，还有一种堪称人类到此时为止所发明的最有用、最经久不衰的工具之一——有眼针。
针、用于刺破兽皮的尖头石锥、锋利的刀刃，以及由割下的动物肌腱或者植物纤维制成的细线：这些毫不起眼的工具，彻底改变了人类在酷寒地区的生活。
鲜明的打扮
衣物容易腐烂，故很少在考古遗址中保存下来。就像研究气候变化时一样，我们必须依靠替代指标；如此一来，石刀和刮器这些最普通的工具的磨损程度，就能说明问题了。
例如，从捷克共和国境内的帕夫洛夫定居地（Pavlov settlement）遗址发掘出来的刀片与刮器，其边缘的磨损状况就说明了石刃的用途——似乎是用于日常切割；在22,000年到 23,000 年前，帕夫洛夫定居地就有人居住了。这些工具，能够让人们制作出合身的复杂衣物，以遮挡脆弱的躯干，裹住圆柱形的四肢。当时的裁缝，不但能够制作复杂的衣物，还能用精心挑选的材料，选取像驯鹿和北极狐等动物身上的独特皮毛，制作衣物的不同部位，比如皮外套及其兜帽，或者制作鞋子。穿上从内衣到防水防风的厚风衣与裤子这样的三四层衣物之后，人们就能在低于零度的气温中高效地工作与生活了。这些衣物全都是精心制作的，非常合身。人们可以用锋利的锥子钻孔，制作出相当合身的衣物；这种锥子，很可能就是现代人最初在气候较为寒冷的北纬地区定居下来时，精心选择的一种工具。但是，有眼针让人们能够制作出复杂得多的衣物，比如内衣。衣物分层的好处就在于，假如气温迅速变化，一个人就可以轻松地穿上或者脱下多余的衣服。
穿合身的衣物，意味着人们开始习惯于穿着衣服，从而导致他们不穿衣物就更难应对寒冷的气候了。较复杂的多层衣物，则可以缓解人们从暖和的洞穴居所走到严寒的户外时感受到的寒意。任何一个跑步者或骑自行车的人都可以证明：迅速加上一层衣物，就可以保护自身免遭气温骤降、雨雪或者冷风所害。所有具有保护性的现代衣物，都是按照分层的原理制成的。
随着气候变得更加寒冷和更具挑战性，简单的衣物也发展成了更加复杂、更加贴身的服装。在中国西北地区的水洞沟[7]，随着北纬36°到40°之间的气候变得越来越寒冷，这里的人大约 3 万年前就开始使用有眼针。[8] 而在遥远的西方，有眼针则是在大约35,000年前地处北纬51°的乌克兰出现的。西欧温度稍高一些，到了大约3万年前开始使用有眼针。在大约 21,000 年前“末次盛冰期”气候最寒冷的那个时期，有眼针就变得更加常见了。
合身的衣物与服装制作技术，再加上对环境的深入了解与不停迁徙，就是人类适应“大冰期”晚期持续不断、有时还很迅速的气候变化时采用的主要手段，寒冷时尤其如此。
寒冷中的舒适
尽管古人在鹿角、骨头和洞壁上创作过很多出色的艺术作品，但在早至35,000年前（同样，这一时间也存有争议）猎人们精彩地描绘出的那些动物当中，我们却并未看到他们的自画像。事实上，我们只是在一些极其罕见的情况下一睹了古人的身影，比如从法国西南部发掘出的一尊大约有25,000 年历史、用象牙雕制的头像，人称“布拉桑普伊的妇人”（Lady of Brassempouy）。这个妇人头像（尽管它的模样更像是一个小姑娘，甚至像是一个男孩），是欧洲已知最古老的、对人脸进行真实再现的艺术作品。至于头像有什么意义，人们一直争论不休；头像上还覆盖着一种角度倾斜、垂在肩膀上的图案，人们对此的解释也各不相同，有人说是假发，有人说是头巾。其实更有可能的情况是，这一图案不过是此人紧紧编成了辫子的头发而已。这种发型并不令人觉得惊讶，因为目前的遗传证据表明，当时欧洲的智人有着卷发和黑色／深色的皮肤；这就明明白白地提醒世人，我们拥有非洲血统。
这些早期的狩猎部落有许多都居住在岩石洞穴里，洞穴则位于深邃的河谷两岸大小不一、天然形成的悬垂峭岩之下。像法国西南部莱塞济（Les Eyzies）村附近的费拉西（La Ferrassie）和阿布里帕托（Abri Pataud）这些大型的栖身之处长期有人居住，至少也是季节性地有人居住。有迹象表明，住在上述两地和其他一些居所的古人，曾经在峭岩的突出部位悬挂大块大块的兽皮，目的就是形成一个个较为暖和的居所，抵御刺骨的寒风；兽皮之后，则是一座座大火塘和人们睡觉的地方。
当时，人们一年中最忙碌的时节必定是春秋两季；各个部落会聚到一起，捕猎正在迁徙的驯鹿群。秋季迁徙很重要，因为度过了较为暖和的数月之后，野兽都变得膘肥体壮了。此时，就是人们将兽皮、油脂和干肉储存起来供冬天所需的时候。通过对古代和现代的驯鹿牙齿进行研究，我们得知，当时有8个方圆200千米至400千米的驯鹿活动区，其中的3 个就位于法国西南部，而智人也生活在那里。
韦泽尔河上阿布里帕托岩穴中密集的居住遗迹层表明，在28,000 年前至20,500年前这个气候严寒的时期里，智人的生活几乎没有发生什么变化。[9] 大约24,000年前，人们曾在悬崖与洞穴前面的几块大石之间建起一个结实的帐篷状结构，并且以之为中心生活着。后墙与地面之间立有用缝制的兽皮遮住的柱子，从而形成了一个牢固的居所。我们可以想见，在无风的日子里，炉塘中升起的炊烟会在突出的崖壁之下缭绕。当时的居民捕猎野马、驯鹿和凶猛的欧洲野牛（一种体型庞大的野牛，在欧洲生存了数千年之久，直到1627 年才灭绝）。无论以何种标准来衡量，这些早期的人都属于高效而机灵的猎手，对于当时的严苛环境和如今不可想象的种种气候变迁方式都了如指掌。您只要看一看他们对于野牛与其他动物形象的出色描绘就能认识到，他们花了大量的时间去观察猎物的独特习性。他们绘制的驯鹿交配、野马的夏季与冬季皮毛、正在梳理身侧皮毛的野牛、动物处于警觉状态或者摆出威胁姿势的图画，表明他们已经深刻地理解了自己的生活环境。
最重要的是，“大冰期”晚期人类创作的艺术作品，还揭示了他们与自然界以及周围宇宙中种种超自然力量之间的复杂关系。如今许多遗址仍然留有手印，仿佛是访客们通过接触深处地表下方的彩绘岩面，就获得了某种力量似的。法国的派许摩尔（Pech Merle）洞穴中有两匹绘制于大约24,600 年前的黑马，它们面对着面，四周则是一些巨大的黑点和彩色手印。在比利牛斯山山麓的加尔加斯（Gargas）洞穴里，世世代代的人，不论男女老少，甚至是婴儿，都在洞穴下层的岩壁上留下了手印。至于手印的意思，我们就只能去想象了。这些手印，是否有可能属于保护性的标志（触摸石头以求好运，是许多人类种族的共同理念），提供了他们接触超自然世界的不可磨灭的证据呢？它们也可能具有其他的作用，比如可能是标示人际关系的一种方式，或是表明一个人隶属于整个群体的手段，等等。[10]
尽管早期的人类了解环境与食物来源，可气候变化却从不由他们所掌控。多年的漫长寒冷与食物匮乏之后，就是一段气候较为温暖、猎物充足的时期。像所有的猎人与觅食者一样，“大冰期”晚期的人也会利用每一次机会，在具有战略优势的地方捕杀猎物。大约32,000年前，在如今法国中部马孔市附近的梭鲁特，“大冰期”晚期的狩猎部落曾经在一个天然的围场里，年复一年、长久不变地屠戮猎物。[11] 在“末次盛冰期”气候寒冷的岁月里，周围的开阔草原上有大量的野马和驯鹿。每年的5月至11月，猎人们就会将年轻的公马诱入这个围场，然后大肆杀戮和屠宰。几千年里，至少有3 万匹野马在梭鲁特的围猎中被人们捕杀；这个山谷中，到处都是腐烂的马匹尸体与骨架。这种狩猎一直持续到了大约21 500 年前，直到严寒促使猎人们南下，迁徙到了气候较为温暖的环境里才作罢。
而在遥远的东部，在那些开阔的平原、隐蔽的河谷和山麓地区，拥有不同传统的狩猎部落则在相遇、融合与保持着联系。他们能够与相距遥远、生活在东方广袤草原的边缘和延伸到了乌拉尔山脉的一些浅河河谷里的民族相互往来。东部诸地是一个残酷而寒冷的世界，到处是灰褐色的尘土，风沙肆虐，还有无情的干旱。尽管环境可能极其艰苦，但东部平原上却养活了数量惊人的野兽，以及众多以捕猎野兽为生、坚韧剽悍的狩猎部落。
如今保存得最完好的一些营地遗址，位于顿河沿岸；大约25,000 年以前，这里的人主要猎杀马匹和毛皮类动物。夏季里，他们会在露天营地里短暂地住上一阵子；此时，遍布各地的部落会聚在一起进行贸易、通婚、解决争端和举行宗教仪式。到了漫长的冬季，人们就会弃这种露天营地而去，分散成规模较小的部落，住进他们在冻土上挖出的半地下的居所里。
位于乌克兰第聂伯河流域的梅日里奇（Mezhirich）遗址，可以追溯到大约 15,200 年前，已是“末次盛冰期”结束之后很久了。当时的气温可能有所回升，但冬季依然极其寒冷。[12]人们通过部分迁入地下，住进直径约为5米的穹顶状居所，极好地适应了这种气候。他们利用猛犸的头骨和骨头，经过精心设计，搭建成外面那道穹顶形的护墙，然后用兽皮与草皮盖成屋顶。据美国考古学家奥尔加·索弗（Olga Sofer）估计，要想建造出4座梅日里奇遗址那种聚集在一起的房屋，只需14位或15位工人花费10天左右的时间。
梅日里奇这样的地方属于大本营，人们每年在此居住的时间长达6个月；它们修建在很浅的河谷中，能够在一定程度上抵御无情的北风。夏季到来之后，部落就会迁徙到较为开阔的乡间，住在临时营地里。每个冬季营地可能会住五六十人，每处居所里住一两个家庭。在冬季的几个月里，他们会以夏季狩猎所得然后放在永久冻土层的坑洞里加以精心储存的肉类为食。这里和其他地方一样，捕猎迁徙的驯鹿是人们在春、秋两季里的主要活动；此外，他们也会用陷阱捕捉一些较小的动物和禽类，甚至会捕鱼。但他们最重要的狩猎活动还是捕杀身上带有皮毛的猎物，因为在如此严酷的环境下生存，靠的就是合身的衣物与动物皮毛。在气温低于零摄氏度的环境中，设陷阱诱捕，也就是在野兔与狐狸惯常所走的小径沿途设下简单而高效的陷阱，是一项重要的技能。这不仅为人们提供了在寒冷中生存所需的食物，也提供了熬过漫漫寒冬所需的衣物。
在“末次盛冰期”里，欧洲有人类群体居住的地区从未出现过常年的深度冰冻。随着气温升高，各个部落会在时间较长的夏季里离开有所遮蔽的河谷；但是，对于困在河谷之外的极寒天气中面临的种种危险，他们一定没有过任何幻想。如今，若是气温远低于零度，连北方那些土生土长的猎人也不愿长途跋涉去打猎了。当时的人肯定都很清楚，在这种气候条件下，徒步狩猎会非常危险。一些人出去狩猎，其他人则是留在营地里，花大量的时间制作衣物、制备兽皮和毛皮，即用刮刀除去兽皮上的脂肪，让它们变得柔软起来。“大冰期”末期的人曾经鞣制过各种各样的兽皮，甚至是禽类的皮毛，将其细细刮擦，并用油脂加以处理。“大冰期”末期人们无休无止地用石器刮擦兽皮的做法，就像如今城市里的车水马龙之声一样，属于一种恒久的需要。
深入了解不断变化的环境，小心谨慎地定时迁徙，并且对自然世界深怀尊重之情，就是人类适应这个原始的“大冰期”世界时必不可少的几项技能；随着一年又一年、一个世纪又一个世纪、一个千年又一个千年过去，这些技能也被代代相传。最重要的是，作为群居动物，我们的早期祖先依赖的是他们精心编织出来的群体纽带、不断从他人那里获取的智慧以及合作——合作正是人类在适应气候变化时最历久弥坚的品质之一。合作曾是维系人类生存的黏合剂。由于人口数量很少，又生活在条件艰苦、掠食性动物众多的环境中，故几乎每一项活动，甚至是制备兽皮或者分配狩猎所得的肉类，参与的都并非只是个人，而是家庭和整个群体。人们很少单独出去打猎，因为警觉的猎人两两结伴去打猎的成功率更高，也更安全。妇女们经常结成紧密的团体，去寻找可食用的植物和坚果，有时离家的距离需步行几个小时。这种合作，得益于她们对坚果林和其他食物所掌握的知识；无论老少，每个人在一生中都会获得这种知识。防止食物短缺，积聚即食的食物，以及在坑、洞穴和岩石居所里储存供冬季里吃的食物，属于人们不言而喻的日常任务。此外，还有其他一些现实情况。以狩猎与采集为生且规模很小的部落，都住在临时性的营地里或者寒冷天气更持久的地方，但由于人数太少，故一场事故就有可能在瞬间让两名技艺高超的猎人丧命，并且毁掉一个部落。一场突如其来的霜冻，有可能在一夜之间毁掉尚未采摘的坚果的收成，并且威胁到冬季的食物供应。分娩则有可能让一名母亲丧生，然后留下一个无助的孤儿。在这些情况下，人们只有相互依靠才能生存下去；无论是依赖住得很近的其他部落成员，还是依靠住得较远的其他人，都是如此。[13] 坦率地说，倘若没有其他人，没有将家庭、亲族和部落团结起来的各种紧密的习俗纽带，一个人就不可能生存下去。
在过去，群体和亲族会以紧密合作为纽带，将小型的狩猎部落团结在一起。正是因为有其他人，既有身边的人也有远方的人，人类才得以生存。合作意味着他们在狩猎和采集时能够获得成功；部落集体掌握的专业知识则确保了生存，降低了风险，并且通过夏天傍晚和漫漫冬夜里的吟诵、歌唱和讲故事而得到了强化；这些方面，既是生存的基本特点，也界定了那些以合作为基础、确保人类不论年景好坏都能生存下去的基本行为。猎人的世界充满了生机与活力。人类曾经对猎物与不断变化的环境心怀敬意，从现在与以前来看，这都并非巧合。这些古老的合作品质，加上一种精心培养出来的环境知识，在人类社会中存续了数千年之久，而如今在少数社会中也依然存在。遗憾的是，在我们这个拥挤不堪的工业世界里，其中的许多品质和知识已经彻底消失，或者被人们低估了。
不过，正如近年来经历的极端气候事件，如“卡特里娜”飓风所教导我们的那样，我们比以往任何时候都更需要这些古老品质中的许多品质。飓风带来的后遗症，以及由极端高温、雷击和下坡风导致的大规模森林火灾所带来的后果，摧毁了美国加州小型乡村社区，已经让一些看似无名的社区团结起来，携手救援与重建。这种时候，人们会依赖亲族关系以及教会会众或者俱乐部之类组织严密的机构，来提供住所、食物和帮助。在这种时候，共同利益会变得比个人目标更加重要。此时，合作似乎成了我们与生俱来的本领。不过，由于如今我们大多数人的生活环境与2万年前的冰封世界截然不同，故我们并没有回顾过去和从中吸取教训。实际上，就连我们的祖先当时也处在剧变的风口浪尖上，因为“大冰期”即将结束，一场毫无规律、后来又变得很剧烈的全球变暖即将开始；而且不久之后，大多数人的生活方式将发生改变。
[1] John F. Hoffecker, A Prehistory of the North (New Brunswick, NJ: Rutgers University Press, 2005).
[2] Brian Fagan, ed., The Complete Ice Age (London and New York: Thames & Hudson, 2009)，这本文集收录了专业人士撰写的通俗文章。至于大冰期的气温，参见Jessica Tierney et al., “Glacial Cooling and Climate Sensitivity Revisited,” Nature 584 (2020): 569–573. doi: 10.1038/s41586-020-2617-x。
[3] Brian Fagan, Cro-Magnon: How the Ice Age Gave Birth to the First Modern Humans (New York: Bloomsbury Press, 2010).
[4] Ian Gilligan, Climate, Clothing, and Agriculture in Prehistory: Linking Evidence, Causes, and Effects (Cambridge: Cambridge
University Press, 2018)，对这一主题进行了明确而缜密的分析。
[5] Charles Darwin, Charles Darwin’s“Beagle” Diary, ed. Richard Darwin Keynes (Cambridge: Cambridge University Press, 1988), 134.
[6] Paul H. Barrett and R. B. Freeman, Journal of Researches: The Works of Charles Darwin (New York: New York University Press, 1987), pt. 3, 2:120.
[7] 水洞沟，中国一处旧石器时代的文化遗址，位于宁夏灵武市临河镇，1923年由两名法国古生物学家率先发掘。——译者注
[8] John F. Hoffecker, Desolate Landscapes: Ice-Age Settlement in Eastern Europe (New Brunswick, NJ: Rutgers University Press, 2002), chap. 5.
[9] Fagan, Cro-Magnon, 159–163.
[10] Hoffecker, Prehistory of the North, chaps. 5 and 6.
[11] Hoffecker, Prehistory of the North, chaps. 5 and 6.
[12] Jean Combier and Anta Montet-White, eds., Solutré 1968–1998. Memoir XXX (Paris: Société Préhistorique Fran.aise, 2002).
[13] Olga Soffer, The Upper Palaeolithic of the Eastern European Plain (New York: Academic Press, 1985).
第二章冰雪之后（15,000 年前至约公元前6000年）
大约12,000年前，中东地区北部，当今的黎巴嫩。夏日并不像夏日，出奇地寒冷，天空中乌云密布。部落里的人都躲在橡树林中的营地里，冻得瑟瑟发抖。他们前不久才在那里安顿下来，是被附近一条湍急的溪流吸引过来的。日子一天天过去，小溪逐渐干涸，涓涓细流最终在日益变小的水塘里变成了一潭死水。据长老们的记忆来看，此时的雨水量只有过去年岁的一小部分了。每个人都饥肠辘辘，靠着用陷阱捕捉的禽鸟、啮齿类动物以及在林间顽强生存着的野草勉强维生。围坐在篝火边，部落长老们讨论了附近一个山谷中有积水、食物较丰富的消息。他们听取了男女老少的意见，然后决定迁徙。第二天，整个部落便背起行囊，开始了一场他们自己并不知道将持续数代人之久的搜寻之旅。
狩猎与采集民族在地中海沿岸与叙利亚-阿拉伯沙漠之间水源相对充足的土地上，已经繁衍生息了数千年之久。自2 万年前以来，先前“大冰期”末期的严寒气候已经慢慢变暖了。到了14,500年前至12,700年前，当地人就像是生活在一个“伊甸园”里：那里温暖湿润，雨水日益增多，食物供应情况也较易预测了。可到了如今，也就是7个世纪之后，噩运却即将来临。气温正在迅速下降。部落的未来变得很不明朗。我们是怎样得知这一切的呢？这个问题的答案，就在非洲鲁文佐里山深处的冰川沉积物与湖芯当中；鲁文佐里山位于如今的乌干达与刚果民主共和国两国的边境上。从地质学的角度来看，这些巨大的山峰能为我们揭示古代气候变化的情况；关键的一点是，其中包括了“大冰期”末期气候开始变暖的时间。
理解古代的气候
鲁文佐里山（当地人称之为“鲁文朱拉山脉”）的顶峰高达5,100米，上面有5个植被带，从热带雨林到高山草甸和积雪地带，依次分布。[1] 在24,000年前的“末次盛冰期”里，鲁文佐里山中部诸峰上的冰川，开始顺着穿过整座山脉的山谷往下流去。冰川汇合之后，在海拔约2,300米的地方形成了一条超级冰川。现在早已消失的那片冰盖融化之后留下的冰川碎石，在海拔3,000米的地方围出了一个完整的潟湖，即马霍马湖（Lake Mahoma）。如今，那些山谷中都长满了郁郁葱葱的热带植物。险峻的山坡高高耸立，白雪皑皑的顶峰常常笼罩在云雾之中。尽管如此，情况仍然令人担忧。1906 年，鲁文佐里山中有43条业已命名的冰川，它们分布在6座山上，面积为7.5平方千米。但在如今全球变暖的形势下，只有3座山上还有冰川，面积也只有1.5平方千米了。冰川的长期融化，给鲁文佐里山的植被与生物多样性带来了巨大的影响。
这些山顶积雪的山峰，本是显示现代气候变化的晴雨表，可上面的冰雪却正在以惊人的速度融化。然而，它们也提供了关于远古时期的一些关键信息。冰川前进时，会裹挟着一堆堆的岩石与泥土；而冰川消退时，宇宙射线就会不断地照在这些刚刚裸露出来的一道道岩石与泥土之上（即冰川碎石堆积物，称为“冰碛”）。将这些冰碛碾碎，然后测量其中的宇生同位素铍-10（或者写作 10Be）的累积量，科学家就能确定冰川消退的时间，了解冰川随着时光流逝而向山上消退的情况，然后间接计算出气候变暖的程度。结果我们得知，鲁文佐里山上的冰川面积在大约21 500年前到18,000年前之间达到了最大，然后由于全球气温上升，它们在大约2万年前至19,000年前的某个时候，开始无可阻挡地消退。[2] 这个地质时刻，标志着地球当前的自然变暖的开始，从而预示着最后一个“大冰期”结束了。
人们在东非的湖泊中钻取的岩芯，也表明了类似的情况。到19,000 年前，热带地区的海洋便开始升温了。此时，也正是覆盖着北美洲北纬地区那片广袤的劳伦太德冰原开始消退的时候，而南半球的冰原也是如此。“大冰期”末期的世界，发生了划时代的改变，而北纬各地区尤其如此。差不多19,000年前至16,000年前，海洋与陆地的温度都仍然较低。此后，气候变暖就开始加速了。但在15,000年前到13,000年前那段时间里，气温迅速上升，可能每个世纪的升温都高达7℃。到了大约13,000年前至11 600年前，气候这座“跷跷板”出人意料地再次跳水，气温下降到了寒冷得多的程度。这段寒冷的“瞬间”就是所谓的“新仙女木”时期（参见绪论），持续了约1,000年。气温骤降后，欧洲重新出现了北极地区的植被，冰原也再次开始前进。欧洲与亚洲西南部变得更加干燥。一场严重的干旱袭击了中东的许多地区，迫使众多部落开始迁徙，以寻觅食物。如今人们对造成干旱的原因争议颇多，从一系列火山活动到可能是陨石撞击，不一而足。
“新仙女木”事件的影响是区域性的，并且在一定程度上与中东地区首次出现农业的时间相一致。这个时期以地球再次开始逐渐变暖而告结束，而这种变暖一直持续至今。
不断变化的地形地貌（自16,000年前起）
但是，这种情况对我们的祖先又意味着什么呢？在欧亚大陆北部，大约 16,000 年前之后，狩猎部落进一步向北迁徙，进入了冰川刚刚消融、变成了开阔草原的一些地区。随着“新仙女木”事件之后气温升高，森林逐渐取代了这些草原，先是桦树林，最终则成了橡树林。[3] 欧洲的狩猎部落，也从捕杀驯鹿和喜欢寒冷气候的猎物转向了捕猎马鹿、野猪和其他的森林动物。当时的猎人仍然使用长矛和投矛器，后者是一根带钩的棍子，能够准确无误地将长矛投掷出去。简单的弓箭此前早已为人们所使用，可能是5万多年前在非洲率先开始使用的；不过，在新的石器技术让人们能够制作出小而锋利的箭头之后，弓箭才开始盛行。这些轻型武器的射程更远，故拥有一种巨大的优势，能够猎杀飞行中的禽类。
人类有了弓箭之后，野兔、啮齿类动物以及迁徙的水禽就变成了颇受重视的食物；人们不仅用网子和陷阱捕捉，而且可以用这种轻便的新型武器捕猎它们了。木箭的顶端带有小而致命的锋利倒钩，以及重量几乎可以忽略不计的致命箭头。考古学家把这种箭头称为“细石器”（microlith），即细小的石头。在猎物种类增加的同时，人们也扩大了对各种植物性食物的利用。此时，禾谷植物、水果和坚果绝对不只是补充性食物，而是“后大冰期”时代人类饮食中的核心组成部分。许多部落定居在湖畔、河滨和避风挡雨的海湾边，而在这些地方，捕鱼与寻觅软体动物也变得日益重要起来。在很多地方，大大小小的部落群体曾经可能年复一年甚至是永久地利用相同的营地；这一点，取决于各季食物的丰富程度。
随着冰川融化、全球海平面上升，数千年的气候变暖也导致海岸线与河流发生了巨变。大陆架消失了，比如东南亚的近海大陆架就是如此。位于西伯利亚与阿拉斯加之间的“白令陆桥”，变成了一个风暴肆虐、波涛汹涌的海洋。直到大约8,500年前，不列颠群岛与欧洲大陆之间的北海还是一处由低洼的湿地与湖泊组成的陆桥。地质学家根据地名“多格浅滩”（Dogger Bank），将这个沉没的古代世界称为“多格兰”；如今，多格浅滩成了一处富饶的渔场。[4] 曾经有好几千人在那里繁衍生息。许多部落必定是划着独木舟，撒渔网、布渔栅、猎野禽，捕杀鹿和其他小型猎物，几乎终生如此。像“大冰期”里的所有人一样，他们也在不停地奔波，只不过，他们流动的必要性不仅是由动物的迁徙或植物性食物的时令所决定的，还取决于水位的变化情况。在这种近乎一马平川的环境中，海平面若是上升，甚至像某一次那样，爆发一场海啸，那就意味着到处都会洪水滔天。一个有所遮蔽的独木舟码头，可能会在一个人不到一辈子的时间内就没入水下。
由此导致的影响，是很深远的。动物们都选择了新的迁徙路线，而当栖息地变成泽国之后，它们又会继续迁徙。突如其来的洪水，带来了疾病与新的寄生生物。最重要的是，在一个人口密度不断上升的时代，失去狩猎场地和明确划界的部落领地，会导致严重的社会动荡，会让人们为了获得开始稀缺的食物而展开争夺，从而不可避免地出现暴力现象和战争。持续不断且似乎势不可当的变化与环境威胁，引发了一种持久的不安全感，甚至是恐惧感，就像当今这个世界里，海平面上升带来的威胁让太平洋诸岛和其他低洼地区的人都心感忧惧一样。多格兰地区内发生的每一场大洪水，都意味着人们失去了一片曾经饱含意义与情感记忆、浸润着家族历史与亲族纽带的土地。它也意味着人们丧失了许多实用性的知识，比如在哪里可以找到最优质的鱼类，或者优质的燧石。尽管一些分析人士可能会不以为意地指出，人口流动是一条适应气候变化的可行之道，但在有些时期，生态环境变化必定曾带来创伤，甚至是危机。大约在公元前6500年到公元前6200年间，大西洋海平面上升，形成了北海，淹没了以前的多格兰陆桥，使之变成了如今的汪洋大海，将不列颠与欧洲大陆分隔了开来。
不过，随着全球变暖，机会主义开始发挥作用了；其实，人类一贯如此。早期的人类既没有被永久性的住宅所束缚，也没有在庄稼种植方面进行投入，故他们会发现，不断迁徙相对容易，至少比后来定居的一代又一代人更加容易。同时，人们对环境了如指掌，这就意味着他们可以用灵活而具有创造性的方式去应对不断变化的气候。当然，我们在古人的技术创新中也会看出这个方面的蛛丝马迹，比如新型的渔具；1931 年人们在多格浅滩附近发掘出的一把经过精雕细刻的多齿骨制鱼叉叉尖，就是一个例子。
大约 15,000 年前的某个时候，第一批人类横跨白令陆桥，从西伯利亚来到了阿拉斯加；他们极为了不起地适应了新环境中的生活。[5] 率先迁来的，是北极地区的狩猎民族；他们很可能是沿着太平洋海岸往南，无比迅速地扩散到了北美洲及其以南的地区。在几千年的时间里，尽管人口仍然稀少，但人类已经适应了各种各样的环境：从北极苔原到广袤开阔的平原，再到沙漠和热带雨林，范围惊人。
起初，美洲的人口数量极少，分布广泛，并且分成了一个个的小部落。第一批美洲人属于来去匆匆的民族，他们不停地迁徙，只是偶尔与其他民族接触一下。他们的工具都很轻便，易于携带；至于狩猎武器和其他设备，许多都是到需要的时候才制作出来，然后很快就丢掉了。他们留下的东西，如今我们几乎都无从看到，通常只有散落的石器和石片，偶尔也有动物的骨头。据我们所知，当时人类用的是锋利的石刀和石尖长矛，它们与西伯利亚出土的工具几乎没有什么相似之处；这就表明，新的环境导致人类采取了新的适应手段。一些零散的石器和经放射性碳测定的工具，其年代可以追溯到14,000 年前，甚至更早。
大约 13,000 年前，北美洲出现了分布广泛的克洛维斯人，他们以制作出了独具特色、带有薄底座的石制枪头而闻名。克洛维斯人全都是技术高超的猎手，能捕杀各种大小的猎物，但他们也曾广泛采集各种植物性食物。与先辈们一样，他们的流动性极强，能够长途追踪野牛和体形较小的猎物。克洛维斯人还曾从遥远的地方获得纹理细密、用于制造工具的石头。例如，在相距1,770千米之远的密苏里州圣路易斯附近，人们竟然发现了用来自北达科他州一些采石场的“刀河燧石”（Knife River Flint）制作而成的克洛维斯燧石矛尖。这些流动性强、多才多艺的克洛维斯部落适应了各种具有挑战性的环境，从“大平原”上的草地直到西部的沙漠之地，以及从寒冷的北方到炎热的沙漠这样的极端气温。
克洛维斯人的文化传统，繁荣发展了大约500年。接下来，克洛维斯文化就被另一种从事狩猎与采集、称为“福尔索姆”（Folsom）的传统文化取代了；后者是一个文化标签，代表了从阿拉斯加的边境到墨西哥湾这个广袤地区里繁衍生息的数百个小型的狩猎部落。许多部落都曾逐猎北美野牛，可福尔索姆诸部落却适应了从落基山脉到“大平原”东部的草原林地等广泛多样的环境。数个世纪过去之后，他们的后继者也适应了各种各样的自然环境，包括西部的沙漠、东部的林地，以及异常富饶的河口与湖滨之地；在这些地区，日益复杂的狩猎采集文化曾于同一个地方繁衍生息数代之久，主要依靠鱼类、植物性食物和猎物为生。在这里，亲族纽带加上食物与其他商品的互惠交换既增加了人们的居住稳定性，也让他们与古老的土地之间形成了紧密的联系。其中有些社会，还成了后来一些更加复杂的狩猎与农耕社会的前身。
所有这些社会都一如既往，将文化价值观、本能以及像拓展食物来源与流动性等经过了深思熟虑的策略结合起来，成功地应对了严重的气候变化，尤其是日益加剧的干旱与气温上升这两个方面。人口密度不断增长与定期接触其他部落，使得人们更加容易分享食物、进行合作，尤其是更易提供有关复杂环境的知识；当时的人类社会，普遍对环境心存敬意。
完美风暴
随着先前数千年里“大冰期”气温的升高，亚洲西南部的森林面积也迅速扩张了；只不过，当时的气温仍然比如今低，而降雨则相对充沛一些。植被变得更加丰富多样，其中还出现了野生谷物，为人类提供了大量可食用的谷物种子。猎物很丰富，而谷类植物和可食用的坚果（比如开心果与橡子）也是如此。底格里斯河与幼发拉底河的下游地区尤其如此，一代又一代的狩猎与采集民族都生活得极其富足，以至于他们开始在那里定居下来。他们兴建了一些规模越来越大的定居点，并且把死者安葬在墓地里，其中许多死者还有奢华的装饰品陪葬。有迹象表明，当时出现了较为复杂的社会组织，尤其是有迹象显示，他们对祖先，即以往数代居住在同一片土地上的人怀有一种更加深刻的敬畏之情。这一点并不令人觉得奇怪，因为将人们的土地所有权合法化的一个好办法，就是强调他们跟曾经拥有这片土地的祖先之间有着密切的联系。
不过，刚开始时他们为什么要定居下来呢？要知道，在600 多万年的漫长岁月里，古人类一直都在迁徙，而智人也迁徙了30万年之久呢。一种说法认为，是冰川融化之后，大约14,500 年前至12,900年前，那种食物丰富、气候也较温和的环境条件，促使觅食民族开始在距肥沃土地不远的村庄里永久定居下来的。还有一种观点则认为，是降雨量增加和食物供应状况改善导致了人口增长，这就意味着人们会积极主动地想要获得“部落领地”的所有权。至于实际情况，很可能是二者兼有。
然而，食物充裕的温暖期过后，就迎来了气候寒冷的“新仙女木”事件；它不但导致了黎巴嫩北部等地的气候条件变得更加干燥、气温有所下降，而且给那些地方带来了大范围的干旱。我们早已得知，“新仙女木”事件对亚洲西南部以采集觅食为生的社会造成了影响，但如今我们还对这种影响的细节有了十分详尽的认识；这一点，要归功于人们对以色列的索瑞克石窟（Soreq Cave）中的洞穴沉积物所进行的研究，以及用其他气候替代指标（包括花粉和同位素记录）进行的研究。
对人类而言，气候条件变得较为干燥之后，他们就更加重视收获野生谷物和建造野生谷物的储存设施了。与此同时，植物栽培实验也进展得很顺利；早在23,000年前，在以色列加利利海（太巴列湖）岸边的“奥哈罗二号”（Ohalo Ⅱ）营地，人们就开始率先种植大麦与小麦，至少也是暂时开始种植了。当时的实验似乎为时不久，降雨量增加之后就没有再进行下去。在干旱环境里，动植物都属于无法预测的资源，栽培野生禾草显然已成为一种公认的策略。无疑，其他群体在“大冰期”末期也栽培过谷物；但此时人类栽培的谷类植物出现了基因改变，既导致了全职农业的产生，也导致了人口的显著增长，故人们开始广泛地转向了有意的作物栽培。
不过，转向粮食生产属于一个适应过程，情况比乍看之下要复杂得多。既不是哪一个人“发明”了农业，也不是哪一个人在某天决定要去驯养有用的动物。相反，它是在多达14 个地方（很可能更多）逐渐展开、独立进行的一个转变过程，通常是为了应对气候变化。[6]
第一批农民（约11,000年前）
尽管人类进行过各种各样的早期实验，但正经的粮食生产，始于约11,000年前的亚洲西南部、东亚和南美地区。大约3,000年至4,000年后，中国的长江与黄河沿岸都出现了农民。5,000 年前，南亚与东南亚、非洲大草原的部分地区以及北美洲都兴起了农业和畜牧业。这些新兴经济以不可阻挡之势扩张开去，但取决于当地的环境而速度不一。有了较为可靠的粮食来源之后，人口数量与密度都出现了持续的增长。人类刚开始进行粮食生产时，全球只有500万左右的人口，但到了基督时代，这一数字急剧增长到了2亿到3亿之间。现在，自给农业与工业化农业养活着全世界75亿人口，而这个数字还在不断增长。但是，如今仍有不到100万的人口，在以古老的狩猎和采集方式生活着。
半个多世纪以前，考古学家维尔·戈登·柴尔德曾经撰文论述人类历史上出现过的两大革命，即农业革命与城市革命。[7] 柴尔德笔下的这两大革命，掩盖了粮食生产能力曾经导致人类社会出现的一些复杂得多的变化。其中，不仅有人类在农作物与动物方面的专业知识的发展，还有规模更大、人口也要密集得多的永久性定居地的建立。
柴尔德是一位马克思主义者，故尤其关注一些与定居生活相关的社会和经济问题，比如财产的积累、对有限土地的投资，以及后来少数人对多数人的统治。人类过上定居生活之后，的确出现了一种朝着竞争、社会不平等以及社会等级日益森严等方面发展的强大趋势。但另一方面，新兴经济也意味着此时一些人摆脱了筹集食物的日常任务，可以专攻其他的事情，比如制陶或冶金，或者只是花时间去思考和关注生活中的其他方面。这正是定居社会促使冶金、写作、艺术与科学领域里出现了大量创新的原因。此外，随着人口倍增，人们的想法也是如此，尤其是在他们会聚于城镇，能够分享知识与思想的时候。人口增长并非只因为食物供应很充足这一个方面（这种充足，从来都没有什么保障），还因为多生几个孩子（作为未来的劳动力）在农耕社会里往往是一种优势。这一点，与从事狩猎和采集的社会形成了鲜明的对比，因为子女太多会给后一种群体的食物供应带来负担。随着人口增长，村落变成了集镇，集镇变成了城市，而城市则变成了王国，然后有了实力强大的帝国。
这种情况，还导致了一些意想不到的后果，即出现了由家畜或者昆虫滋生引发的新传染病，并且给环境带来了种种压力。这些“文明的变革”，对全球气候产生了重大影响。回顾过去的75万年，其间至少交替出现了8个气候温暖的“间冰期”，以及它们之间气候寒冷的冰期；其中的每一个冰期开始的时候，大气中的温室气体含量都很高，然后，随着气温下降，温室气体的含量也会缓慢下降。接着迎来了当今这个时代，地质学家称之为“全新世”；当然，这是一个农耕时代。气候学家威廉·拉迪曼已经指出了大气中的二氧化碳含量起初逐渐下降，但在大约7,000年前又开始上升的过程。[8] 大气中的甲烷含量，则在差不多2,000年之后开始上升。他认为，二氧化碳含量增加是人们砍伐森林以进行农耕导致的，而甲烷含量上升则是人类种植水稻的结果。拉迪曼的理论虽然备受争议，如今却已日益被人们广泛接受。可以说，从狩猎与觅食到农耕这个古老的转变过程，缓慢却势不可当，并且确实在无意当中助长了全球变暖，大大增加了我们在面对短期与长期性气候变化时的脆弱性。
当然，人类一向都很脆弱。像灾难性干旱之类的短期事件，有可能在气候并未变暖的情况下突然降临。以前的社会为何能够适应突如其来的气候变化，并且幸存下来呢？很显然，寻找食物是推动当时社会发展的压倒性因素。当环境有利，猎物和植物性食物都很丰富时，人类的生存决策相对简单，其依据的是哪些食物最容易获得，并且会受到他们与邻近部落之间竞争的影响。环境条件恶化之后，就出现了新的问题；其中之一，就在于最大限度地降低风险。人类的直觉发挥了重要的作用，而一些传统的生存策略也不例外。有些人可能在不发生冲突的情况下，迁徙到新的地方；其他一些人则有可能争夺资源，诉诸暴力，可结果却毫无保障。
当时，人们在很多方面必定都是依赖长期的社会记忆，依赖于人类代代相传的关于环境与食物资源方面的知识。不同于狩猎与采集民族，一旦与土地紧密联系起来，农民就会规避风险；他们非常清楚，反复出现的作物歉收与禽畜疾病有可能让他们无法适应天灾，比如一场旷日持久的干旱。结果，必定有很多人丧命，也必定有一些群体走向了灭绝。在这个方面，不断迁徙的觅食民族与世世代代留在一个地方尝试耕作的农民之间，就出现了一种重大区别。连最早的农民，也对他们的土地、房屋、储藏设施和仪式中心进行了大力投入。在对环境的这种精神依附的作用下，他们往往会对环境变化做出积极的反应，比如养羊而不养牛。抛弃一个定居地和整个部落所珍视的土地，是一种迫不得已的策略。
在气候快速变冷的“新仙女木”事件中，黎凡特[9] 北部地区才真正开始了农业；假如仔细思考一下这个事实，我们就能看出环境在人类生活当中所扮演的角色。[10] 这种气候变化，可能导致人们开始进行粮食生产，因为冬季的霜冻杀死了种子，并且推迟了谷类作物的发芽与成熟时间。各个群落都不得不改变他们的食物来源。这是一个个季节性气候条件不断变化和很不稳定的时期。在只能养活少量人口的地区，存在严重的人口压力。结果，就出现了剧烈的社会动荡、争夺食物和无数次小规模的迁徙。觅食民族做出的反应，是从内盖夫沙漠（Negev Desert）和叙利亚-阿拉伯沙漠边缘这种较为干旱的地区，迁徙到了有可耕土地的地方。但短期内，觅食民族只能在靠近沙漠、不可耕作的边缘地区勉强生存。
巨大的转变，出现在有地中海植被的地区，或者说靠近“肥沃新月”中那个大草原的地区。[11] 在其他一些森林较多的地区，觅食民族则继续与农民一起繁衍生息。在11,700年前到 11,200 年前的这段时间里，农民不但开发出了新型的斧、锛，而且开始使用效率更高的磨石、石镰，以及效果更好的新式箭头。他们的定居地变得更加恒久，还有足以傲人的土墙房屋或者砖墙房屋，这种平顶建筑常常建在石头地基上。宗教建筑的最早证据，比如土耳其东南部哥贝克力山丘（Göbekli Tepe）上的神殿，就可以追溯到这个时期。据我们所知，那处遗址的居民曾经把整座山顶变成一个祭祀中心，但他们仍然属于狩猎采集者，而不是农民。不过，他们建造了一座复杂的、带有石雕立柱的圆形建筑，立柱上雕着动物图案，表明那里曾是一个重要的圣地。
与这种神殿有关的画作、雕像和石膏人类头骨，既反映出当时的人心怀一种强烈的执念，认为祖先是土地的守护者，也反映出他们极度迷信创造环境、力量强大的神秘生物和滋养环境的各种气候力量。这些执念，又反映出他们更加关注领地的控制权。与此同时，神殿内精心设计的动物雕像、人类雕像或者墙壁装饰则证明，他们与不论远近的相邻部落都经常交流。随着这些交流而来的，就是共享耕作与放牧的知识，从而让其他人也能采用新的生存方式与可持续发展方式。
第一批城镇：药物、干旱与疾病（约公元前7500年）
面临干旱时，随着森林范围逐渐缩小，野生禾草的收成也开始大幅下降。一些饥肠辘辘的部落依靠猎杀小羚羊与对谷物和豆类进行精耕细作而幸存了下来。在土耳其东南部和叙利亚北部这样的地区，一些群落开始种植野生禾草，想要扩大它们的种植范围；这种做法是人们熟悉的一种实验策略。
在叙利亚北部靠近幼发拉底河一个叫作阿布胡赖拉的村庄土丘上，大约13,000年前的原始居民都住在简单的“窖屋”里；那里的环境可谓林木繁茂，动物与野生谷物都很丰富。[12] 他们还会捕猎成百上千头波斯瞪羚；每年春季，波斯瞪羚都会从南方迁徙而来。考古发掘者安德鲁·穆尔（Andrew Moore）用细筛对覆盖着灰烬的居住层进行筛选，从中获得了大量的植物性食物样本。他的同事戈登·希尔曼（Gordon Hillman）则发现，这些样本来自6种主要的野生植物。不过，当时还有数百种其他的野生植物，被人们用于各种各样的目的，其中还包括迷幻剂和染料。随着旱情加剧，这个小小的村落被人们遗弃了；或许，木柴短缺也是这里被遗弃的原因之一。
公元前 9000 年前后，一个新的村落在这座低矮的土丘上兴起，然后逐渐发展到了占地近12公顷的规模。在一代人左右的时间里，人们不再捕猎瞪羚，而是开始牧养绵羊与山羊。希尔曼发现，人们起初是在附近的森林里采集水果与禾草。随着干旱加剧，一度生长在房屋附近的野生禾草变得日益稀少起来。400 年过后，旱情更加严重了。起初，人们通过转向采集种子很小的禾草与其他的应急性食物，来适应这个始终都属于半干旱气候的地区。从他们留下的骸骨来看，与前人相比，第一批农民的生活过得尤其艰难。一些年轻人的颈部和脊椎都有问题，因为他们经常背负太重的东西，比如一捆捆谷物或者建筑材料。女性身上通常有趾骨磨损的迹象；这种症状，与脚趾总是处于蜷曲／弯折姿势导致的症状相吻合——这种姿势，也就是她们在房中地上固定的磨石上无休无止地加工谷物时所需的姿势。尽管有这些问题，这里的人口还是迅速增长，以至于居民多达400人了。生活在如今业已荒芜的干旱草原环境中，他们便采用了人类从事农耕之前一种源远流长的策略：他们最终弃这座村落而去，迁往水源较丰富的地方了。
公元前7700年过后，随着环境再次变得较为有利，这座土丘上又兴起了一个更大的村落；村中都是土砖平房，由狭窄的巷子隔开。阿布胡赖拉的情况，并非特例。随着更湿润的气候条件卷土重来，人们便忘掉了气候较干燥的那几个世纪，农业与畜牧业也从沿海地区扩散到了内陆，从低地传播到了高原，经由美索不达米亚传到了土耳其与尼罗河流域。然而，人们变成农民并不只是由于气候的变化。这个转变过程要复杂得多。
在土耳其中部的加泰土丘，人们进行了另一项长期的考古发掘工作；这是一个大型的村落，或者说一个房屋密集的小型城镇，在大约公元前7400年至公元前5700年间的1,700多年里，重建了起码18次。[13] 此地之人的日常生活，以一群群密集的住宅为中心；在这些住宅里，同一家族已经居住了数代之久。许多房屋都带有装饰，所用的艺术风格非常奢华，呈现出复杂的象征意义。墙上绘有人类与猛兽的壁画，还能发现人类与公牛的石膏头骨。在有人居住的房屋里，居住者会与过去进行密切的互动。其他一些房子里则存放着人类的骸骨，数量比曾经居住在那里的活人要多得多。它们似乎就是考古发掘者所称的“祖宅”，是人们举行祭祀仪式、在世者得以接触备受敬重的祖先之地。
当时的加泰土丘人的生活，并不一定令人觉得舒适。在加泰土丘最繁盛的时期里，有3,000人至8,000人住在村中或者附近；当时的降水相对充沛，贸易也在蓬勃发展。加泰土丘人面对过人口过密、传染病频发、暴力肆虐等问题，还遇到过严重的环境问题。公元前 7400 年前后始建的这个小村落，迅速发展成了一座人口稠密、规模大得多的村庄，甚至成了一座城镇，因黑曜岩（即用于制造工具因而备受重视的火山玻璃）生意红火而繁荣起来。如今，生物考古学家能够对当时居民骨骼中的化学成分进行研究。骨骼中稳定的碳同位素表明，当时的人主要以谷类为食，比如大麦、黑麦和小麦。他们一开始养的是羊，后来则是养牛。他们以谷类为主的饮食，导致了许多蛀牙病例。人们腿骨的横截面表明，后来住在这里的人比起初的居民走路更多。研究人员认为，这是因为后来的居民不得不到远离社区的地方去耕作与放牧。领导这项研究的克拉克·斯宾塞·拉森（Clark Spencer Larsen）认为，当时的环境恶化与气候变化，曾经迫使社区成员到离住处很远的地方去种植庄稼和充分收集一种至关重要的物品：木柴。
在整个中东地区的气候变得日益干旱的那个时期里，加泰土丘在蓬勃发展。不过，长期的人口过密与恶劣的卫生条件必然会导致传染病；这一点在死者的骸骨中会显现出来。当时的住宅，就像是一栋栋拥挤不堪的廉价公寓，以至于研究人员对墙壁与地板进行分析时，竟然发现了人畜粪便的痕迹。垃圾坑和畜栏，都紧挨着一些房屋。这里的卫生条件，必定恶化得非常迅速。人口过密，也导致了暴力现象。在一份由 25 人构成的样本中，竟然有超过四分之一的人身上都存在愈合了的骨折痕迹。他们中的一些人还曾反复受伤，其中许多处伤都是他们背对袭击者时，被硬邦邦的黏土团击中头部造成的。受害者中，超过半数都是女性。大多数袭击，都发生在居住环境最拥挤的那几代里；或许，那几代就是这个社区内部紧张和冲突的时期。但最引人注意的一点在于，加泰土丘农民所面临的种种问题，几乎与如今城市在更大规模上普遍存在的问题毫无二致。
在人们与土地的关系变得越来越紧密的一个时代，更频繁与更广泛的相互交往把远近各地的社群联系起来。在当时这个日常活动比以往任何时候都更加紧密地围绕着四季的无尽更替来进行的社会里，一种不可抗拒的延续性理念变成了生活中一个核心的组成部分。在这种背景下，几乎所有地区的农村社会都必须应对气候变化带来的种种挑战。
在公元前6200年至公元前5800年间，一场场灾难性的干旱对位于尤克辛湖（Euxine Lake，即如今的黑海）与幼发拉底河之间的农耕社区都产生了影响。干旱旷日持久，湖泊与河流纷纷干涸，死海水位也降到了历史最低。在冷酷无情的干旱面前，大大小小的农耕社区都开始缩小规模和逐渐衰落下去。许多人消失不见，无数人死于饥饿和饥荒导致的疾病。还有一些人，例如一度繁荣兴旺的加泰土丘居民，则因为满足不了体形较大的畜群的饮水需求，从养牛转向了牧羊。
生存朝不保夕
“大冰期”结束后，人类不得不去适应剧烈的气候变化。随着冰原消退，海平面上升，就连地形地貌也发生了令人难以想象的变化，然而人类（此时仍然属于以狩猎和采集为生的游牧部落）利用了自己最熟悉的知识：他们采用了传统的方式，依靠经验、亲属关系、合作以及技术创新来降低风险和保持韧性，从而成功地适应了深刻的文化变革和环境变化。
最大的一些变化，是在大约11,000年前之后，随着世界各地都转向了农业与畜牧业而出现的。自给农业与畜牧业将人们束缚在土地上，故他们开始“进口”当地没有的商品。此时，买卖“异域商品”的长途贸易就真正开始飞速发展起来。黑曜岩这种纹理细密的火山玻璃，变成了制造工具和装饰品的一种紧俏商品。英国考古学家科林·伦弗鲁（Colin Renfrew）曾经利用岩石中独特的微量元素，勾画出了地中海东部广大地区的黑曜岩贸易路线。
然而，大多数靠农耕为生的人却过得非常艰辛，生活也朝不保夕。人类第一次开始面对自给农业的严酷现实，他们无法像过去那样靠迁徙来适应，只得忍受短期与长期的干旱。与耕种土地、经营农场相比，直接外出寻觅食物或者捕猎野兽时，需要付出的时间与精力要少得多。人类学家已经在他们与一些以狩猎和采集为生的群落合作研究的过程中一再证明了这一点；比如，研究坦桑尼亚的哈察人（Hadza）觅食部落时，他们曾经仔细记录了该部落生活方式中的热量消耗与恢复情况。此外，人类学家对非洲卡拉哈里沙漠的桑人进行的研究也已证明，耕作所需的热量与时间，要远多于以狩猎与觅食为生的部落采集等量食物所需的热量与时间；更何况，狩猎与觅食部落的人口数量事实上一直都在减少，因此整个部落无须再付出那么大的努力。
但是，那些最终幸存下来的农耕部落之所以能够维持下来，在很大程度上要归功于前人遗留下来并代代相传的风险管理措施。适应气候变化是一个局部性的问题，取决于人们掌握的环境知识和继承的经验。如今我们仍然属于定居民族，却经常忘记局部适应的重要性。应对气候变化的措施，往往是从局部层面开始的，并且适合当地周围的局部环境。无论我们是住在乡村，还是住在一个有数百万人口的城市里，这一点到今天都仍然适用。
而且，随着农业经济的扩张，人口密度也将开始上升。假如说“成功”要根据人口密度的上升来衡量，那么，农业就发展得非常成功。在数个世纪的时间里，美索不达米亚这个“河间之地”的南北各地，就都散布着从事农业的社群了。不久之后，先是城镇，然后是拥有文字、纪念性建筑物、黄金、珠宝、富有魅力的国王和进行全面战争的复杂城市，就会涌现出来。接下来的两章，我们将探讨美索不达米亚及其同时代的埃及文明与印度河流域文明，看一看它们在与日晒雨淋做斗争的过程中成功和最终失败的情况。
[1] 据说，古希腊哲学家西诺帕的第欧根尼（前386—前354）曾经从今坦桑尼亚的拉普塔镇往内陆而去，游历了25天。他将鲁文佐里山命名为“月亮山”，并且认为那里就是尼罗河的源头。地理学家提尔的马利纳斯（Marinus of Tyre，约 70—130）记录了第欧根尼的历次旅行，为托勒密的《地理学指南》一书奠定了基础。遗憾的是，马利纳斯的地理专著已经佚失。后来的阿拉伯旅行者，则恰如其分地把这些传说中的山峰称为“吉贝尔厄尔库姆里”（Jibbel el Kumri，即阿拉伯语中的“月亮山”）。 1889年，以“我想您就是利文斯通博士？”这句话而出名的探险家亨利·莫顿·斯坦利最终在地图上确定了这条山脉的位置。此前的欧洲旅行者从未见过这条山脉，因为它们通常都笼罩在云层之下。
[2] Margaret S. Jackson et al., “High-Latitude Warming
Initiated the Onset of the Fast Deglaciation in the Tropics,”
Science Advances 5 (12) (2019). doi: 10.1126/sciadv.aaw2610.
[3] Steven Mithen, After the Ice: A Global Human History,
20,000–5000 BC (Cambridge, MA: Harvard University Press,
2006)，这是一部权威而具有启发意义的总结性著作。
[4] Vincent Gaffney et al., Europe’ s Lost World: The Rediscovery of Doggerland (York: Council for British Archaeology, 2009).
[5] 论述美洲最初定居点的文献资料多如牛毛，并且充满了争议。See
David Meltzer, First Peoples in a New World: Colonizing Ice
Age America (Berkeley: University of California Press, 2008).
See also David Meltzer, The Great Paleolithic War: How
Science Forged an Understanding of America’ s Ice Age Past
(Chicago: University of Chicago Press, 2015).
[6] 同样，这方面的文献资料浩如烟海且相互矛盾。一部非常有用的总结之作：Graeme Barker, The Agricultural Revolution in Prehistory (New York: Oxford University Press, 2006)。
[7] Bruce G. Trigger, Gordon Childe: Revolutions in Archaeology (New York: Columbia University Press, 1980)，这是了解柴尔德的观点和著作的最佳资料。
[8] William Ruddiman, Plows, Plagues, and Petroleum: How
Humans Took Control of Climate (Princeton, NJ: Princeton
University Press, 2016).
[9] 黎凡特（Levant），一个并不精确的历史地名，大致相当于现代的东地中海地区，包括中东的托罗斯山脉以南、地中海东岸、阿拉伯沙漠以北和上美索不达米亚以西的一大片地区。——译者注
[10] 埃及古物学家詹姆士·亨利·布雷斯特德（James Henry Breasted）在一个世纪前的通俗读物中创造了“肥沃新月”一词。它所指的范围呈一个巨大的半圆形，朝南敞开，从地中海的东南角向北隆起，穿过叙利亚、土耳其部分地区以及伊朗高地，然后往南至波斯湾。布雷斯特德把这里比作一个“沙漠海湾”。“肥沃新月”纯属一个便于使用的标签，并无严格的定义，却经受住了时间的检验。
[11] Klaus Schmidt, G.bekli Tepe: A Stone Age Sanctuary in South-eastern Turkey (London: ArchaeNova, 2012).
[12] Andrew T. Moore et al., Village on the Euphrates (New York: Oxford University Press, 2000).
[13] 以任何标准来看，加泰土丘都是一个由国际发掘工作者和研究人员组成的团队实施的真正非凡的长期性考古项目。这方面的文献资料，正在迅速增加。对于一般读者来说，最好从下述文献资料开始：Ian Hodder, The Leopard’ s Tale (London and New York: Thames & Hudson, 2011)。从更专业的层面来看，同一作者编著的Religion in the Emergence of Civilization: .atalh.yük as a Case Study (Cambridge: Cambridge University Press, 2010)一书引人入胜，可以让您对非物质考古一探究竟。
第三章特大干旱（约公元前5500年至公元651年）
马尔杜克既是众神之王与人类之王，也是正义、健康、农耕和雷雨之主，掌管着美索不达米亚的底格里斯河与幼发拉底河两条大河之间的那个原始宇宙。至少，古老的传说中就是这样说的。他跨上自己的风暴战车，用洪水、闪电与狂风暴雨，在混沌当中确立了秩序。这位魅力非凡的神祇战胜了混沌之龙，改变了属于世界上第一批城市居民的苏美尔人那纷乱不安的精神世界与人性世界。马尔杜克掌管的这片土地，气候十分极端，夏季灼热异常，气温高达49℃，冬季则暴雨肆虐，气温寒冷刺骨。他统治的这个世界，喧嚣动荡、反复无常且总是变幻莫测。
他手下的诸神，在这片肥沃与暴力之地上建立了一座座相互争权夺利的城市。美索不达米亚地区的一个创世传说中曾称：“率土皆海，继而埃利都生焉。”数百年后，这个传说被人们刻到了一块泥板上。埃利都城位于幼发拉底河以西，在今天的伊拉克境内，是所有城镇中最古老的一座，属于“地狱之王”兼“智慧之神”恩奇（Enki）的居所。埃利都最早的神庙，建造时间可以追溯到公元前5500年左右；5个世纪之后，人们在一座宏伟的阶梯式金字形神塔（庙丘）下面发现了它，里面装饰着色彩鲜艳的砖块。另一座城市乌鲁克同样位于如今的伊拉克境内，也靠近幼发拉底河；公元前5000年之后，两个大型的农耕村落合并成一个定居地，原本发展迅速的这座城市就发展得更快了。[1] 乌鲁克是神话故事中的英雄吉尔伽美什的故乡。近2,000年之后，即到了公元前3500年，这里完全不只是一座大型的城镇了。其周围的卫星村庄，向四面八方延伸近 10 千米之远，每个村庄都有自己的灌溉系统。4个世纪之后，乌鲁克的面积达到了近200公顷，成了一座拥有5万至8万人口的城市。乌鲁克变成了一个重要的宗教与贸易中心，与一个更广阔的世界相连，其两侧就是幼发拉底河与底格里斯河这两大贸易线路。那里有一座雄伟的神庙，据说是吉尔伽美什本人供奉给爱神伊南娜（Inanna）的；而在神庙的所在之地，据说爱神伊南娜曾亲手种下了一棵采自幼发拉底河畔的柳树。按照《吉尔伽美什史诗》中的记载，乌鲁克有四个区域：城市本身、花园、砖坑，以及最大的神庙区。
女神伊什塔尔[2] 的金字形神塔与神庙区，位于一座拥挤不堪的大都市，市里街区密布，到处都是土砖建成的房屋。其中的大多数都属于关系紧密的同族社区，与城市腹地的村落或者专业工匠生活、工作的地区之间有着长久的联系。狭窄的街道将住宅分隔开来，但街道的宽度足以让驮畜通过。在风平浪静而寒冷的日子里，整座城市和繁忙的市场都笼罩在各家各户的火塘与作坊中冒出的一层烟雾当中。乌鲁克到处都是动物的叫声与人声：狗在吠，小贩在摊位上兜售商品，男人在吵吵闹闹，女人们走到一起购买粮食，远处的神庙围墙后则传来了吟唱圣歌的声音。各种气味混杂在一起，食物、牛粪、腐烂的垃圾与尿液的味道交织；但与美索不达米亚地区的其他所有城市一样，这里虽说位于一个有可能出现危险自然事件的环境里，却是一个生机勃勃的地方。
城市很快就变成了一种常态。[3] 到了公元前4千纪末，美索不达米亚南部有超过 80%的人口都生活在占地面积超过10 公顷的定居点里；那是一片动荡不安的土地，由竞争激烈的城邦统治着。它们构成了我们如今所称的苏美尔文明，以幼发拉底河与底格里斯河之间、如今的伊拉克南部为中心。苏美尔很难称得上是一个统一的国家，实际上不过是由城市与城邦拼凑而成的，而这些城市和城邦都依赖于印度洋夏季风带来的降雨，以及春季与夏初的河水泛滥。
随着城市发展起来，农业生产也急剧增长，足以养活成千上万的非农人口。这种农业生产，靠的是春夏两季沿着那两条大河顺流而下的洪水。大约公元前3000年之后，带来夏季降雨的印度洋季风强度开始有所减弱。雨水减少，并且来得较晚，去得却较早。土耳其的降雨量也下降了，而那里正是幼发拉底河与底格里斯河洪水的发源地。美索不达米亚的气候变得不那么稳定，还有造成严重破坏的漫长干旱周期，而对靠着经常突然改道的河流生存的小规模群落来说，影响尤其严重。
就算是有充沛的降水，这种情况对灌溉农业来说也是一大挑战。[4] 随着城市的发展，人们对谷物和其他主食的需求也大大增加了。数个世纪以来，农民都是沿着天然堤坝的后坡、沿着被洪水淹没的洼地边缘，耕作一片片狭窄的田地。他们还利用天然堤坝上的缺口以及由此冲积而成、排水状况较好的淤积土层，因为它们可以进行小规模的灌溉。不过，这种田地只能养活相对较小的定居地，其中大多是一些主要水道兼商路沿线的村庄。这就是在地方层级管理农业极其有效的原因。
城市里居住的人口很快达到了5,000人至5万人，在面对较为干旱的天气条件时，这里就不可避免地出现了精耕细作与人造的灌溉设施。那些将村落与村落、村落与城市连接起来且本已紧密的相互依存网络，则具有了更加重要的意义。气候变化与非农人口的日益增加，意味着以方方正正的平坦地块为基础的农耕方式，会被耕作成一块块更加标准化的长形地块的方式取代；虽说长形地块需要人们仔细照管，但农民会用牛拉犁耕地。公元前3千纪一位农民的年历上，给出了明确的灌溉指南：“唯麰满犁沟之窄底，当予顶部之种子以水。”[5] 当时并没有什么重要权威，不像后来19世纪西方国家的产业化农业发展起来的时候那样。相反，农学家都来自一些小部落，其中每个部落都有规模不同的灌溉设施，并且那些设施会在他们适应快速变动的环境过程中不断变化。要想在地方控制之下管理好这种经济而具多重意义的农业，人们必须对村落政治与竞争具有深入的了解才行；对于任何一个中央集权机构而言，这都是一项重大的挑战。
起初，这里并没有专制的国王和强有力的统治者来制定政策、分配水源或者修复沟渠。权力掌握在部落首领的手中，他们的权威依赖的是村民的忠诚、亲族关系，以及将农村社会、常常还有城市社会中每一个成员联系起来的各种互惠关系。这些社会现实和政治现实，导致城市与其外围社区之间出现了长期的紧张局势，导致了地方性的动荡和骚乱，并且在苏美尔文明终结之后依然持久存在。
随着城市人口急剧增长，需要更多粮食盈余带来的压力也越来越大。长条状田地以及它们之间密集的犁沟，需要一种超越家庭和亲族群体的组织水平。一种新的要素，即一种社会权威开始发挥作用了；这种社会权威也许是在神庙的基础之上形成的，负责监管着更大范围里的灌溉与农耕。我们很容易认为税收会随之出现，但实际发展起来的是一种徭役代税制，非但为灌溉设施提供了劳动力，而且为各种公共工程提供了劳动力。劳力获得的报酬，都是仔细配给的口粮，而这反过来又迫使农民去满足徭役的要求。从美索不达米亚北部到伊朗腹地，都出现过为劳工准备的、带有斜边的标准化口粮碗，就是这种劳役的证明。尽管像乌鲁克的伊什塔尔神庙这样的宗教场所变成了强大的经济、政治与社会力量，
但苏美尔人却生活在一个由城市与村落组成的二元世界里。村落生产粮食，城市则是制造中心、贸易中心和宗教活动中心。公元前3千纪里有一则谚语，说得恰到好处：“外围村落，乃中心城市之衣食父母。”还有一块泥板上则称：“民之惧者，实乃税吏。”
意大利学者马里奥·利韦拉尼曾经论述过改变了美索不达米亚诸农耕社区的重要一步。[6] 数个世纪以来，这些农耕社区一直生活在自给自足、维持温饱的水平上。不久之后，它们变成了马里奥所称的、刚刚形成的城市社会的“外圈”。农耕社区为粮食生产和城市开发项目提供劳动力，至于回报，就算有的话，除了服务于掌管附近那座城市的守护神所带来的满足感之外，也是寥寥无几。“外圈”生产的粮食和提供的劳力，养活了城内获得口粮的工匠、官吏和祭司。这种不平等的粮食生产和再分配方式，不可避免地造就了一座座以社会不平等与特权为基础的城市。内外之别很快导致了精英阶层与平民百姓之间的分裂，导致了一种被礼制和强调通过等级体系进行合作的“智慧文学”[7] 加以巩固的制度。有则谚语曾经鼓吹：“勿逐权贵，勿毁城墙。”[8]
苏美尔人与阿卡德人（约公元前3000年至约公元前2200年）
苏美尔人的意识形态作品当中提到了两条伟大的灌渠，即幼发拉底河与底格里斯河；它们都发源于美索不达米亚北部的山区，流向南部的城市。这些作品中，还描述过提着篮子、手持锄头的神灵与统治者，仿佛他们曾经躬耕过垄亩似的，从而为粮食供应与农业生产赋予了宗教意义。南部的一切都有赖于灌溉，这就意味着每个农民都清楚那片泛滥平原的细微特点，比如最肥沃的土地在哪里，洪水会经常冲垮哪些地方的天然堤坝。根据后来的铭文资料来推断，对于可能出现灾难性洪水和低水位年景即将到来的种种征兆，当时最出色的农民都已熟知。
美索不达米亚地区的农业耕作从来就不是一件容易的事情，即便是在降水较为丰沛的那几个世纪里，也是如此。至少在最初的时候，那里不可能有永久性的灌渠，因为河流经常在毫无征兆的情况下改道。河流改道是一种始终存在的风险，但天然堤坝的意外决口也带来了机会，让人们可以把河水引到有可能肥沃的土地上去。
随着公元前 3000 年之后气候变得更加干旱，城市人口不断增加，农业耕作也变得更加艰难了。过去那些不规范和不稳定的村落灌溉系统，被较为规范的灌溉方法所取代；然而，后者仍然是以社区为基础。考虑到城市依赖于村庄的粮食盈余，所以人们也别无他法。苏美尔社会由世俗君主所统治，他们被称为“恩西”（ensi）或者“卢伽尔”（lugal），掌管着农业、战争、贸易和外交。[9] 此时，随着政治权力逐渐落入少数人的手中，由政治联盟和数个世纪中将各个社群联系起来的个人或亲族义务所组成的那个不断变化、错综复杂的网络，就开始在更大范围内发挥作用。由于河流系统不断变化，而且定居地集中于主要灌溉区，外交与政治问题的重要性便凸显出来。在这里，一个据有战略位置的统治者可能切断邻邦的水源，并将邻国之人饿死。像拉格什、乌玛、乌尔和乌鲁克这样的城市之间，都曾为了水源与农田而爆发过激烈的争斗。公元前2500年的人所说的话，听起来与如今一样刺耳：“汝等当悉知，汝城将尽毁！速降！”[10] 一些零碎的史料记载了当时因水源与农田控制权而产生的纷争，其中经常提到“高举恩利勒[11]之战网”，因为当时的战争一向是以众神的名义发动的。两条大河形成宽广的环状，在大地上蜿蜒逶迤，而溃堤之后偶尔还会改道，故是导致城市之间爆发冲突与战争的一种严峻考验。到了公元前2700年，许多城市都建起了城墙，比如拉格什与乌尔，后者就是《圣经》当中提到的迦勒底的吾珥。经济繁荣与萧条、人口增长与减少周而复始，再加上土壤中的盐度上升（这在一定程度上是因为休耕期较短），这些方面都导致了作物减产；比如在乌尔，作物产量就比早期减少了一半。
日益加剧的干旱与获得更多粮食盈余的需求，使得全年耕种成了一种必不可少的惯例。像乌尔与乌鲁克之类的城市都形成了有组织的贸易联系网络，沿着两条大河延伸到了遥远的土耳其，并且产生了重大的政治与文化影响，从而形成了研究美索不达米亚的专家吉列尔莫·阿尔加兹（Guillermo Algaze）所称的“乌鲁克世界体系”（Uruk World System）。苏美尔的领主们曾与诸多城市展开过竞争，远至西北部的叙利亚。他们曾袭击贸易线路，吞并邻邦，但这些征伐行动都为时不久，因为内讧与国内的小对手会乘虚而入。有些统治者，必然会萌生获取更多领土的野心。公元前2334年，巴比伦南部的阿卡德国王萨尔贡打败了由乌尔的卢伽尔扎吉西国王（King Lugalzagesi）领导的苏美尔城邦联盟。[12] 萨尔贡由此建立了这里第一个为世人所知的帝国，疆域覆盖了美索不达米亚全境及其以西、以东、以南的遥远土地。不过，他这个疆域远拓、控制松散的帝国与以前那些面积较小且变化无常的国家相比，在严重干旱面前要脆弱得多。最后，帝国的农业生产几乎全都靠地方官吏和社群领导人去管理了。
萨尔贡及其后继者建立的帝国，依赖于忠诚的官吏、慷慨赏赐，以及成千上万平民百姓与战俘的苦工；因为与工业化之前的所有文明一样，阿卡德人依靠的也是原始的人类劳动。日益复杂的上层建筑，要求帝国精心分配口粮，因为帝国不但要供养没有技术的劳力，而且要供养高级官吏、在城市和宫殿里工作的熟练工匠，以及用于征伐的所有军队。阿卡德人几乎所有的军事行动，以及随后对新获领土的开发，全都依赖于南北各地业已臣服的城市与村落，由它们提供大量的粮食盈余。阿卡德统治者的权力也依赖于这个网络，同时生态系统中有两个要素也尤为重要，即北部的充沛降水与滋养着南部一片片沃土的河水泛滥。
从仅存的楔形文字史料中我们得知，阿卡德的官吏曾经仔细监测过洪水的水位，因为他们极其关注作物的产量与配给。然而没有迹象表明，他们对容易为旷日持久的干旱所影响这一点怀有过什么长久的担忧之情。阿卡德帝国的活动在公元前 2230 年左右达到了巅峰，但持续的时间却不到 100年，因为当时的雨水毫无预兆地开始不足了。雨水减少到了正常情况下的30%至50%。一场特大干旱，接踵而至。这场干旱，持续了300年之久。[13]
可怕的干旱（约公元前2200年至公元前1900 年）
公元前2200年前后到公元前1900年的那场大旱，通常被称为“4.2 ka事件”，属于一桩全球性的气候事件。这种史无前例的干旱循环影响了从美洲到亚洲、从中东地区到热带非洲和欧洲的人类社会。[14]
为什么会出现这场特大干旱呢？[15] 我们不能确定。太阳辐照度的变化与周期性的火山作用，是过去1,000年间气温变化的主要原因。尽管更早时期的情况可能也是这样，但北大西洋涛动此时已是一种主要的气候驱动因素（如今依然如此）。在整个欧洲和地中海地区，每年12月到次年3月间的气温与降水变化中，高达 60%的变化都是由这座位于亚速尔群岛上空的副热带高压和副极地低压之间的巨型气候“跷跷板”造成的。由于北大西洋涛动调节着从大西洋进入地中海的热量与水分，故大西洋与地中海的海面温度曾对中东地区的气候产生过影响，如今也仍是如此。所以，说北大西洋涛动这座“跷跷板”推动了那场大旱的发生，似乎是没有问题的。
那场特大干旱的情况，从冰岛和格陵兰岛的湖泊沉积物以及欧洲的树木年轮中，就可以看到。源自土耳其与伊朗等遥远之地一些洞穴的高分辨率洞穴沉积物序列，也记录了这桩气候事件。同样，印度季风强度减弱之后那300年的情况，在东非与印度河流域的古气候序列中也有所体现。当时，尼罗河的洪水与印度河沿岸的降水情况突然出现了变化，而撒哈拉与西非地区也是如此。变化无常的东亚季风，也对中国东部一些历史悠久的农耕社群造成了压力。
这场特大干旱的影响，逐渐波及了各个王国、蓬勃发展的文明和乡村地区。我们在第四章中将看到，这场特大干旱发生的时间，与埃及古王国的终结和法老们的领地暂时的分裂相吻合。干旱的影响一路延伸，远至中国西藏，并且进入了美洲；在美洲，旱情与其西南部和中美洲的尤卡坦半岛引入玉米种植的时间相一致。这场干旱，也成了南美洲安第斯地区一些重要群落兴衰过程中的一个因素。
至于中东地区，人们认为当时死海的水面下降了 45 米左右。从采自阿曼湾的一段海洋岩芯中，我们也可以看到这场大旱的迹象，而从印度东北部的莫姆鲁洞穴（Mawmluh Cave）获得的洞穴沉积物序列，则将尼罗河水量的减少与东非地区的湖泊水位下降、印度季风的转向关联了起来。可以想见，这场干旱对不同地区的影响有着巨大的差异。在亚洲西部和美索不达米亚北部，重要的旱作农业区面积突然减少了 30%至 50%。地中海东部、伊拉克北部和叙利亚东北部的哈布尔平原的大部分地区，都遭遇了灾难性的旱情。
运气不佳的美索不达米亚人应对干旱的方式，也大不相同。在北部哈布尔平原之类的旱作区，一些重要的中心被人们彻底遗弃，比如布拉克土丘（Tell Brak）与雷兰土丘（Tell Leilan）。 [16] 这种疏散，在两座城市里都对2万人产生了影响；随后，一些重大建筑项目也停工了。耶鲁大学的考古学家哈维·韦斯曾在雷兰土丘发掘出了一座大型的粮食储存与分配中心；公元前2230年左右，那里突然就被废弃了。中心外面用石头铺就的街道对面，矗立着一些已经部分建成了的房屋，说明人们当时放弃了城市建设。这里和其他地方，都曾明确做出废弃一些重要建筑物的行政决定。公元前2200年过后，哈布尔平原上已无人生活，直到250年后降水情况好转才有所改变。从土耳其境内的幼发拉底河上游流域到黎凡特南部，从事旱作的农民都弃主要城市和其他社区而去。
许多从事旱作的农民适应干旱的办法，就是一路沿着（通常称为“追踪”）水源较为充足的栖息地南下，前往一些有泉水滋养农田的地方。不过，地中海地区一些重要的沿海城市，例如比布鲁斯和乌加里特，没有这样的水源供应，故人口曾大幅减少。与此同时，南方的耶利哥却受惠于一口天然泉眼，大批羊群都有水可饮。幼发拉底河的水量虽然大减，但仍让美索不达米亚中部与南部地区能够进行某种程度的灌溉。然而，日益干旱却令畜牧业繁荣发展起来了。游牧业变得广受欢迎，成了古时人们在哈布尔平原与幼发拉底河之间进行的季节性放牧迁徙中断所引发的一种生存机制。哈布尔平原上的旱情，迫使统称为亚摩利人的游牧民族迁往附近的大草原和幼发拉底河沿岸，并且南下进入了有人定居的地区。由于他们的畜群侵占了定居者的农田，所以那里爆发了持续的动荡。由此带来的威胁极其严重，故公元前2200年左右，乌尔的统治者还修建了一道长达180千米的城墙，称之为“亚摩利亚人的驱逐者”，以遏阻这些不速之客。不过，此人的努力却是徒劳无功。[17] 在城中的官吏一直拼命地率人清理灌渠、发放少得可怜的口粮那个时期，乌尔腹地的人口却增长了两倍。刻有楔形文字的泥板告诉我们，乌尔的农业经济最终瘫痪了。
但在南方，人们却把气候变化的责任归咎于神灵，并且用诗歌或者“城市挽歌”表达了出来。《苏美尔与乌里姆之挽歌》（“The Lament for Sumer and Urim”），就是最早用神灵的行为来解释气候变化的书面史料之一。从中我们得知，恩利勒、恩奇和其他神灵曾经决定毁掉一座城市。“风雨集焉，若洪水之袭……竟至栏中之牛不得站立，圈中之羊不得繁衍；河中之水皆咸。”[18] 他们还曾下令让底格里斯河与幼发拉底河沿岸长满“邪恶之杂草”，并将城市变成“废墟”。庄稼无法种植，乡村将会干涸；“底格里斯河与幼发拉底河之水，恩利勒壅塞之”。
新亚述人（公元前883年至公元前610年）
随着庄稼死于“茎上”，尸骸浮于幼发拉底河中，整个美索不达米亚地区的城市尽数被毁。食物匮乏，河渠淤塞。随之而来的，就是长达数个世纪的动荡不安，政治争斗与相互对抗此起彼伏，直到公元前9世纪；其时，在美索不达米亚地区占统治地位的亚述帝国的统治者亚述纳西拔二世（前883—前859年在位），在一个比较富足的时代开始了无情的扩张征伐。在一个完全凭借武力建立起来的帝国里，任何一丝反抗的迹象都会招来严厉的惩罚。他任命忠心耿耿的总督控制被征服的领土，严令被征服领地进贡贵金属、原材料与粮食之类的商品。向西征伐到远至地中海边之后，他在降水增加的一个时期（这一点，我们是通过伊朗北部的一段洞穴沉积物得知的）班师回朝，然后利用战俘，在幼发拉底河上的卡尔胡（即尼姆鲁德）建造了一座宏伟华丽的宫殿。接着，在大约公元前879年，他还举办了一场为期十天的盛宴，庆祝宫殿完工。
那确实是一件盛事。[19] 亚述纳西拔二世曾吹嘘说，有69,574 位宾客参加了那场宴会，其中卡尔胡本地就有16,000人。他们享用了成千上万头羊、牛，还有鹿、禽、鱼、各种各样的谷物，喝了1万罐啤酒和满满1万囊葡萄酒。国王打发他们回家时，这些人个个都酒足饭饱，在一派“和平喜乐”的气氛中沐浴更衣、涂抹油脂。亚述纳西拔二世的宾朋享用盛宴之时，还欣赏了墙壁上装饰着色彩鲜艳的楔形文字的浅浮雕。其中，有22行楔形文字列举了这位国王的资历，还有9 行则铭记了他取得的胜利。他是恩利勒与尼努尔塔[20] 两位神灵的“天选之子”，是“伟大之王、强大之王、宇宙之王……战无所惧……一切敌人，皆踏于脚下”。无休无止的宣传，大肆宣扬了这位国王对通过残暴征服建立起来的亚述帝国的统治权；有无数的男女老少，都曾丧命于他的手中。然而，仅仅270年之后，嗜酒如命、喜欢割耳的亚述纳西拔二世曾经统治的那个帝国，就轰然崩溃了。
考古学家所称的新亚述帝国，是当时疆域最广、势力最强的帝国，公元前912年前后正全速发展着，后来亚述纳西拔二世还举办了那场盛大的庆祝活动。不过，帝国在公元前8 世纪中期变得更加强大了；当时，帝国由令人畏惧的提格拉·帕拉萨三世（Tiglath Pileser Ⅲ）统治着，他曾进行了美索不达米亚地区最大的一次扩张。他的名字无处不在，从古代也门人的铭文到《旧约》中那些充满敌意的往事——尤其是对他入侵以色列、攻取加利利和不公平的苛捐杂税的记述中，到处都能看到。既然有这样一些无所不能的国王，那么，公元前610年新亚述王国为什么突然就土崩瓦解了呢？
是不是一系列血腥的内战与叛乱，动摇了统治者的权威？还是说，残酷的战争与军事失利，削弱了一个过度扩张的帝国的基础？无疑，这两个方面都在其中扮演了重要的角色。亚述的统治与早期那些君主制国家的统治一样，向来都很脆弱，永远都变化无常，完全不像埃及历代法老那样，有精心形成的先例可循。然而，我们如今已经明白，还有一个大家都很熟悉的因素，也参与了帝国的崩溃过程，那就是气候变化。来自伊朗北部的库纳巴洞穴（Kuna Ba Cave）里一份分辨率高、断代精确的气候变化洞穴沉积物记录，就说明了问题。[21] 这些洞穴沉积物表明，新亚述帝国是在气候异常湿润的两个世纪里崛起的。对于成千上万的农民来说，充沛的降水就是上天的恩赐；他们不但要为城市提供粮食，也要为四处征伐、靠国家精心分配的口粮维持生计的军队提供粮食。此后，公元前7世纪早期到中期出现了一系列特大干旱，且每次干旱都持续了数十年之久；这种情况，似乎导致亚述帝国的农业生产力开始下滑，继而又导致了帝国在政治和经济上的最终崩溃。最后，整个新亚述帝国终于在艰苦的征战中土崩瓦解，只留下了一个早已为干旱所削弱的民族。
景观变迁
随着城市与长途贸易网络的发展，人们对各种原材料，尤其是木材与金属矿石的需求也日益增加了。除了用于各种建筑的木梁与其他木材，人们对陶土器皿以及金属工具和装饰品永无餍足的需求，也导致了社会对烧窑所用的薪炭存在持久的需求。木柴也始终供不应求，需要用驮畜运送，大捆大捆地输入。在家庭中和生产时都毫无节制地使用木柴，势必产生过浓密的烟雾，在风平浪静的日子里笼罩于不断发展的城市上空。严重的空气污染，必定困扰过那些人口稠密的城市，但砍伐森林造成的破坏，更是带来了严重和长期的后果。
虽然中东地区的植被历史如今仍然鲜为人知，但以近乎工业化的规模消费木材带来的影响，让大部分地区变了模样。例如，花粉图谱表明，安纳托利亚的中部曾经是开阔的橡树林地，但到了公元前5000年左右至公元前3000年，那里的林木覆盖率却迅速下降，情况就像现代的伊朗与叙利亚一样。卡曼-卡莱土丘（Kaman-Kalehöyük）位于安卡拉东南100千米处，在公元前2千纪和公元前1千纪是一个重要的定居地，直到公元前300年左右；那里也是一个重要的农业中心，还有一定规模的纺织业和陶器制造业。人类在此居住的时间，与公元前1250年前后至公元前1050年间一场严重的旱灾相吻合；而当时实力强大的赫梯帝国，就是在这一时期四分五裂的。对木炭进行的一项研究表明，生活在这里的赫梯帝国居民曾经大肆集中采伐周围的林地，以至于伐木工不再像过去那样采伐成熟的橡树林，而是采伐其他物种较少的森林。[22]
宏大工程的瓦解（公元224年至651年）
特大干旱过后，原先的那种季节性降水恢复了，故美索不达米亚文明再次蓬勃地发展起来了。人们重新开始在哈布尔平原和亚述繁衍生息。雷兰土丘又一次繁荣起来。早期被削弱的意识形态与制度存续下来，成了那些在早期城邦的基础上崛起的伟大王国的发展蓝图。新兴的帝国，都把灌溉农业变成了一桩国家大事。不过，农业之本仍然掌握在地方酋长和乡村农民的手中；他们管理着水源与庄稼，就像数个世纪以来一样，只是其间的各种动荡与长期争斗，已经削弱了苏美尔、阿卡德与亚述的统治。那些在美索不达米亚地区耕作的人极具自力更生的精神，对此时已经被人类活动彻底改变的自然环境不抱任何幻想。他们完全清楚，除了干旱，当地还面临着许多困难，比如灌渠长期淤塞和土壤中的盐碱度在不断增加。不过，此时的农业生产仍然很稳定，足以养活古代世界中最大的帝国，即阿契美尼德王朝的波斯帝国（前550—前330）；阿契美尼德波斯人生活在相对和平的环境下，并以建筑杰作而闻名，比如波斯波利斯城。
时光荏苒，很快就到了公元224年；此时，萨珊人建立了波斯信奉伊斯兰教之前的最后一个帝国，然后繁荣发展了4 个世纪之久。[23] 他们控制了高加索山脉南部与阿拉伯半岛部分地区之间的广袤土地。帝国中央政府采取的是以前亚述人运用时发挥过有利作用的严苛政策，但实施的范围要广得多。当局对灌溉系统进行了大力投入；与之相比，早期人们在水源管理方面的努力可谓小巫见大巫了。[24] 就像亚述人一样，萨珊人也把被他们驱逐的人口重新安置在一些似乎有发展潜力的地区。他们兴建新的城镇，开始大规模地人工开掘灌溉设施来养活这些人。其中有一项灌溉工程建成于6世纪，它利用了两条河流，将230多千米以外的水引入了底格里斯河。这一工程灌溉了巴格达东北部约8,000平方千米的农田，但同时也将水源引到了排水不畅的土地上。萨珊帝国没落很久之后，密集的土地利用导致这里出现了严重的盐碱化，大面积的土地都无法再进行耕作。到了公元1500年，这项灌溉工程就被人们废弃了。
在整个6世纪，萨珊人于底格里斯河与幼发拉底河之间开拓了面积约 12,000 平方千米且至少进行过零星灌溉的土地。这就说明，他们的耕作面积起码达到了早期的两倍。考虑到底格里斯河的水流湍急多变，故利用此河进行灌溉，是一种风险极大的勇敢之举。灌渠与农田纵横交织，遍布广大地区，远远超过了乡农或者小小城邦所能掌控的程度。但新建灌溉设施的巨大规模也意味着，一旦上游发生决堤，生活地点离水源有一定距离的农民就会陷入极大的麻烦之中。这是一种由中央政府进行规划、规模史无前例的标准化灌溉，其动力是潜在的税收而非收成，目的则是为中央政府在粮食与土地税两个方面带来最大的财政收入，而不是满足地方的需求。大多数灌渠都是成千上万的战俘修建起来的，这种修建工程也带有将被征服的百姓重新安置的目的。萨珊人抛弃了那些需要考虑当地条件、规模也较小的灌溉设施。他们创造了种种以人工为主的灌溉制度，起初也让各地生产出了充足的粮食。但是，随着设计不佳的灌渠逐渐淤塞，他们就陷入麻烦了。每一项复杂的灌溉方案、每一种来自外部的新需求，都降低了乡村百姓——那些在地里劳作的人——的自给能力。萨珊王朝那些干劲十足的工程人员都只盯着短期利益，却忽视了早期农民极其关注的、最重要的排水不畅问题。起初，丰厚的回报确实带来了繁荣与更多的财政收入。但是，日益增加的维护成本很快就让这些工程人员不堪重负起来。他们新建的堤坝破坏了原有的排水模式，抬高了地下水位，造成了农业用地的慢性盐碱化。不久之后，他们就必须以生态环境日益脆弱为代价，才能让粮食在短期内增产了。生产力急剧下降，一些边缘地区尤其如此。面对干旱、大洪水和其他一些气候变化，种种灌溉方案都丧失了灵活性。随着经济和政治衰弱导致农业人口日益贫困和集中管理的灌溉系统土崩瓦解，以农业与水源管理为中心的官僚制度也逐渐式微。公元632年至651年间，面对不断扩张的伊斯兰教，萨珊帝国解体了。到11世纪时，两河之间的土地已是一片废弃的、到处都是盐碱地的荒野了。
亚述人、阿卡德人和苏美尔人经历了一个时代的开端；当时，农村人口和城市人口都开始更易受到突如其来、常为短期性的气候变化的影响。像萨珊帝国那样的中央集权制政府与专制统治，并没有解决人口密度不断增长和水源供应（无论是洪水还是降雨）不稳定的问题。早在苏美尔时代，人们就很清楚：最好的解决办法是在地方层面，因为地方的社群领导人可以单独采取规模较小的措施来战胜饥荒。他们熟悉这片土地，熟悉变幻莫测的洪水，也熟悉手下百姓的性情与专长。等到城市与乡村之间的复杂关系从相互依存演变成了城市占据统治地位，数个世纪的动荡经历再加上农民的自力更生精神，就使得任何一种应对严重干旱或者其他气候变化的长期性措施几乎都不可能实施了。无疑，有些早已被人们遗忘的美索不达米亚领导人，曾在他们辖地（无论是城市还是省份）的狭窄范围内成功应对过严重干旱带来的挑战，只是如今并无记载他们那些举措的史料留存下来。
美索不达米亚位于两条大河之间，但一马平川的地形地貌则意味着，这里的边境地区容易被渗透，而人们在土地上建造的基础设施常常也很不牢靠。人口的持续流动、松散的控制、朝秦暮楚式的效忠，再加上官吏任免与皇室野心的不断变化，都与尼罗河沿岸历代法老治下的情况形成了鲜明的对比。所以，适应气候变化方面一个历久弥坚的教训就是：征服与开发并非解决之道；就算亚述纳西拔国王与提格拉·帕拉萨三世曾经以为它们可以解决气候变化的问题，也是如此。美索不达米亚地区的这一历史经验，在当今世界产生了强烈的共鸣。在解决办法属于地方性的，而非由遥远的官僚机构或大型的工业企业所强加时，适应不断变化的环境（其中也包括气候变化）的措施往往最为有效。
[1] Nicola Crusemann et al., eds., Uruk: First City of the Ancient World (Los Angeles: J. Paul Getty Museum, 2019).
[2] 伊什塔尔（Ishtar），前文中爱神伊南娜在古巴比伦神话中的名称。——译者注
[3] Monica Smith, Cities: The First 6,000 Years (New York, Penguin, 2019).
[4] T. J. Wilkinson, Archaeological Landscapes of the Near East (Tucson: University of Arizona Press, 2003).
[5] Samuel Kramer, The Sumerians (Chicago: University of Chicago Press, 1963), 240.
[6] Mario Liverani, The Ancient Near East: History, Society and Economy (Abingdon, UK: Routledge, 2014).
[7] 智慧文学（wisdom literature），指公元前6世纪以色列人被掳流亡以后到公元纪元（即基督纪元）前后希伯来文学中出现的一种独特文体，主要以自下而上地探讨人生与伦理为主题，是《圣经》中的重要组成部分，亦称“智慧书”。——译者注
[8] Kramer, The Sumerians, 190.
[9] William H. Stiebing and Susan L. Helft, Ancient Near
Eastern History and Culture, 3rd ed. (Abingdon, UK: Routledge,
2017). See also Benjamin Foster, The Age of Agade: Inventing
Empire in Ancient Mesopotamia (Abingdon, UK: Routledge, 2016).
[10] J. S. Cooper, “Reconstructing History from Ancient
Inscriptions: The Lagash-Umma Border Conflict,” Sources and
Monographs on the Ancient Near East 2, no. 1 (1983): 47–54.
[11] 恩利勒（Enlil），苏美尔神话中的大地和空气之神，尼普尔城邦的保护神，还可能拥有战神和风神的神格。——译者注
[12] Marc Van De Mieroop, A History of the Ancient Near East ca. 3000–323 BC, 2nd ed. (New York: Blackwell, 2006). See also Foster, The Age of Agade.
[13] 这一段在很大程度上参考了哈维·韦斯对气候变化与阿卡德王国崩溃进行的出色论述，事实上整章都是如此。参见Harvey Weiss,“4.2 ka BP Megadrought and the Akkadian Collapse,” in Megadrought and Collapse: From Early Agriculture to Angkor, ed. Harvey Weiss (New York: Oxford University Press, 2017),93–159。关于干旱及其成因的文献资料也越来越多。参见Heidi M. Cullen et al., “Impact of the North Atlantic Oscillation on Middle Eastern Climate and Streamflow,” Climatic Change 55(2002): 315–338。亦请参见Martin H. Visbeck et al., “The North Atlantic Oscillation: Past, Present, and Future,”Proceedings of the National Academy of Sciences 98, no. 23(2001): 12876–12877。
[14] Weiss, “4.2 ka BP Megadrought and the Akkadian Collapse,” 135–159，这篇文章列举了古气候学替代指标的遗址并附上了参考资料，因而价值非凡。
[15] M. Charles, H. Pessin, and M. M. Hald, “Tolerating Change at Late Chalcolithic Tell Brak: Responses of an Early Urban Society to an Uncertain Climate,” Environmental
Archaeology 15, no. 2 (2010): 183–198.
[16] Charles, Pessin, and Hald, “Tolerating Change at Late Chalcolithic Tell Brak,” 183–198.
[17] W. Sallaberger, “Die Amurriter-Mauer in Mesopotamien: der .lteste historische Grenzwall gegen Nomaden vor 4000 Jahren,” in Mauern als Grenzen, ed. A. Nunn (Mainz: Phillipp von Zabern, 2009), 27–38.
[18] J. A. Black et al., The Literature of Ancient Sumer (New York: Oxford University Press, 2004), 128–131.
[19] 卡尔胡的一处王室碑文上描绘了这场盛宴的情形。Van De Mieroop, A History of the Ancient Near East, 234.
[20] 尼努尔塔（Ninurta），美索不达米亚神话中的战争与农业灌溉之神。——译者注
[21] Kuna Ba: Ashish Sinha et al., “Role of Climate in the Rise and Fall of the Neo-Assyrian Empire,” Science Advances 5, no. 11 (2019). doi: 10.1126/sciadv.aax6656.
[22] Nathan J. Wright et al., “Woodland Modification in Bronze and Iron Age Central Anatolia: An Anthracological Signature for the Hittite State?” Journal of Archaeological Science 55 (2015): 219–230.
[23] Touraj Daryaee, Sasanian Persia: The Rise and Fall of an Empire. Rpt. ed. (New York: I. B. Tauris, 2013). See also Eberhard Sauer, ed., Sasanian Persia: Between Rome and the Steppes of Eurasia (Edinburgh: Edinburgh University Press, 2019).
[24] Fagan, Cro-Magnon, 146–152.
第四章尼罗河与印度河（公元前3100年至约公元前1700年）
希腊历史学家希罗多德曾经在公元前5世纪撰文，描述了古埃及的农民：“彼等集稼穑，易于世间之他族……大河汤汤，自涨而灌溉其田，俟水再退，彼等则播于其地，遣豕踏之，令种入壤。”[1] 每年夏季，埃塞俄比亚高原上的季风暴雨都会让远在上游的青尼罗河与阿特巴拉河水位大涨。泥沙俱下的洪水向北奔腾，并在7月至9月的大约6个星期里达到最大。每一年里，“阿赫特”（即洪水）都会漫过那个沿着斜坡逐渐远离主河道的泛滥平原。一到此时，人们都会满怀期待。一段金字塔铭文中称：“既睹尼罗河之泛滥，彼等皆喜之而栗。田地开颜，河岸溢水。神赐既降，民色尽欢，神心亦悦。”[2]
尽管希罗多德与古埃及的书吏确实描绘了一幅田园牧歌般的图景，可这却是一幅具有误导作用、实际上只有神话中才存在的景象。真实情况是，古埃及的村民曾无休无止地劳作，利用堤坝与沟渠将洪水引到他们耕作的田地里去；而这些堤坝与沟渠，在凶猛的洪水面前还有可能瞬间化为乌有。古埃及农民，都是在尼罗河的摆布之下生活，并且受制于遥远的海洋与大气之间驱动着印度洋季风的相互作用。
尽管如此，他们却好像生活在一个永恒的世界中；那里的太阳，日复一日地划过万里无云的苍穹。水、大地与太阳，就是古埃及文明中亘古不变的三大真理。[3] 阿图姆神（Atum）号称“完整者”，是这里的造物主。他诞生于努恩神（Nun）即原始水与混沌之神，然后将一处土丘抬升到了水面之上。不过，太阳神拉（Ra）才是力量的最高体现；他在日出时必定现身，然后穿越诸天，有如生命不息，滚滚向前。古埃及人的信仰与思想意识，都依赖于虔诚并统治着一个和谐国度的法老们稳定而贤明地施政。埃及诸王都以荷鲁斯（Horus）的名义实行统治，荷鲁斯象征着神圣的力量与天空，象征着良好的秩序。他们的敌人，就是塞特神（Seth）这个长鼻子怪物，是混乱与无序之本。他给和谐的尼罗河世界带来了暴风雨、干旱和心怀敌意的异乡人。荷鲁斯与塞特之间的冲突，象征着秩序与和谐、无序与混乱这两组相对的力量。果断、有力而带有个人魅力的统治者，则象征着上埃及与下埃及“两界”的统一。古埃及历经数个世纪，才实现国家统一；只不过，人们总是（错误地）将这种统一描述成一种和谐之举，描述成秩序对混乱的一种胜利。
古埃及是一个连贯的文明社会，紧靠着土地肥沃的洪泛平原，与此地之人一直认为动荡不安的外部世界不相往来。历代法老都是按照惯例实施统治，被人们当成“玛特”（ma’at）的化身；“玛特”的意思近似于现代的“秩序”或者“公正”，一位兼具智慧与和谐、掌管着四季与律法的同名女神便体现了这两种品质。“玛特”的意思，与代表无序力量的“伊斯菲特”（isfet）正好相对。古埃及的半神统治者都是用自己的旨令进行统治，并未遵循什么成文律法或者圣典。一个庞大的世袭官僚机构为他们有效地统治着整个国家，而这种官僚机构通常由大小官吏组成，属于一个个名副其实的王朝。大多数时候，这个国家都算得上国泰民安。这是一种非凡的文明，在“玛特”及其独特的尼罗河环境的支撑下，以各种形式存续了3,000多年。
开端（约公元前6000年至公元前3100年）
公元前 6000 年前后，美索不达米亚南部地区开始了农业耕作，而“多格兰”也沉入了北海水下，此时尼罗河流经的，是一个植被苍翠繁茂、被沙漠包围着的河谷。尼罗河以西的降水很没有规律，却还是足以维持撒哈拉地区一个个绵延起伏、由干旱草地组成的平原。当时，只有数千人生活在这个河谷里，有猎人、觅食者和渔民，他们可能还种植过一些谷类作物。他们偶尔与来自沙漠之上的游牧民进行交易，而后者之所以前来，就是为了交易物品，或者让他们放牧的畜群吃草和喝水。牧民的头领属于一些经验丰富、祭祀本领超群的人，他们显然都是专业的祈雨祭师。很有可能，就是这种本领让他们在干旱地区获得了异乎寻常的威望。
公元前5000年之后，由于雨水变得更加没有规律，那些游牧民族便逐渐东迁，来到了尼罗河流域的洪泛平原上。随着撒哈拉地区变得越来越干旱，他们便在尼罗河畔永久定居下来，同时带来了“头领都是强壮的男性与牧人”这样的新观念，或许还带来了一些祭祀仪式，导致后来人们开始崇拜生育女神哈托尔（Hathor）。古埃及文明深深植根于早期的村落文化，后者则依赖于谨慎细致的水源管理与繁重的灌溉农业劳作。在一个几乎不存在降雨的世界里，可能是从村落头领那里继承下来的一种权威式领导传统，已经深深地扎根于古埃及人的心灵之中。这里的一切，全都依赖于赋予生命的洪水和一位牧人坚定自信的领导。
尼罗河还流经了一些环境严酷的沙漠。从空中鸟瞰，此河就像一根绿色的斜线，宛如箭矢一般，直指北方的地中海。古埃及人把这里的洪泛平原称为“库姆特”（kmt），意思就是“黑土地”，因其肥沃的黑土与沙漠上的“红土地”形成了鲜明的对比。每一年里，假如众神庇佑，尼罗河就会裹挟着淤泥，从两条支流即白尼罗河与青尼罗河奔腾而去，直到遥远的下游；这两条支流源自东部非洲和埃塞俄比亚高原，然后在如今苏丹境内的喀土穆汇合，从而形成了尼罗河。在春、夏两季，待尼罗河的洪水漫上泛滥平原之时，“阿赫特”即洪水季就开始了。退去的洪水为农民滋养了肥沃的土地，他们精心开掘灌渠并进行维护，在洪泛平原上种植庄稼。这里的情况与美索不达米亚不同，“阿赫特”既给整个洪泛平原的土地带来了肥力，也没有导致土地盐碱化之虞。虽说农业耕作是一项极其艰苦的事情，但以美索不达米亚地区的标准来看，这里的农耕却相对容易，并不需要休耕或者给田地施肥。这里的农民，只需通过他们为阻挡洪水而修建的沟渠与水库，对上涨的河水加以导引就行了。
尼罗河流域可能既是进行村落农耕的理想之地，也是一个生产大量粮食盈余且具有预见性的完美环境。希腊历史学家希罗多德曾将“阿赫特”描绘成一种一年一度、似乎很有规律的事件。这种关于洪水很可靠的神话，曾经广为流传，直至今天；可实际上呢，尼罗河却是一条反复无常的河流。雨水若是异常丰沛，就意味着这里有可能出现灾难性的洪水，将人们眼前的一切全都淹没，将庄稼与整座整座村庄冲走。“阿赫特”的强度若是很弱，就只能灌溉冲积平原上的小部分地区。有的时候，洪水几乎是立即退去，导致庄稼歉收，饥荒也就随之而来。在大多数年份，这里的水源都很充足，可以种植充足的庄稼，而农民也可以毫无困难地度过短期的干旱。不过，假如出现持续几年、几十年甚至是几个世纪的干旱周期，就是另一回事了。
无所不能的法老（公元前3100年至公元前2180年）
生活无常，变幻莫测，故秩序与团结就极为必要。数百年来，古埃及境内各诸侯王国都争来斗去；（可能）直到公元前3100年，一位名叫荷尔-阿哈（Hor-Aha）的统治者将上埃及与下埃及“两界”统一起来，埃及才变成一个国家。荷尔-阿哈及其继任者对埃及的统治持续到了公元前2118年；当时，平民百姓的福祉都系于他们的最高统治者即一个世俗君主的身上，而世俗君主的统治则代表着秩序战胜了混乱。在将近 8 个世纪的时间里，这个世俗国家都发展得相当平稳。
古埃及这个文明社会的基础，并不是稠密的城市人口，而是通过水上交通相连的城镇与村落。此种基础结构，将这个狭长的国家联系起来，而不存在牲畜驮运谷物时只能走50千米的运输限制。但法老们很幸运，因为不断逼近尼罗河流域的沙漠是天然的防御工事；这些沙漠和浅滩密布的三角洲，让外敌几乎不可能入侵。这一点，与美索不达米亚与两河流域的边境可以渗透且不断变化的情况形成了鲜明的对比；后者的历史，就是不同国王及其文化群落在争斗中此兴彼衰、有时还会再次崛起的过程。与此同时，埃及的天然孤立状态，让法老们能够紧紧掌控手下的臣民。这里的人口虽有组织，却分散各地；人口普查以及对粮食、牲畜和其他商品所征的赋税，确保了这里拥有充足的粮食盈余；此外，国家还紧紧把持着优质的农业用地。
只要该国臣民认为政府对他们有益且实力强大，法老就可以轻而易举地对其有限的疆土实施统治。王权既是永恒的，也是个人的，其象征就是统治者有形的神威。埃及的王权属于一种制度，以法老的成败为标志。不过，尽管人们认为法老神圣，王室权威最终依赖的却是充足的粮食盈余，而后者反过来又要靠百姓的辛勤劳作才能获得。虽然形势复杂，政治挑战日复一日，各州州长偶尔也有犯上作乱之举，但最重要的一点还在于，这个国家很容易为气候变化所危及——印度洋上的季风强度减弱，会导致严重的干旱。
在公元前2575年到公元前2180年前后的古王国时期统治着埃及的那些法老，都是实力强大、自信十足的君主；他们执掌政权的4个世纪里，尼罗河洪水丰沛，作物收成充裕。他们可以轻而易举地凭借自己的神圣地位，声称他们是用全部神威掌控着洪水泛滥。法老都在孟斐斯的朝廷实施统治，那里位于下埃及，在“吉萨金字塔群”以南20千米。法老掌管着由上、下埃及“统一”而成的国家，全国分成9个“诺姆”（州），各州则由实力强大而又桀骜不驯的州长统治着。只要泛滥季带来了充足的洪水，国王的权力就是相对稳固的。这些领导人扩充了灌溉设施和沟渠，加强了下埃及地区那个肥沃三角洲上的农业生产。不过，一次强度不足的泛滥和作物歉收，会削弱国家权力中最关键的一个因素，即充足的粮食盈余。当然，其间偶尔也出现过洪水不足的年份，但过后总是再次出现了水量充沛的泛滥。这个国家不仅实力强大，治理得也很成功，因此到了公元前2250年，埃及的人口已经增长到了100多万，且其中很多人都在一定程度上靠国家提供的粮食维生。
公元前 2650 年之后，实力日增的祭司阶层开始把太阳崇拜与对法老的崇拜联系起来。统治者死后，将在星辰之中占据一席之地，被人们当成神灵加以崇拜。刻在一座金字塔墓室中的铭文曾称：“王之其灵……有梯置焉，王可登之。”[4] 古王国时期那些法老修建的金字塔，都是象征阳光穿透云层的石制建筑。这些气势雄伟的石梯东侧，就是正对着日出方向的国王陵寝。建造这些陵寝，是官僚组织取得的巨大成功：他们要安排口粮和原材料的运输，要召集有技术的工匠，并且在农耕生产停止、可以找到较多劳动力的每个洪水季里召集成千上万的农村劳力。如今世人都很清楚，开罗以西那个庞大的“吉萨金字塔群”修建于公元前2500年前后，但法老们究竟为何要修建如此复杂、如此耗费劳力的陵墓，却仍然是一个谜。[5] 或许，他们的目的在于通过劳动力将百姓与他们的守护者联系起来。这也有可能是一种行政手段，是根据劳动重新分配粮食，来组织百姓及其守护者之间的关系并将其制度化；这种手段，有可能用于粮食匮乏的时期。或许，他们之所以建造金字塔，主要是为了强调法老与众神之间那种非同寻常的联系，是一种把国王与太阳神联系起来的方法；至于太阳神，正是人类生存与作物丰收的终极源泉。究竟为何，我们永远都不得而知了。过了一段时间，金字塔便实现了建造它们的目的。国家掌控的劳动力，便转向了其他一些不那么显眼的项目。
埃及的精英阶层（其中也包括识字的书吏）与辛勤劳作的平民阶层之间，隔着一条巨大的鸿沟；当时，平民阶层必须提供劳动力，去清理灌渠、搬运石头和种植庄稼。这是一个领导有方的专制时代，依赖的是法老、法老手下的州长与高级官吏之间种种密切合作的关系。他们凭借集体才智与军事力量，创造出了一种独特的文明；这种文明在水源充足的几百年里运作良好，可在“阿赫特”水量不那么丰沛的时候却极易受到影响，事实上还非常脆弱。
大旱来袭（约公元前2200年至公元前2184 年）
古王国时期最后一位伟大的法老佩皮二世（前2278—前2184 年在位）统治埃及之后，这种脆弱性带来的恶果马上就显现出来了；据说此人曾统治埃及长达94年之久，是埃及历史上在位时间最长的法老。[6] 随着他年龄渐长、效率日降，这位法老手下的州长们便开始蠢蠢欲动。佩皮二世的应对之法，就是把大量财富赏赐给各个州长，从而极大地削弱了他的中央集权。公元前2184年佩皮二世归天之后，随着高级官吏们开始争权夺利，埃及便陷入了混乱之中。此时，彻底摧毁了美索不达米亚的“4.2ka 事件”也正好降临到了尼罗河流域。[7]
有无数证据说明了此时干旱正在日益加剧的情况。从青
尼罗河的源头即埃塞俄比亚的塔纳湖里钻取的淡水岩芯，记录了公元前 2200 年的一场干旱。红海中的咸水沉积物也表明，同期出现过一场严重的干旱。从下埃及地区萨卡拉钻取的一段岩芯表明，此地原来的耕地之上覆盖着深达1米的丘沙。水位很低的洪水，加上偶尔出现的强烈暴雨，将法尤姆洼地上的加龙湖（Lake Qarun）与尼罗河阻隔开来了。甚至从一具雪松棺材和一艘陪葬小船上取下的木头，其年轮也显示出了公元前2200 年到公元前 1900 年间一场干旱的迹象。
洪水水量突然灾难性地长期减少，这几乎马上导致了饥荒，并让一些本已完善的政治制度失去了作用。在长达 300年的时间里，饥荒不断，因为此时需要养活的人口，比早期多得多了。绝望的农民开始在河中的沙洲上种植作物，结果却无济于事。一位名叫伊普味的智者，有可能目睹过那场旱灾。据此人描述，上埃及成了一片“荒芜之地”。“呜呼，众人皆云：‘吾愿既死。’”在一段放在今天也很适用的评论中，他曾谴责当时的法老：“权、智、真集于汝身，然汝之所为，实乃陷国于骚乱喧嚣之中。”[8]
人们自然而然地向孟斐斯的法老求助，因为法老长久以来都宣称，他掌控着这条反复无常的河流。佩皮二世的继任者们既无能，也无权。储存的粮食很快就吃完了。于是，孟斐斯的统治者开始风水轮流转、你方唱罢我登场，而政治与经济权力则转移到了各州；此时的各州已经成了一个个小王国，由野心勃勃的州长掌管，其中有些州长的统治无异于国王。一些有能力的州长采取了严厉的措施，来眷顾手下的子民。通过实践，他们很快就掌握了应对突发性气候变化的一条基本原则，那就是在地方层面上解决这个问题。
有些州长喜欢在其陵墓墙壁上吹嘘他们取得的丰功伟绩。他们的吹嘘究竟在多大程度上反映的是机会主义而非实际行动，是一个仍有争议的问题。尼肯与伊德富的安赫提菲曾在公元前2180年左右统治着埃及最南边的两个州；当时，尼罗河的洪水水位低得异常。此人的陵墓铭文中，就说到了他采取的果断行动：“凡上埃及诸地，无不饿殍遍野，至人人皆食其子。然吾尽力，致本州无一人饿毙。”[9] 安赫提菲还把宝贵的粮食出借给其他州。这些自吹自擂的陵墓铭文中，还描绘了人们漫无目的地寻觅食物的情形。
此种行为，与 1877 年维多利亚时代那场可怕的大饥荒期间印度民众的做法惊人地相似（参见第九章）。随着周围沙漠上的丘沙被风刮到洪泛平原之上，那些一度繁荣兴旺的州都成了干旱的荒地。仓廪之中，空空如也；盗墓贼则把死者身上的东西尽数掳掠。
与安赫提菲一样，艾斯尤特的州长罕提（Khety）也采取了极端的措施，来与饥荒做斗争。他命人修建了蓄水坝，排干了沼泽，开掘了一条宽达10米的沟渠，将灌溉用水引到干旱的农田里。凡是有能力的官吏都很清楚，只有采取极端的措施，才能养活每一个人。他们关闭了所辖州的边界，以防饥民不受控制地逃难。他们定量配给粮食，并且小心谨慎地进行分配。实力强大的州长才是埃及真正的统治者，因为只有他们，才能采取短期或者较长期的措施来养活饥民，刺激当地的农业生产。埃及整个国家那种脆弱的统一性，就此土崩瓦解。
在3个世纪的时间里，埃及都是一个四分五裂的文明社会。历代法老已经促生出了一种信念，让民众以为他们掌控着从遥远上游而来的神秘泛滥。实际上，埃及这个国家所有不可一世的显赫辉煌，全都依赖于变幻莫测的印度洋季风，以及遥远的西南太平洋上的大气变化。这场危机，最终以尼罗河泛滥水位提高与法老门图霍特普（Mentuhotep）发动艰苦卓绝的军事征伐而宣告结束；公元前2060年，这位法老在上埃及登上王位，然后重新统一了全国，并且在位达半个世纪。
门图霍特普及其继任者在位期间，重建了农业经济；当时，人们已经不再认为法老绝对正确了。他们变成了“百姓的牧人”，对古埃及人生活的方方面面强制实行一种严厉的官僚制度。他们得天独厚，在位期间洪水充沛，只有公元前8 世纪和公元前7世纪例外，其间的低水位泛滥再次导致了政治动荡。但到了此时，为了经济生存，各州州长开始前所未有地相互依赖起来。后来，埃及那些最成功的法老之所以能够实现治下的兴旺昌盛，是因为他们派人把尼罗河流域变成了一片组织有序的绿洲。拉美西斯二世（前1304—前1237年在位）兴建王都拉美西斯城时，他建造的沟渠被称为是全埃及最厉害的：高效、宏伟，精心装饰的设施灌溉着整个地区。
在一个中央集权的农业国家里，法老简直就是神灵一般的管理者；国家在扩大灌溉计划、技术进步以及大规模粮食存储等方面进行了大力投入，确保了民众能够在多年的饥荒与危机中生存下去。宗教则是这种制度具有掌控力的最终源头。每个为自家田地和庄稼挖修沟渠的农民都很留意，他们必须公平修建，不然就会受到惩罚、坠入地狱。古埃及有所谓的“反面忏悔”，也就是人死之后灵魂接受审判时所做的告白；其中的第33条和第34条，要求灵魂申明自己从未阻断过水源，也从未非法接引过别人沟渠中的水。最终，这个国度就做好了应对危机的准备。埃及有备无患的情况，甚至在《圣经》中关于约瑟与家人为逃离迦南的饥荒而前往埃及的故事里都有所记载，因为约瑟等人知道，埃及会有充足的粮食盈余。
众神尽管拥有无所不知的力量，却无法为人们做出长期性的季风预报。在数个世纪的时间里，祭司们确实开发出了简单的“尼罗尺”水位计——一种巧妙的科学工具，能够在河水上涨时测出洪水的水位。如今，除了一些可以追溯到公元7世纪穆斯林征服埃及之后出现的水位计，这种工具已经罕有存世了。由法老所制的大多数水位计，都由神庙控制着。上埃及地区的阿斯旺是该国最南端的城市，而其对面的象岛上，就留存着一种重要的水位计样本。人们在这里可以测量当季最早的洪水水位。那座水位计建于古罗马时代之前，后被古罗马人所修复，大致就是河岸之上的一口井，用严丝合缝的石块建成，石块上面标着以前记录的、不同的洪水水位。一代代人长期观察积累和传授下来的经验，让祭司们能够以惊人的准确程度对洪水的水位做出预测。这是一种极其宝贵的信息；不但为与灌溉工程打交道的农民所需要，也为孜孜不倦地监督庄稼收成的税吏所需要。正如古希腊地理学家斯特拉波曾嘲讽的：洪水越厉害，财政收入就越多。
古埃及文明又繁荣发展了2,000年，最终变成了罗马的粮仓，这一点并非巧合；在下一章里，我们将对此进行探讨。不过，即便是在那时，突如其来的气候变化也曾造成旷日持久的干旱和旱情导致的饥荒，不但让成千上万人丧生，而且影响到了罗马与君士坦丁堡两地的粮食供应。
印度河：城市与乡村（约公元前2600年至公元前1700年）
印度洋季风的波动，对数百万人的生活产生了影响——不但影响到了尼罗河流域与美索不达米亚，也影响到了热带非洲，或许还影响到了南亚和东南亚；其中，就包括印度河流域及其周边地区的居民。
南亚地区的东部为热带雨林，北部为山脉，且为阿拉伯海、印度洋和孟加拉湾所环绕。这个次大陆上，形成了自身的文化特色和极具多样性的独特文明。其中最早的，就是印
度河文明，它属于早期与美索不达米亚文明、埃及文明同时繁荣发展起来的伟大文明之一。[10] 20世纪20年代，英国和印度的考古学家几乎纯属偶然地在旁遮普邦发现了这个文明；当时，更广阔的外界仍然对其所知不多。如今我们知道，这种文明曾经在至少达 80 万平方千米的广袤区域里（大致相当于西欧面积的四分之一）繁荣兴盛，不但覆盖了今天的巴基斯坦，还从如今的阿富汗一直延伸到了印度。印度河流域与现在已经干涸的沙罗室伐底河流域，是这个文明的文化中心，但它们仅仅是一个范围更大、具有多样性的散居社会中的一部分而已；那个社会绵亘多种多样的环境，从俾路支斯坦的高原和喜马拉雅山麓，纵贯旁遮普和信德的低地，直至如今的孟买。
考古学家已经在印度河流域的多个生态区里确定了1,000 多个定居地，从植被葱茏、绿色遍野的乡间田园，到气候炎热、不宜居住的半沙漠地区，到处都有。尽管大多数遗址都是村落，但其中至少有5处为主要城市。需要明确的是，这里属于当时世界上最大的城市文化群落，规模大约达到了美索不达米亚或者埃及同时代城市文化群落的两倍。这里的城市，在公元前2600年左右到公元前1900年间，曾经令人钦佩地繁荣了六七个世纪之久。这里的人口可能达到了100 万，与古罗马鼎盛时期的人口相当。只不过，这个庞大的文明很快就从历史上消失了。无论是公元前4世纪入侵此地的亚历山大，还是公元前3世纪南亚次大陆上一心向佛的统治者阿育王，对这个文明都一无所知。因此，考古学家不禁要问：气候变化在印度河文明的消亡中，扮演了什么样的角色呢？
如今，当地的气候有利于农业，因为那里有两种不同的天气系统占据主导地位，有时二者还会叠加。[11] 在西部高原地区发挥作用的，是多雨的冬季气旋系统，而夏季季风系统，则会为印度半岛各地带来降水。假如其中一个系统未能带来降雨，那么另一个系统往往能够加以补足，从而意味着如今的印度河流域不会出现饥荒。每年的7月至9月间，印度河本身也会泛滥。农民会待洪水退却之后，以洪水带来的淤泥为肥料种植庄稼，到来年春季再进行收割。有意思的是，我们没有证据表明印度河流域的农民进行过大规模的灌溉；这一点不同于埃及，因为埃及人必须修建灌渠来扩大洪水所及的范围和蓄水。很有可能，假如印度河流域某个地区的收成不佳，那么获得了丰收的另一个地区便会通过当时业已完备的贸易网络，送来粮食进行救济。
印度北部新德里以北约200千米的萨希亚洞穴（Sahiya Cave）中的石笋表明，印度河文明形成的那几个世纪，正是强季风导致气温升高、降雨也显著增加的一个时期。[12] 结果，作物收成变得更可预测，粮食盈余变得更加可靠，印度河文明赖以生存的经济上层建筑就此形成。也正是此时，不断发展的村落与较大的农业群落逐渐演变成了一种复杂的前工业化文明。
尽管有过多种形式，但城市已经成为古代文明的一个标志。它们完全不是人们在中东大部分地区发现的那种紧凑、拥挤而有围墙的定居地。印度河流域的城市，很难与乌鲁克、乌尔、拉美西斯诸城比较，事实上也很难与其他地方的任何一座城市比较。忘掉亚述和苏美尔君主们浮夸的豪言壮语，忘掉古埃及法老们自吹自擂的意识形态宣言吧。曾经掌管着哈拉帕、摩亨佐达罗以及印度河流域其他城市的统治者，至今仍默默无闻。他们与古埃及人或美索不达米亚人不同，不喜欢在寺庙墙壁上大肆宣扬自己的丰功伟绩。再则，这种文明中似乎没有什么寺庙；实际上，根本没有任何宗教建筑的明显迹象。此外，那里只有一些模糊的宗教暗示，比如一尊“祭司王”的小型半身像；不过，此人有可能既非国王也非祭司，而只是某个沉浸在极乐的瑜伽式冥想中的人。大量装饰性的印章上，也带有各种各样的形象，其中包括以明显的瑜伽姿势打坐的人。这是宗教信仰吗？也许吧。遗憾的是，他们的文字系统仍然没有为世人所破解。假如得到了破解，那么印度河文明的密码可能会讲述一个截然不同的故事；但在此以前，考古学还是会指出，当时此地城市中居住的，都是一些谦逊与崇尚平等的人。
20世纪40年代末，劲头十足的英国考古学家莫蒂默·惠勒（Mortimer Wheeler）曾在哈拉帕与摩亨佐达罗两地进行过发掘，却并未找到装饰华丽的建筑、宏伟壮观的寺庙、镀金的神殿或者宫殿。相反，他发现了两座城堡，里面建有相当实用的公共建筑，包括一座粮仓和一座用砖块建造、有支柱的大型厅堂；砖块能够保护大厅不被洪水冲垮。人们都住在精心建造的房屋里（同样是用砖块建成），并未显露出城市里常有的阶级差别的任何迹象。然而，尽管两座城市明显崇尚平等主义，在公元前2550年左右到公元前1850年间有人居住的那个时期，两城都属于世界上最复杂的城市。城中建有气势恢宏的防洪工程、水井，以及可与现代相媲美的卫生设施，其中还包括世界上最早的洗澡间和带有下水道的厕所。在两座城市里，建造者都遵循一种不规则的网状建设规划；这种规划历经多个世代的发展演变，其中包括呈网格状的街道与房屋。惠勒曾经令人难忘地描绘他的印象：“中产阶层繁荣富裕，热衷于市政监管。”[13]
惠勒喜欢进行生动形象的描述，并且用其西方视角来加以渲染。不过，他对中产阶层繁荣富足的描述，却是错误的。最新观点认为，两城都属于多中心社会，有墙壁与平台将城内划分成了不同的区域；城外的定居地较少，是从事经济活动和工匠们劳作的地方。印度河文明可能是一个无等级社会，公共活动曾是平常之事。然而，这种文明中的城市居民可能也逞强好斗，因为定居下来的人类经常如此；比方说，有迹象表明，哈拉帕曾经出现过相互对抗的地方社群。[14] 然而，考虑到印度河文明是世界上唯一一个没有证据表明发生过任何有组织战争的已知文明，那么，我们把读者的注意力引向可能存在的地方性争端，就会是一种相当不公平的做法。尽管我们也曾努力寻找相反的情况，但所有证据还是表明，至少在城市层面上来看，这是一个和平、繁荣与崇尚平等的社会。这个社会，也与外界有着密切的联系：这里的民众，曾与波斯湾和美索不达米亚地区进行过数个世纪的贸易。
无论哪种社会曾在印度河沿岸以及更远的地方繁荣发展，无疑都不属于一个金字塔式的社会。我们很难找到另一个社会，能与华而不实的埃及和美索不达米亚诸邦形成更加鲜明的对比；而从应对气候变化方面来看，印度河文明的韧性也要强得多；尽管从其幕后始终存在地方领导人与城市之间的竞争这种意义上来说，印度河文明也很脆弱。
随着城市的发展，城市周围的乡村定居地也发展起来了。实际上，我们或许应当把这些城市称为“城邦”，才能反映出它们在当地环境中的重要性。至于城市周边的定居地，其中很多都以农业耕作为主，还有一些则属于手工艺中心。许多定居地只是短时间里有人居住，或者断断续续地有人居住。
当时居无定所的情况很常见，河流密布、季风性洪水频发的地区尤其如此。这样的环境要求定居人口具有流动性，以便适应变化迅速的水文条件。这种适应手段中的一部分，就是让家庭成员和亲属分散到几个定居地生活，以便稳定地获得水源供应。对于在局部需要面对极具挑战性的自然条件的人们来说，这样的局面有可能提供了更大的适应性与生存能力。在这种情况下，减少风险就成了生存的核心；人们所用的策略，很可能包括多茬复种（即每年种植两三种作物）、栽种抗旱作物以及在同一块地里同时种植不同的谷物等等。[15]
随着人们越来越多地种植大麦、小麦之类的冬季作物与小米、抗旱谷物等夏季作物，农业多样性也随时间的推移而得到增强。不同地区的农耕方式之间差异巨大，使得这里很难对粮食生产实施任何一种形式的集中存储和控制措施。哈拉帕遗址的一个大型粮仓表明，养活大量不从事农耕的城市人口，无疑是当时的人十分关注的问题。极有可能的是，像哈拉帕这样的城市所依赖的，都是城市腹地提供的粮食盈余以及完善的基本商品贸易网络，而农村地区基本上都是自给自足。
印度河文明与古埃及文明形成了鲜明的对比。印度河文明并非一个统一的国家，而是一个丰富多样、权力分散的社会；这一点，就使得可持续生存的问题远比独裁君主统治大片领土时更受地方关注。虽然不同地区的风险管理差异巨大，但它们却在朝着共同的方向发展：印度河流域的所有城市，在公元前2000年到公元前1900年左右全都消失，而整个文化综合体也随之消亡了。为什么呢？
熬过大旱
“4.2 ka事件”是一个极度干旱的时期，给整个亚洲与印度洋地区那些简单的和较复杂的社会都带来了长期的困扰。印度洋夏季风和冬季风强度减弱的时间，与哈拉帕、摩亨佐达罗以及印度河流域其他城市消失的时间大致吻合；不过，大旱似乎不太可能是触发城市解体的唯一因素。在这里，我们是有意使用“解体”（dissolve）一词，因为说“崩溃”的话，会让人产生误解。农村群落中，有一种由来已久的散居传统。近期对哈拉帕一座墓地中的骸骨进行的同位素研究表明，很多死者都是从别处而来的移民。人们源源不断地进出这些城市，也会频繁进出一些较小的群落。考虑到村落与较大社群之间联系紧密，这一点就不足为奇了，因为较大社群中必定有他们的其他亲属，起码也有贸易伙伴。
印度河流域城市的解体，可能只是对食物短缺做出的一种防御性反应，因为迁往水源供应较充足、可以找到食物的社区，就能解决食物短缺的问题。这是一个去中心化的文明，故人口流动就是适应措施。毕竟，假如照管好自己所在的社区就能衣食无忧，为何还要去为城市提供粮食呢？村落中为了适应长期干旱而将作物多样化，更多种植夏季作物与抗旱谷物，比如小米与水稻，也就成了一种常规。作物收成可能一直处于较低水平，故难以维持大型城市所需。整个印度河流域各地显然存在差异，不过，我们同样应当将短期干旱与长期性的干旱周期区分开来；在长期性的干旱周期中，短途甚至是中等距离的供应网络也无法为城市生产出充足的粮食盈余。在高度重视亲属关系与义务的非等级制社会里，一种古老的适应策略开始发挥决定性的作用。据一些针对定居地进行的研究来看，许多人在公元前 1800 年左右离开了印度河流域，往北迁徙到了拉贾斯坦与哈里亚纳，故随着哈拉帕的没落，上述两地的人口也出现了大幅增长。
除了韧性，一些根本问题如今依然没有答案。看似稳固的印度河流域诸城在面对漫长的干旱时，出现了什么情况？此时的气候，是否太过干旱？农民的适应之举，是否变得太过多样化了？是不是气候变化导致印度河流域的城市人口根本不可能适应？我们知道，虽说印度河当时仍然水流湍急，但该地区的第二条大河沙罗室伐底河却已干涸；或许是因为一场地震破坏了该河的上游，导致河水改向，注入了恒河。随着沙罗室伐底河逐渐干涸，依靠此河生存的定居地也消失了。这种情况，最终导致了整个社会的倾覆。
尽管印度河文明已经消失，但从全局来看，它却是一种长久存在的文明。无疑，以工业化之前的早期标准来衡量，印度河流域诸城都曾异常稳固与持久存在。它们之所以具有长久的韧性，可能是因为当时的人都依赖一些可持续的农村生活方式；可事实证明，当作物收成减少导致粮食盈余大幅下降时，仅仅依靠这些生活方式是不够的。相比而言，乡村农民反而通过种植一系列适应了当地环境与水源供应的作物，实现了长期的可持续生存。人数较少的群落，可能拥有他们熟悉的、长期采用的社会机制，故人们对作物与耕作方式的选择以及他们的文化行为都较为灵活。在这种情况下，人口迁移可能就成了许多地方的必要之举，这也解释了人们不断弃定居地而去的原因。当然，我们没有证据表明这种文明是以痛苦的方式终结的，因为我们并未看到这里爆发过大战（甚至是小规模战争）的迹象，也没有证据表明定居地出现过暴力或者遭到过破坏。
印度河文明之所以强大稳固，是因为它建立在一种农村的社会与经济基础之上。就其本质而言，这种社会和经济基础是有韧性和可持续的；原因部分在于，那里的环境极具挑战性与多样性，或许还在于，那里有一种似乎平和安宁、没有社会等级以及约束性的宗教教条的意识形态。在一个去中心化、大部分社会权力留在地方的社会中，这种意识形态发挥了良好的作用。城市是一种临时的适应之举。农村社区可以熬过长期的干旱；尽管邻近社群的帮助有可能减轻了干旱带来的影响，但农村无疑不会出现饥肠辘辘而密集拥挤的城市人口所经历的痛苦。同样，最成功地适应气候变化的措施，最终都属于地方性的举措。
各有所好
逞强好斗、极其脆弱且易被摧毁：美索不达米亚与古埃及这两大最早的文明，其一连串统治者都试图将自己的意志和独特的治理模式，强加于亘古以来的村落社会之上。由他们的宗教、他们的众神加以合法化之后，这些统治者的故事就成了一段权力与荣耀的佳话。可在印度河流域，人们却似乎尝试过某种别的做法，即合作与社会平等（起码在城市居民当中是如此），并且明显弱化了等级制度、君主制度和宗教信仰。为了适应气候变化而采取的这些策略，每一种都在一段时间里获得了成功，直到新的政治组织体系兴起并改变了社会。不过，说到应对干旱与重大气候事件，最有效的对策却既非来自为了资源而征服邻邦的中央集权制帝国，也非来自那些实力强大、掌管着集中化粮仓的总督，而是来自地方主动根据自身群落所熟悉的现实情况及其周围环境，量身定做出的适应性举措。无疑，今天的情况也是如此。
这些早期文明在气候变化面前，没有哪一个曾经全然无力应对。但在面对一些重大情况，比如“4.2 ka事件”时，它们也不像偶尔有过的情形那样具有无限的适应力。它们所应对的《圣经》当中所述的一场场漫长干旱的经历，现代的工业文明社会从来不必面对。假如将4,200年前的干旱事件放到当代背景之下来看，那么，1998年至2012年间黎凡特地区长达15 年的干旱，其旱情据说就要比过去900年间任何一个可比时期都厉害。这场干旱，比近几百年里自然变化造成的其他干旱都要严重得多。造成这种现象的罪魁祸首，就是势不可当的人为气候变化。考虑到人们对未来全球气候的预测，我们需要在国际范围内采取更大的适应措施，规模将远超过去。从公元前 2200 年那场特大干旱事件中吸取的教训，或许有助于我们去面对未来即将出现的大量气候挑战。
这些社会留下来的遗产，对如今具有相当重要的意义。法老们统治着一个面积广袤的河谷，那里降雨稀少，但每年都有一场变化莫测的河流泛滥。“4.2 ka事件”让他们明白，在一个农业权威最终以村落为本的社会里，无论是独裁权力还是众神，都无法解决作物歉收与饥荒的问题。后来的统治者则鼓吹新的教义，将法老说成是引路的牧人。这些领导者在粮食储存与地方性灌溉方案上进行过大量投入。他们的文明，延续了 2,000 多年。与此同时，在美索不达米亚地区，百姓却生活在一种撕裂了的政治局面中，很大程度上由显著的极端气候与往往猛烈的洪水所决定。这种局面，远比古埃及的环境易变，而反复无常的环境变化还有可能导致河流改道，甚至是干涸。从长远来看，生存以及适应干旱周期与其他气候变化既需要深入的环境知识，也需要深厚的农业知识。在这个方面，真正的权力最终并非掌握在实力强大、大肆征伐的国王手中，而在于城市与农耕社群适应当地环境的能力。正如萨珊人付出了巨大代价，在亚述人消亡数个世纪之后才发现的那样，大规模的灌溉农业会带来全面的环境改变，故在有些方面很脆弱（尤其是易受盐碱化的影响）；而这一点，在早期进行较小规模耕作的农民中已是众所周知。所以，萨珊人的农业没有获得成功。
尼罗河沿岸和美索不达米亚地区，是少数精英实行统治。他们过着锦衣玉食的奢华生活，农民却要辛勤劳作，有时还处于长期贫困之中。对于掌控多数民众的少数人而言，实行中央集权式的政治与经济控制最为理想，即便这种控制意味着他们必须遏制地方的知识，禁止传统的解决办法，以及消磨百姓在面对不断增长的实物税需求时的韧性。印度河文明似乎正好与之相反，是一个去中央集权化和极具多样性的社会，倡导社会平等（至少在城市中如此），权力则掌握在那些靠着土地为生的小社群手中。在这里，迁徙就是人们为适应洪水不足与干旱而经常采取的对策。即便到了沙罗室伐底河干涸、印度河流域诸城解体之后，这种独特的印度河文化及其制度，也依然存续了一段时间。如果说过去有什么例子，说明了传统知识与地方性办法对解决气候变化问题的重要价值，那就非印度河文明莫属了。
与此同时，我们现代的工业化世界实行的却是一种经济极端不平等的制度，它建立在一种崇尚积聚、增长和剥削的意识形态之上，让少数精英靠别人的劳动变得富裕起来。然而，许多资本家都会忘记——或者更喜欢无视——还有无数人生活在农村，并且按较为传统的方式生活。尽管生活艰难，但这些人还是生存下来了，原因就在于他们依赖的是古老而传统的农耕和放牧策略；这些策略对所有人的未来都至关重要，在现代世界中也仍然具有可持续性。
虽然考古学家已经让我们了解到大量有关远古时代气候变化与适应情况的知识，但我们也有许多的历史记录与科学资料，涵盖了过去的2,000年。我们将会看到，就算是几十年的短期干旱或者短暂的寒潮，也曾导致死亡与苦难，并且最终导致一些实力最为强大的帝国灭亡。在接下来的各章中，我们将从意大利开始，然后一路横跨整个世界，去探究其他几个在气候变化面前崩溃的帝国。偶尔，我们也会看到人们成功应对气候挑战的情况，并且学习他们的经验。但我们首先要探究的，就是罗马帝国的遭遇。
[1] Herodotus, The Histories, trans. Robin Waterfield (Oxford: Oxford University Press, 1998), bk. 2, line 111, 136.
[2] J. Donald Hughes, “Sustainable Agriculture in Ancient Egypt,” Agricultural History 66, no. 2 (1992): 13.
[3] Barry Kemp, Ancient Egypt: The Anatomy of a Civilization, 3rd ed. (Abingdon, UK: Routledge, 2018)，这是一部了解古埃及文明的出色指南。
[4] I. E. S. Edwards, The Pyramids of Egypt (Baltimore: Pelican, 1985), 12.
[5] Mark Lehner, The Complete Pyramids (London: Thames & Hudson, 1997). See also Miroslav Verner, The Pyramids. Rev. ed. (Cairo: American University in Cairo Press, 2021).
[6] 佩皮二世的在位时间存有争议，有可能短至64年；但按照法老的标准来看，这仍然是一段令人印象深刻的漫长统治时期。
[7] 在埃及学当中，气候变化在古王国的没落过程中所起的作用仍是一个具有争议的问题。有一篇论文对各种观点进行了有益的总结：Ellen Morris, “Ancient Egyptian Exceptionalism: Fragility,Flexibility and the Art of Not Collapsing,” in The Evolutionof Fragility: Setting the Terms, ed. Norman Yoffee (Cambridge,UK: McDonald Institute for Archaeological Research, 2019), 61–88。
[8] 人们认为《伊普味陈辞》（The Admonitions of Ipuwer）的创作时间可以追溯至中王国时期，这是一部不完整的文学作品，保存在大约公元前1250 年的一份纸莎草纸上，但其正文源自更早的时代。这是世人已知最早的一部政治伦理学专著。伊普味认为，贤明的法老应当约束其手下官吏，并且执行众神的意志。引自 Barbara Bell,“Climate and the History of Egypt: The Middle Kingdom,”American Journal of Archaeology 79 (1975): 261。
[9] Barbara Bell, “The Dark Ages in Ancient History, I: The
First Dark Age in Egypt,” American Journal of Archaeology
75 (1971): 9.
[10] 对印度河文明的概述之作：Andrew Robinson, The Indus: Lost
Civilizations (London: Reaktion, 2021)。亦请参见 Robin
Coningham and Ruth Young, From the Indus to Ashoka:
Archaeologies of South Asia (Cambridge: Cambridge University
Press, 2015)。
[11] Ashish Sinha et al, “Trends and Oscillations in the
Indian Summer Monsoon Rainfall over the Past Two Millennia,”
Nature Communications 6, no. 6309 (2015); Peter B. deMenocal,
“Cultural Responses to Climate Change During the Late
Holocene,” Science 292, no. 5517 (1976): 667–673. See also
Alena Giesche et al., “Indian Winter and Summer Monsoon
Strength over the 4.2 ka BP Event in Foraminifer Isotope
Records from the Indus River Delta in the Arabian Sea,”
Climate of the Past 15, no. 1 (2019): 73. doi: 10.5194/cp
15-73-2019.
[12] Gayatri Kathayat et al., “The Indian Monsoon
Variability and Civilization Changes in the Indian
Subcontinent,” Science Advances 3 (2017): e1701296.
[13] Mortimer Wheeler, The Indus Civilization, 3rd ed.
(Cambridge: Cambridge University Press, 1968), 44.
[14] 基本资料：Cameron A. Petrie, “Diversity, Variability,
Adaptation, and ‘Fragility’ in the Indus Civilization,”
in Yoffee, Evolution of Fragility, 109–134。
[15] C. A. Petrie and J. Bates, “ ‘Multi-cropping’, Intercropping and Adaptation to Variable Environments in Indus South Asia,” Journal of World Prehistory 30 (2017): 81–130，这是一篇全面论述印度河农业的论文。
第五章罗马的衰亡（约公元前200年至公元8世纪）
公元 350 年，罗马帝国正处于鼎盛时期；其规模之大，令人难以置信。罗马帝国的公民，从欧洲西端的西班牙到远至东方的尼罗河流域，在各地繁衍生息着。罗马帝国的军团驻守在气候寒冷的不列颠北部的哈德良长城上，控制着莱茵河与多瑙河沿岸的防御工事，在撒哈拉沙漠北部边缘与亚洲西部也保持着强大的军事实力。罗马这座“永恒之城”最初只是一个小小的镇子；根据传说，此城是公元前753年由罗慕路斯与雷慕斯这对双胞胎兄弟所建，据说他们是由一头母狼养大的。罗马先是变成了一个君主国，然后是共和国，最终又成了一个庞大帝国的中枢。然而，公元476年最后一任皇帝退位之后，这个帝国便土崩瓦解了。
罗马帝国为什么会分崩离析，是历史上一个存有重大争议的问题。[1] 1984年，德国古典学者亚历山大·德曼特曾经列举了自古典时代晚期以来，人们针对罗马帝国衰亡提出的不下210个原因。如今世间无疑提出了更多的原因，但也有了一种重大的区别，那就是：对于古罗马时期的气候变化，以及气候变化对人们生活的影响，我们有了更加深入的了解。
暖和的开始（约公元前200年至公元150年）
罗马帝国诞生于一个气候温暖、普遍湿润且持久稳定的时期；传统上，人们将这一时期称为“罗马气候最宜期”（Roman Climatic Optimum，略作 RCO），它从公元前200年左右一直持续到了公元150年。[2] 种种宜人的气候条件，与公元前 43 年阿拉斯加地区的“奥克莫克二号”火山大规模喷发之后火山活动大幅减少的时间相吻合。从公元前 44 年尤利乌斯·恺撒遇刺到公元169年之间，并没有出现什么重大的火山喷发；就算公元79年著名的维苏威火山喷发，规模也相对较小。在西方，北大西洋涛动与大西洋西风带是两大主导因素。东方则有一系列的气候因素参与，其中包括印度洋季风、厄尔尼诺现象，以及北纬30°的持久性副热带高压，它们单调而有规律地遏制着降水。这是一个温暖和气候稳定的时期；对任何智人而言，条件都很完美。45座高山冰川开始消退，直到公元3世纪。高海拔地区的树木年轮表明，最高气温出现在公元1世纪中叶。正是当时罗马的博物学家老普林尼[3]，指出了山毛榉不只能在海拔较低之处茁壮成长，也喜欢生长在高山上。当时的整个地中海地区一直气候湿润，降水丰沛。
“罗马气候最宜期”凭借较高的气温和通常很充沛的雨水，为地中海地区的农业创造了奇迹，尤其是小麦，这种作物对降雨和气温变化极其敏感。多年的较高气温与充沛的降水扩大了耕作的范围，提高了土地的生产力，所以古罗马时期种植的谷物要比数百年之后中世纪农民种植的谷物产量更高。据一项保守的估计数据，气温每上升 1℃，就会增加100万公顷适宜耕作的土地，足以多养活300万至400万人。不仅小麦的种植面积扩大了，像橄榄和葡萄等主要作物也是如此。
有三大因素共同作用，促进了罗马疆域的扩张，即贸易、技术与气候。降雨增加，让北非地区变成了罗马的一座粮仓。如今，北非国家却须进口粮食了。不断上升的人口密度，将农民推向了更加边缘的地区。随着帝国不断发展和稳固下来，各地交通水平与长途贸易水平都大幅提高，使得原本具有风险的农耕变成了一种更加现实和风险较低的活动。属于半干旱气候的北非地区见证了灌溉农业的爆炸式增长，那里兴建了水渠、堤坝、蓄水池，以及简单却很巧妙的暗渠——这种设施能够利用重力，将地下水从海拔较高的地方输送到可耕作的低地上。[4] 在“罗马气候最宜期”达到顶峰的时候，作物种植拓展到了如今的撒哈拉沙漠北部。在公元2世纪干旱卷土重来期间，沙漠便再次开始扩大。在东方，来自死海地区索瑞克石窟中的洞穴沉积物则说明，公元100年之后那里的降雨量曾急剧下降。
“罗马气候最宜期”快要结束的时候，夏季气候开始势不可当地加速转变成更严重的干旱。有一种观点认为，这种情况，是由于古罗马的农民为了建筑、生火和燃料所需的木材而对地中海地区的森林乱砍滥伐。上述活动，都会导致地面向大气中反射更多的热量。如此一来，土壤中通过蒸发进入低层大气中的水分减少，使得夏季的降水也减少了。假如这种观点是正确的——争论还在继续——那么，随着“罗马气候最宜期”结束，人为因素与自然因素就开始一起发挥作用，而罗马帝国在随后的数个世纪里，也一直面临着由此带来的压力。
古典学者凯尔·哈珀指出：“气候就是古罗马人能够创造奇迹的有利背景。”[5] 他认为，罗马帝国统治的土地曾是“一座巨大的温室”。“罗马气候最宜期”导致的发展，在其规模与抱负方面都是史无前例的。不过——这个“不过”很严重——此种扩张看似神奇，其稳定性却直接取决于人类无法掌控的一些强大因素。
公元150年之后的3个世纪里，罗马帝国的气候变得日益变幻莫测和不稳定起来，非但让农业和统治方式的调整变得反复无常，而且让帝国的人口也变得反复无常起来。各种不受掌控的气候变化力量开始产生微妙的作用，有时还会带来巨大的影响。
正如哈珀进一步指出的那样，地中海向来都是一个气候变化剧烈的地区，而“罗马气候最宜期”气温较高、降雨丰沛，有可能缓解了每年气候莫测的程度；对当时的农民而言，气候过度不可预测是一种重要的现实情况。公元128年，经常出巡的哈德良皇帝巡察了非洲诸省。在巡察期间，那里下了5年以来的第一场雨；当年的小麦价格，要比过去气候较为湿润的数十年里高出了 25%。“御驾一到，天降甘霖”这样的奇迹固然很好，但还需要采取切实措施才行。于是，哈德良皇帝冒冒失失地下令，建造一条长达120千米的引水渠来为迦太基供水；这也是古罗马人建造的最长水渠之一。[6]皇帝的顺应之举虽然令人钦佩，但实际上，它不过是对数个世纪以来肆虐罗马帝国心脏地带的一场旷日持久的干旱危机所做的一种反应罢了。
韧性与瘟疫（公元1世纪以后）
罗马帝国是一个由农业、人口、财政、军事与政治制度错综交织而成的庞大帝国。各种各样的风险，都曾危及整个国家。诚如马可·奥勒留皇帝所言，整个帝国就像一座风雨飘摇的岛屿，被敌人的舰队、海盗与暴风雨所围困。每位皇帝都不得不在一个持久动荡的世界里直面诸多困难，其中就包括了气候变化。风险管理靠的是人，须利用各种来之不易的策略，去应对意外的洪水、漫长的干旱，以及由此导致的让粮食供应不堪重负的饥荒等事件。压力就是罗马帝国晚期一种始终存在的现实，而其中的大部分压力，又日益来自气候变化。
最有效的应对武器在农村，在业已获得了代代相传的经验与专业知识的农耕群落里：作物多样化和稳健的粮食储存策略，以及一些奇异的当地作物，它们能够在干旱年份里茁壮成长，故是一种重要的保险措施。自给自足、在饥馑时期帮助困难亲属与邻居的互惠之举，以及精心安排的资助，都属于农民手中的“武器”。罗马帝国的农村社会背后，蕴藏着一种深厚的自力更生精神。比如说在不列颠，罗马时期的农耕定居地似乎已经实行了一定程度的自治。尽管这种遗址如今为世人所知的不多，但英格兰南部的萨默塞特郡却发掘了两处。第一处是西格韦尔斯，它由一些互不相连、修有石墙的长方形建筑组成，而附近的卡茨戈尔遗址则以一种呈直线形的“街道”布局为标志。[7] 这两个罗马-不列颠定居地属于同一时代，但看上去却截然不同。它们显然不是按照帝国那种自上而下的规则千篇一律地组织起来的，而是根据当地居住者的需求独立发展出来的，有时还是在罗马时代以前就发展了漫长的时间，比如西格韦尔斯就是这样。
有些顺应策略，也拓展到了城市与市镇。城市里的粮食储存，在帝国各地都占有极其重要的地位。许多城市都是沿着主要河流与水道发展起来的，这一点并非巧合，因为河流与水道降低了它们对各自腹地的依赖程度。众所周知，内陆城市很容易受到短期干旱的影响，因为这些城市输入与输出粮食都要困难得多。
出现粮食危机时，罗马帝国政府早已做好了准备，要么是提供粮食，要么就是遏制任何一种企图剥削他人的做法。这就是农村中普遍存在的互惠与资助原则的一种真正延伸。帝国实行的应对策略，往往规模宏大。公元117年至138年在位期间，哈德良皇帝巡视了许多城市，并且“悉加眷顾”。[8] 他修造水渠供水，兴建港口，进口粮食，甚至为公共建设提供资金。那些为罗马供应粮食的市政粮仓，规模都很巨大。塞普蒂米乌斯·塞维鲁皇帝（193—211年在位）极其关注罗马的粮食供应问题，以至于去世后他还留下了可供罗马吃上7 年的粮食。粮食救济变成了帝国慷慨大度的一种公认象征。公元2世纪时，古城以弗所发出的一封公函中曾经承诺，只要作物收成足供罗马所需，埃及就会将粮食运往此城。“若如吾等所祷，尼罗河之泛滥一如往昔，埃及人之小麦亦获丰收，则汝等当为母国之后率先获得粮食者。”[9] 在很多方面，古罗马人面临的全球粮食供应挑战都与我们如今一样，因为当时的人正越来越容易遭受饥荒的威胁。我们不妨想一想现代美国或者欧洲各国超市的情况。您可以买到产自世界6个大洲的食品。与古罗马人一样，我们的食品供应也严重依赖单一栽培，依赖玉米、小麦和其他谷物的大规模生产，也依赖于工业化的畜牧业。假如人类食物链中的部分链条因为全球变暖而断裂，又会出现什么结果呢？或者，面对新型冠状病毒之类的人类流行病，还有像“疯牛病”等有可能在短期内大幅减少牛肉供应量的动物瘟疫时，它们对食品供应的影响情况又会如何呢？
古罗马人的食物链，达到了极其复杂的程度。在公元 2世纪，大约有20万罗马公民每月都能领到5斗小麦；光是用于救济的小麦，发放量就达到了8万吨。[10] 每年都有一支大型的运粮舰队，从亚历山大港驶往罗马，且舰队一向都会受到兴高采烈的罗马人夹道相迎。值得注意的是，向罗马城运送粮食的任务由私人负责，官方并未参与；这一点，应归功于当时粮食市场的雄厚实力。不过，罗马的谷物供应主要依赖于两大粮仓，即埃及各州与北非其他地区。
纵观帝国的历史，罗马帝国堪称一家庞大的企业，以不断发展的城市与远远超出了帝国疆域的贸易网络为基础。古罗马人很清楚中国人的存在。罗马帝国是一个宏伟显赫而令人敬畏的文明社会，促进了人类的远距离流动与联系。但帝国也变成了流行性疾病的温床；这一点，很大程度上是由城市里的卫生问题导致的。帝国境内的主要城市人口都很稠密，居民住得很近，还挤满了来自遥远国度的移民与奴隶。古罗马的市政工程师将水源引入各座城市的中心，供人们饮用、沐浴和冲洗下水道。他们修建了一些较大的公共厕所，一次能供50 至100 人使用。但这些城市里的垃圾处理和卫生设施充其量只能算是很简陋的。据说，光是罗马城，一天就能产生45 300公斤的人类粪便。蛔虫、绦虫以及其他寄生虫十分常见，而大量的细菌则让城市变成了一个个致命的、传染病频发的杂乱之地，夏末和秋季这段死亡高峰期尤其如此，因为夏季的炎热很致命。不论是富人还是贫民，都会感染疟疾、伤寒、慢性沙门菌和腹泻等。就连皇帝本人，也未能幸免：公元81年，皇帝提图斯很可能就是死于疟疾。“罗马气候最宜期”当中气温较高、雨量增加的那几个世纪，似乎助长了疟疾的流行。罗马与其他的主要城市，都成了传染病的“培养皿”。
当时的瘟疫，通常源自内部而非外部输入，这种情况直到公元2世纪马可·奥勒留统治时期才有所改变；当时，由于罗马开辟了季风航线，故帝国与印度洋、孟加拉湾沿岸地区的贸易联系大幅增加了。[11] 到了此时，每年都有差不多120 艘来自印度的商船抵达红海诸港。商船带来了黄金、象牙、胡椒和其他香料，还有中国的丝绸。胡椒变成了人们常用的一种香料，连遥遥驻守不列颠北部哈德良长城的士兵也不例外。亚历山大港扼守在地中海与印度洋世界之间的十字路口，成了东方奢侈品的最大市场。大部分贸易起源于盛产象牙与黄金的东非沿海，而那里正是一个拥有丰富的微生物多样性，以及有可能致命的病原体的地区。
横跨欧亚大陆的“丝绸之路”，也是一条历史悠久、传播人体携带的病原体的路线。2016年，研究人员在中国西北地区一个大型的“丝绸之路驿站”发现了旅行者远距离传播传染病的最古老证据。他们的研究，集中在公元前111年前后挖成，直到公元109年仍在使用的一座汉代茅厕上。在一把把“个人卫生棒”（即用织物包裹着的擦粪棒）上，研究小组发现了4种不同的寄生虫虫卵，其中包括了中国的肝吸虫虫卵，它是一种能够引发腹痛、腹泻、黄疸和肝癌的寄生性扁虫。[12] 这种寄生虫只能在雨水充足、气候潮湿的地区才能完成其生命循环；然而，悬泉置驿站却位于气候干旱的塔里木盆地东端。这就说明，肝吸虫不可能曾在这个干旱地区普遍存在，而如今距这里最近、流行地方性肝吸虫病的地区，也在大约1,500千米以外。因此，研究人员的结论就是：一个本已感染了传染性肝吸虫病的旅行者，必定曾强忍腹痛，长途跋涉到了此地。不过，与很快就会让整个世界陷入困境的瘟疫相比，寄生虫及其虫卵就算不上什么了。
公元2世纪中叶，一种似乎起源于热带非洲且传播迅猛的瘟疫，在安东尼·庇护在位期间（大概是在公元 156 年）传播到了阿拉伯半岛。公元166年年底，如今所称的“安东尼瘟疫”传播到了罗马，然后从一个人口聚集地传到另一个人口聚集地，迅速传遍了整个西地中海地区。[13] 整个罗马军团被瘟疫消灭，招募到的人员数量也大幅减少了。这场大流行是历史上首次有记载的瘟疫，从东南向西北蔓延，而其传播之势也完全不可预测。我们无法估算出究竟有多少人因此丧生，但死亡人数有可能多达罗马帝国总人口的三分之一。罗马著名的内科医生盖伦所描述的症状与天花最为接近，这是一种通过人与人之间的接触传播的疾病。在亚历山大港之类的大城市里，这种疾病先是潜伏起来，然后突然暴发。公元191 年罗马的一次大暴发中，每天都有2,000 多人丧生。“安东尼瘟疫”席卷了整个帝国；此时正值一个关键时刻，国际贸易联系发展到了一个新的成熟阶段。
尽管遭受了巨大的经济破坏与人口损失，但罗马帝国并没有崩溃，因为下一场大瘟疫要到公元249年才会再次暴发。人口数量很快恢复过来，因此“安东尼瘟疫”并未在人口方面留下长久的影响。这场瘟疫主要的短期后果，就是中断了基本的粮食生产与农业，饥荒则蔓延到了帝国的边远地区。在有些地方，城市居民还曾袭扰农村地区，夺走农村社区的粮食，因为城市居民觉得那些粮食本来就是他们的。帝国采取的一些重大政治调整措施，我们在此无须去关注，但变幻莫测的气候变化与不久之后就将露头的新病菌，暴露出了帝国的脆弱性。
天花的暴发与持续的干旱引起了普遍的悲观情绪。到了公元3世纪40年代末，迦太基主教西普里安身处日益干旱的北非地区，在作品中如此抱怨道：“世界日渐老耄，殊无往昔之生机……冬日既至，无充沛之甘霖，至种不润；夏之赤日，于麦田之上亦无往昔之焱焱。”[14] 他认为，当时的世界有如一个面色苍白、行将就木的老人，可他错了。
后勤与脆弱性（公元4世纪）
尽管西普里安主教如此悲观，但在公元4世纪的大部分时间里，罗马帝国却是一派欣欣向荣的气象。罗马仍然笼罩在一种特殊的光环之下。城中居住着大约70万人，每日配给的口粮是烤面包（而非谷物）、橄榄油以及葡萄酒，价格只有市场售价的零头；[15] 还有12万人获得猪肉救济。由于这些配给物资全都是免费的，故首都的人口急剧增长了。这一切的中心，是一个由国家掌管的庞大军事综合体。有50万人在战场上服役。一个复杂的后勤系统，为军队提供所有的装备、骑兵所用的坐骑和驮畜，还有军粮。仅仅军粮需求一项，就是帝国的一大负担，使之容易受到干旱以及其他一些比帝国当局意识到的更为严重的气候变化所影响。与此同时，公元330 年建成的君士坦丁堡（即如今的伊斯坦布尔）则成了正在崛起的东罗马帝国的中心。在公元4世纪，君士坦丁堡的人口增长到了原来的10倍，从3万人左右增长到了30万。
原本应当运往罗马的粮食，如今开始往东而去。诚如凯尔·哈珀恰如其分地指出的那样：“亚历山大港与君士坦丁堡之间的海上，往来的船只极多，以至于就像一条狭长的人造‘陆地’。”[16] 这座建于 4 世纪的城市，既是当时国际贸易的十字路口，也是一个重要的希腊文化中心。
幸运的是，此时的气候仍然相对宜人，气温较高，从而促进了经济增长；只不过，“罗马气候最宜期”那段天下太平的日子，却一去不复返了。尽管繁荣昌盛，但帝国依赖于集约化的单一栽培，尤其依赖于从北非进口的粮食。即便是在干旱年份，最可靠的粮食来源仍然是尼罗河流域，那里由季风雨导致的洪水似乎总是充沛得很。土地肥沃的泛滥平原与丰沛的洪水结合起来，就形成了一个天然的灌溉系统，而且早在法老们采取行动之前，人们就对那个系统进行了改造与利用。后来，罗马和帝国的大部分地区便靠埃及来养活了。
然而，就算是埃及巧妙的“尼罗尺”水位计，也无法预测出种种影响尼罗河泛滥的不可阻挡的长期气候变化。最重要的罪魁祸首，就是遥远的南方与东方的热带辐合带和印度洋季风，它们一直都在非常缓慢地逐渐南移。尼罗河的泛滥是否稳定，对此河沿岸的人类社会与文明有着极大的影响。
人们对纸莎草纸进行的细致研究表明，公元前 30 年屋大维（即后来的奥古斯都皇帝）吞并埃及之时，正值尼罗河泛滥稳定可靠，还出现过多场优质洪水的时期；这种情况，一直持续到了公元155年。从公元156年开始，泛滥就不再那么可靠，而一度富饶的埃及，粮食出口形势也受到了影响，并且常常是极其严重的影响。
除了季风变化，处于正指数的北大西洋涛动也导致了一些无法预测的情况。[17] 公元 3 世纪末，一段漫长的正指数北大西洋涛动期开始了，且在整个4世纪一直持续；其涛动水平之高，我们只在后来的“中世纪气候异常期”里才再次见到（参见第十一章）。高山冰川纷纷消退。不列颠的树木年轮记录表明，当时北欧与中欧地区的降雨量曾居高不下。法国和德国的橡树年轮，则记录了降雨量直到公元5世纪初都在不断增加的情况。但是，这个降雨充沛的时期并不长久。随后的3个世纪里，气候条件就没那么稳定了。铍同位素记录表明，当时的太阳辐射量（即到达地球的阳光量）出现了大幅下降。气候随之开始变冷，高山冰川也再次开始向前推进。地中海的南部边缘遭遇了严重干旱，让北非地区遭到了重创。城市里的粮食开始短缺，而富人们却试图从上涨的粮价中牟取暴利。黎凡特的沿海地区降雨稀少，长期以来都以降水无常而闻名。尽管后来及时出现了较为丰沛的大雨，但关于这场大旱的故事，却在犹太人的希伯来语作品中长久留传了下来。
冬季风暴的轨迹边缘，只在地中海上空一闪而过；热带季风与遥远的厄尔尼诺现象，导致帝国东部的降雨量不断地波动。干旱与饥荒，出现得更加频繁。公元383年，由于尼罗河的泛滥水位极低，故许多州的粮食都严重歉收。粮食开始普遍稀缺，情况极其严重，连相邻州也无法像过去一样运送粮食、相互帮助了。几个世纪以来，古罗马的哲人与诗人笔下描绘的，始终都是一个太平、仁义的世界。可如今呢，种种邪恶力量却降临到了人类头上。可想而知，当时的人们都以为，要么是公元4世纪刚刚皈依基督教的罗马帝国内的那个基督教上帝发怒而阻止了降雨，要么就是各州中那些尚未皈依者所信奉的异教神灵发怒而阻止了降雨。流行性疾病之所以不可避免地随着饥荒而暴发，部分原因就在于人们吃了实际上不能食用的东西或者有毒的食物，从而降低了他们对各种传染病的抵抗力。
马匹、匈人与恐怖场面（约公元370年至约公元450年）
西罗马帝国的东边，坐落着广袤的欧亚大草原，上面没有树木，只有一望无际的草地与灌木丛。那里的降雨毫无规律且变幻莫测，全然取决于来自西部的暴风雨的移动路径。古罗马人很瞧不起那些在无法耕作的大草原上到处流浪的游牧民族。古罗马人与中国的汉族都属于定居的农耕民族，可游牧民族却在不停地迁徙；他们骑马放牧，同时挤占定居民族的土地，先是袭扰中国中原王朝，后来又向西进犯。公元 4 世纪，一群群游牧的匈人出现在罗马帝国东部的边境。青藏高原的一系列桧树年轮表明，那里属于一种大陆性气候模式与季风气候相结合的环境。从公元350年前后至公元370年间，这个地区遭遇了 2,000 年来最严重的一个大旱时期。这一点，可能就是游牧部落开始向西迁徙的原因。[18]
气候导致人们进出干旱环境——人们在降水较充沛的时期进入这些地区，而在气候干旱时则离开——这种效果开始发挥作用。匈人应对干旱的办法，就是跳上马背、四下散开，为他们的牧群寻找水源较为充沛的牧场。大草原上的政治权力中心，也从西伯利亚的阿尔泰地区向西转移。这次突然迁徙的时间，与游牧民族形成的不同联盟之间展开激烈竞争的一个时期相一致。古罗马军人兼历史学家阿米亚努斯·马凯林努斯，曾经生动直观地描绘了匈人的情况：“虽具人形，然皆丑陋，生活坚忍，乃至无须用火，无须美食……几至臀不离鞍。”[19] 他们那种威力强大的反曲弓，据说射程达到了150米。他们所用的战术极其凶狠，令人生畏。
随着游牧民族不断从多瑙河中游地区向西迁徙，匈人的处境也到了紧要关头。公元378年，瓦林斯皇帝在哈德良堡附近的一场血战中被打败。有多达2万名罗马士兵在这场屠戮中丧生。公元405年至410年间，面对哥特人和后来其他民族的不断入侵，西罗马帝国逐渐衰亡了；入侵民族越过莱茵河，洗劫了高卢，并且向西征伐，远至西班牙。公元 395年狄奥多西一世皇帝死后，罗马帝国的东、西两半就再也没有统一到一个君主治下。公元410年，哥特人的统治者阿拉里克进入罗马。西罗马帝国的军事力量已经荡然无存，而罗马的实力也随之瓦解。阿提拉是所有匈人头领中最臭名昭著的一位，曾大肆劫掠了巴尔干地区。直到遭遇一场瘟疫，此人才在君士坦丁堡的城门前止了步；当时的君士坦丁堡，已因公元447年的一次大地震而遭到了重创。随后，阿提拉进军高卢和意大利，但因出现饥荒和军中流行在潮湿低地感染的疟疾而撤退，回到了大草原上。到6世纪时，由于始终须靠其他地方生产的粮食才能维生，故罗马的人口也急剧减少了。
公元4世纪之初，戴克里先与君士坦丁两位皇帝已经加强了对帝国行政的控制。他们宣称自己是神圣的君主，崛起于东部诸省的繁荣兴旺之中。戴克里先让皇帝变成了一位高高在上的国君，极其倚重礼仪上的治国方略来扩大自己的权力，与早期那些从一座城池迁往另一座城池的皇帝形成了鲜明的对比。君士坦丁大帝则把自己的都城建在海上，建在连接东方与西方的贸易线路上。他的统治，是罗马帝国晚期的根基。君士坦丁堡取代罗马，成了国际贸易的十字路口和一个重要的希腊文化中心。原本运往罗马的粮食，如今则转道往东而去。
没有什么比每年对帝国粮库进行审计更能突出皇权之显赫。归根结底，皇帝最基本的义务，就是养活手下的臣民。都城有50万居民，皇帝做任何事情都不能靠运气。一个庞大的官僚机构，控制着税收与粮食供应。都城的安全至关重要，而这种安全是靠提供粮食来保证的。饥荒的威胁曾经在罗马引发内乱，故首都有了大量的粮食储备，足以养活50万人；其中光是获得免费面包口粮的人，就达8万之多。与数个世纪以来的情况一样，君士坦丁堡的粮食供应也来自埃及。在查士丁尼统治时期（527—565），每年都从亚历山大港运来31万升小麦。[20]
每一年，皇帝都会登上自己的战车。整个帝国中权力为一人之下、万人之上的禁卫军首领，会亲吻皇帝的双脚。皇家的游行队伍开进城中繁忙的市场区，然后朝着金角湾那些巨大的公共仓库进发；一艘艘装载着货物的船只，就停泊在金角湾里。到了这儿，掌管粮仓的庾吏就会呈上他的账簿。如果一切都没问题，庾吏及其会计就会获得 10 磅黄金和丝绸长袍，以资奖励。这场煞费苦心、精心上演的公开盛事向所有人表明，帝国的粮食供应很安全。
查士丁尼大帝统治着一个真正全球性的和很不稳定的城市，其中到处都是来自已知世界各个角落的人与货物。当时的君士坦丁堡是一个国际化的大都市，位于众多较小城市组成的广袤网络的中心。不过，就在皇帝率领群臣巡察粮仓时，生态系统中的另一个成员却在暗中冷眼旁观着：那就是学名为Rattus rattus 的黑鼠。这种无处不在的啮齿类动物身上携带着鼠疫杆菌，也就是导致腺鼠疫的那种微生物。
瘟疫于541年传播到埃及，并在接下来的两个世纪里蔓延到了罗马帝国全境。史称“查士丁尼瘟疫”的这场疫病，起源于中国西部的高原地区。[21] 到了 6 世纪，无论是经由陆路还是横跨印度洋的那些古老的贸易线路，罗马帝国与亚洲之间的贸易都已是一桩大生意，尤其是胡椒与其他香料贸易。丝绸也是一种珍贵的商品，但其生产大多集中在红海地区。红海以西，是埃塞俄比亚地区信奉基督教的阿克苏姆王国，以东则是阿拉伯半岛南部的希木叶尔王国，该国当时信奉犹太教，并且脚踩两只船，与罗马和波斯都结了盟。这个地区极具战略意义。因此，公元571年伊斯兰教的先知穆罕默德选定在红海沿岸阿拉伯半岛一侧的麦加降生，也就不足为怪了。
病菌随着商人而来，而藏在船只运载的货物当中、已经感染了瘟疫的黑鼠也是如此。瘟疫首先出现于培琉喜阿姆，那里靠近红海北部的克莱斯马港（Clysma），从印度而来的船只经常在此停靠。从那里开始，瘟疫轻而易举地传到了尼罗河流域，然后进入了罗马帝国。登陆之后，瘟疫便朝着两个方向传播：一是往西传至亚历山大港，然后沿着尼罗河流域而上；二是往东，不但蔓延到了地中海沿岸，还传播到了整个叙利亚与美索不达米亚。罗马帝国那个高效的网络将瘟疫带到了内陆地区，但瘟疫经由海路传播得尤其迅速。1542年3 月，疫情扩散到了君士坦丁堡，并在城中持续了2个月之久。在疫情高峰期间，据说每天都有16,000人丧生。城中的50 万居民当中，死了25万至30万人。当地社会崩溃，市场关门，结果出现了饥荒。就连各级官吏，也十去其一。尽管人们将死者集中安葬在一座座大坑中，可尸体还是到处堆放着。许多死者层层叠叠，陷进“下方尸体渗出的浓液中”。以弗所的教士约翰曾目睹了当时的恐怖场景，并且撰文声称他看到的是“神烈怒的酒醡”[22] 。[23] 整个国家在这场灾难中摇摇欲坠。小麦价格暴跌，因为要供养的人口大幅减少了。一场严重的财政危机削弱了国家的力量，帝国几乎无力调动一支军队，更别提支付军饷了。东罗马帝国的人口，行将骤减。从542 年至619年，君士坦丁堡平均每15.4年就会遭到一场瘟疫重创。公元747年，由于有太多的人死于新的瘟疫，皇帝只得通过强制移民的方式往这座几乎荒无人烟的城市重置居民。
酷寒时代（公元450年至约公元700年）
在罗马历史上的这个关键时刻，从公元450年至公元700年前后这3个世纪不稳定的气候变化，逐渐演变成了较为显著的降温，从而有点儿像是到了“大冰期”。公元450年之前，北大西洋涛动处于正指数模式；可到了公元5世纪晚期，北大西洋涛动指数却突然转正为负，导致了长久稳定的暴风雨轨迹南移。地中海大部分地区的降水量都有所增加。与此同时，之前几百年里火山毫无动静的局面被打破，出现了猛烈的火山爆发。公元536年是一个“无夏之年”，阳光几乎没有带来多少温暖；大气中的火山灰还遮云蔽日，让太阳也变得暗淡无光。在帝国的东部地区，这个寒冷而不见阳光的年份则导致了葡萄酒产量大减。[24]
意大利政治家卡西奥多鲁斯曾经看到过一个蓝色的太阳。[25] 意大利当年的作物虽然歉收，但前一年的丰收弥补了粮食分配上的欠缺。公元536年这一年，不但给极北方的爱尔兰带来了饥荒，也让遥远的中国异乎寻常地感受到了夏季的寒冷。通过将冰芯、树木年轮以及全球火山爆发的实物证据结合起来，我们如今就能确定，公元6世纪三四十年代是火山活动最异常与最严重的20年。公元536年北半球的大规模火山爆发，曾经让3月的君士坦丁堡上空为火山灰所笼罩；这一年，正是2,000年来最寒冷的一年。欧洲夏季的平均气温，下降幅度高达2.5℃。公元539年至540年间热带地区一次更加猛烈的火山爆发，则让欧洲的气温再次下降了大约2.7℃。当时的寒冷程度，比17世纪处于巅峰状态时的“小冰期”更加严重。
幸好，公元535年的丰收在一定程度上暂时缓解了饥荒，而地中海地区那些农耕社会对作物歉收具有传统的韧性，这一点也发挥了作用。因此，这次饥荒的直接影响要比纯粹蔓延的饥荒更不易让人察觉。人们通常所谓的“古小冰期晚期”，充其量只能算是一个不恰当的标签；这个时期的降温，让帝国当局感受到了更大的压力，此前瘟疫的困扰与大草原上游牧部落在欧洲发动的密集袭击，早已让帝国当局不堪重负。公元500年前后，太阳活动已经开始大幅减少，导致太阳给地球带来的热量也少了。从公元6世纪30年代中期至公元7世纪80年代，太阳辐射量下降的同时，火山爆发也对全球的气温产生了影响。太阳辐射能锐减的幅度，甚至比17 世纪臭名昭著、气候酷寒的“蒙德极小期”里太阳辐射能的减幅还要大；至于详情，请参见第十三章。
气候变化的影响，与气候变化本身一样，向来都因地而异。北大西洋涛动的突然变化，已经让暴风雨的轨迹南移，给意大利本土和西西里带来了丰沛的降雨和洪水。强降雪、低气温与更多的雨水，对土耳其（安纳托利亚）以及更往东的广袤地区也产生了影响。更为频繁的霜冻，导致许多传统种植区的橄榄树都被冻死了。北非地区则经历了灾难性的干旱化，导致大莱普提斯（Lepcis Magna）这座伟大的城市被人们所遗弃，其中的房屋则埋入了黄沙之下。北非地区不再是一座粮仓了。
查士丁尼是一位积极主动的皇帝，他付出了巨大的心血，与气候变化导致的干旱做斗争，比如抗击持久的干旱。他下令修建了许多引水渠和大大小小的蓄水池，以及一座座战略性地分布于各地的粮仓。这位皇帝改善了粮食运输，命人开垦洪泛平原，并且让一些河流改了道。诚如一位作家所言，皇帝做到了“将林谷相连”和“让山海相接”。查士丁尼似乎认为，他可以像征服手下的臣民一样征服环境。但在他那个时代，各种大规模气候变化的力量都太过强大；一介凡夫，又怎能将其征服？
查士丁尼奋力熬过了环境变化与瘟疫造成的双重灾难，但“古小冰期晚期”的极端气候却逐渐将帝国推向了一个关键的转折点。帝国相互联结的各个地区，则以不同的方式来到了这个关键时刻。归根结底，罗马帝国是在各种环境原因的触发下，从内部缓慢衰亡的。
在地中海的东部，尼罗河流域已经变成一个经过精心改造和尽力组织的绿洲，因为罗马统治者的主要目的，就是让这里成为罗马的粮仓。粮食产自一个由沟渠、堤坝、抽水设施与车马组成的复杂系统，其中的各个方面都依赖大量劳力和异常艰辛的劳作。埃及人主要进行单一栽培，种植罗马与君士坦丁堡所需的小麦，除此之外几乎不种别的作物。在瘟疫导致罗马帝国诸城要供养的人口减少，使得小麦市场跌至谷底后，新收获的粮食供过于求，就给他们造成了巨大的经济损失。
一种末日将至的感觉，在整个罗马帝国蔓延。一桩桩灾难性事件的沉重打击，似乎历数了上帝的愤怒与审判，因为上帝惩罚的都是虔诚的信徒。从6世纪起，我们就有了基督徒进行忏悔游行、旨在为不同社群赎罪的最早史料。教皇大格列高利曾经组织一场声势浩大的祈祷活动，进行了长达 3天的祷告与诵经。唱诗班齐声诵唱赞美诗，祈祷队伍则穿过了整座城市。据说，在连续不断的祷告中，曾有80人支撑不住而死去。这样的仪式，就是在呼吁人们进行忏悔。但到了最后，随着伊斯兰大军将东部领地从罗马帝国分离出去，一种新的、来自阿拉伯半岛的易卜拉欣一神论思想开始盛行起来。君士坦丁堡获取埃及粮食的那条生命线，停止了运作。数个世纪以来，罗马帝国一直如走钢丝，在脆弱与韧性的夹缝中艰难存续着。但到了最后，自然界种种不可避免的力量还是削弱了帝国百姓的意志，使得他们再也无法承受更多的苦难了。
以任何标准来衡量，罗马帝国都像是一个庞大而复杂的企业，掌控着巨大的财富。帝国的历任皇帝，都面临着他们那些极其传统、协调良好的领地遭遇的诸多挑战。罗马帝国的衰落，是一个缓慢渐进的过程，从公元2世纪开始，一直持续到了8世纪。就像18世纪伟大的历史学家爱德华·吉本指出的那样，罗马帝国衰落的时间，比许多国家的整个兴亡过程更加漫长。[26] 这个内爆过程，并不是突然崩溃，而是一种缓慢的转变，是从一个严密控制和相对集权的帝国，变成了一个由不同社会和政治实体构成的组合体；其中的社会或者实体要么遭受了深重的苦难，甚至不复存在，要么就是兴旺发展起来了。罗马的繁荣发展，建立在奴役平民百姓尤其是奴隶的基础之上；帝国势力之所以曾经睥睨天下，是因为帝国具有优秀的军事组织能力，拥有高效地远距离运输粮食与其他商品的基础设施。帝国就是一种催化剂，使之容易受到短期与长期气候变化的影响。由于在运输和集中储存粮食方面付出了巨大的努力，因此出现相对短暂、只持续几年或者一二十年的气候事件时，帝国尚能应付。不过，随着干旱周期（尤其是特大干旱）变得更加旷日持久，对当地粮源与进口粮食的供应都造成了严重破坏，帝国的脆弱性就大大加剧了。加之罗马诸城不论大小，全都拥挤不堪，卫生条件恶劣，故像“安东尼瘟疫”与“查士丁尼瘟疫”这样的流行病既无可避免，也起到了决定性的作用。但是，尽管气候事件与瘟疫都属于转折点，我们也绝对不应忘记，经济与社会动荡，连同军事活动，常常是意外气候事件带来的冲击逐渐导致的。
工业化之前的所有文明，都依赖于人类的劳动与自给农业。为了满足日益增长的城镇市场和供养常备军队而进行的农业集约化，以及为了养活劳工、军队和各级官吏而广泛运用的食物配给等发展手段，都加剧了日益复杂的社会在面对气候变化时的脆弱性。对于不愿冒险的自给农民而言，粮食盈余向来都很重要，因为他们耕种土地的时候，始终都对饥荒与营养不良心存担忧。相比之下，不断发展的城市和帝国则日益依赖于小麦之类的高效单一作物，可这种作物对干旱、寒冷以及降水过多都很敏感。罗马和君士坦丁堡开始严重依赖于进口遥远地区种植的粮食，而在那些地区，基本粮食作物的单一栽培差不多变成了一种产业活动。这两座城市和其他主要人口中心的居民，再加上军队和官僚阶层，全都依靠政府分配的口粮；这种配给制度，确保了政治与社会的稳定。尼罗河流域、欧洲的部分地区和北非其他地区都变成了罗马帝国的粮仓。在灌溉用水充足的几十年里，这种情况没什么问题；不过，等到埃及的洪水泛滥不足、干旱在北非各地的农田肆虐后，一切便都土崩瓦解。粮仓里空空如也，饥荒随之而来，结果就出现了粮食骚乱。面对气候变化与瘟疫，富裕的精英阶层与经常饥肠辘辘的平民之间那道日益加深的鸿沟，不可阻挡地扩大了。从罗马帝国的残垣断壁中，兴起了一个不同的、更加支离破碎的世界。国家被一些更有意义的地方性文化结构所取代，它们以新的方式塑造了世界。
罗马帝国经历了一次又一次扩张，直到疆域从不列颠北部延伸到了美索不达米亚，并与更加遥远的地方有着贸易往来。这种扩张主要发生在气候条件相对有利的几个世纪里，将多种文化与经济纳入了一个单一而庞大的系统。其间，有许多政治人物都家喻户晓，比如尤利乌斯·恺撒、克娄巴特拉，以及许多各有优缺点的皇帝，比如奥古斯都、克劳狄乌斯、尼禄与哈德良等等。帝国是在制定了一系列经济、军事与政治战略的背景之下繁荣发展起来的，这一点值得注意，因为提出这些战略的人几乎没怎么花时间去思考长远的问题。无疑，他们也很少考虑那些会在他们有生之年过后出现的长期环境变化。尽管能够看出即将发生的种种灾难性气候变化，可我们如今的做法常常与他们没有什么两样。
帝国后期只能采取被动的对策，因为不同于如今的我们，罗马当时并没有重大气候变化（其中也包括了重大干旱）的预警机制。
回顾罗马帝国的解体与转型，我们很容易看出，其中有些方面与如今人们普遍关注的这个世界惊人地相似，只是我们面临的问题要重大得多罢了。易受气候变化的影响会带来种种危险，在这个方面，我们还有很多东西要向差不多2,000年前的帝王们学习。我们只要看一看如今食物链的全球化，就能明白这一点。相比于古罗马人来说，在面对重大的气候变化时，我们拥有调整自身食物链的潜在能力。不过，有一种可能性却始终存在：未来全球变暖的速度将有可能太快，规模有可能太大，以至于我们当中会有数万人甚至是数百万人挨饿。而且，如今几乎还没人从政治的角度来考虑这个问题。
[1] 由于我们两位作者都不是研究古罗马的专业人士，故本章在很大程度上参考了凯尔·哈珀（Kyle Harper）一部经过了严密论证的综合性著作：《罗马的命运：气候、疾病和帝国的终结》（The Fate of Rome: Climate, Disease, and the End of an Empire, Princeton, NJ: Princeton University Press, 2017）。哈珀汇集了广博的资料，讨论了气候变化与流行病在帝国漫长的崩溃过程中的核心作用。这是一部非凡的作品，有时会引发争论，有时又引人深思，可以引领读者巧妙掌握这一主题的纷繁难懂之处。当然，在这里进行简要总结的时候，我们忽略了其中的许多争议与意见不一的地方。哈珀的作品当中，还含有一份全面的参考书目。亦请参见Rebecca Storey and Glenn R. Storey, Rome and the Classic Maya (Abingdon, UK: Routledge, 2017)。
[2] 对于古罗马气候的概述，请参见Kyle Harper and M. McCormick, “Reconstructing the Roman Climate,” in The Science of Roman History, ed. W. Scheidel (Princeton, NJ: Princeton University Press, in preparation)。还有一份重要的综合性资料：Michael McCormick et al., “Climate Change During and After the Roman Empire: Reconstructing the Past from Scientific and Historical Evidence,” Journal of Interdisciplinary History 43, no. 2 (2012): 169–220。关于“奥克莫克二号”火山喷发的资料：Joseph R. McConnell et al., “Extreme Climate After Massive Eruption of Alaska’ s Okmok Volcano in 43 BCE and Effects on the Late Roman Republic and Ptolomaic Kingdom,”Proceedings of the National Academy of Sciences 117, no. 27(July 7, 2020): 15443–15449. doi: 10.1073/pnas.2002722117。
[3] 老普林尼（Pliny the Elder，23—79），古罗马时期一位百科全书式的作家兼博物学家，代表作是《自然史》（Natural History ）。其拉丁语全名为盖乌斯·普林尼·塞孔都斯（Gaius Plinius Secundus），因其养子也叫普林尼，故冠以“老”“小”来加以区别。——译者注
[4] 暗渠是指坡度平缓的地下渠道或者隧道，利用含水层或者深井来灌溉农田。它们在伊朗被称为“坎儿井”，在中东和北非地区广泛应用了数个世纪。它们基本上属于地下引水渠。
[5] 该段的引文与来源：Harper, Fate of Rome, 53–54。
[6] Harper, Fate of Rome, 54.
[7] 关于西格韦尔斯（Sigwells）：Richard Tabor, Cadbury Castle: The Hillfort and Landscapes (Stroud, UK: History Press, 2008), 130–142。关于卡茨戈尔（Catsgore）： R. Leech, Excavations at Catsgore, 1970–1973 (Bristol, UK: Western Archaeological Trust, 1982)。
[8] Harper, Fate of Rome, 57.
[9] Harper, Fate of Rome, 57–58.
[10] 1 斗相当于1配克（peck），或者约合9升的干量货物。
[11] 这几段以哈珀的《罗马的命运》第92页至98页论述为基础。对于印度洋上的海运与贸易进行的总结，参见Brian Fagan, Beyond the Blue Horizon: How the Earliest Mariners Unlocked the Secrets of the Oceans (New York: Bloomsbury Press, 2012), chaps. 7 to 9。
[12] Hui-Yuan Yeh et al., “Early Evidence for Travel with
Infectious Diseases Along the Silk Road: Intestinal Parasites
from 2000-Year-Old Personal Hygiene Sticks in a Latrine at
Xuanquanzhi Relay Station in China,” Journal of
Archaeological Science: Reports 9 (2016): 758–764.
[13] William H. McNeill, Plagues and Peoples (New York: Doubleday, 1976), and Harper, Fate of Rome, chap. 3，都论及了“安东尼瘟疫”。
[14] 西普里安（约200—258）虽有柏柏尔人的血统，但后来成了迦太基主教，他同时也是一位著名的早期基督教作家。他描述的那场瘟疫，后来就以他的名字命名。引自Harper, Fate of Rome, 130。
[15] 整体概述请参见 Lucy Grig and Gavin Kelly, eds., Two Romes: Rome and Con-stantinople in Late Antiquity (Oxford: Oxford University Press, 2012)。
[16] Harper, Fate of Rome, 185.
[17] M. Finné et al., “Climate in the Eastern Mediterranean, and Adjacent Regions During the Past 6000 Years — a Review,”Journal of Archaeological Science 38 (2011): 3153–3173.
[18] E. Cook, “Megadroughts, ENSO, and the Invasion of Late Roman Europe by the Huns and Avars,” in The Ancient Mediterranean Environment Between Science and History, ed. William Harris (Leiden: Brill, 2013), 89–102. See also Q Bin Zhang et al., “A 2,326-Year Tree-ring Record of Climate Variability on the Northeastern Qinghai-Tibetan Plateau,”Geophysical Research Letters 30, no. 14 (2003). doi: 10.1029/2003GL017425.
[19] 引自Harper, Fate of Rome, 192。阿米亚努斯·马凯林努斯（Ammianus Marcellinus，约 330—约395）既是一名战士，也是古罗马最后一位了不起的历史学家。他的主要作品是《大事编年史》（Res gestae），这是一部从塔西佗结束之处写起的31卷本历史巨著，前13卷现已佚失。
[20] Described by Harper, Fate of Rome, 199–200.
[21] 在概述“查士丁尼瘟疫”时，我们主要参考了哈珀的《罗马的命运》第6章。然而，关于这场瘟疫的地方性影响和随之而来的死亡率，以及鼠疫杆菌的历史，我们还需要了解更多的信息。亦请参见William Rosen, Justinian’ s Flea (New York: Penguin Books, 2008)。
[22] 神烈怒的酒醡，语出《圣经·启示录》中的19：15。原文为“他必用铁杖辖管他们，并要踹全能神烈怒的酒醡”。“酒醡”是古时榨酒的器具。——译者注
[23] 以弗所的约翰（约507—588）曾是叙利亚正教会的领袖兼历史学家。他的《教会史》（Ecclesiastical History）中的第三部分论及了“查士丁尼瘟疫”，其中的内容都是他目睹的第一手资料。他认为那是神之震怒的征兆。引自Harper, Fate of Rome, 227。
[24] Stuart J. Borsch, “Environment and Population: The Collapse of Large Irrigation Systems Reconsidered,” Comparative Studies in Society and History 46, no. 3 (2004):451–468，以及该作者的其他论文。
[25] 古罗马政治家卡西奥多鲁斯（约485—585）也是一位可敬的学者与作家。此人在爱奥尼亚海边的庄园里修建了维瓦留姆修道院，专门用于阅读和抄录手稿。
[26] 爱德华·吉本（1737—1794）是一位历史学家兼下院议员，著有不朽之作《罗马帝国衰亡史》。此书出版于1776年至1788年间，总计6卷。Edward Gibbon and David P. Womersley, History of the Decline and Fall of the Roman Empire, 3 vols. (London: Penguin Press, 1994).
第六章玛雅文明之变（约公元前1000年至公元15世纪）
古罗马人曾经运气很好。公元前200年以后，在时间长达4个世纪和横跨地中海世界大部分地区的那种气候相对稳定、温暖与湿润的环境下，罗马帝国曾经繁荣发展和不断扩张，达到了鼎盛时期。他们建立了一个以集约化农业为基础的辽阔王国，但在很大程度上并未意识到，支撑他们那座看似不可战胜的大厦的环境基础已经岌岌可危。当时的帝国似乎注定会永垂不朽，注定是一个将永远存续下去的统治实体。许多人都以为，假如帝国真的衰亡，那就意味着世界末日到了。
虔诚的古罗马人都认为，人类的未来掌控在神灵的手中，无论有众多神灵还是只有一个神灵，都是如此。罗马帝国的历任皇帝之所以像古时的许多统治者一样，强调他们与神灵之间具有密切的联系，原因就在于此。然而，我们在前一章中已经看到，众神未能对公元3世纪以后气候不稳定的严酷现实进行干预；这些现实，最终削弱了一个深受复杂的气候、政治和社会压力所困，还暴发了灾难性瘟疫的帝国。罗马与君士坦丁堡这两座大城市，在一个被不断扩张的伊斯兰教所包围、发生了变革的中世纪世界中幸存了下来，只是实力大大下降了。地球围绕太阳公转时地轴倾角的细微变化与强大的火山活动，助长了欧洲与地中海地区的动荡不安和危险局势，从而导致了所谓的“黑暗时代”。不过，在深入探讨这种交织着气候变化、政治与战争的混乱局面之前，我们必须走得更远一点，因为公元1千纪早期较为暖和与稳定的环境条件，还在美洲促生出了一些令人惊叹的文明。
无论是墨西哥中部高原上靠近墨西哥城实力强大的城邦特奥蒂瓦坎，还是尤卡坦低地上具有多样性的玛雅文明，都在公元1千纪的中美洲取得了伟大的成就。[1] 玛雅统治者声称自己拥有神圣的血统，并且利用精明的商业头脑，涉及政治结盟与联姻的专业外交手段，结合精英武士阶层中偶尔爆发的战争，统治着那里。他们掌管着一个个动荡不安、以令人眼花缭乱的速度兴亡更替的王国。在其鼎盛时期，从公元250年左右一直持续到了公元900年前后的古典玛雅文明，包括大约40个城市与王国。[2] 但在公元10世纪，南方低地上的古典玛雅文明却解体了；那里如今属于危地马拉的佩滕。王朝接连瓦解，城市纷纷崩溃，城中居民则散布到了乡间的村落里。大量人口南迁到了如今的洪都拉斯，就像印度河流域诸民族在其文明解体之后迁徙到了拉贾斯坦一样。一度人口稠密、有人耕作的农田都变成了森林，后来一直都没有复原。
古典玛雅文明的剧变，吸引了一代又一代学者进行研究；不过，只是在过去的大约20年里，气候变化以及由此带来的干旱与洪水才变成了这种学术性讨论中的一个主要方面。新的一代代更准确的气候资料，将揭示一段错综复杂的历史，而其中涉及的，也远不止干旱与飓风。
低地与君主（约公元前1000年至约公元900 年）
尤卡坦半岛上的玛雅中央低地的环境，对于以分散的社区中众多分散的农庄为生存之根本的自给农民而言，极具挑战性，而对那些由野心勃勃的君主所统治的、复杂而又竞争激烈的城邦而言，就更是如此了。[3] 然而，玛雅人却在这个一度森林密布的高原上耕作和生存了2,000多年；高原由松质岩石构成，耸立于半岛之上。那里的现实情况，实在令人生畏。季节性的降雨极其变幻莫测，只是在炎热的夏季里出现通常短暂的暴风雨时，才会降雨。冬季则气候干燥。雨水会迅速渗过基岩。差不多所有的低地上，都没有任何形式的永久性水源供应，只是散布着一些相距遥远的泉眼。这种含水层，位于地表以下100米或更深处，故人们很难获取。再加上偶尔会有长达10年或者100 年的干旱，水源就成了人们在这里生存下去的最关键因素。在这种时期，蒸散——来自海洋、湖泊、植物冠层和其他源头并且超过了降雨量的水运动——就至关重要了。
茂密的季节性雨林，覆盖着这片土地；当时，此地还没有被人们开垦出来进行耕作。深度各异的肥沃土壤上，植物生长得茂盛茁壮。低洼地带全是深达1米的黏土，雨季降水形成的径流都汇入其中，形成了一个个弥足珍贵的季节性湿地。磷是植物的一种有限养分，主要由森林的冠层获取，然后会被雨水冲刷到土壤里。为了生产出越来越多的粮食盈余，满足数量日增的城邦所需、满足贪得无厌的城邦头领的要求，人们必须进行多样化和高效的农业耕作，并且深入了解复杂的低地环境。
第六章里提及的考古遗址
公元前1000 年至公元前 400 年间，有大批玛雅农民迁徙到了尤卡坦低地上；其中许多农民都来自墨西哥湾沿海，那里曾经出现过一个个繁荣兴旺的奥尔梅克社会。尤卡坦半岛上的本地农业繁荣已久，人们在这里栽培作物，并且在数个世纪的时间中对这里的森林环境有了深入的了解。[4] 到公元前600年时，他们正在兴建一座座巨大的金字塔，将祖先安葬于其中的平台或者其他结构里。这些金字塔成了他们礼敬祖先的圣地，家谱则成了他们申明自己对某些地方拥有所有权的重要方式。在几个世纪的时间里，他们的后人建造了一些庞大的建筑群，一座座精美的建筑物上都装饰着神灵与祖先的灰泥面具。由此诞生了“查尔阿霍”（ch’ul ahau）这种神圣王权制度，“查尔阿霍”也就是“圣主”的意思。伟大的埃尔米拉多城就说明了这一点。数个世代以来，当地农民都是靠耕作公元前150 年至公元50年间开垦的湿地为生。
这几个世纪，正是玛雅人开始大规模地改造当地环境的时候。此时，他们不但需要养活越来越多的农民，还需要养活越来越多不从事粮食生产的人。他们曾移走数百万立方米的泥土，修建了水库、沟渠和池塘，为旱季蓄水。埃尔米拉多城在其鼎盛时期，曾占地16平方千米；它位于一个洼地之中，是靠洼地里的水源供应发展起来的。随着人口增加，为养活民众而进行的环境改造和兴建公共建筑对公共劳动力的需求也增加了。在数代人的时间里，社会不平等变成了一种常态；常常与执政的君主关系密切的特权精英阶层，也日益疏离了平民百姓。
埃尔米拉多城在突然之间就崩溃了；至于原因，部分在于过度砍伐森林，部分在于地表径流和侵蚀破坏了周围的湿地，使之成了降水丰沛的牺牲品。数个世纪以来，当地农民都是靠湿地来种植大量的粮食作物，提供此城所需的粮食盈余。但是，由于这个城邦有大量的非农业人口，所以等到平民百姓无法养活精英阶层的时候，其政治与社会基础就受到了威胁。对所有人而言，唯一的对策就成了迁徙，即随着城市中心日渐没落，迁徙到农村地区一些规模较小的定居地去。到了公元250年，玛雅人的政治重心已经转移到了中央低地；那里的一些新兴中心，比如卡拉克穆尔和蒂卡尔，已经在降水较为丰沛的一个时期里发展成了实力强大的城邦。从破译的象形文字中，我们得知了一些城邦君主的情况；这些象形文字，向我们揭示了一幅外交、贸易与战争交织且不断变化的图景。城邦的一切，都以王权制度为中心；这种王权按照可以追溯至一位开国祖先的朝代更替顺序，由父传子或者由兄传弟。玛雅文明不同于古埃及或者罗马帝国，从来就没有形成一个高度集权的国家，而是由大小不一的政治单元所组成；那些政治单元，最终演变成了四大城邦和无数个较小的王国。这是一个竞争极其激烈的社会，由一些实力强大的王朝统治着；它们的大本营，就是蒂卡尔、卡拉克穆尔、帕伦克和科潘之类的重要中心。
埃尔米拉多城衰落之后，蒂卡尔与附近的瓦哈克通便乘虚而入，填补了由此留下的政治真空。公元1世纪，一个精英阶层开始在蒂卡尔掌权；那里的象形文字资料表明，从公元292 年至869 年间，蒂卡尔历经31位统治者，实行了大约577年的王朝统治。这个实力强大的新兴城邦，逐渐变成了一个多中心的王国，然后在公元557年被一个崛起中的国家的君主征服了；那个国家就是卡拉科尔，位于如今的伯利兹境内。
到了公元650年，主要的贵族王朝都曾主持过一些繁复的公开仪式，以确认他们的精神血统与政治权力。他们把自己的行为与神灵、祖先的行为联系起来，有时还会通过声称他们的血统重现了神话事件来将其合法化。他们把他们的历史与当下以及超自然的来世联系起来，并将社会嵌入一个由神圣的地点与时间所组成的环境里。一位玛雅君主会煞费苦心地宣称，他是在世者、祖先和超自然世界之间的媒介。这一点，就是统治者与被统治者之间一种不成文社会契约的基础；这里的被统治者就是千千万万玛雅人，他们付出了巨大的环境代价，支撑着小小的一撮精英。玛雅低地上的人口，出现了急剧增长。肥沃程度一般的雨林土壤上，作物收成却越来越少。就算是短暂的干旱周期，也会危及宝贵的水源供应，尽管有作物多样化这样历史悠久的惯例，也是如此。这片土地迟早无法再养活大量的非农业人口。
并不是说贵族们没有意识到气候变化带来的危险。实际情况恰恰相反，因为在他们统治的数个世纪中，气候正在逐渐变得干旱起来。他们生活中的大部分仪式，都是以水源与降雨为中心。蒂卡尔的统治者还匠心独运，建造了一些神庙金字塔，对全年的富余雨水加以控制，因为这些金字塔的四面能将雨水导入蓄水池，用于灌溉附近的田地。玛雅统治者应对人口增长的措施，就是修建蓄水池和范围往往相当广泛的水源控制系统来蓄水，以应对干旱年份。
玛雅农民之古今
公元3世纪至10世纪间，整个玛雅低地上猛增了数百个大小不一的定居地，靠形式异常多样的农业耕作为生。其中，既有在森林空地上进行的刀耕火种式农业（称为火耕农业），在坡地上进行的梯田栽培，也有利用沼泽和湿地中的台田进行的耕作，这种台田农业的不同凡响之处在于积极地利用环境和稀缺的水源。许多农民还有各种各样的农户庭园，栽种着大量的植物与树木。在地方层面上，玛雅农民管理着森林、蓄水，并且充分利用了整个低地上的不同土地与食物资源。他们干得非常成功，故在长达4,000年的时间里应对好了艰苦环境带来的种种风险。他们之所以做到这一点，是因为他们最深入地了解了自身所处环境的具体情况，并且建立和维持着各种集约化的粮食生产体系；在公元9世纪玛雅文明遇到严重问题之前，他们至少经受了两次漫长干旱的考验。
幸运的是，中美洲低地的玛雅农民后裔，如今仍然在那种严苛的低地环境里繁衍生息。现代村民采用的许多惯常做法，很早以前就一直存在；这就说明，它们可以让我们深入了解到，以前的人是如何应对干旱、作物歉收以及其他一些意外的气候灾难的。现代玛雅农业的多样性可谓惊人，这是他们针对从人口密度不断上升到当地土壤质量以及降雨模式变化等一切因素所做的反应。即便是作物混种，也会根据环境条件而逐年、逐季变化。比方说，伯利兹的凯克奇玛雅人如今仍然以传统农业为生。他们在排水不良的地区耕作台田、山坡梯田，雨季则会利用“火耕农业”这种刀耕火种式耕作系统。[5] 凯克奇人的旱季河岸农业，就是那种需要长期经验的机会主义创新能力的一个例子。每个农民都须平衡好气候条件、植被再生与其他的任务之间的关系。玉米越早播种下地越好，因为这种作物须趁着土壤仍然湿润的时候播种，才会有一个良好的开端。每年旱季开始的时间差别很大，会让问题变得很复杂，而收获火耕农业作物这一关键任务也是如此。假如火耕农业的收成不错，那么旱季耕作就不要那么多时间了。收成不好，则意味着他们要花更多的时间进行清理和耕种。
这样的河岸农业，是在一种规模更大的自给周期之内进行的。其中的关键词是“周期”，因为这有助于我们揭开玛雅人年复一年地应对毫无规律可言的气候变化时所用的策略；实际上，其他许多自给农民也是如此。这样一种周期性的生存，意味着靠土地为生的人会把时间看成一个无穷无尽的循环。他们的祖先经历了同样的周期：播种、作物生长、收获，然后是一个宁静的季节。这样的生活，具有一种始终不变、取决于作物与降雨的恒久性。
这种有力的设想，赋予了受人尊敬的祖先一种核心作用。玛雅的王公贵族之所以强调他们与神圣祖先之间的亲密关系，埃及的法老们之所以举行繁复的公开仪式来证明他们作为神圣统治者的合法角色，都有着迫不得已的理由。王公贵族与祖先之间的各种关系往往具有权威性，并且执着于精神上的合法性。祖先与在世者之间的联系，深入渗透到了乡村生活当中，而在乡村环境下，人类的生存曾经依赖于他们与环境、降雨、植被以及土壤肥力之间的密切关系，至今依然如此。当今凯克奇人依赖常识、详尽的环境知识加上一种根深蒂固的信念，即认为祖先积累的经验对于生存来说是一项宝贵的遗产。
在这个地区，祖先的经验无疑是一项宝贵的遗产。以前，这个地区一度人口稠密，高度依赖于农民的耕作技术，还有降雨。[6] 这里的人口，在公元 700 年至 800 年间达到了峰值。当时，人口密度达每平方千米600人至1 200人的情况并不少见。据估计，生活在这些低地上的人口曾经达到了惊人的1,100万。其中大多数人都没有住在那些大城市里，而是以单个家庭的形式广泛分布于当地环境中，生活在所有的非城市地区。这种模式，与柬埔寨吴哥城周围的情况并无不同；在第九章里，我们将介绍后者的情况。可叹的是，无论是在柬埔寨还是在这里，城市腹地的任何一种环境压力，都增加了当地出现重大的社会动荡与政治动荡的概率。到了公元8世纪，南部低地的古玛雅文明就行将没落下去了。
假如生活在公元8世纪末的玛雅低地，那么，您会居住在一个经过了人为改造、气候正在逐渐变得干旱起来，与数个世纪之前截然不同的环境中。随着人口增长和作物产量下降，环境改造形成的累积效应加速了。清除了植被的地区与有所管理的森林、田地、城市结合起来，将大部分低地变成了一个人工改造的环境。当然，密集的人口几乎向来都会导致乱砍滥伐，而树木减少则导致了气温上升和降水减少。此外，焚烧木柴与庄稼还会导致大气中的灰尘与污染物含量增加。
随着低地上的定居地增加，不透水地表的面积也急剧扩大了。建筑活动的增加与耕地面积的扩张，进一步减少了磷的捕获量，增加了磷的沉积量。在以前的数个世纪里，高地上的沉积物会被冲刷到下方湿地农业多产高效的洪泛平原上，但农民们通过广泛采用坡地梯田进行耕作而减少了沉积磷的流失。仅仅是维护农田系统，以及不断增加的沟渠、池塘和水库，就需要成千上万的平民百姓和整个整个村庄的劳动力。施肥、培土与除草等日常工作，也是如此。
光是滥伐森林造成的长期影响，就具有毁灭性。到了公元前600年，危地马拉北部佩滕的大部分森林已经被砍伐殆尽。砍伐森林的做法持续到了公元9世纪，直至人类改造过的土地上大部分林木植被全都消失。持续不断地滥伐森林、改变土地用途以及玛雅农业造成的环境恶化等方面结合起来，形成的长期效应就导致了降雨减少、气温升高和水资源日益短缺等后果。[7] 这些方面，全都截然不同于自然出现的干旱周期。但在一个严重干旱的时期，一旦森林差不多被砍伐殆尽，农民采用的种种持续性适应对策就不会成功。政治不稳定与社会动荡随之而来，玛雅文明也就此分崩离析了。人类和环境系统到达了一个转折点，从而导致了文化衰落和最终的人口减少。
转折点之后（公元8世纪至10世纪）
古典玛雅文明在这些低地的衰落，是人类与环境之间不断变化的关系带来了各种压力，再加上干旱周期激增导致的。但与此相关的，却远非食物供应与水源之类的基本要素。维持玛雅文明持续发展的决定性因素变得太过复杂和难以承受的时刻，已经到来——至少对统治者与精英阶层来说，就是如此。由于精英阶层根植于玛雅社会种种复杂的社会经济、意识形态和政治层面之中，所以维持或者发展这个系统的障碍极其巨大——或许巨大到了什么都不做让他们觉得更加容易的程度。公元9世纪中央低地的玛雅文明发生巨变，原因并不是单一的；只不过，世人如今对这种判断仍然存有重大争议。[8]
研究人员曾经在北部低地的奇恰卡纳布湖（Lake Chichan canab）中钻取岩芯，表明那里在公元800年至1000年间出现过严重的干旱；此后人们就一直认为，气候变化是玛雅文明没落的一个主要诱因；更具体地说，干旱就是罪魁祸首。[9] 湖芯表明，公元750年至1100年间，这里的气候普遍较为干旱；从加勒比海的卡里亚科海盆中钻取的一段深海岩芯，也表明这里有过多年的干旱，比如公元760年、810年、860 年和910 年。然而，湖泊与海洋岩芯显示的信息，并没有达到必要的精确性。因此，许多专家往往低估了气候在古典玛雅文明衰落中的作用。
新一代的研究，则得出了更加精确的干旱与降水信息。玛雅低地南部约克巴鲁姆洞穴（Yok Balum Cave）中一根56厘米长的石笋，为我们提供了一种精确的、时间长达 2,000年的气候序列。[10] 约克巴鲁姆洞穴中的这根霰石（一种碳酸钙矿物质）石笋之所以尤为重要，是因为它不但生长得相当迅速，而且持续生长了2,000年之久。研究人员利用铀系断代法，从中获得了不少于40个降雨年代的数据；它们精确到了5年至10年之内，并且与其他来源的气候数据非常吻合。公元440年至660年间，这个地区的降水异常丰沛，而在随后的三个半世纪里，气候则逐渐变得干旱起来。这种变化，最终以公元1000年至1100年间一场旷日持久、极其严重的大旱而告结束；那场大旱，也是2,000年里旱情最严峻的一次。情况还不止如此。公元820年至870 年间的一场大旱，持续了半个世纪之久，而公元930年左右，又发生过一场程度较轻的旱灾。从约克巴鲁姆洞穴石笋中获取的气候信息，与低地其他地方对公元820年至900年的一场严重干旱的记载，以及对公元1000年至1100年间另一场旱灾的历史记载完全吻合。
以任何标准来看，我们从一系列证据中了解到的这些干旱，都是旷日持久的干旱时期；它们必定给一个降水变幻莫测的地区的农耕社会带来了严重的影响。干旱年份对作物收成与农业生产力具有显而易见的影响。假如雨季姗姗来迟或者作物绝收，这些影响便尤其严重。公元1千纪末期出现的干旱，却要另当别论。它们都持续了几十年，甚至是几个世纪之久。
诚如考古学家道格·肯尼特（Doug Kennett）与气候学家大卫·霍德尔（David Hodell）指出的那样，农业干旱与水文干旱之间有一种重大的区别。前者是由雨水不足、蒸发增加使得土壤变干燥造成的，最终会导致作物歉收。在此期间，湖泊水位、河流流量和地下水供应却有可能在数年之内都不受影响。玛雅人很清楚，他们必须节约用水。这样的策略，虽然在短期和稍长时期内都有效，但在很大程度上取决于消耗水资源的人口密度。假如干旱周期旷日持久，或者异常严重，那么，随着水源枯竭或供应稀缺，就会出现水文干旱。如此一来，就有可能造成严重的社会经济后果，而当人口密度不断上升、水和环境中的其他资源供不应求时，影响则尤为严重。
导致玛雅文明没落的，远非干旱一个方面。玛雅社会属于一个金字塔式的社会，由一小部分精英统治着；他们把武力和精心打造的意识形态结合起来，享有无上的权力。他们的生活水平比工匠与平民百姓高得多，几乎所有的财富都集中在贵族手中。同时，他们还控制着像黑曜岩与盐之类的重要资源，以及像天文学、数学与历法这样的复杂知识。与民众之间的这种不成文社会契约，就是精英阶层在意识形态、物质和精神上具有权威的保证。但是，随着他们煞费苦心地制定的种种统治机制变得比以往更加复杂、更加保守，问题的解决也变得日益棘手起来。
维护精英阶层的权威、政治权力和财富并将其合法化，成了一项越来越复杂的任务，涵盖了从维护基础设施到开垦湿地、掌管军队进行防御以及袭扰邻邦等各个方面。当时的城邦都是君主制国家，由思想僵化但实力强大的君主统治着；百姓都认为，这些君主拥有半神的种种力量。除了他们自己那一幢幢奢侈华丽的宫殿豪宅，他们还强征大量的粮食盈余，用业已习惯的做派供养着宫廷、各级官吏，以及一个树大根深的精英阶层。他们实施的军事征伐，需要获得百姓的支持。无数技术熟练的建筑师、工匠、书吏以及非农人口也是如此，他们需要口粮和其他商品才能工作。当时的主要粮食作物，就是玉米；这种粮食极其重要，在公共仪式、私人仪式和艺术当中都扮演着重要的角色。不过，玉米属于热带作物，几乎不可能在玛雅低地这种潮湿的环境中长久储存。其他作物包括豆类、南瓜和辣椒，但无论以哪种作物为食，每个玛雅农民都必须养活自己的家人，同时为下个季节留出充足的种子。此外，每个农户还要向统治者和精英阶层上缴粮食、提供劳役，以维持众多相互争斗的王国中日益苛刻和错综复杂的上层建筑。再加上作物和土壤生产力不同，还有地形以及最重要的水源供应等因素，就使得哪怕是对短期的气候变化迅速做出反应，也成了一项他们难以应对的任务。
到了8世纪末，统治者已经无力兑现他们对社会所做的承诺，尤其是无力在干旱持续时通过大量水库提供清洁水源了。此时，已有数百年历史的经济与政治结构，连同其中半神一般的君主，都陷入了严重的没落之境。在一个被激烈的竞争与林立的派系所撕裂的社会中，统治者对被统治者的严苛要求在贫富之间造成了一种直接而持久的紧张关系。一切所依赖的，乃是一个最终有可能难以为继的自给农业社会；可这个社会，却生活在一个深受降水不足、干旱无法预测且旷日持久两个方面困扰的地区。
权威无能造成的不利政治影响是极其巨大的。尽管古代玛雅社会具有多样性，但也具有许多共同的文化传统，其中就包括了至关重要的神圣王权制度。这里的国王或者女王，就是较大的王国与无数等级不一、面积较小、效忠情况也不断变化的领地之间种种不稳定关系中的主角。每一位玛雅统治者，都生活在一种充满政治色彩的环境下，其中既有短暂的结盟与贸易网络，也有和祖先之间的亲缘关系。但归根结底，效忠与文化联系都具有地方性；这一点，也使得他们几乎不可能采取全面的措施来应对气候变化。
科潘解体（公元435年至1150年）
随着一度强大的城邦纷纷解体，工匠和平民都分散到了城市腹地，或者迁往其他地方以寻找机会。例如，洪都拉斯境内的科潘是一个宏伟壮观的玛雅文明中心，那里点缀着许多金字塔和广场，占地面积达12公顷。[11] 公元435年12月11 日之后的4个世纪里，有一个实力强大的王朝统治着科潘王国；这个王朝的开创者，是雅克·库克·莫［K’inich Yak Ku’k Mo’，或称“伟大的太阳神绿咬鹃金刚鹦鹉”（Great Sun Quetzal Macaw）——金刚鹦鹉与绿咬鹃是两种羽毛鲜艳的鸟类］。
人们在科潘周围长期进行的实地考察，记录了这个“太阳鸟”王朝治下的400年间人口方面的巨大变化。公元550年至公元700年间，王国的人口曾经急剧增长。人们都住在中心区及其周边地区附近，只有少量的农村人口。人口和社会结构的复杂程度都增加了，一直发展到有18,000人至2万人生活在科潘河谷里；至于其核心区域，每平方千米则有大约500 人。似乎每隔80年到100年，这里的人口就会翻一番。农村人口仍然非常分散，但农民此时开始耕种不太理想的山麓之地，以增加作物的收成。
不过，变化即将出现。公元749年，一位名号叫“烟壳”（Smoke Shell）的君主登上王位，统治了这个一度伟大的城邦。在一个派系斗争激烈、内部局势紧张的时代，此人开始疯狂地大兴土木；其中，有些工程就是降雨减少的现实情况引发的。当时的政治秩序似乎已经改变，因为一些小贵族纷纷请人给自家的房子刻下铭文，仿佛他们是在一个政治权威日渐衰落的时代，以此来维护自身的权利。意义深远的人口变化与政治变革，也随之而来。“烟壳”王朝的统治在公元810 年终结，城市人口也正是从此时开始减少。40年的时间里，住在城市中心及其边缘的人口中，差不多一半都迁走了，可农村人口却增长了 20%。随着连贫瘠耕地也被过度开发和土壤不受控制地遭到侵蚀，由此形成的累积效应开始带来恶果，而一些小型的地区性定居地便取代了大型的城市中心。1150 年，生活在科潘河谷中的人口已不过5,000至8,000人了。
科潘的人口外迁，既是人们对作物产量下降和城市生活快速发展所做的一种合理反应，也是他们对严重干旱的一种传统反应，与许多古代社会无异。这种外迁，并非只有这里出现过。在蒂卡尔和卡拉克穆尔等中心城市的腹地进行的长期研究已经提供了充足的证据，表明当时密集的城市人口正在减少。公元8世纪以后，南部低地上的广大地区都已为人们所遗弃，后来再也没有人口聚居；就连西班牙殖民者对美洲进行了“武装远征”之后，也依然如此。玛雅人口的增长，依赖于一种不考虑漫长干旱等长期问题的农业系统。在这种文明的鼎盛期，居住在这些低地上的玛雅人或许多达1 100万，比如今生活在那里的人口还要多。到了此时，这个农业系统再也无法扩大，再也无法生产出贪得无厌的精英阶层所需的种种财富。就像科潘和蒂卡尔一样，那些一度很有影响力的城邦，就只有没落和人口外迁的路可走了。
许多记载玛雅人口疏散的文献都会给人一种印象，似乎玛雅诸社会当时通通解了体。实际情况显然并非如此。有些城邦缩小了规模，幸存了下来。还有一些城邦则继续繁荣发展着，特别是那些紧邻重要河流和位于主要贸易线路两侧的城邦。沿海地区的许多中心也存续了下来，尤卡坦半岛的北部沿海尤甚。一些强大的经济与社会因素发挥了作用，其中包括：有通往沿海与河流贸易线路的通道；战争不断；或许还有一个最重要的因素，那就是贸易活动发生了巨变，从内陆贸易转向了海上贸易。
干旱与作物歉收，加剧了城邦之间争夺粮食供应与争相控制贸易线路的局面。在公元7世纪和8世纪，许多地方都爆发过残酷的战争，但它们不一定都是干旱导致的。玛雅的君主，当时都是依靠玉米收成来维护他们的实力。直到气温在周而复始的干旱期间达到了30℃左右，作物产量才不再增加。此后，作物收成便迅速减少，而水库的水位也大幅下降了。由于气温超过30℃的天数越来越多，粮食供应骤减，从而威胁到了王权。为此，那些野心勃勃的统治者便开始进攻其他王国，以为只要征伐成功，就可以重新巩固他们当时似乎正在不断衰落的合法地位。干旱周期可能也减少了暴力冲突，因为食物与水源供应不足，让各个王国在备战时都要困难得多了。但是，不管气温条件如何，暴力冲突在玛雅历史上都时有发生，以至于有些贵族为了躲避暴力，还在大片大片的农田周围修建了防御性的城墙，保护正在生长的庄稼，却没有去加固神庙和以前修建的其他一些宏伟建筑。
崩溃（公元8世纪以后）
虽然在南部低地玛雅社会的崩溃过程中，战争可能确实起到了作用，但干旱在摧毁玛雅社会的过程中扮演了一个重要角色也是无可置疑的。约克巴鲁姆洞穴石笋中记录下来的历史干旱周期，与那里出现作物歉收、饥荒，以及暴发与饥荒有关的疾病的时间相吻合。还有证据表明，当时不但人口数量减少，而且人们纷纷迁往了规模较小的定居地。这是一种经典的迁徙对策；在一个干旱变得比以前更加旷日持久、旱情也更加严重的时代，人们再次显著地应用了这种策略。
实际情况究竟如何呢？古典玛雅王权的逐渐瓦解并非一种剧变。更准确地说，早在公元780年至公元800年间，南部低地上那些历史悠久的政治与社会网络便已开始瓦解，同时战争也开始愈演愈烈。[12] 由此导致的结果，就是道格·肯尼特和其同事们所称的“割据”，因为政治网络权力变得分散起来，人口则开始外迁散居。与其说这是一种崩溃，不如说是社会的一种重新组织；公元900年之后，西班牙殖民者“武装远征”之前，留存于世的文字、历法以及其他珍贵的文化传统都体现了这一点。
最急剧的变化发生在那些以危地马拉北部、伯利兹西部、尤卡坦半岛南部以及洪都拉斯的科潘地区为中心的玛雅王国里。它们留下了一片开垦过的土地，可如今那里仍是几乎无人居住的森林。中央低地上的森林虽然恢复了，可人们再也没有回去，以至于那里的雨林后来成了一个避难所，让玛雅难民得以躲避西班牙人的统治。就算到了今天，那里的人口密度也较古典玛雅时期减少了一半乃至三分之二。究竟为什么会这样，如今依然是一个谜。人们不再大范围毁林开荒了；一直要到现代，人们才再次开始砍伐硬木。一小部分人有可能曾经冒险进入过那片植被茂密的土地，采伐一些具有经济价值的树木，比如拉蒙树；这些树木的果实与坚果营养丰富，是容易发生旱灾的雨林环境下的一种珍贵的食物来源。或许，原因在于开垦森林、恢复集约化农业的基础设施需要付出的人力成本太高了。
北部的气候事件（公元8世纪以后）
玛雅文明继续在尤卡坦半岛北部蓬勃地发展着。[13] 一个以奇琴伊察为大本营且实力强大的王国，曾经从公元8世纪繁盛到了公元11世纪；至于原因，部分就在于许多百姓逃离了日益干旱的南部内陆地区，成了这个王国的新臣民。假如我们明白北方的地表水源供应其实要比南方稀少得多，那么，这个王国的崛起过程就会令人觉得难以置信了。奇琴伊察的实力，既源自积极扩张与建立同盟，也源自它控制了海上贸易和玛雅世界广大地区之间的联系。在这种情况下，人们应对干旱时采取的措施主要是经济和政治方面的，它们极其有效，以至于玛雅文明出现了复兴；只不过，这是以一种不同的方式实现的，注重共享统治。
公元 11 世纪，这个地区发生了一场最漫长和最严重的旱灾，破坏了长久确立的现状，动摇了奇琴伊察的统治地位。但公元1220年前后，这里又崛起了一个新的国家，它以位于北方内陆的玛雅潘为大本营。[14] 当时的玛雅潘大约有15,000 位居民，隶属于一个实力强大的区域联盟，是其重要的政治首都。这是玛雅文明的一种国际性复兴，其特点是兴建了许多宏伟壮观的建筑，展开了广泛的对外交往，而传统的宗教信仰也得到了重振，有许多华美的抄本来加以纪念。由于所处的位置靠近一系列呈环状分布的天坑（即自然形成的深坑），地下水源丰富，故玛雅潘繁盛发展到了公元 1448年左右，后来又与严重的干旱抗争了一个半世纪之久。其间的一次次干旱对粮食供应造成了严重的破坏，扰乱了市场网络，并且导致了政治动荡和随之而来的战争。
玛雅潘遗址（尤卡坦半岛，墨西哥）
不过，玛雅文明还是存续了下来；原因部分就在于那些重要中心之间的联系并不紧密，因此它们不那么容易受到曾经颠覆了南方各个王国的种种政治动荡的影响。直到西班牙人开始“武装远征”，许多沿海城镇都令人瞩目、一片繁荣，广大地区也运作着各种复杂的市场体系。这一切，都是人们成功地适应了当地的环境挑战、地区性干旱和粮食短缺的结果。在一个拥有数百年文化传统的“文化玛雅”世界里，整个社会始终都在发生变革。16 世纪初西班牙征服者的到来，改变了玛雅文明的历史轨迹，因为人们适应了新的经济、政治与精神环境。
所谓的“古典玛雅崩溃”一说，其实属于用词不当，听起来古典玛雅文明像是一夜之间急剧崩溃的。相反，文明的衰落是一个复杂的过程；在此过程中，人们会步履艰难地应对漫长的干旱周期，历经数代之久。尽管如此，古典玛雅的政治体系确实崩溃了，农民则继续生存着。最终，在公元800年左右，到了一个看似生死攸关的社会、政治与生态转折点之后，古老的玛雅文明经历了一场变革。玛雅人与其所处环
境之间的相互作用，导致了不同程度的环境压力；更何况，这些压力还是与严重的干旱同时出现的。尽管玛雅统治者拥有精心设计的意识形态，并且牢牢控制着整个社会，但到了此时，他们已经无力组织民众采取措施去适应那些干旱得多的低地环境了。在一些被派系斗争和战争所撕裂的城邦里，组织并采取大胆的措施来适应生存危机就成了一项艰巨的任务，彻底压垮了那些傲慢自大、显赫一时的君主。对于他们的权威，对于统治者与被统治者之间早已土崩瓦解的社会契约，民众也失去了信心。于是，百姓便四散而去。
从全球范围内来看，我们生活在一个被狭隘的民族主义所撕裂的世界里，千百万人被牵涉其中，故人为性全球变暖和可能具有灾难性的气候变化让我们面临的威胁，要比玛雅的君主们当时面临的威胁大得多，令人难以想象。由于危机带来的影响因地而异，故他们的臣民不是迁往农村，就是到其他地方寻找机会去了。不过，玛雅人的经验教训却显而易见，那就是：强有力和果断的领导十分重要。如今许多人正在努力解决未来气候变化的问题，但我们缺乏那种能够超越一代又一代、强大有力和具有远见卓识的领导能力。我们正面临着真正的危险，有可能遭遇像蒂卡尔和玛雅其他一些伟大城邦的执政者那样的命运，原因不仅是我们当中有许多人否认即将到来的气候危机，还有随着我们逐渐接近一种与之类似但规模要大得多的环境转折点，大多数人都会在挑战面前不知所措。玛雅人的经验提醒我们，大部分气候适应措施都是地方性的，而面对气候变化时无所作为，也不是一种可行的对策。
相比于那些只关心作物收成的无名官吏制定的宏伟施政方案，应对气候变化的地方性措施之所以有效得多，原因就在于此。还有更加重要的一个方面，那就是风险管理，尤其是在地方层面上的风险管理；只不过，我们如今经常会忽视这一点。
[1] 在学术界，“Mesoamerica”（中美洲）一词被用来指前工业文明得到发展的中美洲地区，包括如今的墨西哥中部、伯利兹、危地马拉、萨尔瓦多、洪都拉斯、尼加拉瓜和哥斯达黎加北部。
[2] 玛雅文明这一术语的内核，就是从公元250年前后一直持续到公元900年左右的古典玛雅文明。我们在此使用这一术语，只是为了方便起见；不过，它无疑在很大程度上掩盖了文化的多样性。
[3] 要想详细了解我们在此所述的低地情况，请参见B. J. Turner
II and Jeremy A. Sabloff, “Classic Period Collapse of the
Central Maya Lowlands: Insights About Human-Environment
Relationships for Sustainability,” Proceedings of the
National Academy of Sciences 109, no. 35 (2012): 13908-13914。
[4] 对古典玛雅文明进行通俗论述的经典作品：Michael Coe and
Stephen Houston, The Maya, 9th ed. (London and New York:
Thames & Hudson, 2015)。Linda Schele and David Freidel’ s A
Forest of Kings (New York, William Morrow, 1990)，生动而通俗地描绘了玛雅的王权情况，只是如今有点过时了。
[5] Richard R. Wilk, “Dry-Season Agriculture Among the Kekchi Maya and Its Implications for Prehistory,” in Prehistoric Lowland Maya Environment and Subsistence Economy, ed. Mary Pohl (Cambridge, MA: Peabody Museum of Archaeology and Ethnology, Harvard University, 1985), 47–57. See also Richard R. Wilk, Household Ecology: Economic Change and Domestic Life Among the Kekchi Maya of Belize. Arizona Studies in Human Ecology (Tucson: University of Arizona Press, 1991).
[6] B. L. Turner II, “The Rise and Fall of Maya Population and Agriculture: The Malthusian Perspective Reconsidered,” in Hunger and History: Food Shortages, Poverty, and Deprivation, ed. L. Newman (Cambridge: Cambridge University Press, 1990), 178–211.
[7] Robert J. Oglesby et al., “Collapse of the Maya: Could Deforestation Have Contributed?” Papers in the Earth and Atmospheric Sciences 469 http://digitalcommons.unl.edu/geosciencefacpub/469. (2010).
[8] 论述古典玛雅文明崩溃的文献非常多。一般性的概述之作，请参见 T. Patrick Culbert, ed., The Classic Maya Collapse (Albuquerque: University of New Mexico Press, 1973)，但如今此作有点过时了；另外可见D. Webster, The Fall of the Ancient Maya (London and New York: Thames & Hudson, 2002)。在此，我们很大程度上参考了一部有用的分析之作：Turner and Sabloff,“Classic Period Collapse of the Central Maya Lowlands”。
[9] David Hodell, M. Brenner, and J. H. Curtis, “Terminal Classic Drought in the Northern Maya Lowlands Inferred from Multiple Sediment Cores in Lake Chichancanab (Mexico),”Quaternary Science Reviews 24 (2005): 1413–1427.
[10] Douglas Kennett and David A. Hodell, “AD 750–100
Climate Change and Critical Transitions in Classic Maya
Sociopolitical Networks,” in Megadrought and Collapse: From
Early Agriculture to Angkor, ed. Harvey Weiss (New York:
Oxford University Press, 2017), 204–230. See also Douglas
Kennett et al., “Development and Disintegration of Maya
Political Systems in Response to Climate Change,” Science
338 (2012): 788–791.
[11] Copán: William L. Fash and Ricardo Agurcia Fasquelle,
“Contributions and Controversies in the Archaeology and
History of Copán,” in Copán: The History of an Ancient
Maya Kingdom, ed. E. Wyllys Andrews and William L. Fash
(Santa Fe, NM: School of American Research Press, 2005), 3
32. See also William L. Fash, E. Wyllys Andrews, and T. Kam
Manahan, “Political Decentralization, Dynastic Collapse,
and the Early Postclassic in the Urban Center of Copán,
Honduras,” in The Terminal Classic in the Maya Lowlands:
Collapse, Transition, and Transformation, ed. Arthur A.
Demarest, Prudence M. Rice, and Don S. Rice (Boulder:
University Press of Colorado, 2005), 260–287.
[12] Arthur Demarest, Ancient Maya: Rise and Fall of a Rainforest Civilization (Cambridge: Cambridge University Press, 2004).
[13] Jeremy A. Sabloff, “It Depends on How You Look at Things: New Perspectives on the Postclassic Period in the Northern Maya Lowlands,” Proceedings of the American Philosophical Society 109 (2007): 11–25. See also Marilyn A. Masson, “Maya Collapse Cycles,” Proceedings of the National Academy of Sciences 109, no. 45 (2012): 18237-18238.
[14] Marilyn A. Masson and Carlos Peraza Lope, Kukulkan’ s Realm: Urban Life at Mayapan (Boulder: University of Colorado Press, 2014), 5.
第七章众神与厄尔尼诺（约公元前3000年至公元15世纪）
蓝天之下，皑皑白雪一望无际。此时，我们来到了偏僻的奎尔卡亚（Quelccaya）冰盖上，这里位于秘鲁北部的安第斯山脉高处，是世界上面积最大的热带冰原之一。如今，这座冰盖的面积约为43平方千米，最高点的海拔为5,680米。
然而，在18,000年前的最后一个“大冰期”结束时，这座冰盖却要广袤得多：人为造成的全球变暖正以不可阻挡之势，让这座冰盖的面积缩小，以至于到2050年时，冰盖有可能彻底消失。在冰盖的东部，群山向下延伸到了亚马孙河流域，距那里的热带雨林仅有 40 千米之遥。这座冰盖虽然属于高山冰川，却异乎寻常地位于地表平坦之处，有的地方冰层竟然厚达200米。这种情况，使得奎尔卡亚成了人们钻取冰芯的理想之地；冰芯中呈现出了分界清晰的层次，每一层都代表了一年，各层之间有旱季尘埃层隔开，足以重现奎尔卡亚约1 800年的气候历史。
1983 年，美国俄亥俄州立大学的古气候学家朗尼·汤普森（Lonnie Thompson）曾用一台太阳能冰钻，在这片冰原的中心地带钻取了两段长长的冰芯；那里除了太阳能，没有其他能源可以利用。[1] 由于没有办法带走冰芯，他便把冰芯切割成样本，当场融化并装入瓶中，从而重新获得了有 1,500年历史的部分冰芯。2003年，由于物流条件已经有了充分的改善，汤普森又把两段一直钻到了基岩之上且仍然封冻的冰芯运回了俄亥俄州的实验室。如今，汤普森得以研究奎尔卡亚过去1,800年以来的气候历史，并且揭示了“恩索”与热带辐合带的位置曾经如何对这处冰盖的气候产生影响。
厄尔尼诺现象会带来西风，从而减少到达冰盖中的水分，并给西海岸的沿海沙漠带来暴雨。随着时间推移，导致气温上升的厄尔尼诺现象与对应的、导致气温下降的拉尼娜现象会毫无规律地交替出现。前者会导致秘鲁南部与玻利维亚的高海拔草原（或称 altiplano，在西班牙语中就是“高原”的意思）出现干旱。与之相反的是，拉尼娜现象则会给高原地区带来降雨。它们结合起来，就成了安第斯山脉与南美洲西海岸，尤其是秘鲁的沿海干旱平原上的两大气候驱动因素。
来自附近安第斯山脉上的径流，曾经让秘鲁境内工业化之前那一个个蕴藏着丰富黄金的国家（比如莫切国）变得极其富裕。“恩索”属于复杂的气候事件，在安第斯地区的古代历史上发挥过重要的作用。
沿海：卡拉尔、莫切、瓦里与西坎（公元前3000年至公元1375年）
安第斯文明的两大支柱，发展了数个世纪之久。古安第斯文明的一大支柱位于高原地区，以的的喀喀湖为中心。另一个支柱则在遥远的西北部，即秘鲁北部的低地沿海平原上繁衍生息着，那里也是全球气候最干旱的地方之一。从整体来看，这个广袤的地区由一系列东西走向的环境带组成，由西向东依次为沿海沙漠与河谷、山脉、高原、平原和热带雨林等等。每个环境带都有种植在不同条件之下的作物，说明自给自足与远距离贸易是这里两大持久存在的现实状况。[2]当时，沿海地区的百姓严重依赖于近海的鳀鱼渔业；这种渔业为他们提供了食物和鱼粉，其中的大部分都销往了高原地区。捕鱼是低地文明一项生死攸关的任务。河谷地区的灌溉农业，也是如此。秘鲁北部沿海的灌溉用水，几乎全都来自山间的径流；它们沿着河流而下，将一个个沿海平原分隔开来。沿海地区的环境非常脆弱，经常发生灾难性的地震，更不用说有常常旷日持久的严重干旱、沙漠化与沙丘构造，以及强大的厄尔尼诺现象导致的大洪水了。在这种艰难的环境条件下生活，对沿海社会造成了极大的制约；只有像逐渐沙漠化之类的变化，才允许他们在漫长的时间里慢慢地去适应。
到了公元前3000年，有1,000 至3,000 位农民与渔民生活在一些离太平洋不远且早已有人居住的定居地。他们是一些联系紧密的社群，拥有种种牢固的亲族纽带和对祖先的深厚敬意。这一点，在一些华丽气派、精心装饰的织物中体现了出来；织物上，描绘着许多拟人化的图像、螃蟹、蛇和其他生物。这里也有城市，其中以秘鲁中北部沿海地区苏佩河谷中的卡拉尔古城遗址（约前3000—前1800）尤为著名。[3] 卡拉尔古城中，建有一座座巨大的泥土金字塔、广场、住宅和神庙建筑群。这是一个强大的文明社会，与“旧大陆”上的印度河流域、埃及与美索不达米亚等文明属于同一时代。
卡拉尔古城遗址
这里的人与古埃及人一样热爱金字塔；只不过，考古学家在卡拉尔并未发现爆发过战争的痕迹：没有残缺不全的骸骨，没有城垛，也没有武器，与印度河流域的情况一样。相反，卡拉尔似乎是一座和平安宁的城市、一个繁盛兴旺的大都市，占地面积超过了150公顷，并且至少催生了同一时期的19个卫星城镇。至于人口众多、交通发达的卡拉尔究竟为什么会逐渐衰落下去，如今我们仍不清楚；但这个地区的整体情况与世间的所有地区一样：随着人们艰难地应对社会变迁、政治变革与气候变化，各种文化此兴彼衰，一些特点保留了下来，还有一些要素则不复存在了。当我们沿着时间的长河继续前行，把注意力集中到公元1千纪前后的一些事件时，这种相互作用就会得到充分的体现。
第七章中提到的考古遗址
差不多就在提比略皇帝将敌人扔进台伯河里和维苏威火山喷发的时候，秘鲁北部沿海崛起了一个富裕的新文明社会，即莫切城邦（约 100—800）；这个城邦由一个富裕的精英阶层统治着，他们把死者安葬在用土砖修建的金字塔里，留下了大量的黄金珠宝和丰富的艺术作品等遗产。他们掌管着一条狭长的海岸线，长约400千米，宽度却顶多只有50千米，从北部的兰巴耶克河谷一直延伸到了南部的内佩尼亚河谷。[4] 当然，秘鲁既然拥有伟大的文化遗产，那么莫切文化就不是凭空出现的。相反，他们是以当地各种错综复杂、历史悠久的河谷灌溉系统为基础，建立了自己的国家。他们留下的遗址周围虽说布满了沟渠与灌溉系统，但一切全都依赖于以单个村庄为基础的灵活耕作方法。莫切人的农业之本，需要小规模的劳动力和简单的灌溉设施，尤其是这些设施还须容易维修才行。与“旧大陆”上的情况一样，散居的本地社群也依赖于泉水和偶尔降下的暴雨所形成的地表径流。
广泛分布的灌溉系统为莫切城邦提供了一种防御手段，以免为漫长的干旱和厄尔尼诺现象导致的暴雨所害；这种暴雨，有可能在几个小时内淹没和彻底摧毁所有的灌溉系统。从安第斯山脉流淌而下的山泉径流，宛如超自然世界一年一度馈赠给他们的礼物。从莫切人留下的艺术作品与墓葬来看（他们没有留下书面文字），当时是一些实力强大、叱咤风云的君主在统治着这个国家。[5] 他们声称自己拥有种种超自然力量，充当的是凡人与众神之间的中间人，而沿海渔场与珍贵作物正是由众神滋养的。莫切的统治者披金戴银，服饰华丽，出现在精心设计的公开仪式上，以强化人们的一种信念，即每位头领都对生命的延续不可或缺。若是没有头领，太阳有可能不会东升，鱼类也有可能死去。与（时代稍晚的）蒂亚瓦纳科高原上的人（参见后文）一样，莫切王国的臣民也是通过他们生产出来的商品与粮食向这些“赐予生命”的头领纳税；还有强制劳动，因为当时有大量平民被派去建造一座座宏伟的高台与神庙。
对我们而言，这种制度可能看上去是一种杜撰，目的是让百姓为精英阶层服务，像是一个童话故事和一种欺骗，可莫切人看待这些观念时却很严肃，认为它们攸关生死。在一个充满不确定性的世界里，在现代科学崛起之前，头领和他们的众神乘虚而入。奎尔卡亚冰芯为我们提供了沿海地区生活严酷的证据，其中就包括接连不断的大旱，导致降雨量较平均水平减少了30%。[6]
最严重的一场大旱发生在公元563年至594年之间；当时，莫切的统治者（或者君主、武士祭司，考古学家对他们的称呼五花八门）都生活在靠近太平洋的河流下游。这种战略位置，使得他们控制了水源与近海富饶的鳀鱼渔场；那些渔场是美洲驼商队销往高原地区的富氮鱼粉的主要来源，利润丰厚。干旱将各种灌溉系统都变成了贫瘠的尘暴区。君主们利用城邦谨慎节约和储存下来的粮食应对干旱年份，但当时肯定普遍存在营养不良的问题。幸好，他们还可以依赖渔场，直到强大的厄尔尼诺现象在干旱周期的高峰期来袭。暴雨导致沙漠中的河流变成一道道汹涌的洪流，将他们面前的一切席卷而去，来自北方的较暖海水则让鳀鱼的种群数量锐减。“恩索”摧毁了莫切人的生活之地，数十座村落消失在泥浆之下，土房纷纷倒塌，其中的居民则纷纷溺水而亡。
那些武士祭司都很清楚，强大的厄尔尼诺现象会带来什么样的影响。他们的应对之法，就是派百姓重修灌溉系统，并且以人献祭。在考察研究莫切河谷中“月亮金字塔”（Huaca de la Luna）旁边一座隐蔽的广场时，考古学家史蒂夫·博格特（Steve Bourget）发现了一些描绘着海鸟与海洋生物、令人眼花缭乱的壁画，它们都与近海温暖的“恩索”洋流有关；可在这次轰动一时的艺术发掘当中，他还找到了大约70位被杀害武士的遗骸。他认为，在面对灾难时，莫切统治者曾经用活人献祭和复杂的仪式，来巩固他们的权威。接着，又一次强大的厄尔尼诺现象袭击了这个河谷。由河流冲积物形成的巨大沙丘被冲上海滩，掩埋了数百公顷的农田，淹没了莫切王国的都城。于是，莫切河谷里的君主和同一时期生活在兰巴耶克河谷中的人，都迁往了上游地区。
尽管出现了这些不利的气候事件，但莫切人仍然维持着在投资尽可能少的情况下修建起来的面积广阔的农田系统。人口流动性变得更强，人们在不同的环境条件下兴建了许多较小的定居地，而不再兴建以前那种大型的城市中心。由于争夺肥沃土地与水源的冲突日益加剧，故农民们会迅速修好受损的地方。
公元500年至600年间，莫切人巩固了他们那些规模越来越小、越来越分散的定居地；它们都位于安第斯山麓，分布在沿海河流的颈部，也就是河流进入沙漠的地方。[7] 到了此时，莫切人的领地日益变得四分五裂，故对粮食生产进行任何形式的地区性控制都难以实现。随着另一次严重的厄尔尼诺现象将关键的农田系统彻底摧毁，一个实力本已遭到削弱的领导阶层既要与突如其来的气候变化做斗争，还要全力对付高原部落的袭击。君主们丧失了神圣的信誉，莫切王国便开始分裂。与古埃及的法老一样，他们起码也设法熬过了一场曾经威胁到王国的灾难性气候事件。但与古埃及的法老不同的是，环境让他们几乎没有什么灵活变通的余地。他们在各个河谷中创造的人工环境需要长期规划和技术创新，以及摒弃一种僵化的意识形态，这种意识形态无法再支撑起一个严格控制的社会。他们显然与被统治的村落里的生活脱了节，已经别无选择，故到了公元650年之后，他们那个富有的遍地黄金的社会便逐渐分裂成了无数个较小的王国。
在这些分散的王国当中，有一个是瓦里王国。瓦里人的领地，在公元500 年前后至公元1000 年间，从安第斯高原往下，一直延伸到了秘鲁北部（可能还有中部）的沿海地区。这是一种复杂的文明。瓦里人会用精美的珠宝加上精美的织物与陶器，给他们的精英阶层陪葬。他们巧妙地耕种土地，在山坡上开发出了一种壮观的梯田农耕系统。不过，由于实力受到了干旱的削弱，他们最终也衰落下去了。人们之间的暴力，或许还加速了他们的终结：在瓦里古城发掘出的一些政府建筑中，门都被堵上了，这暗示当时的人逃离了这里。考古学家提出，也许城中市民本想在再度下雨或者重归和平之后返回故里，但最终也没能回去。
接下来，沿海地区就出现了西坎文化。西坎的头领，是在公元800年左右莫切社会开始分裂时上台掌权的。他们很可能就是莫切精英阶层的后裔；他们投入巨资，兴建了许多装饰华丽的仪式中心，其中主要是用土砖建造的假山。一座高达 27 米的金字塔，俯瞰着一个大广场和位于兰巴耶克河谷中的巴坦格兰德的西坎中心；如今，那座金字塔被称为“胡亚卡洛罗”（Huaca Loro）。葬在墓穴里的精英们个个装扮华丽，戴着特别的金面具和饰品。平民百姓却是葬在很浅的墓穴里，身上少有甚至没有饰物。他们与之前的莫切人一样，在“恩索”的破坏面前也很脆弱。在1375年另一个王国即奇穆王国行将征服西坎之前，面对一次大规模的厄尔尼诺现象，巴坦格兰德也衰亡了。
奇穆：多种水源管理（公元850年至约1470 年）公元850年前后，奇穆王国崛起于莫切河谷之中。与西坎王国的情况一样，奇穆王国的第一批统治者有可能是莫切贵族的后裔；他们还深受同时代其他民族的影响，尤其是受到了瓦里人的影响。在接下来的4个世纪里，奇穆人将他们的经济与政治权威扩张到了秘鲁北部与中北部沿海的广大地区。他们虽然继承了前人的很多东西，但有一种重大的区别。从一开始，奇穆王国的君主就采取了一种不同的方法，来兴建他们的都城昌昌。[8]
昌昌城位于莫切河谷的入口附近，后来逐渐发展成了一座庞大的城市，与数个世纪之前墨西哥高原上的特奥蒂瓦坎不相上下。一开始的时候，昌昌城是一座没有阶层之分的大都市，统治者专注于提供充足的粮食供应。没人确切知道，这座城市的人口数量后来有多庞大。到公元1200年时，此城的面积已经扩大到了20多平方千米。有大约26,000名工匠住在中心城区南部与西部边缘一带的土屋和藤屋里，其中还有五金匠与纺织工。还有3,000人紧挨着王室宫廷居住，而附近一座座独立的土砖大院里，住着大约6,000名贵族与官吏。对于这些统治者本身，如今我们仍然不知其名，因为他们没有留下任何文字记载；不过，当时他们住在城市中心 9座高墙环绕的僻静大院里。每座大院都有自己的供水系统、装饰华丽的住宅区和一个墓葬平台；统治者死后，这里便做坟墓之用。
口头传说与17世纪西班牙人的编年史表明，在公元1462年至 1470 年的印加征服期间，统治着奇穆王国的是一位名叫米昌卡曼（Michancamán）的君主。显然，此人手下的朝臣都有明确的等级，其中还有“开路官”，是一名专司在君主要走的路上撒下贝壳粉末的官吏。每位领袖都会把自己的宅邸建在其他统治者的宫廷附近，但不会继承后者的任何财产。这种制度，通常被称为“分离式继承”，让奇穆王国的领袖们不得不通过征服来获得额外的领土、财富和纳税的臣民。他们还采取了强行将被征服民族迁离故土的措施，与印加人的做法一样。[9]
秘鲁奇穆王国昌昌城古城遗址
奇穆王国逐渐变成了一个等级森严、组织严密的社会，既有精心划分的贵族与平民两个阶层，也有严格的法律体系来执行社会等级制度。奇穆王国境内的不同地区，都由受到统治者信任的官吏管治着。从政治角度来看，这个国家堪称治理有方。在其鼎盛时期，奇穆人统治着一个广袤的王国，其疆域扩张到了古莫切王国的北部沿海地区以外，并且一直向南，沿着差不多长达1,000千米的海岸线延伸。
历任君主都把武力与朝贡结合起来，维护着他们这个不断发展的国家。他们很快就认识到，以连接每座河谷的道路系统为基础的高效交通十分重要。其中的许多道路不过是羊肠小道而已，可它们却将奇穆王国的每个地区都连接起来了。这一点至关重要，因为该国的贡品与物质财富，都经由这些道路流向中央。与其他的古代文明一样，奇穆君主曾精心利用徽章和贵重礼物，奖励手下臣民的忠诚和在战斗中的勇猛之举。他们也很清楚，整个国家依赖的是无法仅凭武力或者朝贡就获得的粮食供应。
数个世纪以来，沿海地区的农民像莫切人那样，一直利用沿海山坡上各种高度灵活的农业系统进行耕作；在沿海山坡上，他们可以最大限度地利用泉水和暴雨形成的地表径流。人口密度相对较低的时候，这种农耕策略效果很好。与莫切人形成了鲜明对比的是，面对快速发展的城市建筑群与迅速增长的人口，奇穆人在极其多样化、组织严密的水源管理与农业方面进行了大力投入。
昌昌城本身严重依赖于阶梯井，其中的许多水井都利用了靠近太平洋的高地下水位。此城东部地势低洼，从而为一种复杂的下沉式庭园系统提供了条件，使得高地下水位从太平洋沿岸朝上游方向，延伸达5千米之远。到了公元1100年，徭役劳动力已经开掘了一个巨大的沟渠网络，为昌昌城北面和西面的平原地区提供灌溉用水了。灌溉用水也对城市的地下含水层进行了回补。同年一次强大的厄尔尼诺现象导致莫切河改了道，并且严重破坏了这座都城上游的灌溉系统，之后统治者们便冒冒失失地下令开始建造一条长达 70 千米的沟渠，要从附近的奇卡马河谷中将水源引到被毁的农田里。[10]
这项雄心勃勃的工程一直没有完成，部分原因在于该城已经扩张到了上游地区，那里的地下水位要深得多。最终，此城就只能往太平洋和地下水位较浅的沿海地区收缩了。
这还只是奇穆人与干旱及“恩索”进行的非凡抗争的一部分。君主们的计划原本更具雄心，耗资更加巨大。[11] 他们在整个王国境内兴修了许多精心设计的沟渠，把水源引到土地有可能肥沃的各座河谷中的不同地方。有些沟渠长达40千米。对于奇卡马北部的赫克特佩克河谷来说，不但其泛滥平原和与之毗邻的可灌溉沙漠平原上有肥沃的农田，沿海地区也有丰富的海洋资源。如今，河谷的北侧依然留存着奇穆人在数百年里开掘的至少长400千米的沟渠遗迹。这个广袤的沟渠系统从来没有同时使用过，因为那里没有充足的水源来灌满所有的沟渠。凡是依赖于这些沟渠的社群，必定精心制定过灌溉时间表，以便公平地为所有群落供水。若是明白如今当地的农民每隔 10 天左右就要给庄稼浇一次水，我们就会对这个沟渠系统的运筹复杂性有所了解。尽管极其复杂，但奇穆人的沟渠设施既提供了一种切实可行的方法，可以缓解极端气候事件带来的影响，比如“恩索”导致的暴雨，同时也提供了应对缺水导致的政治动荡的某种手段。
农耕环境与十二河谷
赫克特佩克河谷的南侧是一幅完全不同的景象，那里有大量的沿海沙丘，向内陆延伸达25千米之远。公元1245年至1310年间的一场大旱，导致这里形成了大片沙丘，以至于人们在14世纪末还遗弃了位于卡农西略（Cañoncillo）的一个大型定居地；当时，不断推进的沙丘覆盖了农田，堵塞了灌渠，掩埋了房屋。这种较为长期的沙漠化，规模远大于干旱和暴雨造成的破坏。干旱与暴雨导致的破坏，人们尚可修复，但日益侵袭的沙丘，却非人类所能阻遏的。人们只能迁往别的地方。
干旱是一回事，厄尔尼诺现象导致的降水过多则是另一回事。在法凡苏尔（Farfán Sur）、卡农西略以及奇穆王国其他一些较大的城市中心，当地的头领与水利专家兴建了许多复杂的溢流堰，将其作为灌溉沟渠中的组成部分，尤其是为连接一些深谷的引水渠修建了溢流堰。这种溢流堰能够降低水流速度，防止水土流失。他们修建的引水渠里还衬有用石头砌就的导水沟，让水流不致破坏整个结构的底部。这些策略起到了一定的作用，但有迹象表明，其中许多引水渠都是在垮塌之后重新修建起来的。另一种策略，就是在沿海附近的地区用石头修建一些呈新月形的石制挡沙墙。这种挡沙墙减缓了丘沙侵入灌溉沟渠和农田的速度，只不过其中的许多并没有起什么效果。
莫切人依赖的都是单个社群，他们很少尝试对农业进行集中管理。村落被毁之后，人们只是迁往别处，然后修建一个新的沟渠系统罢了。人口密度很低的时候，尽管各个社群对最肥沃之地的争夺很激烈，这样做也完全没有问题。可奇穆人生活在一个人口要密集得多的农耕环境里。他们逐渐形成了许多大型的城镇与城市，在地区范围内从事着农业生产。他们利用大量的徭役劳动力，对他们创造的整个农耕环境进行了大力投入。这些有组织的农耕环境包括大型的蓄水池与陡坡之上的梯田，后者可以控制倾泻的下坡水。他们最大的投入，就是开掘了一些长长的沟渠，将水源从深深下切的河床引到遥远的梯田与灌溉用地里；即便是大旱期间，那些沟渠也能引水。这是一种长远投入，让奇穆王国能够开辟数千公顷的新地；奇穆人耕作着这些土地，每年都能种、收两三次。在此以前，他们每年只能收获一次，且时间上与一年一度的山间洪水一致。
最后，就算有大量的劳动力，土地开垦也变得不划算起来了。奇穆王国的君主们转而开始了征伐；“分离式继承”制度为这种征伐提供了理由，因为在此制度下，每位统治者都必须通过自己的努力才能获得农田。最终，他们掌控了12个以上的河谷，其中至少有50,500公顷的耕田，全靠当时的农民用简易的锄头或挖掘棒进行耕作。这种规模的农业，需要进行高效而坚决的监管。考虑到建设与管理方面所需的巨大投入，他们也不可能采取别的做法。统治者严格限制个人流动，强迫许多臣民住进城市，还对粮食供应与人口实施高度集权化的控制。这种集权管理具有战略上的优势，因为奇穆王国可以在地区范围内而非局部范围内去应对漫长的干旱和重大的“恩索”事件。他们可以把一个地区的庄稼转到另一个地区去播种，可以启用未受损坏的灌渠，并且派出大量劳力去修复洪水造成的损毁。
奇穆王国依靠长远规划，在一个只有 10%可耕土地的环境中创造了众多的农业奇迹。幸运的是，这个王国还可以仰仗鳀鱼渔业。据史料记载，当时的渔民与众不同，会跟农民交换粮食。沿海居民几乎不会遭到干旱影响，却会为厄尔尼诺现象所害，因为近海的上升流速度放缓，鳀鱼捕获量就会锐减。
玛雅的君主率领着臣民进入一种环境乱局之时，奇穆王国的精英阶层则在“中世纪气候异常期”熬过了一场场漫长的干旱和一次次异常强大的“恩索”事件。奇穆人的领袖掌管的是一个精心组织的绿洲，以大量人力劳动与严酷的集中控制为基础。他们还依赖于一种僵化的社会秩序，以及沟通自然世界与超自然世界的种种宗教仪式。他们所处的环境，让领袖与农民都预先适应了干旱程度在世界上数一数二的环境中的严酷现实；这里雨水稀少，水源则来自遥远的地方。每个人的一生中，都经历过干旱；国家则通过让粮食供应变得多样化、节约每一滴水以及捕鱼来扩大食物基础，从而适应了这一切。祖先们来之不易的经验、老练的机会主义和长期规划，都带来了很好的回报。
奇穆王国掌控着自身的生存，但君主们却无法主宰那些用山间径流滋养着王国的分水岭。此时，王国的农业耕作已经极具规模且复杂，以至于他们开始难以管理水源供应，尤其是难以对上游水源进行管理了。公元1470年前后，来自高原地区的印加征服者获得了诸分水岭的战略性控制权，并且打垮了这个国家。奇穆王国变成了塔万廷苏尤的一部分，“塔万廷苏尤”在印加语里就是“四方之国”的意思。农耕与灌溉继续进行，而沿海河谷中那些新的王公贵族，则把奇穆王国中的专业工匠迁往了高原地区的库斯科。
沿海诸国之所以在不同规模上繁荣发展起来，是因为人们深入了解了自己所处的环境和滋养土地的水源。各国领袖与农民生活的河谷里经常出现严重的干旱，而一次次厄尔尼诺还毁掉了他们的农田。他们十分熟悉“恩索”即将到来的种种迹象，比如鳀鱼渔获减少、近海洋流南下、出现不熟悉的热带鱼类，以及近海水温上升。无论是莫切人、西坎人还是奇穆人，人人都能预测出可能发生的灾难，以及高原地区由“恩索”导致的干旱，这种气候现象会让播种时节的地表径流减少。安第斯地区诸国对气候与环境变化做出过各种不同的社会反应，但其中只有奇穆王国认识到了长远规划有助于维持王国的持续发展；而且，这种认识一直延续到了印加时代及其以后。
在秘鲁沿海和安第斯山区，保持可持续性始终都是一种挑战。一些小社群在适应当地条件与变幻莫测的干旱时所用的各种方法，会让我们立刻大吃一惊；这里的干旱，有时会持续一代人的时间，甚至更久。莫切人与奇穆人在沿海地区从事的河谷农业，若是没有小社群里耕作者精心做出的长远规划，是绝对不可能蓬勃发展起来的。强调“防患于未然”，为大旱时期制定应对措施，在奇穆王国表现得尤为突出；这个王国曾大力投资兴建水利工程，比如将各个河谷连通起来的沟渠。
莫切人与奇穆人都属于等级社会，使得他们的领袖能够强迫臣民用劳役的形式纳贡。很显然，这一点建立在领袖与平民之间具有一种社会契约的基础之上，且每个人都据此认识到了谨慎管理水源和预见潜在风险所带来的益处。回顾起来，这个方面在奇穆社会里似乎组织得更加严密；只不过，无论领导层多么高效，专业的农耕知识（即当地的环境知识）和以社群为基础的劳动力显然都极其重要。在靠近灌溉工程的农耕村落之间起着黏合作用的亲族关系，也是如此。社群的合作劳动无比重要。中央集权的专制统治负责调配劳动力，但对本地的了解和亲族纽带，却将各个方面团结了起来。此外，还有近海的鳀鱼渔场；故在干旱年份，这里也有多样化的粮食供应，足以养活百姓。
我们可以将这种情况与高地上的蒂亚瓦纳科比较一下；那里的粮食盈余既取决于降雨，也取决于最终以社群为基础的灌溉规划。长期性的干旱降临之后，蒂亚瓦纳科统治者们的中央集权势不可当地解了体，而整个国家也分崩离析了。可在农村地区，当地社群由于拥有种种紧密的亲族联系，所以存续了下来。
令人震惊的高原：蒂亚瓦纳科（公元7世纪至12世纪）
阿尔蒂普拉诺（altiplano，即西班牙语里的“高原”一词）紧挨着奎尔卡亚冰盖南部边缘；这就意味着，此地钻取的冰芯会敏锐地反映出气候变化的情况。的的喀喀湖位于奎尔卡亚以南，相距仅有120千米；从此湖中钻取的沉积物岩芯，则提供了第二种关于降水的准确来源。所以，问题就在于：过去的人是如何对冰芯中所记录的气候变化做出反应的呢？对考古学家而言，幸运的是，蒂亚瓦纳科属于南美洲已知的、哥伦布到来之前（pre Colombian）的最大遗址之一，它就位于离的的喀喀湖畔不远的地方。
公元7世纪至12世纪初，蒂亚瓦纳科逐渐发展成了一个主要的城邦。[12] 据钻取的冰芯所示，在这差不多5个世纪的时间里，气候普遍温暖且相对湿润。虽然其间也有比较干旱的时期，但气候相对稳定。冰芯当中含有一层层的风积物；这些风积物，来自城市周围面积广袤和阡陌纵横的台田系统。据我们所知，光是蒂亚瓦纳科的腹地，就有大约19,000公顷这种农田。在城邦的全盛期里，全国的农业全都依赖由村落社群兴建和维持的这些农田系统。产量最高的田地都位于高原上的战略要地，就是那些被灌渠环绕的地块。连四周那些灌渠中的淤泥，也为台田原本肥沃的土壤提供了丰富的养分，而当地的主要家畜美洲驼的粪便也是如此。降水丰沛的时候，高位地下水和灌渠会浸润田地，不但可以提供充足的水分，还可以极好地保护生长中的作物免受霜冻之害。这种浸润，与最负盛名的作物即玉米的成功极为相关。
蒂亚瓦纳科的农民也种植土豆——这是当时平民百姓的主食，但同样容易被高地上的霜冻所毁；他们还成片成片地种植块根落葵，这种植物的根块颜色鲜艳，样子跟土豆一样，叶子则可食用，像是菠菜。台田农业如此多产，以至于从公元7世纪至12世纪初期，村民们开发出了大片这种阡陌交错的田园。局部的农田系统最终变成了精心整合的地区性系统，提供的粮食盈余既养活了一个政治精英阶层，支撑起一种复杂的意识形态和各种宗教信仰，还广泛销往了各大低地和沙漠地区。
蒂亚瓦纳科遗址，玻利维亚
当时，在蒂亚瓦纳科这个政治与宗教中心周围从事农耕生产的“城郊”地区，可能生活着2万人。蒂亚瓦纳科城宏伟壮观，城里不乏巍峨雄壮的建筑。其中有一个巨大的下沉式场院，名叫“卡拉萨萨亚”，坐落于一个铺着石头的土台之上。不远处，一排笔直的石头围成了一道呈长方形的围墙，附近一扇大门上则刻有一个拟人化的神像，人们有时称之为“维拉科查”[13] 。宗教建筑群的附近坐落着一些较小的建筑、场院和巨大的雕像；它们都是一种强大图腾的组成部分，这种图腾以秃鹰和美洲狮为特点，外加一些拟人化的神灵，且神灵身边还跟着一些地位较低的神祇或者信使。蒂亚瓦纳科的中心是一个极其神圣的地方，由一些姓名不详的半神贵族掌管着。这个精英阶层站在一个精心组织的王国的顶端实施统治，王国依靠畜牧业和自给农业支撑着；其规模之大，以至于考古学家如今仍然能够在城市四周废弃已久的台田里看到犁沟的遗迹。
在这个高原国家的表象之下，隐藏着一些强大的经济与政治力量；该国的繁荣，很大程度上依靠当地的冶铜业，再加上的的喀喀湖南岸及其与遥远的沿海地区之间进行的其他贸易。利用美洲驼形成的非正式贸易网络，将这个高原城邦与大约325千米以外一个距离太平洋不远的殖民地莫克瓜联系起来了。这种殖民开拓活动并非偶然，因为两个中心都地处一个肥沃的玉米种植环境的心脏地带。查尔斯·斯坦尼什（Charles Stanish）和其他人曾在的的喀喀湖盆地西南部进行实地考察，他们既发现了这座城市，还在同一个南方河谷中找到了其他两座与蒂亚瓦纳科具有密切文化联系的大型城镇。[14] 在数个世纪的时间里，有无数人曾经生活在那儿。其中有些人还曾到处游历。在当时距海岸不远的昌昌城中的一座大型公墓里，长眠着 10,000 多个与地处高原的蒂亚瓦纳科有密切联系的人。
蒂亚瓦纳科中部与其周边遗址之间的贸易，似乎相对不那么正式，但涉及了来自周边地区的、该国心脏地带无法获得的商品与货物的流动。不同于后来的印加人，蒂亚瓦纳科人并未付出什么努力，去维持一种正式的道路系统。不过，他们确实在低海拔地区保持着一些殖民地，其中的居民与高原上的创始社群之间保持着密切而长久的联系。当时的大部分贸易，都掌控在历史悠久的贸易路线沿途那些具有牢固人际关系的亲族群体与商人手中。当时的驼队数量，有可能达到了数百支（如今数量少得多了）；而从现代的观察结果来看，这种驼队每天能够走上15千米至20千米。这种贸易，将该国的意识形态传播了出去，以黏土器皿与艺术的形式加以表达，从而强化了蒂亚瓦纳科在面积广袤的高原与低地上的经济与政治权威。即便是蒂亚瓦纳科城邦土崩瓦解之后，这种贸易也仍然进行了下去。
忽冷忽热
我们在前文中已经提到蒂亚瓦纳科在气候相对温暖和稳定、降雨也较以前更多的那几个世纪中崛起的过程。与玛雅人的情况一样，当时蒂亚瓦纳科的农业不断扩张，台田面积大增，人口密度也上升了。那几个世纪可谓黄金时代，蒂亚瓦纳科经历了一场大规模的建设与扩张，而其统治者的威望与宗教势力则主宰着辽阔的高原，以及遥远而气候干旱的沿海地区。不过，这种状况并没有持续多久。
奎尔卡亚冰盖上的冰芯与的的喀喀湖中的钻孔取样表明，公元1000年前后蒂亚瓦纳科及其领地遭遇过一场大旱。[15] 降雨量急剧减少，的的喀喀湖的水位也在公元1100年以后下降了12米多。湖岸明显退却了数千米之远，导致大量的台田陷入了无水可灌的境地。与此同时，当地的地下水位下降，远低于之前数个世纪的正常水平了。许多水力循环系统曾经极其巧妙地维持着附近的沟渠，此时却变得毫无用处；由湖边往内陆而去的引水系统尤其如此。
剧烈的环境变化，出现在人口数量不断增加、人口密度也日益上升的一个时期。以前的沼泽地带是进行精耕细作的理想之地，如今则变成了比较干旱的环境。尽管人们随即大幅降低了农耕生产的集约化程度，还种植了种类更多的作物，可他们已经无力创造出以前那样富足的粮食盈余了。寥寥几代人过去之后，由城邦统治者兴建和管控的那种精心组织的大规模农耕体系，就再也行不通了。曾经支撑着蒂亚瓦纳科根基的那种农耕体制崩溃了。严重的干旱，导致蒂亚瓦纳科这个城邦在经历了数代人的经济、政治与社会动荡之后，就此土崩瓦解。日益分化、竞争激烈的农业与畜牧业经济发展起来，不可避免地带来了严重的政治与经济影响。[16] 在一些灌溉条件较好的地区，成就斐然的地方领袖纷纷获得独立，摆脱了这个统治者长久以来都依靠其强大的神圣血统及其与神的联系来实施统治的国家。这些变化，出现在公元1000年至1150年之间。
与玛雅人的情况一样，蒂亚瓦纳科城邦的解体也是一个复杂而不规则的过程。人们继续居住在蒂亚瓦纳科的部分地区，以及附近卡塔里河谷中的一个重要农耕区，直到 12 世纪。宗教仪式继续举行，并未中断。传统的生活方式，也在一个看似漫长而混乱的解体过程中留存了下来。
奎尔卡亚冰盖上的冰芯表明，干旱继续在这一地区肆虐；13 世纪和14世纪出现过一场尤其漫长的旱灾，而公元1150年左右那段不规律的变暖期里也出现过一次（此时正值“中世纪气候异常期”，即欧洲变暖的那个时期，我们将在第十一章里看到）。在这种反常的炎热气候中，蒂亚瓦纳科与北方安第斯高原上另一个伟大的城邦瓦里在经济和政治上最终都崩溃了。到了此时，各个社群都已从河谷谷底与位置较低的河谷山坡迁往海拔较高的地区；人们认为，海拔较高的地区较易获得水源。
由于台田无法再耕作下去，其中的许多社群便将蒂亚瓦纳科人曾经忽视、以前未被开发和无人居住的地方性环境利用了起来。这种做法，对高原社会产生了巨大的影响。在一度繁荣兴旺的卡塔里河谷，农民都迁移到了无数座较小的村落里；它们的规模，只有蒂亚瓦纳科全盛时期的四分之一。以前数个世纪里精心形成的社会等级制度，以及曾经将人们与此时业已遗弃的城市维系在一起、有时必定需要人们像奴
隶一样奉献的政治与宗教活动（其中还包括节庆宴飨），全都一去不复返了。要想生存，就意味着他们必须离开蒂亚瓦纳科附近那些一度富足的农耕环境，迁往海拔更高、更靠近冰川水源且容易防御的地方。到了公元1300年，修建在山巅的城寨要塞已经随处可见；考古学家发掘出的遗骸表明这里出现过暴力，或许还发生过地方性战争。[17] 经过了长达5个世纪不间断的台田农耕，出现了一座座拥挤的城市中心之后，肆虐的干旱导致的的喀喀湖周围的农业耕作在随后的数个世纪里都难以为继了。在 15 世纪中叶印加帝国掌控这一地区之前，阿尔蒂普拉诺高原及其毗邻的高地上几百年间都没有出现过人口稠密、繁荣发展的城镇。
实际上，人们直到现代才停止台田耕作。这种耕作方式，是美国的艾伦·科拉塔（Alan Kolata）和玻利维亚的奥斯瓦尔多·里维拉（Oswaldo Rivera）这两位考古学家“重新发现”的，他们曾研究蒂亚瓦纳科以北约 10 千米一些废弃的台田。[18] 他们的发掘，穿过了一些台田与附近的沟渠，还穿过了一些曾经有人居住的土丘，目的是揭示人们为改善排水状况和把沟渠中的淤泥铺到田地里而采取的措施。在考古学家克拉克·埃里克森（Clark Erickson）、当地农民、一群农学家和其他人的参与下，他们启动了一个旨在恢复传统耕作方式的项目。他们一起精确地复制出了一块台田，并且只使用传统的工具，比如脚踏犁。结果表明，这块新辟之地不但大获成功，还证明了小家庭与亲族群体可以轻而易举地建造、耕作和维护这种田地。随后进行的对照实验项目，已经让高原上的许多农民开始采用这种失传已久、曾经支撑过一个完整的文明社会的台田耕作方法。
由此我们再次得知，传统的农耕知识在当今世界上仍然具有重大的意义。可惜的是，在我们能够将其应用到正在变暖的世界中去之前，这种知识中的大部分正在消失。如果不吸取过去的教训，我们就将面临危险。
[1] L. G. Thompson et al., “A 1500-Year Record of Climate Variability Recorded in Ice Cores from the Tropical Quelccaya Ice Cap,” Science 229 (1985): 971–973.
[2] Michael Moseley, The Inca and Their Ancestors, 2nd ed.(London and New York: Thames & Hudson, 2001)，这是一部旁征博引的综合性作品。
[3] Ruth Shady and Christopher Kleihege, Caral: First Civilization in the Americas. Bilingual ed. (Chicago: CK Photo, 2010).
[4] 关于莫切人：除了Moseley, The Inca and Their Ancestors，
请参见Jeffrey Quilter, The Ancient Central Andes (Abingdon,
UK: Routledge, 2013)。
[5] Walter Alva and Christopher Donnan, Royal Tombs of Sipán
(Los Angeles: Fowler Museum of Cultural History, 1989). 更
新之作：Nadia Durrani, “Gold Fever: The Tombs of the Lords
of Sipan,” Current World Archaeology 35 (2009): 18–30。
[6] L. G. Thompson et al., “Annually Resolved Ice Core
Records of Tropical Climate Variability over the Past 1800
Years,” Science 229 (2013): 945–950.
[7] Brian Fagan, Floods, Famines, and Emperors: El Ni.o and
the Fate of Civilizations. Rev. ed. (New York: Basic Books,
2009), chap. 7，其中为普通读者进行了描述。
[8] Michael Moseley and Kent C. Day, eds., Chan Chan: Andean Desert City (Albuquerque: University of New Mexico Press, 1982).
[9] Brian Fagan, The Great Warming (New York: BloomsburyPress, 2008), chap. 9，其中进行了大致的描述。
[10] Charles R. Ortloff, “Canal Builders of Pre-Inca Peru,” Scientific American 359, no. 6 (1988): 100–107.
[11] Tom D. Dillehay and Alan L. Kolata, “Long-Term Human Response to Uncertain Environmental Conditions in the Andes,” Proceedings of the National Academy of Sciences 101, no. 2:4325–4330.
[12] Alan L. Kolata, The Tiwanaku: Portrait of an Andean
Civilization (Cambridge, MA: Blackwell, 1993). 还有两卷编著
作品，它们属于详尽的专著：Alan L. Kolata, ed., Tiwanaku and
Its Hinterland: Archaeology and Paleoecology of an Andean
Civilization, vol. 1: Agroecology and vol. 2: Urban and Rural
Archaeology (Washington, DC: Smithsonian Institution, 1996
and 2003)。
[13] 维拉科查（Viracocha），印加神话中的创世神，被奉为众神之王。——译者注
[14] Charles Stanish et al., “Tiwanaku Trade Patterns in Southern Peru,” Journal of Anthropological Archaeology 29(2010): 524–532.
[15] 这一节在很大程度上参考了Lonnie G. Thompson and Alan L. Kolata, “Twelfth Century A.D.: Climate, Environment, and the Tiwanaku State,” in Megadrought and Collapse: From Early Agriculture to Angkor, ed. Harvey Weiss (New York: Oxford University Press, 2017), 231–246。
[16] R. A. Covey, “Multiregional Perspectives on the Archaeology of the Andes During the Late Intermediate Period (c. A.D. 1000–1400),” Journal of Archaeological Research 16 (2008): 287–338.
[17] E. Arkush, Hillforts of the Ancient Andes: Colla Warfare,
Society, and Landscape (Gainesville: University Press of
Florida, 2011). See also E. Arkush and T. Tung, “Patterns
of War in the Andes from the Archaic to the Late Horizon:
Insights from Settlement Patterns and Cranial Trauma,”
Journal of Archaeological Research 219, no. 4 (2013): 307-369; Alan L. Kolata, C. Stanish, and O. Rivera, eds., The Technology and Organization of Agricultural Production in the Tiwanaku State (Pittsburgh, PA: Pittsburgh Foundation, 1987).
[18] Clark L. Erickson, “Applications of Prehistoric Andean Technology: Experiments in Raised Field Agriculture, Huatta, Lake Titicaca, 1981–2,” in Prehistoric Intensive Agriculture in the Tropics, ed. I. S. Farrington. International Series 232 (Oxford: British Archaeological Reports, 1985), 209–232. 还有一篇论述这个地区传统农业的宝贵论文：Clark Erickson, “Neo-environmental Determinism and Agrarian ‘Collapse’ in Andean Prehistory,” Antiquity 73(1999): 634–642。
第八章查科与卡霍基亚（约公元800年至1350年）
公元1100年前后，美国佛罗里达州派恩岛海峡。独木舟静静地穿过红树林沼泽中一条狭窄的水道，驶入了开阔水域。一段长长的麻绳和一根插到水底的杆子，让小船停到了合适的位置。船上的夫妻二人撒下一张细细的渔网，任由网子下沉，然后耐心地等待着。他们拽了拽，觉得渔网很沉，稍微动了动。他们收了网，把不断挣扎的钉头鱼拉到船上，然后继续前进。但是，船桨触到了水底。划桨者在心中暗暗记住了这个地方，然后把船划入了较深的水域。近来天气较为寒冷，这里的水深在不断变化，所以大家都开始日益主要靠海螺和其他可食用的软体动物为生了。
如今佛罗里达州东南部的美洲原住民卡卢萨族曾经在一种地势低洼的沿海环境中繁衍生息，以种类繁多的鱼类和软体动物为食。人人都靠船只谋生，住在紧凑的永久性定居地，因为这里高地很罕见，人口流动起来也很困难。食物供应虽说充足可靠，但海平面从来都不是永久不变的。海平面上升或者下降几厘米，就有可能毁掉一个海草渔场，或者毁掉盛产牡蛎或海螺的地方。他们几乎不可能将食物储存起来，故每座孤立的村落都靠着独木舟，在一个贸易和互惠互利对所有人都有益的社会里与其他村落保持联系。从根本来看，将所有人团结起来的那种黏合剂是无形的，那就是他们的经验性知识，以及他们在一种复杂的仪式生活中体现出来的超自然信仰。
无形领域在古代北美洲人的生活中居于核心位置。智人从 15,000 多年的漫长岁月的一开始便成功地适应了北美洲的各种迥异的环境：从严酷的北极苔原，到温带森林，再到占据了西部大部分的荒芜、干旱地区。美洲原住民通过数百代人的口耳相传，将这些适应措施的奥秘，以及与之相关的大量知识传了下来。其中很多知识曾帮助人们应对过各种各样的气候变化，直到19世纪仍然保存得很完整。许多知识如今依然留存于世，既铭刻在赞美诗与歌曲里，也铭刻在人们谨慎珍藏、很少与他人分享的不成文知识当中。全球气候变化中的重大变化，比如大气与海洋之间持续不断的相互作用、厄尔尼诺现象、严重的干旱周期以及导致海平面大幅上升的气候变暖等等，就是无数成功与不成功、牢牢立足于传统经验与知识的地方性适应措施的背景。直到如今我们才开始认识到：可持续性与面对这些变化时的韧性，是当代加拿大与美国的美洲原住民历史中的两个主要因素。
在本书中，我们只能描述几个例子，但它们代表着我们的知识具有巨大的进步潜力，对我们如今关于未来气候变化的论争具有重要的意义。
干旱与渔民（公元前1050年至公元13世纪）
赤道太平洋表面海水温度的不断变化，给美国加州既带来了干旱，也带来了降雨，并且次数极多，变幻莫测。数千年来，生活在沿海与内陆地区的狩猎与采集民族，都曾以我们熟悉的对策适应干旱或者洪水。[1] 他们顺应各种气候力量，在干旱年份里依靠永久性的或者可靠的水源供应，必要的时候还会吃一些不那么理想的食物。许多群落都倚重各种各样的橡树，摘取易于储存、营养也很丰富的橡子为食。加州南部沿海从事渔业的社群，则是利用圣巴巴拉海峡的自然上升流，以鳀鱼为主食，辅之以橡子。[2] 与其他从事狩猎和采集的社会一样，这里的人们也是通过焚烧干草的手段来促进新植物生长或者吸引猎物，从而对所处的环境进行“管理”。干旱降临之后，许多群落都会退回到沼泽或者湿地环境中去。和往常一样，将风险降至最小的传统对策与灵活性、机会主义结合起来，就确保人们能够在各种干旱与半干旱地区生存下来。
像圣巴巴拉沿海地区的丘马什族这样的渔民，能够毫不费力地应对厄尔尼诺之类的短期气候变化。较长期的气候变化就是另一回事了；如今，我们可以从深海岩芯、湖泊岩芯与树木年轮中看出来。幸运的是，人们从圣巴巴拉海峡中钻取了一根长达198 米的深海岩芯，其中的17米岩芯中记录了自“大冰期”以来此地的气候变化情况。[3] 有孔虫（浮游生物以及其他类似的简单生物）沉积物的聚积速度很快，故非常适合用于研究高度敏感的环境情况。由道格拉斯·肯尼特与詹姆斯·肯尼特这对父子组成的一个研究团队利用有孔虫与放射性碳定年法，获得了一幅显示过去3,000年间每隔25年海洋气候变化情况的高分辨率图像。
第八章与第十三章中提到的北美洲遗址
肯尼特父子发现，海洋表面平均温度的变化幅度高达3℃。可公元前2000年之后，气候就变得更不稳定了。从人类的角度来看，生活变得更加复杂，因为沿海渔场的产量每一年都有可能出现巨大变化。海岸上升流的强度是一个关键指征，标志着富含养分的低温海水上升到海面的时期。这种上升流，极大地提高了当地渔场的产量。通过研究岩芯中的深海有孔虫和浅海有孔虫，肯尼特父子还发现，从公元前1050 年至公元450年，海水温度相对较高、较平稳。海面水温较高导致自然上升流减少，故渔场产量也较低。从公元450年至1300年，海水温度大幅下降，比“大冰期”以来的水温中值低了大约1.5℃。在公元950年至1300年这三个半世纪的时间里，海洋上升流特别强劲，导致各个渔场的产量都大增。公元 1300 年之后，海水温度又平稳下来，开始逐渐上升。到了公元 1550 年，上升流的强度已经减弱。有意思的是，在公元500 年至1250 年间，海洋表面温度下降与海洋上升流增加的时间，与出现地区性干旱的时间相吻合。（公元800 年至 1250 年这段干旱周期，大体与“中世纪气候异常期”相一致。）在美国西部的许多地方，内华达山脉的树木年轮序列中也记录了类似的干旱周期；其中一个序列中记录了两场旷日持久的干旱，分别持续了200多年和140多年。不管以什么标准来衡量，它们都属于特大干旱。
长久以来，圣巴巴拉海峡地区的丘马什民族及其祖先都在一个被世人误称为“伊甸园”的地方繁衍生息，并且以此闻名；那里有资源丰富的近海渔场，陆地上的橡实收成也很充足。不过，就算是在一个个降雨充沛、渔获丰收的好年景里，许多群落也是过一年算一年。虽说公元450年之后海水温度的下降改善了渔业状况，但要养活的人口也更多了。接下来的八个半世纪里干旱周期频繁，虽然有可能没给沿海地区带来太大的问题，却给内陆地区造成了重创。随着人口增加，部族领地的边界划分也变得极其清晰了。部族首领之间不断争夺领地和橡树林的控制权，并且为了永久性水源而争战。从一些墓葬的遗骸中我们得知，当时偶尔有营养不良的现象，还有受过外伤的人；这些遗骸可以追溯到公元1300年和1350年前后，当时弓箭开始出现。在降水变幻莫测、粮食供应高度本地化、政治竞争与社会竞争都很激烈的地区，深受气候压力之苦的群体之间爆发一场场短暂而激烈的局部冲突，是在所难免的事情。
公元 1100 年以后，丘马什社会发生了深刻的变化；当时，暴力与持久的饥荒（或许甚至还有当地的族群消亡）成了普遍存在的现象。定居地的规模变得越来越大，人们住得更近、更集中了。随着首领家族领导的世袭精英阶层制定了各种有力的机制来控制贸易、解决争端和分配食物，许多大型定居地和较小的定居地都形成了等级制度；有些地方仅仅相距数千米，食物资源方面却差异巨大。人们用舞蹈和其他的宗教仪式，通过一种被称为“安塔普”（antap）的联盟，确认了这种新的社会秩序；“安塔普”发挥了一种社会机制的作用，可以把相距甚远、有权有势的个人联合起来。因此，丘马什族一直存续到了 16 世纪西班牙殖民者来到美洲的时候；在一种动荡不安的政治环境下，合作确保他们能够在充满挑战的自然环境中生存下来。[4] 丘马什族的这个例子表明，在食物供应不一定充足的社会中，精心控制的传统仪式可以提升整个社会的可持续性与韧性。
在公元10 世纪至13 世纪的“中世纪气候异常期”里，丘马什族的渔场曾因得益于海洋中的自然上升流而产量大增。还有两个重要的社群也是如此：美国西南部的查科峡谷，以及位于密西西比河的“美国之底”、靠近如今圣路易斯的卡霍基亚。尽管两地相距约有1,500千米（对于他们是否知道彼此存在的问题，世人尚存分歧），但这两个社群都是一度崛起，然后在12世纪至13世纪解体的。它们存续的时间跨度，与“中世纪气候异常期”相一致；当时的人寿命短暂，而在不到15代人的这段时间里，气候条件较为温暖、湿润。
查科峡谷：一场气候踢踏舞（约公元800年至1130年）
圣胡安盆地的范围，包括了美国新墨西哥州西北的大部分地区，以及与该州毗邻的科罗拉多州、犹他州和亚利桑那州的部分地区。[5] 这里有辽阔的平原和众多的山谷。盆地的四周，是一些小型的台地、孤峰与低矮的峡谷。查科峡谷是一个壮观的宗教仪式中心和土木建筑群，以其中的9处多层式“大房子”或者说大型的普韦布洛（即印第安村落）而闻名。它们的内部和四周还有2,400多处大小不一的考古遗址。在公元800 年至1130 年间的300多年里，这个地区曾经生活着密度惊人的农耕人口，人们住得很近，而从一座座露台与一个个广场上不断传来的嗡嗡低语和一阵阵袭来的气味——包括北美蒿属植物、人们身上的汗液以及食物腐坏等各种气味——就是他们日常生活的写照。他们生活在一个贫瘠的农耕地区，却维持着一种可持续的农耕系统；那里的降水量变幻莫测，每年只有200毫米左右，并且变化很大。归根结底，一切都依赖于谨慎细致的水源管理。[6]
查科文化的核心区域坐落在查科峡谷的中间地带，如今称为“查科峡谷国家纪念公园”。这里最负盛名的普韦布洛沿着一侧的查科河绵延达 17 千米，此河会不定期地从峡谷当中穿过。在所有的“大房子”里，“普韦布洛波尼托”最为有名；这是靠近一座中央广场的一群呈半圆形排列的房间，其中还有曾经位于地下的圆形礼堂，或称“基瓦”（kiva）。[7] 每处“大房子”都曾经是一个生机勃勃的地方，经常出现派系斗争与社会关系紧张等现象。这处遗址本身，有可能是作为一个圣地建成的，其标志就是附近的峡谷崖壁上有引人注目的岩层。普韦布洛波尼托也坐落在显眼的“南隘”对面；这个隘口，会把夏季的暴风雨导入查科峡谷的心脏地带。
起初，在公元860年至935年间，普韦布洛波尼托还属于一个砖石建筑的小型定居地，是一个普普通通的弧形之地，但也是一个十分具有灵性的地方。其中的居民，都生活在一个包括了天空、大地与地狱的分层世界里。他们的村落叫作“西帕普”（sipapu），也就是从地下世界出来的地方。他们举行的复杂仪式，都是围绕着夏至、冬至以及日月的运行更替进行的。普韦布洛人的世界向来都和谐、有序，他们的基本价值观则在戏剧表演中得到了再现。群体比个人重要；人们专注于维持的那种人生，过去一直如此，将来也仍然不会改变。查科峡谷的生活，以玉米耕种和宗教信仰为中心；这里气候干旱，种种严酷的现实决定了人类的生存。
不过，在一个日益复杂与更加政治化的时代，由于越来越多的新兴领袖渴望获得更大的权力与宗教权威，其他一些因素也开始发挥作用了。到了公元1020年，普韦布洛波尼托已经与宗教有了很深的联系。公元1040年之后，这里再次开始大兴土木。在不到30年的时间里，普韦布洛波尼托便变成了一个有如迷宫一般、着实引人入胜的复杂建筑群。它起初属于一个住宅区，但接下来变成了“大房子”，一座与宗教及政治密切相关的仪式性建筑，其中储存空间巨大，却没有几个永久性的居民，只是到了夏至、冬至和举行其他重大活动时，才会有人把那里挤得满满当当的。
查科峡谷里的农民，依靠各种各样的水源管理制度来灌溉庄稼。他们开垦了查科河两岸的冲积平原和悬崖之上的斜坡，并且在雨水充沛的年份靠洪水进行耕作。人们运用一系列科学方法［其中包括机载激光雷达（LiDAR）勘测］，对久已淤塞的沟渠进行考古发掘，钻取沉积岩芯，并且利用锶同位素研究水源之后，我们得知，查科的农民曾经通过引导径的方法，利用过各种各样的水源。[8] 一个个由人工灌渠与土沟构成的复杂系统，成了适合当地条件的一种多层面灌溉系统中的组成部分。变化迅速的降雨模式和变幻莫测的环境，要求整个社会随机应变，通过部署大房子与小定居地的劳力，对突如其来的雨水丰沛和水源稀缺做出反应。与居住在大房子里的精英阶层息息相关的种种强大有力的宗教关联，既强调了农耕，也强调了水源管理。对普韦布洛波尼托墓葬进行的 DNA（脱氧核糖核酸）研究证实，母系血统是查科农业获得成功的一个关键因素，因为他们的宗教活动与生育、水两个方面都息息相关，故女性在水源管理方面有很大的发言权。[9]
在女性属于社会的重要成员且常常担任宗教仪式头领的一种文化中，亲族关系、遗传与保护珍贵的水源供应几个方面都极其重要。普韦布洛波尼托的领导权属于世袭制，带有宗教性且强大有力。文化秩序则以种种无常的现实为中心，比如无法预测的水源供应、天空，以及在周围地形衬托之下显得或明或暗的天体。
查科领导权的这种集中化，有可能维持过一种社会制度，它曾经不断面临变幻莫测的环境条件与气候变化。不过，这种集权也对整个地区产生了轻微的影响。确保领导层能够对土地与不断变化的水源供应加以监测的各种社会控制手段，连同在短时间里调配劳力，就是长期生存背后那种风险管理中的基本要素。
归根结底，查科社会之所以成功生存，与其说是因为这里有强大的领袖，倒不如说是因为这里的家庭具有灵活的自主性；这种自主性，受到了一种信念的引领，这种信念认为大部分劳动最终都是为了整个社会的共同利益。在圣胡安盆地那样的干旱环境里，没人能够做到自给自足；这一点，就是一些精心设计的宗教仪式曾经将整个社会团结起来的重要原因之一。至日仪式以及纪念每年农事中一些重要时刻的仪式活动，将人们团结起来，使之能够在亲族关系与义务远远超出了峡谷范围的一种环境中生存下去。在一个具有各种互惠关系，从而将住在数千米以外的亲族群体联系起来的社会中，不可能再有其他任何一种团结方式；这些互惠关系，有时反映在陶器的风格上。等到一个地方食物充足，而另一个地方食物有限的时候，这些关系就会发挥作用。在物资匮乏的时期，人们会搬到水源供应较充足的地方与亲族一起生活，对方也可以指望自己有难时同样能够投奔亲族。在一个受到年年改变节奏的气候变化所影响的社会里，合作、人口流动与韧性之间息息相关。查科人与气候之间的关系，有如一种复杂的舞蹈，有如农民与不停循环变动的降雨、气温、生长季节之间的小步舞。气候设定了一种快速而灵活的步速。它的人类“舞伴”，必须灵活、敏捷地对来自大地与天空的暗示做出反应，否则的话，这种“舞蹈”就会以灾难而告终。对此，查科人都老练地做出了反应。
有4种主要的天气模式会对圣胡安盆地与科罗拉多高原产生影响。湿润的极地太平洋气团从西北而来，是由往南与东南方向移动的气旋性风暴带来的。到了夏季，这种情况就会反过来；此时，源自墨西哥湾那种温暖湿润的热带气流会带来降雨，偶尔还会有太平洋上的温暖气流入侵，导致雨水更多。山脉的抬升作用，有时也会带来大量的局部性夏季雷雨，主要出现在7月份到9月初之间。不过，这里每年都只有少量降雨，并且每年都有相当大的变化。一切都取决于数千千米以外的气团运动和当地的地形地势。在相距仅有数千米之远的地方之间，雨量有可能差异巨大。
整个盆地夏季炎热，冬季寒冷，生长季约为150天，但在查科峡谷等地势较低的地方，生长季则会短上1个月之久。居住在峡谷里的人，都是任凭变幻莫测、常常还出人意料的气候变化所摆布。像厄尔尼诺之类的短期性全球气候事件，也对每年的农业耕作产生了深远的影响。
当时的查科人，很可能没有意识到长期气候变化的影响，因为活着的一代人与历代祖先具有相同的基本适应力，我们可以称之为一种“稳定性”。不过，每个查科农民都非常清楚那些为期较短、出现频率较高的变化，比如年复一年的雨量变化、长达10年的干旱周期、季节性变化等等。干旱、厄尔尼诺现象带来的降雨以及其他类似的气候波动，需要他们采取临时性的和高度灵活的调整措施，比如耕作更多的土地、维持两三年的粮食储备、更多地依赖野生的植物性食物，还有在整个地区进行迁徙。
这些对策，在数个世纪里都很有效；只要查科人的生活方式具有可持续性，耕作的土地上要供养的人口远低于每平方千米能够养活的人口数量，这些对策就很有效。然而，当人口增长到接近土地的承载能力上限时，人们就会日益容易受到厄尔尼诺现象的影响，尤其是容易为短期或者较长期的干旱所害。就算是一年降雨不足、作物歉收或者出现暴雨，也有可能导致一户人家数周或数月之内无以为生。时间更久的干旱周期，则有可能带来灾难性的后果。
树木年轮定年法是西南地区一种基本的气候替代指标。如今，我们已经有了查科峡谷自公元661年至1990 年间的逐年树木年轮记录，以及来自其他替代指标的数据资料；它们表明，此地大兴土木、建造“大房子”的时间与降雨较为充沛的时期相吻合，从而进一步说明，稳定的气候可能导致人口增长。普韦布洛波尼托和其他地方的建造活动，在1025年至1050 年间曾经大增；其间有 3 个时期的降雨量高于平均水平，它们之间隔着短暂的干旱期。即便是在情况最好的年份，圣胡安盆地的农耕环境也很不稳定；只不过，高于往常的地下水位以及较多的降雨，让这个峡谷成了比较安全的地方之一。在1080年到1100年之间，长达20年的干旱给农民带来了很大的麻烦，幸好有高地下水位加以缓解。接下来，这里再次出现了充沛的降雨，而人们也再次加速大兴土木；查科地区如此，而圣胡安盆地北部的阿兹特克与萨尔蒙普韦布洛等地也是如此。
到了1130年，查科居民已经极其依赖于栽培植物，故对同年开始的那场长达50年、其间只短暂中断过一次的干旱，他们根本就没有做好准备。玉米产量大幅下降，野生植物的生长严重受阻。兔子或其他野生动物，也不容易猎取了。公元 1100 年之后，人们曾从圣胡安盆地北部引入火鸡作为替代品，但这种做法并未满足人们对更多食物供应的需求。假如这场干旱只持续了数年，那么“大房子”与一些较小的社群都会幸存下来。可公元1130年之后，那场大旱似乎并未缓解，所以人们开始挨饿。于是，查科人只得求助于一种古老的对策，那就是迁往别的地方。
在查科峡谷，人口流动向来都是一种常见的现象。很久以前，家家户户就已经不断进出这个峡谷了。他们来来去去的原因，可能是某个季节，决定与远在高地上的亲族住到一起，或者通过迁往别处来解决一种长久的纠纷。他们所属的那些历史悠久的社区继续繁荣发展着，每个社区都有各自的庭园与水源供应，拥有获得其他食物与资源的权利。待那场长达 50 年的干旱降临时，这里既没有出现人们大规模迁离的情景，也没有出现成百上千查科人死于严重饥荒的现象。相反，人们是一个家庭一个家庭地离去，有时则是大家族一起迁走。他们前往降雨较为充沛的地区，前往数个世纪以来他们一直维系着亲族关系和贸易联系的群落。
12 世纪查科峡谷的人口外迁，刚开始时跟往常一样，是一个个家庭毫无规律地进出这个峡谷。不过，随着情况恶化，人们种植更多作物的努力并未奏效。地下水位下降了。最终，原本小规模的人口流动就变成了源源不断的迁徙，家家户户都开始迁往其他地方那些正在发展的群落。查科峡谷里的人口日益减少，并且达到了一个临界点，使得那些历史悠久的群落全都突然迁走了。只有少数顽强不屈的村落仍在坚持着，直到他们无法再生存下去。至于留下者的遭遇，我们只能搜集到少量的蛛丝马迹。例如，考古学家南希·阿金斯（Nancy Akins）的骨骼研究显示，到了11世纪，查科峡谷中有83%的儿童都患有严重的缺铁性贫血；这一点，又增加了他们患上痢疾和呼吸系统疾病的风险。
只要仍有降雨，人们就可以耕种新的庭园，庭园主人也可以兴建新的定居之地。公元1080年之后，虽说雨水减少，但大兴土木的热潮并未消退。不过，在某个时间点上，“大房子”里的首领们丧失了他们对复杂宗教仪式的控制权；这样的宗教仪式，曾经为普韦布洛波尼托这类地方带来许多宝贵的奇珍异物，以及像木梁之类的商品。他们再也无力组织精心表演的种种仪式了；数个世代以来，这些仪式都是农耕年份里农事节奏的标志。查科不再是这个世界的精神生活中心。人们逐渐散居到了其他地方。古普韦布洛人跳动的心脏北移到了圣胡安河、科罗拉多州西南部和弗德台地。查科峡谷全然成了一种记忆；但它是一种强大的记忆，深深地镌刻在北方几十个普韦布洛群落的口述传统当中。
查科的历史，始终都以他们与别人、与别的民族、与亲族以及范围狭窄的峡谷之外各个群落之间的关系为中心。我们可以将那里称为查科世界，它以“大房子”为基础，然后变成了一个日益重要的宗教中心。查科的首领们从来没有掌控过偏远地区的群落，但这个世界的不同地区都以不同的方式将自己与这座峡谷联系在一起，他们的目标也各不相同（既是为了他们自己，也是为了查科的首领们）。
如果说查科的瓦解完全是由干旱造成的，那就是无稽之谈，就像用同样的说法解释玛雅文明崩溃的原因是误人子弟一样。查科人始终生活在一个贫瘠的农耕环境里，可持续性方面存在由此带来的种种脆弱性。查科的首领们世世代代都得益于一个降雨量高于平均水平、农耕生产极其成功的时期。这一点，就要求其他社群将上述首领的身份合法化。待到查科没落下去，一系列复杂的事件导致人们遗弃了此地，这个峡谷世界的中心便北移了。就算有东西留存下来，那也是因为人们对祖先的记忆十分有力，相信众神不但掌控着宇宙，还掌控着人类。只是与凡人一样，神祇也有义务将他们的恩赐分享给他人，因为这是一种古老的互惠观念。查科的根基，是三种不言而喻的价值观，即和谐、灵活性与迁徙。同样的原则，在许多古代社会中都居于核心位置，因为后者也敏锐地认识到了韧性、可持续性与风险管理的重要性。如今，对于这些合理的日常生存方法，我们还有许多要学习之处。
因灾迁徙（公元1130年至1180年）
公元1130年至1180年的那场大旱，让查科的“大房子”遭受了重创。随着查科衰落下去，政治势力便向北转移，落到了阿兹特克和萨尔蒙普韦布洛的头领手中。[10] 此时，有两个从事农耕生产的群落获得了一定的突出地位，其中一个以阿兹特克北部的托塔（Totah）为中心，另一个则以福科纳斯地区的弗德台地为中心。随后，那里出现了一轮兴建“大房子”的热潮，并且持续了60年左右。但到了1160 年前后，各种大规模的建造活动都停了下来；从弗德台地中心区域发掘出的木梁表明，在接下来的一场大旱期间，人们砍伐树木的速度有所放缓了。
圣胡安北部的社群不同于查科的农民，他们完全依靠旱地玉米种植为生，并且主要在海拔1 829米以上的地区栽培庄稼。在干旱年岁里，质地疏松的土壤可以养活的人口要比实际生活在这一地区的人口多得多，连严重干旱期间也是如此。在公元10世纪，当地人都住在小而分散的群落里；这种群落，一般由5至10个带有一间“基瓦”与若干间储存室的住宅单元组成。但从12世纪末到13世纪，定居地的规模变得越来越大，农业人口则变得没有那么分散了。许多以前的小村落，都变成了带有多个住宅区的村庄，只不过，它们并未达到查科“大房子”那样的规模。
这里的人口增长一直持续到了 13 世纪中叶，其间无数个各自为政的群落相互争夺农田，争夺贸易路线的控制权和政治权力。随着众多群落纷纷迁到各个峡谷当中能够采取防御措施的地方，袭击与战争也变得普遍起来。这就是“绝壁宫殿”与其他著名的弗德台地普韦布洛的时代，它们出现在峡谷深处，而附近的曼科斯与蒙特苏马两处河谷的普韦布洛则靠着大量的排水系统而繁荣发展起来。这里的人，往往聚居在最多产的土地附近；假如迁徙方面没有限制，或者可以耕作最优质的土地，他们便能在此熬过极端严重的干旱时期。3 个世纪之后，人口密度就从每平方千米13至30人上升到了每平方千米多达 133 人。村落的规模也翻了一番。但是，一旦人口密度接近土地的承载能力上限，而所有最多产的土地也已被开垦，人们适应长久的干旱周期就要困难得多了。
离如今科罗拉多州南部科特斯不远的尘沙峡谷（Sand Canyon）普韦布洛，此时变成了圣胡安北部最大的修有防御工事的社群之一，距一个水源充足的峡谷前部很近。在公元1240 年至 1280 年间，这里有多达700人生活在一堵巨大的围墙之后。有80至90 户人家居住在尘沙峡谷的住宅群里，生活在他们于短短的 40 年间建造、居住然后又将其遗弃的一个普韦布洛村落里。与普韦布洛波尼托不同，这里更像是居住地而非仪式中心；只不过，宗教节庆与至日仪式也是这里一年一度的活动中的一部分。
1280 年，历经了40 年的繁荣之后，尘沙峡谷的居民遭受了一场大旱，其严重程度甚于他们集体经历过的任何一场干旱。此时，气候露出了它的真正面目。精确的树木年代学加上以“帕尔默干旱强度指数”为基础的气候重建，为我们提供了详尽的环境信息。气象学家韦恩·帕尔默（Wayne Palmer）开发出了一种算法，可以利用降雨和气温方面的数据来衡量干旱的严重程度。他开发的指数，已被广泛应用于衡量如今与过去的长期性干旱。一系列重建出来的气候变化、土壤信息、可能的作物生产数据和可以获得的野生食物表明，13 世纪的那场干旱并未彻底破坏尘沙峡谷环境的承载能力。因此，可能有一个人口数量业已减少的群落一直留在该地区，熬过了最严重的干旱期。
由此所需的树木年轮研究既复杂，要求也很高。例如，目前大多数序列使用的都是冷季的湿度条件，它们将被以春夏两季降水研究为基础的曲线所替代。弗德台地的冷杉树提供了一些最强烈的气候信号，因为它们的年轮中记录了前一年秋季、冬季与春季的气候信息。研究人员利用复杂的相关分析法，重现了过去1 529年间每个10年里9月到次年6月的降雨量。如今我们得知，在12世纪与13世纪，弗德台地曾经出现过数场旷日持久的冷季干旱。公元1130年至1180年间的干旱周期，曾经导致冬季与较暖和的月份都出现了严重的旱情。让气候条件变得雪上加霜的是，在整整一百年里，这里的大片地区普遍遭遇过初夏干旱。13世纪初期与末期的旱情最为严重。正如一个世纪之前查科峡谷的情况那样，人们开始迁出这一地区。外迁缓慢地进行着，持续了数十年之久，直到13世纪末整个地区变得空无一人。
最终，圣胡安北部人口分散的过程开始逐渐展开，就像以前查科峡谷的情况一样。在这两个地方，古普韦布洛人都遵循了数个世纪以来的传统，离开了深受干旱困扰的土地；这一过程则见证了战争、苦难，以及农民逐渐往东南而去的迁徙过程，他们来到了雨量变化不大的小科罗拉多河流域、莫戈永高地，以及格兰德河河谷。我们如今所知的美洲原住民部落，比如霍皮族与祖尼族，就是迁往这一地区的古普韦布洛人的后裔。
迁徙曾是解决贫瘠农耕地区人口过多的一个办法。不过，如今的美国西南部也见证了人口急剧增长和主要城市迅速发展的历史，比如凤凰城、图森、拉斯维加斯和阿尔伯克基。随着全球变暖加剧、长期干旱变得更加常见，而将人们迁往水源供应更加可靠的地区也不再可行，这些大城市和大规模农耕生产就给地下水以及其他稀缺水源带来了巨大的压力。同样，对于气候更加干旱、人口更加稠密的未来而言，做出长远规划与思考供水问题都具有至关重要的意义。迁徙这种经典的对策虽说有可能在早期的文明社会中发挥过很好的作用，但它在我们这个时代已经不再是一种可行的选择。
密西西比人（公元1050年至1350年）
密西西比河流域的环境条件，与美国西南地区大不一样。以任何标准来衡量，密西西比河都算得上一条大河，其广袤而呈三角形的流域面积覆盖了美国 40%左右的国土，仅次于亚马孙河与刚果河。密西西比河也是一条反复无常的河流，既有可能带来灾难性的洪水，也有可能带来旷日持久的低水位期，进而导致干旱。人们把圣路易斯附近的那个冲积平原称为“美国之底”，此地土地肥沃、气候湿润，在欧洲人到来之前就早已是人类一个重要的定居中心了。
自公元 1050 年左右开始，卡霍基亚在“美国之底”占据了统治地位；它是当地一个宏伟的仪式中心，也是考古学家口中一个实力强大的“密西西比王国”的政治和宗教中心。[11] 这个伟大的中心，既是一个举行宗教典礼的地方，也是一座繁荣的城市与仪式综合体，横跨密西西比河的两岸。卡霍基亚的中心区域居民稠密并且筑有防御工事，还有一座座壮观巍峨的土丘；在公元1050年到1100年这半个世纪的时间里，这里的人口从大约2,000人迅速增长到了10,000至15 300 位居民。其中的许多人，都是在“中世纪气候异常期”从美国中部的其他地方迁徙而来的移民；这段异常期，就是公元950 年前后至 1250 年前后世界上广大地区的气候都较为暖和的一个时期。
美洲原住民的首领凭借亲族关系、巧妙的政治手腕、长途贸易垄断以及种种据说与精神世界具有密切联系的个人超自然力量，领导着这个宏伟的中心。一些并不可靠的联盟，将卡霍基亚那些组织松散的地区团结到了一起，而后者又是靠效忠、个人与亲族关系等短暂的纽带和一种古老的宇宙观联系起来的；这种宇宙观认为，宇宙中有三个层次，最上层和最下层里都居住着力量强大的超自然生物。其中之一，就是神话中的“鸟人”，它是战士的化身。“巨蟒”则是地狱里一种了不起的生物，它一直都在跟“鸟人”作对。这种宇宙观的政治影响力与精神触角，从墨西哥湾沿岸一直延伸到五大湖区，并且深入了密西西比河的无数支流地区。
卡霍基亚遗址
卡霍基亚是一个独特之地，是当地的美洲原住民适应有利的环境条件、不断增长和越来越多样化的人口，以及需要更多粮食盈余来维持一个日益复杂的酋邦等方面的结果。“美国之底”有适合种植玉米的肥沃土地，有丰富的鱼类和水禽；但由于这里的人口极其稠密，故风险也很大，年景接连不好的时候尤其如此。农业耕作由精英阶层牢牢掌控着，并且随着许多农民迁往地势较高之处，以躲避日益上涨的地下水位和周期性的河流泛滥，农业也扩张到了附近的高地之上。如果没有高地上的农民，那么，生活在中心区域的10,000多人就会面临更大的粮食短缺风险。这里的人也不具有灵活性，无法适应更加恶劣的气候状况。
源自一个覆盖了整个北美洲的干旱测量网络且经过了校准的树木年轮数据，就说明了气候变化的部分情况。与美国西南地区一样，在公元1050年至1100年半个世纪的时间里，这里的气候相对湿润。在这些年里，高地上的人口迅速增长了。接下来，干旱降临，一开始就是1150年之后一个长达15 年的干旱周期。大多数年份里都是旱灾频发，而1150年开始的那个干旱周期，则与我们在前文中业已论述过的西南大旱在时间上相吻合。
获取古代的人口数据通常都是一个问题，因为遗址的数量有可能产生误导作用。然而，卡霍基亚北部密西西比河上一个呈牛轭状的湖泊“马蹄湖”，却证实了此地人口的上述变化。[12] 人们从湖中钻取的两份重要岩芯，提供了1 200年间的粪固醇记录；粪固醇是源自人类肠道中的有机分子，而令我们感到惊讶的是，这种分子竟然在沉积物中存留了成百上千年之久。它们是一种衡量人口数量随着时间推移而变化的替代指标。粪固醇（即人类粪便中产生的固醇）与土壤中产生的 5a-粪固醇微生物之间的高比率表明，这一地区曾经有过大量的人口。低比率则会反映出，该地区以前的人口要少得多。两段湖泊岩芯中不断变化的固醇比率表明，这里的人口在公元10世纪曾快速增长，并在11世纪达到了最大值。到12世纪时，卡霍基亚流域的人口开始减少，并在公元1400年左右达到了最低值。
春季与夏初生长的玉米，是卡霍基亚鼎盛时期一种决定性的主要作物。公元1150年左右人口数量开始下降时，气温与粪固醇的比率也一齐开始下降，直到 13 世纪才停止。但1200 年的一场大洪水也淹没了耕地、粮仓，以及洪泛平原上的无数定居地，它是5个世纪以来的头一场大洪水。[13] 这种大洪水通常出现在春季与夏初，也就是非常关键的玉米生长季里。随之而来的，必定就是严重的作物歉收与粮食短缺。那场大洪水定然重塑了卡霍基亚的模样，因为头领们既无法调派大量人手去清理耕地上的杂物和干燥的冲积物，也无力派人去重建神殿与房屋。尽管许多卡霍基亚人可能已经迁往地势较高的地方，与亲属一起生活，但破坏已经造成。由此我们得知，这个时期人口数量已经开始缓慢减少，人们正在建造防御所用的栅栏，而复杂公共建筑的修建速度也已放缓。
卡霍基亚的领导阶层可能是由几个精英家族组成，而随着“美国之底”人口外迁，这个领导阶层也崩溃了。到1350年时，除了寥寥几个小村落，卡霍基亚已成废弃之地。其中的居民，都散居到了各地；整个周边地区，全都化成了尘土；一座座土木建筑和土丘，则消失在森林之下。
有许多因素构成了以卡霍基亚为中心的各种宗教信仰中的一部分。其中包括生与死两大现实、像水这样普遍存在的物质，以及像变化的月光与黑暗之类的现象，还有他们自己制定的、时长为18.6年的月亮周期。这一点，在他们的一座座雄伟壮观的纪念碑、土丘、灵堂和一些复杂的公共殡葬仪式中体现了出来；除了其他方面，这种殡葬仪式中还有一条通往逝者、穿过积水的土路。以神殿为中心的汗屋[14] ，也在卡霍基亚的宗教生活中扮演了一个角色。这些方面都让人觉得强大有力，只不过，它们取决于人们认为卡霍基亚是世界的中心这种信念，也取决于他们的忠诚。
但是，像查科一样，这个王国及其基础设施在当地留下的痕迹相对轻微。也许是因为持有一种盲目的、狭隘的视角，过度专注于精神领域，所以这个领导地位衰落之后，秩序被打乱了，直到当地一些规模小得多的新中心纷纷崛起方有所好转。具有争议的继承权、一位缺乏魅力的首领、一场成功的反叛，都有可能推翻卡霍基亚的统治者，而历史上也很可能出现过这类事件。团结民众与精英阶层的社会纽带断裂了。形形色色的移民与当地人的幻想都已破灭，他们便纷纷弃“美国之底”而去。根据他们明显没有关于卡霍基亚的口述传统这一点来判断，那些离开此地的人必定曾经心怀一种深刻的疏离感。卡霍基亚从历史中消失了近7个世纪，直到19世纪初才被考古学家发现。然而，“美国之底”并不是全然没有人生活了；那里还有一些耕种玉米的半定居农民和捕猎野牛的狩猎民族，他们比前人更具流动性。[15]
密西西比河流域里像卡霍基亚这样的酋邦都依赖于玉米耕种，以及政治领导与社会领导，这种领导靠向有权势的酋长进献财物来维持。这些首领通过重新分配贡物和共同遵奉的宗教信仰与复杂的仪式（比如汗屋仪式），来维持追随者和一些较小中心的忠诚。这是组织管理等级社会时的一种经典方式，但它也具有一些致命的弱点。一切都依赖于亲属关系与忠诚，可后者在派系斗争盛行的社会中，往往是一种靠不住的品质，比如美国西南地区的普韦布洛人就是如此。查科峡谷与普韦布洛波尼托的历史就是一个典型的例子，说明在一个亲族义务远远延伸到了峡谷狭窄范围之外的社会里，根本力量还在于亲族与社群。在这里，宗教义务是围绕着农事与季节更替，在水源与干旱之间履行的。人们几乎彻底遗弃查科峡谷很久之后，在面对气候变化时，就算是大型的印第安村落中居于领导地位的女性与家族那种相对专制的角色，最终依赖的也是种种古老的亲族关系和迁徙的信条。普韦布洛村庄植根于所在的环境与种种社会关系之中；这些关系盘根错节，让它们存续了一代又一代，直至现代。
中央集权依赖于长期的稳定与可靠的粮食盈余。在许多以亲族关系为基础的社会中，严格的管控（甚至是通过武力进行控制的做法）只能延伸到相对有限的领土之上，或许还会小至方圆50千米的范围。卡霍基亚的情况无疑就是如此，因其影响力与实力的根基就是贸易与复杂的宗教信仰。待重大的旱涝灾害影响到“美国之底”之后，随着气温下降带来重创，卡霍基亚这个酋邦便土崩瓦解了。经济与政治剧变的影响有如涟漪一般，波及整个密西西比河流域。争端加剧，演变成了战争；与其他定居地一样，这个大型中心也修建了栅栏，围起来以防动乱。为了应对内战和邻邦之间为争夺权力而爆发的战争，居民纷纷迁走，故那些重要的人口中心也崩溃了。在密西西比河流域，对玉米盈余和更多外来商品的控制则巩固了政治权力。西班牙征服者经由美国东南部而来的时候，碰到的并不是一个统一的和强大的酋邦，而是数十个筑有防御工事的村落与城镇，它们之间常常还有荒芜之地隔开。当地有些村落的人口达到了数千之多，从而表明那里拥有可以证明西班牙人贪婪之心的巨大财富。不过，他们发现自己陷入了仇恨与对抗的泥淖之中。这些美洲原住民社会的生存与可持续性，取决于种种复杂的政治现实和社会现实；这种情况，完全不同于一些高度集权的国家，比如我们在下一章即将论述的吴哥。
[1] Brian Fagan, Before California: An Archaeologist Looks
at Our Earliest Inhabitants (Lanham, MD: Rowman & Littlefield,
2003); Jeanne Arnold and Michael Walsh, California’ s
Ancient Past: From the Pacific to the Range of Light
(Washington, DC: Society for American Archaeology, 2011).
[2] Lynn H. Gamble, First Coastal Californians (Santa Fe, NM:
School for Advanced Research, 2015)，这是一部供普通读者阅读
的佳作。
[3] Douglas J. Kennett and James P. Kennett, “Competitive and Cooperative Responses to Climatic Instability in Coastal Southern California,” American Antiquity 65 (2000): 379 395. See also Douglas J. Kennett, The Island Chumash: Behavioral Ecology of a Maritime Society (Berkeley: University of California Press, 2005).
[4] Lynn H. Gamble, The Chumash World at European Contact(Berkeley: University of California Press, 2011).
[5] Frances Joan Mathien, Culture and Ecology of Chaco Canyon and the San Juan Basin (Santa Fe, NM: National Park Service, 2005). See also Gwinn Vivian, Chacoan Prehistory of the San Juan Basin (New York: Academic Press, 1990).
[6] 描述查科供普通读者阅读的作品：Brian Fagan, Chaco Canyon: Archaeologists Explore the Lives of an Ancient Society (New York: Oxford University Press, 2005)。关于该峡谷的近期研究成果的论文：Jeffrey J. Clark and Barbara J. Mills, eds., “Chacoan Archaeology at the 21st Century,” Archaeology Southwest 32, nos. 2–3 (2018)。
[7] Jill E. Neitzel, Pueblo Bonito: Center of the Chacoan World (Washington, DC: Smithsonian Books, 2003). See also Timothy R. Pauketat, “Fragile Cahokian and Chacoan Orders and Infrastructures,” in The Evolution of Fragility: Setting the Terms, ed. Norman Yoffee (Cambridge, UK: McDonald Institute for Archaeological Research, 2019), 89–108.
[8] Vernon Scarborough et al., “Water Uncertainty, Ritual Predictability and Agricultural Canals at Chaco Canyon, New Mexico,” Antiquity 92, no. 364 (August 2018): 870–889.
[9] Douglas L. Kennett et al., “Archaeogenomic Evidence Reveals Prehistoric Patrilineal Dynasty,” Nature Communications 8, no. 14115 (2017). doi: 10.1038/ncomms14115.
[10] 这一节参考的文献：David W. Stahle et al., “Thirteenth Century A.D.: Implications of Seasonal and Annual Moisture Reconstructions for Mesa Verde, Colorado,” in Weiss, Megadrought and Collapse, 246–274。亦请参见Mark Varien et al., “Historical Ecology in the Mesa Verde Region: Results from the Village Ecodynamics Project,” American Antiquity 72 (2007): 273–299。
[11] 关于卡霍基亚的文献资料极多。参见 Timothy R. Pauketat,
Cahokia: Ancient America’s Great City on the Mississippi
(New York: Viking Penguin, 2009)，以及同一作者的 Ancient
Cahokia and the Mississippians (Cambridge: Cambridge
University Press, 2004)。亦请参见 Timothy R. Pauketat and Susan Alt, eds., Medieval Mississippians: The Cahokian World (Santa Fe, NM: School of Advanced Research, 2015)；Pauketat, “Fragile Cahokian and Chacoan Orders and Infrastructures,”89–108。
[12] A. J. White et al., “Fecal Stanols Show Simultaneous
Flooding and Seasonal Precipitation Change Correlate with
Cahokia’s Population Decline,” Proceedings of the National
Academy of Sciences 116, no. 12 (2019): 5461–5466.
[13] Samuel E. Munoz et al., “Cahokia’s Emergence and
Decline Coincided with Shifts of Flood Frequency on the
Mississippi River,” Proceedings of the National Academy of
Sciences 112, no. 20 (2015): 6319–6327. See also Timothy R.
Pauketat, “When the Rains Stopped: Evapotranspiration and
Ontology at Ancient Cahokia,” Journal of Anthropological
Research 76, no. 4 (2020): 410–438.
[14] 汗屋（sweat house），美洲印第安人用于与祖先进行精神沟通、
净化身心和洗涤灵魂的地方，其大小不等，多用柳条编制，呈圆形或
者椭圆形，上面用水牛皮或者其他兽皮覆盖，从而围成一个黑暗、密
封的屋子。举行汗屋仪式时，人们会在屋里击鼓、唱歌、祈祷，并按
顺时针方向轮流为自己和家人祈福。——译者注
[15] A. J. White et al., “After Cahokia: Indigenous Repopulation and Depopulation of the Horseshoe Lake Watershed AD 1400–1900,” American Antiquity 85, no. 2 (April 2020): 263–278.
第九章消失的大城市（公元802年至1430年）
富有、美丽而壮观：柬埔寨境内的吴哥窟，是惊人的建筑杰作，据说也是20世纪以前世界上最大的宗教建筑。公元1113 年至 1150 年在位期间，痴迷于权力的统治者苏利耶跋摩二世在高棉帝国的鼎盛时期修建了他的这座皇宫兼庙宇。其规模之大，令人叹为观止。光是主寺加上寺中的莲花塔，就占有215米×186米的面积，并且高出了周围的地面60多米。护城河边的宫墙，长1,500米，宽1,200米。与吴哥窟相比，埃及人祭祀太阳神阿蒙的卡纳克神庙或者巴黎圣母院简直就像是村中的小神殿了。[1]
吴哥窟紧挨着湄公河，此河会在每年的8月至10月间泛滥。泛滥的河水会注满附近的一个湖泊，即洞里萨湖，使之变得浩浩汤汤，长达160 千米，水深16米。待到洪水退去，成千上万尾鲇鱼和其他鱼类会在浅滩出没，使得这里成了地球上最富饶的渔场之一。著名的“大吴哥”，就位于洞里萨湖与水源丰沛的荔枝山之间。吴哥窟周围皆为平原，使之可向四面八方扩张，故有充足的土地来种植水稻。一座座水库和一条条沟渠，将水源输送到数千公顷的农田里，支撑着公元802年至1430年间繁盛兴旺、极其富裕的高棉文明。然而，这里也有一个棘手的问题：在一个若不谨慎实施水源管理就从来没有充沛水源的地区，人们几乎不可能维持作物的产量。即便作物收成充足，不断增长的人口也加大了粮食短缺的风险。吴哥的领导人只有一个选择，那就是砍伐更多的森林、耕种更多的土地，才能养活以不可阻挡之势日益增长的人口。
东南亚地区尤其是湄公河三角洲上的小型城市中心，已经有6个多世纪的漫长发展历史。在公元8世纪和9世纪，这些小型中心为更加分散的城市所取代，后者在 13 世纪发展到了巅峰。一连串雄心勃勃的高棉国王，建立了一个实力强大而更加稳定的帝国。统治者们开创了一种对神圣王权的崇拜，兴建了许多精心设计的复杂中心，其中主要是精美奢华的神庙，比如吴哥窟和附近的吴哥城就是如此。成千上万的平民百姓，曾为一个所有东西完全流向了中央的国家辛勤劳作。公元1113年，国王苏利耶跋摩二世开始利用整个王国内精心组织起来的劳动力兴建吴哥窟，并用洞里萨湖的鱼儿和海量的稻米收成供养那些劳力。[2]
吴哥窟的每一处细节，都体现出了高棉神话的某种元素。高棉人的宇宙观里，包括一个大陆南赡部洲[3] ，以及耸立于南赡部洲以北的世界中心的须弥山。吴哥窟的中央有一座60米的高塔，它效仿的就是须弥山；还有4座塔，代表着4座较低的山峰。这里的围墙，再现了传说中环绕着南赡部洲的那些山脉，而其四周的护城河则代表了乳海，据说神、魔双方曾经在那里搅起过“不死甘露”。
吴哥窟与吴哥城里，到处都是象征着宇宙与宗教的建筑；其中，包括了星象台、王陵与寺庙。一代代研究人员对两地的艺术与建筑都进行过研究，但笼罩在这两座遗址和整个地区之上的茂密植被，却让他们无法进行任何系统性的实地考察。2007年，一个国际研究小组联手启动了一个最先进的项目，旨在利用一系列前沿技术来了解吴哥窟的真实情况及其更广泛的地形环境。这项具有突破性的研究表明，吴哥窟的寺庙建筑群要比人们以前所想的庞大得多、复杂得多。不过，可能更加令人激动的是，研究小组还运用了机载激光雷达技术；这是一种遥感方法，就是利用脉冲激光测量出无人机（或者其他机载设备）到地球之间的可变距离。在吴哥窟这个研究项目中，所捕获的图像让研究小组能够“看透”吴哥窟主庙群周围的茂密丛林，发现一些意想不到的东西。该项目发表了一篇在学术界引起轰动的论文，表明那里存在一个失落的“特大城市”：有一个庞大的道路网络，有池塘、沟渠、狭窄堤岸环绕着的成千上万片稻田、房屋土堆，还有1,000多座小型神庙。[4]
东南亚地区的高棉文明遗址
无边的辉煌
大吴哥地区的城乡面积加起来至少达 1,000 平方千米，并且这个广袤区域里可能有75万至100万人口（这一数据还有待商榷）。与此同时，居住在吴哥窟寺庙群围墙之内的，却只有相对较少的人口（大约为25,000人）。范围更广的定居地与这个宗教—政治—经济精英中心之间的关系，有点类似纽约城与圣帕特里克大教堂所在的曼哈顿中心城区之间的关系，或者大伦敦地区与其中心即圣保罗大教堂俯瞰着的伦敦城之间的关系。这是一片组织得井然有序的绿洲，整个大吴哥地区则在辽阔而有组织的稻田之上延伸。从考古学的角度来看，它会让人想起一度环绕着玛雅那些宗教中心的人口稠密的地区，比如人们最近也用激光雷达技术考察研究过的蒂卡尔与卡拉科尔，只是大吴哥的规模比它们大得多而已。
吴哥窟这个伟大而生生不息的心脏之地，并不是独一无二的。尽管吴哥窟是苏利耶跋摩二世（1113—1150 年在位）所建，但它实际上完工于国王阇耶跋摩七世（1181—约1218年在位）的统治时期。而且，这位面带温柔微笑的国王阇耶跋摩七世（其雕像上的模样就是如此）还兴建了另一座寺庙群，即吴哥城；它名副其实，意思就是“大城”。这将是高棉帝国最后一座都城，也是存续时间最久的一座都城。阇耶跋摩七世兴建的这座新城，坐落在吴哥窟以北约1.7千米处，占地9平方千米，其中心区域有3万至6万居民。还有大约50 万人生活在从市中心向外延伸达15千米的郊区。兴建此城，并不是什么心血来潮的项目。相反，用激光雷达进行的勘测表明，高棉人必定是早就有了兴建吴哥城的想法，因为他们在修建这处寺庙群的半个世纪之前，就已建好了一个路网。四通八达的道路，将寺庙群的整个腹地接入了一个沟渠与道路交织的网络中，后者则延伸到了当时大部分人生活的广阔邻近地区。
这里的一切，都依赖于娴熟的水源管理。早在兴建吴哥窟和吴哥城的很久之前，高棉人就开始修建“巴莱”（baray）了。“巴莱”就是一座座巨大的长方形水库，既可用于储水，也可将多余的水排进烟波浩渺的洞里萨湖；此湖通往洞里萨河，然后注入湄公河。到了公元9世纪，兴建“巴莱”的工作进行得如火如荼，成了一个不变的水源管理系统的基础，并且由此形成了一个规模庞大的人造三角洲。三角洲的北端有输入水道，南部则是一些呈扇形分布的水道，它们位于紧邻吴哥窟的东巴莱湖与西巴莱湖[5] 两侧。[6]
这个巧妙而灵活的水源管理系统使得官吏们几乎可以将水源输往整个平原上的任何方向，然后储存起来或者排入辽阔的洞里萨湖中。可不要把这个系统与埃及或者美索不达米亚地区形成的集中灌溉系统混为一谈。上述各地的基本灌溉技术都很简单，并且都依赖于充足的人力，只有吴哥这个国家能够召集规模充足的劳力，去兴建那些曾经属于吴哥文明之命脉的重要沟渠或水库。澳大利亚考古学家罗兰·弗莱彻曾经恰如其分地把这个系统称为“一种风险管理系统，旨在缓解一个以雨水灌溉为主且稻田有田埂环绕的地区里季风变化带来的种种不确定性”。[7] 从根本上看，他无疑是对的，而他也恰当地称之为一种悖论。
高棉人创造了一个多功能系统，可以应对变幻莫测的季风波动。不过，他们面临着一个严重的长期问题。规模庞大的灌溉设施和他们的管理方式，使得他们在面临重大的气候变化并需要迅速做出改变的时候，很难（且几乎不可能）去改造甚至是维护这些灌溉设施。
吴哥地区的沟渠与堤坝网络既灌溉了北部的田地，也提供了充足的水源供应，确保了吴哥地区南部那些有埂农田里种植的水稻获得高产。靠近吴哥中部的西巴莱湖，其灌溉面积在整个平原上相对较小。此湖曾为大约 20 万人口提供水源，其供应量足以让人应对季风不力所导致的干旱年份。这个系统一直运行到了 12 世纪晚期且效果良好。当时水利工程的重点，更多地放在那座中心城市之上。新筑的沟渠——至少在一定程度上是为了给管理和维护主要寺庙所需且日益增多的人口提供水源而修建的——都确保水源会流经吴哥这个中心。光是阇耶跋摩七世国王，就在12世纪末至13世纪初让吴哥中部地区的寺庙数量翻了一番。
由此所需的资源之多，是令人不可想象的。仅是一位寺庙工作人员，就需要大约5个农民劳作，才能生产出此人所吃的稻米。光是阇耶跋摩七世建造的塔普隆寺（1186年完工）与圣剑寺（1191年完工），就用了不下15万名辅助人员，他们都必须住在寺庙的附近。建造这两座中等规模的寺庙，消耗了大吴哥地区人口中五分之一的劳动力。然而，他们似乎成功地解决了这个问题。当地的水牛随处可见，鱼类极其普遍，菜蔬也很丰富。该国维持着一副光鲜亮丽的外表，实际上却是用苛政和宗教狂热维持着秩序。事实上，倘若不进行大规模的武力展示，国王就不会公开露面。当时，这种无边的辉煌似乎永远不会终结，直到这里开始遭遇季风不力的问题。
无常的季风（公元1347年至2013年）
吴哥地区的庄稼收成，向来都依赖于亚洲季风。[8] 季风导致的西风，会随着它们北移进入东南亚地区和南海而逐渐加强。季风雨会在每年的8月和9月达到顶峰，给孟加拉湾带来强大的热带气旋。吴哥地区的夏季降水，都源自稳定的季风雨，以及强烈的热带扰动（尤其是热带气旋）给陆地带来的暴雨。等热带扰动到达东南亚之后，它们导致的强风虽然会逐渐减弱，但会随着缓慢移动、长达4天的风暴系统带来大量的降雨。这些扰动与一次次强度较弱的赤道东风带来的降雨，占到了整个东南亚地区夏季降水的一半左右。
虽然 12 世纪在吴哥地区建立帝国的统治者们可能并不知道这一点，但他们治下的王国其实比哪怕一个世纪之前都要脆弱得多。[9] 该国通过一种简单的权宜之计，即靠大规模砍伐森林来增加农业用地的数量，保持着高水平的水稻生产。此时，吴哥的大部分地区都成了有埂稻田，却只有零星的树木了。当季风性暴雨来临，强劲的地表径流以及由此导致和无法遏制的侵蚀作用，就会让土壤裸露出来。再加上高地的森林被砍伐一光，所以严重的生态后果随之而来。航拍照片表明，在一片耕作强度远高于如今的土地上，遍布着成千上万处废弃的古老稻田。
此外，吴哥的基础设施原本是作为一种风险管理措施而兴建的，当气候开始变得不稳定时，已经有500多年的历史了。这里的最后一座“巴莱”，还是此时的一个世纪之前建成的。吴哥那种庞大的基础正在老化，不但越来越难以有效地管理，而且变得越来越盘根错节了。弗莱彻的考古团队发掘出了一座垮塌了的水坝，它曾在10世纪或者11世纪得到重建。在城市人口少得多和这个系统刚刚形成的时候，一切都没有问题。这个系统受损后，人们就会迅速将其修复，但也仅此而已。
究竟发生了什么？多亏了在越南发现的一种热带柏树“福建柏”（Fokienia hodginsii），我们才能找出这个问题的答案。这种柏树的年轮记录了从公元1347年至2013年间出现的“恩索”事件与季风情况。年轮的厚薄，与寒冷的拉尼娜现象与炎热的厄尔尼诺现象相互交替的时间相吻合。[10] 在14世纪，二者之间出现了剧烈的波动，以大规模的季风与严重的干旱为代表。除此之外，从印度季风区的丹达克洞穴与中国西北地区的万象洞获得的优质洞穴石笋记录，与越南南方的树木年轮记录非常吻合，尤其是与13世纪和14世纪时形成的树木年轮记录非常吻合。[11] 总之，这些证据表明，13世纪和14世纪是东南亚地区一个重要的气候不稳定时期，这对元朝来说也是如此；当时的气候，在异常强大的季风与严重干旱之间波动，变幻莫测。
起初，高棉人的系统能够应对周期性的干旱，就像数个世纪以来的情况一样，只是这个系统很脆弱。这里的水坝，显然无法应对严重的泛滥。人们对这里两座主要的水库即东巴莱湖与西巴莱湖进行发掘后发现，它们的出水口都被堵死了，其中有些早在12世纪就已淤塞。东巴莱湖还曾多次储水不足，导致 13 世纪初气候较为干旱的时期出现了严重缺水的情况。接着，季风带来的大雨倾盆而下；直到16世纪，这种波动才稳定下来。到了此时，劳动力却出现了短缺，没有充足的人手去分流调水了。
我们不妨想象一下当时的情景：一场长达150年的大旱过后，极端强大的季风突然袭来，冲击着规模庞大却有数百年历史的基础设施。洪水汹涌，导致那个衰朽的网络上出现了裂口；这必定对精心设计的各级沟渠造成了灾难性的损毁，可精英阶层既无能力也无意愿进行修复。由此带来的后果是致命的。受损的田地再也无法养活吴哥稠密的城市人口。可持续性遭到了破坏。世世代代供养着寺庙及其工作人员的农民，也无法继续供养他们。精英阶层过着奢华的生活，拥有庞大而关系复杂的家庭，如今却难以为生了。该国的统治者和高级官吏再也没有能力或者权力来为重大的工程招募劳力，以修复这个系统。他们可以支配的粮食盈余也不足了。
数个世纪以来，复杂而极其稳固的供水系统还支撑着其他一些方面，比如道路，比如与稻田相连的鱼塘。因此，蛋白质供应与水稻这种主要作物的收成都开始面临压力。上游的供水系统失去作用之后，损害就会迅速波及下游；除了其他方面，整个道路网络也会瓦解。水运与陆运不但曾将粮食运往吴哥各地的集市，还将其他各种各样的商品和奢侈品汇集到了这些市场上。比如，大吴哥地区的家用商品中，有不少于6%产自中国。谣言、恐慌与社群之间的竞争，导致了与环境混乱相一致的社会动荡。
解体（公元13世纪以后）
14 世纪60 年代的那场大旱，必定对粮食供应造成了严重的破坏。到了14世纪末，吴哥的部分地区已经变得不可再用，寺庙经济也处于崩溃状态。除了洪水造成的破坏，肆虐的大水还会将各种垃圾冲到整个地区，堵塞重要的沟渠，甚至更加严重地损毁精心组织起来的整个局面。
当然，并不是一切都被洪水冲毁得不留痕迹了。被冲毁的道路与堤坝当中，有一片呈方形的纵横交错的堤坝与农田完好无损地留存了下来。当时，精英阶层有好几种选择：要么迁往别的地方，跟富有的亲戚一起生活，要么随着他们的君主迁往其他中心，或者搬到他们在内地的庄园生活。不过，依赖他们谋生的工匠和为精英阶层提供粮食的农民，却被留在疮痍满目的穷乡僻壤，自生自灭了。沟渠与堤坝崩溃之后，洪水溢出人造的水道，漫到了整个地区；被遗弃于此的普通百姓，饱受饥饿与营养不良之苦。无疑，农民与其他人也曾努力修复吴哥城的供水系统，可在 16 世纪中叶之前的差不多200年时间里，吴哥地区都没有王室存在。
表面上，吴哥的崩溃是超强季风和极端干旱降临到高棉人身上并给他们造成了重创的直接结果。尽管这看似是一种直接的因果关系，可历史真相却要更加复杂。
一如既往地，宗教扮演了一个主要的角色（就吴哥而言，宗教还是一个致命的角色）。佛教中的大乘佛教一派，在吴哥城的缔造者阇耶跋摩七世治下（1181—约1218）被定为了国教。也正是在这段时间里，季风强度日益减弱，而粮食短缺的现象也出现了。精英阶层与农民不但都要应对这种危机，而且要为发生的事情找到一种解释。于是，他们转向了宗教。在12世纪末至13世纪初，这里爆发了一场反对王室支持大乘佛教的运动，导致一些重要寺庙墙壁上所绘的佛像遭到了破坏。几乎可以肯定地说，这种破坏佛像的行为就是民众对干旱做出的有力反应，表明他们认为其他信仰可能会提供应对持久干旱的更好方法。
多年以来，学者们都认为，吴哥是1431年被其竞争对手即暹罗的大城王朝攻陷并洗劫一空的；大城（音译“阿瑜陀耶”）如今位于泰国境内，曾经是一个重要的国际贸易中心，并在 16 世纪变成了东方最大和最富裕的城市之一。但我们如今明白，事实并非如此：因为到了那时，吴哥地区早已不适宜人类居住。精英阶层可能早已带着他们的财产离去了。也就是在这个时期，高棉帝国发生了深刻的政治、经济和社会变革。国家不得不面对暹罗人与越南人成群结队的南迁，这一迁徙活动切断了高棉人那些历史悠久的陆上贸易线路和沿海通道。在15世纪和16世纪，贸易变得更加全球化了。沿海城市的地位日益重要起来，而高棉内地那种古老且极其稳定的水稻生产则逐渐衰落下去。高棉帝国与阿拉伯、印度、中国以及其他航海国家和地区之间的海上贸易，也变得越来越重要。由于深受气候难题所困扰，故这个帝国随着人口减少的加速便慢慢没落下去，变得默默无闻了。
此外，新的宗教信仰也对种种旧的生活方式构成了制约。吴哥地区与印度之间有着长久而密切的贸易联系。这些历史悠久的贸易线路带来的不仅有商品，还有思想和信仰，其中包括南传佛教。13 世纪过后，南传佛教就成了高棉的国教。
新的教义淡化了长期以来供养大型庙宇和庙宇中众多看管人的惯例。随着大寺庙的势力自13世纪起日渐减弱，其经济后果也对吴哥的人口产生了影响。3 个世纪之后，尽管人们还没有废弃吴哥城，吴哥窟也只是一个朝圣中心了。随着国家的权力中心南移到了当今金边附近的四臂湾地区，吴哥地区只有少量人口留存了。
吴哥的衰亡，涉及的远不只是气候带来的冲击。内部瓦解与征服无关；相反，这是一种变革。高棉帝国的领导阶层和权力中心，从那个面积广袤、组织有序且种植水稻的绿洲向东南方向迁移，进入了一个每年都受自然泛滥所滋养的地区。这里的农民不会那么容易受到干旱的影响。一到汛期，湄公河就会水量大增，溢出河岸。当湄公河在季风雨过后漫到洞里萨河时，洪水就会把周围之地淹没。洪水会注满面积约1万平方千米的洞里萨湖这个淡水湖，有时甚至还会将整个湖泊淹没。[12]
高棉地区的遭遇，与玛雅人的情况完全一样；我们将看到，斯里兰卡的情况也是如此。公元9世纪到16世纪之间，从中美洲一直到东南亚的热带地区中散布着的城市文明纷纷解体，它们的根基都因粮食供应的不确定性和传统的政治权力受到削弱而遭到了动摇。一个个实力强大的王朝兴起又衰落，战争变得司空见惯，一些精英阶层则迁往了新的中心。这些文明之所以崩溃，很大程度上是因为维持文明的可持续发展超出了那些中央集权制国家的能力，这些国家由神圣的国王所统治，而国王们致力于奉行不变的宗教思想，行政管理僵化。随之而来的，必然是一个转型期。农民们曾经供养距他们很遥远的君主治下的一个个王朝，他们保持着可持续的传统农耕方式，并且对其进行了改造，使之反映了新的环境现实。城市中心变得格局更加紧凑，通常位于如今业已消失的国家的外围。此前的大城市，比如蒂卡尔、蒂亚瓦纳科和吴哥窟，都屈服于种种新的经济现实和政治联盟，且其中许多都建立在国际贸易的基础之上。取而代之的则是在广袤的腹地外围繁荣发展起来的小型定居地。
在亚热带和热带地区，水源管理曾是各地可持续性当中一个至关重要的组成部分。这些社会面临着无数挑战：泾渭分明的干、湿两季，有可能带来暴雨的季风，“恩索”事件，飓风或者台风，以及短期和长期的干旱周期。尤其重要的一点在于，变幻莫测的降水是一种永远存在的挑战，降雨量年年都有可能大不相同。有了一代代新的气候替代指标之后，我们如今就可以明确，在亚洲季风区的大部分地区，气候变化曾经发挥过作用，动摇了中世纪的社会体系和政治体系。南亚、东南亚以及中国北方和南方的农民，曾经都任由距其家乡很遥远的各种气候力量所摆布，现在也依然如此。早先那种原生态的辉煌，实际上就是一个神话。
进入斯里兰卡（公元前377年至公元1170 年及以后）
我们在第五章里已经看到，古罗马人与印度洋各地以及远至孟加拉湾沿岸之间的贸易，甚至把中国的丝绸带到了地中海地区，其作用就像横跨欧亚大陆的“丝绸之路”一样。其中的一大驱动因素就是季风，它在公元2世纪发挥了重要的作用。罗马与君士坦丁堡两地，在公元4世纪都极其繁荣。后者还变成了日益发展壮大的东罗马帝国的中枢。稳定可靠的季风会季节性地转变风向，将亚历山大港、红海地区与印度西海岸及斯里兰卡连接起来。人们对象牙、香料与织物都有一种永不餍足的需求；这种需求不但促进了贸易，也为斯里兰卡那些日益复杂的社会带来了财富。
当时，槃陀迦阿巴耶（Pandukabhaya）国王于公元前377年建立的阿努拉德普勒王国统治着斯里兰卡。王国的都城，坐落于斯里兰卡岛上那个所谓的干燥地带，就在如今的阿努拉德普勒遗址上；阿努拉德普勒既是当时一个重要的政治中心，也是后来在高棉占统治地位的那个佛教分支即南传佛教的一个主要的知识中心和朝圣中心。[13]
这里的百姓必须想出办法，好在每年的12月至次年2月之间利用季节性的降雨来灌溉田地。为了节约水源供旱季所用，他们修建了许多大型的水库与水坝，故需要大量的劳力。农民也兴建了一些灌溉工程，依靠的是重力，以及阿努拉德普勒腹地倾泻而下的水流。[14] 随着当地寺庙与朝圣者的数量都不断增加，中心区域也在扩大。他们的水库越修越大，到了公元1世纪，努瓦拉维瓦湖（Nuwarawewa Lake）的面积达到了 9 平方千米。然而，人们的用水需求也进一步猛增，故精英阶层修建了更多的巨型水坝和重要的引水渠。长达87千米的尤达埃拉［Yoda Ela，或称“贾亚甘加”（Jaya Ganga）］运河，将更可靠的水源引到了地势较高的重要水库里。无论以什么标准来看，这条水渠都称得上一项工程杰作：水渠每千米的坡度，竟然只有10至20厘米。
阿努拉德普勒的用水供应，依赖于精英阶层派人兴建的大型水利项目，而当地社区与寺庙也兴建和管理着各自的小型阶梯式灌渠。在季风状态相对稳定、降水充沛的几个世纪里，一切都运行得很顺利。寺庙对受到灌溉的腹地施加意识形态上的控制，从而开创了一种神权政治的局面，使得僧侣既是宗教管理者，又是世俗统治者。
接下来，气候在9世纪至11世纪变得极其不稳定，导致了气温上升和持久的干旱。变幻莫测的降雨造成了严重的后果，就像吴哥的情况一样。相比而言，14世纪至16世纪则气温较低，暴雨和干旱的发生频率也越来越高。考古学家指出，在11世纪，阿努拉德普勒方圆15千米之内的遗址数量减少到仅剩11个定居地了。[15] 没有人仍然生活在城市的核心区域里。中心区域和外围的绝大多数寺庙，都已门庭冷落。没人再对水库与沟渠进行日常维护，所以许多都淤塞了。在 19 世纪人们重新开垦那片干燥地带之前，只有少数几个进行刀耕火种式农耕的小社群在这里幸存了下来。
随着气温在11世纪和12世纪不断上升和阿努拉德普勒日渐没落，波隆纳鲁瓦-斯里兰卡第二古老的王国——开始崛起。这个王国由僧伽罗王族维阇耶巴忽一世于公元1170年建立，位于更远的内陆和气温没那么极端、地势较高的地区。维阇耶巴忽的外孙波罗迦罗摩巴忽一世（约 1153—1186 年在位）派人修建的沟渠与水库，甚至比阿努拉德普勒的沟渠和水库更大。此人兴建的“波罗迦罗摩萨姆德拉雅”（意即“波罗迦罗摩海”）环绕着他的城池，既是水库，也是防御攻击的护城河。国王修建的这个湖泊面积达87平方千米，实际上由3个水库组成，它们的浅水区有狭窄的水道相连。成千上万名劳力完全是用双手为国王修建了这座湖泊，可获得的回报却是精神上的。这个人工修建的“海”与真正的大海相比毫不逊色，它支撑着一个复杂的稻田灌溉系统，后者覆盖了7 300公顷的土地，养活了稠密的城市人口。
阿努拉德普勒和波隆纳鲁瓦两个王国的寺庙，在农耕生产与水源管理方面都发挥了核心作用。两国的寺庙，都是举行一年一度的重大宗教节庆活动的中心；每到那个时候，都有来自城市及其腹地的成千上万人参加。在吴哥和斯里兰卡，重大的公共庆祝活动确定了四季。与玛雅君主举行的公共仪式一样，这种庆祝活动可以提醒每个人记住那些复杂的和不成文的社会契约，它们将所有的人联系在一起，无论是祭司、统治者还是平民百姓，全都如此。环绕着一座座大佛塔的水库，形成了一个个组织有序的绿洲，增强了寺庙所代表的那种宗教权威。这种宁静的景色，给人以恒久和稳定的印象。不过，与吴哥的情况一样，这里的气温在13世纪和14世纪日益上升，季风降雨周期也大幅减少，对水库造成了严重的破坏；而在当时，人口密度正在增加，由此导致农业生产日益密集，以满足不断增长的粮食盈余需求。为了应对这种情况，统治者便迁往了距数量大减的水库更近的地方，有时还皈依了新的宗教信仰。这种社会转型具有深远的意义，因为人们在面对旷日持久的干旱时，采取了常见的分散策略。人口的锐减使得大城市成了纯粹的朝圣之地。
进入多灾多难的19世纪：中国与印度的大饥荒（公元1876年至1879年）
自公元前206年至公元220年（与古罗马人统治欧洲同期）统治中国的汉朝诸帝，确立了皇室负责灌溉与掌控水利的模式；虽说此后经历了很大的改良，不过这些模式一直持续到了20世纪。他们面临着许多重大的挑战，其中既有北方的黄河造成的，也有南方的长江导致的。汉朝及之后的朝代，都是依靠成千上万的劳力去修建堤坝、治理洪水的。中央政府与地方利益集团之间的关系一直都很紧张，在兴建重大水利设施的问题上尤其如此。到了19世纪，在厄尔尼诺现象异常活跃的一段时期，中国没能将其可持续性维持下去。[16] 数千年来偶尔灵感勃发的灌溉工作、常常僵化的官僚机构和受到严格管制的劳作，都无法遏制自然界突如其来且经常很剧烈的各种循环。
长期以来，当洪水与干旱周期影响的充其量只是微小的可持续性时，断断续续且偶尔有效的饥荒救济制度曾解决过粮食短缺的问题。但在1875年至1877年间季风雨连续两年不力之后，中国北方遭遇了巨大的厄运。随之而来的干旱与饥荒，要比印度同一时期的干旱与作物歉收严重得多。1876年，远至南方的上海这座城市的街头也出现了数以万计的难民；但在此之前，一个效率低下的政府在遥远的北京却几乎无动于衷。饥肠辘辘的农民只能吃谷壳、草籽，以及他们能够找到的任何东西。美国传教士卫三畏曾经看到，“民如幽魂，逡巡于已为灰烬之宅，觅薪于寺庙之废墟”。[17] 大多数地区的官吏面对这场灾难的规模时都不知所措，什么措施也没有采取；或者，他们干脆将成千上万名因饥成匪的百姓关在笼子里活活饿死。
最终，在一个面积比法国还大的地区里，有9,000多万人陷入了饥荒。传教士与外国公使成了向外界传递消息的唯一源头。他们报告说，一座座大坑里躺满了死去的人。最后，一些从鸦片贸易中赚取了巨额利润的公司成立了一个“中国赈灾基金委员会”。中国教区里那些虔诚的基督教信众都把饥荒赈济视为“一个美妙的开端”，可在 1878 年季风再度回归之后，信众中却没有多少人继续保持他们的信仰。据传教士们估计，当时只有20%至40%的饥民得到了救济。到那场饥荒结束之时，许多村落里剩下的人口都不到饥荒之前的四分之一了。
19 世纪的严重气候变化也对印度产生了极大的影响。从维多利亚女王治下初期直到19世纪60年代，季风区的气候相对平静，降雨一直都很丰沛。就像数个世纪之前吴哥的情况一样，充沛的雨水使得作物丰收和人口增长。耕地不足的印度农民，开始去耕作一些不那么肥沃的地带；虽然这些地带在气候湿润的年份可以种植适量的作物，但大多数时候，它们对农业而言是微不足道的。在英属印度开始输出粮食的那个时候，一切似乎都没有问题。接下来，1877年至1878年发生了一次大规模的厄尔尼诺现象，随后与之类似的异常气候事件又一批接一批，持续了30多年，尤以1898年和1917年为甚。1877 年的厄尔尼诺现象最为严重，它始于1876年的一场大旱，然后持续了3年之久。印度尼西亚上空形成了一个强大的高气压系统，阻延了季风。干旱随之而来，并且导致了大范围的丛林火灾。1877年，西南太平洋广袤的温暖水体东移，催生出了严重程度在历史上屈指可数的一次厄尔尼诺现象。大部分热带地区遭到了重创，人口大量死亡，尤其是只依靠雨水而不靠灌溉进行耕作的农业人口。
印度在遭受了 1792 年以来最严重的干旱之后，又迎来了饥荒。雨水未至，作物枯萎。英国当局拒绝实施物价管控措施，从而引发了疯狂的投机大潮。随着粮食骚乱爆发和许多劳力饿死，即便是灌溉情况良好的地区也有数百万人受灾和丧生；可是，英国人却继续在全球市场上出售印度所产的大米与小麦。
这场严重的饥荒，实际上是一场人为造成的灾难。难民纷纷涌向城市，城市里的警察却将他们拒之门外；光是马德拉斯一地，被拒的难民就达25,000人。许多难民死去，其他难民则是漫无目的地四下流浪，寻觅食物。与此同时，英属印度当局却认为，赈济饥民虽然有可能挽救生命，却只会导致更多的人生而贫困；因此，当局并未积极尝试为饥饿的百姓提供粮食。官方的政策就是自由放任，结果是仅在马德拉斯地区，至少就有 150 万人饿死。等到雨水再度降临之后，成千上万的百姓却虚弱得无法耕种了。在获得补贴的工作场所里，工人们的口粮根本不够，已死和垂死之人到处都是，而“霍乱患者皆辗转于未病者之中”。新闻界与政府中少数义愤填膺之士提出了强烈抗议，却无济于事。作家迈克·戴维斯（Mike Davis）已经令人信服地指出，这场灾难为印度的民族主义奠定了基础。
1877 年的灾难，让许多殖民政府第一次不得不真正去面对气候变化方面一个普遍但经常为人们所忽视的问题：在他们正剥削的国度里，几乎普遍存在饥荒和饥饿的现象，可当时当地人唯一明确的解决办法就是外迁。没有人对由此导致的大规模人口分流做好准备，而这种分流，也预示了20世纪末和如今的大规模移民。自给自足的农民对祖辈留下的土地怀有深深的眷恋之情，他们用尽了熟知的办法，只能采取唯一可行的生存之道：分散开来，迁往他处，去寻觅食物和可以种植庄稼的地方。
19 世纪发生在中国和印度的严重饥荒，让基本上无视这个问题的两国中央政府几乎无力回天了。以印度为例，当时英属印度当局更关心的是从全球粮食价格中牟取暴利，而不是帮助当地农民摆脱困境。他们的干预导致了规模惊人的骚乱。有数以百万计的百姓，在现代的国际救济组织出现之前的时代里死亡。自私自利却具有凝聚力的高棉帝国曾经迅速扩张，但最终被维护水利工程的需求所压垮，因为那些水利项目需要大量的人力、谨慎细致的组织，以及高效而去中心化的行政管理。与玛雅人和蒂亚瓦纳科的农民一样，最有效的解决之道在于社会转型；转型之后的社会，不能以兴建气势恢宏的城市与寺庙为基础，而应以自给自足的农村社区为根基。
这一点，与我们的世界息息相关。如今，有数以百万计的自给农民和贫困人口都深受粮食不安全之苦。干旱与饥荒，如今在刚果民主共和国、南苏丹、津巴布韦和萨赫勒地区几乎普遍存在，而阿富汗也有三分之一的人口（约为1 100万人）为粮食不安全所困。形势既微妙又复杂，但就像殖民时代西方国家瓜分世界、争夺土地和资源并让民众丧失人性一样，战争、对人民和资源的剥削往往还会继续下去。此外，我们还要面对常常很腐败和冷漠的无能地方官僚机构，因此人们唯一的生存之道就变成了外迁，与过去没什么两样。
19 世纪末，中国和印度曾经有成千上万忍饥挨饿的村民孤注一掷地迁徙，以寻觅食物和可靠的水源。如今，在全球变暖、干旱迅速蔓延的情况下，生态移民的人数已数以万计甚至更多。然而，我们西方人却向我们剥削的民族筑起了一道道壁垒（不管是隐喻还是非隐喻的壁垒）。究竟是什么给了我们这样做的权利呢？是西方的经济制度让我们不得不这样，因为资本主义内含强大的企业利润观念与剥削观念。因此，各国政府不得不保护本国的土地与资源，并且打压其他国家。这样做，究竟是不是我们这个物种应对全球变暖诸问题的最佳之道呢？
无疑，在我们生活的这个时代，城市人口动辄就有数百万之多。不过，这种情况导致我们忘记了过去的教训：我们忘记了许多农村社群在自我维持与合作方面所做的大规模投入，忘记了他们长期积累下来的风险管理经验。假如我们与这样的社群合作，向其学习，并且通过投资他们的生活方式和处理问题的方式，与之共享资本，那么，与始终庞大的军费支出等方面相比，这种投资对人类的未来将有用得多。
[1] 要想了解高棉文明的概况，请参见 Charles Higham, TheCivilization of Angkor (London: Cassel, 2002)，或者 Michael D. Coe, Angkor and the Khmer Civilization (London and New York: Thames & Hudson, 2005)。亦请参见Roland Fletcher et al., “Angkor Wat: An Introduction,” Antiquity 89, no. 348 (2015):1388–1401。
[2] 对最新研究的通俗论述，请参见Brian Fagan and Nadia Durrani, “The Secrets of Angkor Wat,” Current World Archaeology 7, no. 5 (2016):14–21。
[3] 南赡部洲（Jambudvipa），佛教传说中的“四大部洲”之一，由“四大天王”中的“增长天王”负责守卫，泛指人类生存的这个世界，亦译“琰浮洲”“南阎浮提”“南阎浮洲”“阎浮提鞞波”等。——译者注
[4] 关于在吴哥进行的激光雷达勘测：Damian Evans et al.,
“Uncovering Archaeological Landscapes at Angkor Using
Lidar,” Proceedings of the National Academy of Sciences 110
(2013): 12595–12600。
[5] 东巴莱湖与西巴莱湖（East and West Barays），亦译“东大人工湖”与“西大人工湖”，或者“东池”与“西池”。——译者注
[6] Roland Fletcher et al., “The Water Management Network of Angkor, Cambodia,” Antiquity 82 (2008): 658–670.
[7] 本章的其余部分主要参考的文献是：Roland Fletcher et al., “Fourteenth to Sixteenth Centuries AD: The Case of Angkor and Monsoon Extremes in Mainland Southeast Asia,” in Megadrought and Collapse: From Early Agriculture to Angkor, ed. Harvey Weiss (New York: Oxford University Press, 2017), 275–313；此处引自其中的第279页。
[8] P. D. Clift and R. A. Plumb, The Asian Monsoon: Causes,
History, and Effects (Cambridge: Cambridge University Press,
2008).
[9] 对这种复杂的恶化过程的概述，见于Fletcher, “Fourteenth
to Sixteenth Centuries AD,” 292–304。
[10] B. M. Buckley et al., “Climate as a Contributing Factor
in the Demise of Angkor, Cambodia,” Proceedings of the
National Academy of Sciences 107 (2010): 6748–6752. See also
B. M. Buckley et al., “Central Vietnam Climate over the Past
Five Centuries from Cypress Tree Rings,” Climate Dynamics
Heidelberg 48, nos. 11–12 (2017): 3707–3708.
[11] 关于丹达克洞穴（Dandak Cave）：A. Sinha et al., “A Global
Context for Mega-droughts in Monsoon Asia During the Past
Millennium,” Quaternary Science Reviews 30 (2010): 47–62。
关于万象洞的洞穴堆积物：R.-H Zhang et al., “A Test of Climate, Sun, and Culture Relationships from an 1810-Year Chinese Cave Record,” Science 322 (2008): 940–942。
[12] R. A. E. Coningham and M. J. Manson, “The Early Empires
of South Asia,” in Great Empires of the Ancient World, ed.
T. Harrison (London and New York: Thames & Hudson, 2009),
226–249.
[13] De Silva, K. M., A History of Sri Lanka (New Delhi:
Penguin Books, 2005).
[14] R. A. E. Coningham, Anuradhapura: The British-Sri Lankan
Excavations at Anuradhapura Salgaha Watta. 3 vols. (Oxford,
UK: Archaeopress for the Society for South Asian Studies,
1999, 2006, 2013).
[15] Lisa J. Lucero, Roland Fletcher, and Robin Coningham,
“From ‘Collapse’ to Urban Diaspora: The Transformation of
Low-Density, Dispersed Agrarian Urbanism,” Antiquity 89, no.
337 (2015): 1139–1154.
[16] Mike Davis, Late Victorian Holocausts: El Ni.o Famines
and the Making of the Third World (Brooklyn, NY: Verso Books,
2001).
[17] Frederick Williams, The Life and Letters of Samuel Wells
Williams, MD: Missionary, Diplomatist, Sinologue (New York:
G. P. Putnam’s Sons, Knickerbocker Press, 1889), 432.
第十章非洲的影响力（公元前1世纪至公元1450年）
“这条铁路，是尸骨堆成的。”那些受害者的后代，如今仍然称之为“尤阿亚恩戈曼尼兹”（Yua ya Ngomanisye），也就是“到处蔓延的饥荒”。[1] 肯尼亚中部的那场干旱，从1897 年持续到了1899 年，严重削弱了东非大裂谷东侧的坎巴和基库尤两个小型的自治社会。有些地方的庄稼接连3年歉收。在更早的时代，农民可能还有充足的余粮维生，可此时却到了殖民时代。当时，这里正在修建乌干达铁路。从附近群落征收来的宝贵粮食，都被分配给了修建铁路的劳工。腺鼠疫很可能就是由移民劳工从印度传播到这个地区的，造成了数千人死亡。饥饿的当地人开始抢劫。铁路警察则以牙还牙，焚毁了当地人的村落，从而毁掉了更多的粮食。狮子和其他食肉动物在光天化日之下跟踪和猎杀人类，鬣狗则啃食着倒在路边的饿殍。虽然英国当局粗略尝试过为幸存者提供粮食，但损失已经极其巨大。在乌干达西部，饥荒导致的死亡人数超过了14万。
多年的大丰收和充沛的降雨导致人口增长集中发生于拥挤不堪的定居地之后，饥荒降临了。就像中世纪欧洲的情况一样，农民开始耕作那些贫瘠的土地，以便种出更多的粮食。雨水持续丰沛，使得本地和长途贸易也繁荣发展起来。
接着，1896年出现了一场大旱，是由一次大规模的厄尔尼诺现象导致的，其严重程度超乎想象；随后，1898年又出现了一场由拉尼娜现象导致的干旱，而 1899 年再次爆发了一次由厄尔尼诺现象导致的干旱。埃塞俄比亚高原曾经是一个富饶之地，孕育过阿克苏姆文明，此时却旱情肆虐，以至于尼罗河的洪水降到了自1877年至1878年以来的最低水位。严重的旱情，笼罩着非洲东部、南部以及萨赫勒地区。从肯尼亚山往南，直到遥远的斯威士兰，有数以百万计的农民都遭遇了严重的作物歉收。而且祸不单行，不断暴发的牛瘟让牛群遭到重创，天花在许多社群中肆虐，无数群蝗虫遮天蔽日，其他灾祸在面对重大气候变化时也持久不去。与此同时，欧洲的帝国主义者也在步步进逼。英国人趁火打劫，将他们以内罗毕为大本营的新保护国向外扩张，吞并了坎巴和基库尤的大部分领地。在南方，塞西尔·约翰·罗得斯则占领了后来的罗得西亚。大津巴布韦一些供奉绍纳人的神灵姆瓦里的著名灵媒就曾宣称：“白人乃汝等之敌……雨云将不至矣。” [2]
气候变化与其他灾难，彻底改变了非洲社会。随着各种贸易土崩瓦解，作物种类开始减少，而作物产量也大幅下降，曾经充满活力的乡村经济崩溃了。权力从传统的部落酋长转移到了殖民地政府任命的傀儡首领身上。此时非洲的各个社群，全都处于权力等级的最下层；这种权力等级，与西方国家控制下的全球粮食与原材料市场紧密相关。随着欧洲人开始“争夺非洲”，科学上极其荒谬的种族主义意识形态所支持的社会不平等与不发达，也变成了一种常态。
掌控“巴萨德拉”（公元前118年以前至现代）
公元916年，阿拉伯地理学家阿布·扎伊德·艾尔赛拉菲曾经写道：“‘巴萨德拉’［即夏季风］赐生于率土之民，因雨令地沃，如若无雨，民皆饥亡焉。”[3] 数个世纪之后的1854 年，美国气象学家马修·方丹·莫里发表了他的《风向与洋流图之说明及航向》一书。[4] 他利用数百艘船只的观测结果，揭示了印度季风的环流情况。1875年，印度气象局建立，试图利用全印度的观测网络，对带来降水的西南季风做出预报。1903年，吉尔伯特·沃克登场了；此人是英国的一位统计学家，他利用世界各地获得的成千上万份观测数据，确定了复杂的大气与其他有可能影响到季风降雨的气候条件之间的关系。也正是沃克，发现了南方涛动及其与季风雨之间的关系——这种关系，属于印度洋气候中的一个基本要素（参见绪论）。
商船水手们在印度洋水域航行，从阿拉伯半岛和美索不达米亚地区一路来到印度，至少已有5,000年的历史了。他们习得了在季风中航行的本领，因而能够掌控海上的贸易路线。几个世纪以来，他们都严守着关于印度洋季风的知识，只是父子相传。到了公元前118年至前116年左右，一名遭遇海难的水手从红海抵达了亚历山大港，在协助一位名叫“库齐库斯的欧多克索斯”（Eudoxus of Cyzicus）的希腊人两度前往印度之后，这些知识才传到了更广阔的外界。不久之后，另一位希腊兼亚历山大港的船长希帕卢斯（Hippalus）想出了一个比沿着海岸航行要快得多的办法，那就是利用8月份猛烈的西南季风，开辟一条能在12个月内返回的从红海近海的索科特拉岛直达印度的海上航线。远洋航行中的这一重大突破，将使人们接触到非洲几十个地处内陆且远离印度洋的社会。如此一来，全球天气模式就对数以百万计的非洲自给农民以及努力统治着他们的部落酋长产生了影响。
长久以来，非洲的红海沿岸一直吸引着商贾们前往。从公元前2500 年前后到公元前1170年间的12份古埃及文献资料中，都提到了“蓬特”或者“神之国度”这个神秘之地，并且盛赞那里有着种种珍贵的资源，其中包括黄金和沉香。一代代考古学家都在试图找出蓬特的具体位置，很有可能，它是在沿着“非洲之角”的红海往北，一直延伸到如今埃塞俄比亚和厄立特里亚所在的高原那一带。事实上，公元前600年的一篇古埃及铭文中提到了雨水落在蓬特山上，以及雨水随后如何流入尼罗河的情况；极有可能，流入的就是我们在第四章中曾提到过的青尼罗河。所有的古埃及文献还进一步表明，从陆路和海路都可以抵达蓬特；这就说明，至少自公元前 3 千纪起，人们就懂得如何利用季风沿着红海航行了。
尽管如此，人们显然并未大量利用这条航路。蓬特及其位置始终披着一层神秘的面纱，人们曾认为那里异常重要，以至于（在公元前1472年至前1471年前后）令人敬畏的哈特谢普苏特女王还用来自蓬特的无数商品的形象，包括搬运沉香树的奴隶、狒狒、长颈鹿、牛、狗、驴、埃及姜果棕的形象，以及一些丰乳肥臀的贵妇的形象，装饰过她位于上埃及的达尔巴赫里陵庙的墙壁。陵墓墙壁上还绘有蓬特的许多珍贵资源，比如没药、乌木、象牙和黄金。考虑到哈特谢普苏特女王对蓬特的关注，我们可以推测，这里或许是女王希望作为遗产而留下的一项开创性的国家使命。
到了公元前1千纪末期，形势出现了一些变化。商人们开始更加频繁地出入红海，尽管我们从斯特拉波和阿伽撒尔基德斯（Agartharchides）这些古典作家那里了解到，这仍然是一段艰难的航程，因为一路上既有遍布暗礁的水域，还有汹涌的巨浪，且没有锚泊之地。阿伽撒尔基德斯在公元前2 世纪记述这些情况时，偶尔会发挥一点儿想象力，称有条河流流经那片土地，带来了大量的金沙，而继续往南的一座座金矿，则出产天然金块。我们认为，当时这条航线仍是一个秘密。一个世纪过后，知道这条航线的人就多得多了，连那些原本可以依靠广泛采用的航向去航行的外来者也知道了。公元1世纪的《红海环航》（Periplus of the Erythraean　Sea ）一书最为著名。此书的佚名作者可能是一位熟悉这个地区的航海者，用朴实无华的希腊文描述了进一步往南的非洲沿海的情况；当时，那里称为阿扎尼亚，一直延伸到了遥远的南方。[5]
《红海环航》一书中提到，遥远的南方有许多避风锚地和像拉普塔（Rhapta，具体位置至今不明）这样的地方，那里到处都是象牙与玳瑁。由于季风很有规律，故帆船可以穿越红海往返，或者从东非地区前往印度西海岸，并在12个月之内返回。在像肯尼亚北部的拉穆这样的避风锚地，信风商船的进出是一年当中的头等大事。在这里，人们将大船的货物卸到小船上，由后者去跟历史上默默无闻的偏远沿海群落进行贸易。在这些沿海群落里，人们可以购得许多贵重商品，比如质地柔软、易于雕刻的非洲象牙，用于制作装饰品的金、铜，以及易冶炼的铁矿石，然后销往阿拉伯半岛和印度。也有一些较为普通的商品，其中包括产自非洲红树沼泽的木屋梁柱；在没有树木的阿拉伯半岛上，这种梁柱对住宅建造很是重要。
数个世纪以来，阿扎尼亚都是一个由小村落组成的冷清之地，只有来自红海的商贾偶尔前来。可这一切，在10世纪出现了变化，因为地中海地区对黄金、象牙以及透明石英的需求急剧增加了。此时，随着一些商贾社群在避风港附近的发展，伊斯兰教也站稳了脚跟。有些沿海飞地有数以千计的聚居人口，他们都住在一座座“石头城”里面；一些实力强大的商贾家族，在遥远的南方也兴旺发达得很，比如当今坦桑尼亚的基卢瓦。
如今，这里被称为“斯瓦希里走廊”（Swahili Corridor），在这片狭窄的沿海地带，许多本地的商业城镇都是在安全的锚地附近发展起来的。[6] 在公元1千纪晚期，伊斯兰教开始与范围更加广泛的世界产生政治联系和经济联系，并且与那些更遥远之地的意识形态形成了联系。然而，生活在非洲这一地区的石头城中的群体与实力强大的商贾家族也注重更多的地方关系。他们与一个个贸易线路网之间发展起政治联系与社会联系，并且小心翼翼地加以维护；那些贸易线路延伸到了数百千米以外的内陆。一小批一小批的商贾带着粮食、兽皮、贝壳和农民十分重视的食盐，深入了遥远的内陆地区。他们还带来了其他的奇珍异宝，比如中国的瓷器、印度的纺织品和玻璃珠子。贝壳和小饰品基本上都是廉价的小玩意儿，只相当于他们运往沿海地区的黄金和象牙价值的一小部分。在遥远的非洲内陆，海贝却是声名赫赫之物；当然，它们并不是用作发饰的普通子安贝，而是一些更稀罕的贝类。印度洋中的芋螺贝成了部落酋长威望的重要象征。近至1856年，5 个芋螺贝仍可以在非洲中部买到一根象牙。黄金最难获得，因为黄金产地在遥远的南方，即津巴布韦高原上。然而，据估计，在长达8个世纪的时间里，至少有567吨黄金被运往了沿海，因为非洲的黄金是当时全球经济中的一个要素。[7]
这种非正式的贸易已经持续了数世纪之久。在许多考古遗址中，人们都发掘出了像中国的陶瓷器皿、精细和较粗糙的棉织品以及成千上万颗玻璃珠子之类的外来商品；而那些考古遗址，距这些商品首次抵达非洲时落脚的港口都有数百千米之遥。令考古学家们觉得幸运的是，根据样式就可以确定其中许多东西所属的年代，而光谱微量元素常常可以揭示出它们的原产地。
季风将东非地区的石头城与遥远的地方联系了起来，并将它们纳入了全球长途贸易领域当中。尽管风力可能年年不同，降雨量也有可能逐年增加或者减少，但印度洋上的商业贸易却持续了数个世纪，甚至持续到了欧洲人开始殖民之后。可以说，全球气候在过去的2,000多年里，在非洲东部和南部的历史中扮演了一个重要的角色。不过，变幻莫测的季风对那些生活在遥远内陆的人，又产生了什么样的影响呢？这个问题的答案，就存在于赞比西河与林波波河之间的津巴布韦高原上，存在于那些从事畜牧业的王国与农耕村落的复杂历史之中。
探索内陆（公元1世纪至约1250年）
前往非洲内陆，我们就会进入这样的一个世界：在19世纪中叶传教士兼探险家大卫·利文斯通（David Livingstone）穿过非洲中部大部分地区之前，欧洲人对这个世界几乎一无所知。一些零碎的文献，如葡萄牙人所著的编年史和维多利亚时期一些探险家的著作，描绘了从16世纪到19世纪这里的情况。不过，除了“大津巴布韦是腓尼基人的一座宫殿”这种不正确的说法之外，我们对非洲早期的历史几乎是一无所知，直到20世纪60年代人们开始认真研究。我们正在进入的，是一个与世界上许多其他地区相比，很少被探索且具有复杂气候动态的历史领域。
我们的“老朋友”热带辐合带，在印度洋上的南、北半球之间来回移动。尽管它始终停留在赤道附近，但其北移的极限却是北纬15°上下。每年的1月份，它会南移至南纬5°左右。热带辐合带是一个雨云密布的地带。在冬季里，即从11 月份至次年2月，这种移动会给非洲南部带来降雨。但热带辐合带位置的长期变化，却有可能导致旷日持久的干旱。这一点，还只是一种复杂的气象状况中的一部分，因为厄尔尼诺现象与拉尼娜现象在干旱与洪涝灾害中都扮演着重要的角色。我们探究气候变化在非洲东部与南部的作用时，就像是在玩一个难以掌握、变化莫测的溜溜球。
大约2,000年前，一小群一小群的农民与牧民相继迁徙到了赞比西河流域，并从那里进入了非洲南部。他们在辽阔的热带稀树草原上的广大地区定居下来，形成一个个小村落，并且喜欢选择没有采采蝇的地方，因为采采蝇对牛群具有致命性；这些地方的土壤也相对较松，用简单的铁片锄头就可以轻松耕作。[8]
这些新来者迁入的地区里，已有少量的桑族猎人与采集民族生活了数千年之久。几个世纪之后，农耕人口增加了，而桑人要么是采用了新的经济模式，要么就是迁往了边缘地带。桑人的祖先人口不多，并且流动性极强，可农民却被束缚于土地之上，种植高粱和两种谷子，早在玉米从美洲传入之前就已如此了。
一道崎岖的悬崖，将非洲南部赞比西河与林波波河之间的内陆地区与东边紧邻印度洋的平原分隔开了。一个个炎热而低洼的河谷，切入了地势较高、平均海拔超过1,000米的津巴布韦高原。绵延起伏的平原上，是一望无际的热带稀树林地，其间夹杂着一片片适合种植高粱与谷子的沃土，是一个气候相对凉爽、灌溉条件相对较好的环境。[9] 这两种作物都是在南方的夏季生长，但它们需要 350 毫米左右的降水，且每天起码还需要3毫米的灌溉用水。这些现实情况，意味着此地的农田须有500毫米左右的最低年降水量，同时气温不能低于15℃。虽然不同地区的要求也有所不同，但在一种旱季漫长和降水量变化无常的环境下，它们算是两种要求颇高的作物了。一片片辽阔的草地，为牛、绵羊和山羊提供了优质的牧草；只不过，其背后始终都存在干旱与降水无常的风险。这里并不是一个条件优越的农耕地区，因为这里不但存在干旱、缺水和旱季漫长等问题，偶尔还会出现降雨过多的情况和洪涝灾害，并且年年不同，变化巨大。
非洲南部的降雨，自东向西呈显著递减的趋势；同时，非洲南部的东南部在南半球冬季的降雨量，占到了其年降雨量的 66%左右。热带辐合带在印度洋上南移，一股来自印度洋的偏东气流，给广大地区带来常常难以预测的降雨。这些相对湿润的气候条件，对过去1,000年间农业与牧业的规模产生了关键的作用。此地的农牧业，大多局限于草原林地和开阔的热带稀树草原上，以及位于北方的赞比西河与南方的大凯河之间的草原地区。
大量的湖泊岩芯、洞穴石笋以及树木年轮表明，在过去的1,000年间，这里的降水与气温都出现过显著的变化。[10]总的来说，从公元1000年以前到公元1300年后不久的这一时期，非洲南部的广大地区都出现了变化极大的中世纪变暖现象。气温在公元1250年前后达到了一个显著的峰值，故那一年是过去6,000年间最炎热的年份之一；当年的气温，要比1961 年至 1990 年间的年最高气温还高了3℃到4℃。接着，气温开始下降；从海洋中钻取的岩芯与内陆地区的马拉维湖的水位变化，就说明了这一点。气温最低的时候，是1650年前后到1850年之间。其间的最低气温，出现在1700年前后，但差不多一个世纪之前，还出现过一个较为寒冷的短暂时期。有意思的是，从南非西开普省和其他沿海地区的考古遗址中发掘出的软体动物身上的同位素记录了当时海面气温下降的情况。南非北部马卡潘斯盖地区一些洞穴中最低温度与氧同位素记录的时间，则与1645年前后至1715年间让欧洲变得极其寒冷的“蒙德极小期”相吻合，而且记录中也有气候寒冷的“斯波勒极小期”（1450—1530）的迹象。当然，这些都属于大致情况，因为洞穴石笋中还记录了无数种地区性的差异。1760 年之后，气候又慢慢地回暖了。不过，无论具体情况如何，对于村落中的农民和在“小冰期”中崛起然后又逐渐衰亡的国家而言，降雨和气温变化都是一种始终存在的重大挑战。
自给农业的现实
在粗略地探究了范围更广的气候时间框架之后，我们现在将关注焦点缩小到过去的2,000年间，即自给农民首次在非洲中南部定居以来的那些世纪。当时，几乎所有的农耕生产都是围绕着村庄进行的，并且依靠刀耕火种式的农业，即烧垦农业。每年9月份旱季结束时，村民都会清理掉此前无人清理过的林地，然后在各自的地块上点火焚烧。接下来，他们会把灰烬散播开去，用锄头将灰烬翻进土里，做好一切准备。然后，随着气温每天稳步上升，他们就开始等待天降甘霖。有时，几场阵雨会落到这个地区的不同地方，出现一个村庄下雨而邻村却艳阳高照的情况。这种时候，该不该播种呢？假如播下了种子，那人们就会盯着天空，盼着雨云出现了。有时，充沛的雨水随之而来，庄稼也长势良好。但更常见的情况是，庄稼会在田地里枯萎。几周之后，饥荒就会降临，而到了春天，就会有人饿死了。大多数农耕村落虽然可以凭借存粮熬过一年的干旱，却没法在多个干旱年份中幸存下来。人们会以野生植物、猎物维生，假如养有家畜的话，他们也会宰杀牛羊为食。
世世代代的人来之不易的经验教训，就是农业获得成功的基础。[11]
风险管理需要人们运用熟悉的对策，但就像农业具有多样性一样，这些对策也因村而异。人们曾在家庭和村庄两个层面，建立了一些完善的应对机制。其中包括谨慎地长期储存粮食，同族之间分享粮食，以及古老的互惠互助观念；这种互助观念，可以确保挨饿的人尽可能地少。清理田地和执行其他重要任务时的合作劳动，已经成了惯例。
这些社群，在很大程度上依赖于亲族和与祖先之间强大的仪式联系；因为自古以来，祖先就是这片土地的守护者。在部落社会里，求雨和祈雨仪式是两种强大的催化剂，而人们与居住在附近或者更远社区的亲族之间种种强大的纽带，也是如此。源远流长的互济关系以及相互提供帮助、食物甚至是播种用的谷物之类的共同义务，在气候突然变化和出现长久干旱的时候，为人们提供了强大的适应武器。
那些散布各地且养有少量畜群的农耕村落，在这几个世纪中挺了过来，当村民们可以靠不定时的长途象牙贸易和其他商品交换粮食时，尤其如此。但其中最重要的，还是宗教仪式与血缘关系，它们就是最初将远近村落维系起来的纽带。经过多个世代之后，这些血缘关系和受人重视的能够与祖先交流的超自然力量，将曾经规模很小的村落社会变成了一种由小型的酋邦构成且不断变化的政治格局。社会地位上的差异，取决于人们所谓的超自然天赋、是否属于主要宗族的成员以及个人魅力，因为这里与古代世界的其他地方一样，权力常常取决于一个人的统领能力和让追随者（通常都是亲族）保持忠诚的能力。仪式性的帮助、适时的礼物与互惠姿态，就是换取忠诚的通用方法，而赠予财富也是一个办法。这种财富以活的牲畜为主，尤其是牛。
人们养牛，远非只是用于产肉与产奶。这种享有盛誉的牲畜，当时也是财富、社会地位和丰厚聘礼的有力指标。多余的公牛是十分宝贵的通用财富，而数量庞大的畜群则是政治权力的无声象征。只不过，大大小小的畜群都很容易受到变幻莫测的降雨和干旱影响，因为每头牛起码每隔 24 小时就须喝一次水，并且还需要优质的牧草。高原与河谷地区的酋长都不遗余力地获取粮食与牛群，来巩固他们的政治权力。到了10世纪，高原社会开始出现变化，但终究还是依赖于一种行之有效的对策，去适应一种以分散的村落社会为根基的变化无常的环境。从根本来看，人们在高原环境中的成功生存和发展，一如既往地取决于畜牧业和自给农业。不过，高原地区远非只有肥沃的土地与牧草。从冲积矿床中开采的黄金，石英矿脉，以及铜、铁和锡，很快就成了长途贸易的主要商品。在高原上繁衍生息的不只是牛，还有长着象牙的大象。控制着高原地区的绍纳人酋长们，开始接触从遥远的印度洋沿岸来寻找黄金和象牙的游客。起初，沿海贸易断断续续地进行，但10世纪过后，在气候条件较为炎热且湿润的一个时期，这种贸易急剧扩大了。随着高地上的一些酋长成功地掌控了贸易路线，并且从实力较弱的部落首领那里榨取贡赋，他们自然地在经济和政治上获得了统治地位。这给他们带来了不稳定的政治权力与经济权力，因为在面临长期干旱或者印度洋贸易出现变故时，这种权力有可能迅速化为乌有。
马蓬古布韦与大津巴布韦（公元1220年至约1450年）
在气候炎热、地势低洼的林波波河河谷中，崛起了一个实力强大的王国。[12] 公元10世纪到公元13世纪，也就是在“中世纪气候异常期”的那几个世纪里，较多的降雨导致河流经常泛滥，并将原本干旱的河谷变成了一个农耕生产欣欣向荣、牛群茁壮成长的地区。林波波河流域的优质牧草还引来了大批象群。一个统领河谷的王国崛起所需的一切要素均已具备，尤其是在环境对一个以牛为财富与社会地位象征的社会有利时。起初，实力强大的新兴部落首领们都住在河谷中较大的村落里。到了公元1220年，一小群精神力量强大的人迁到了一座显眼的小山之巅；此山名叫“马蓬古布韦”，俯瞰着整个河谷。长久以来，这座独特的小山一直是部落举行求雨仪式的中心，而求雨仪式又是当地绍纳人社会的一个重要组成部分。当时的降雨较为丰沛，似乎就证明了马蓬古布韦的首领们具有强大的精神力量。
当时，因有牲畜、黄金和象牙而富甲一方的马蓬古布韦并非一枝独秀。还有一些中心也纷纷崛起，其中许多都坐落在平坦的山巅，有石墙环绕，还有安全无虞的牛栏；并且，每个中心的位置都经过精挑细选，都位于主要的河流流域，确保有可靠的水源供应。酋长和村民都采取了深思熟虑的对策，来适应季节的更替与气候变化。他们精心挑选耕地，种植高粱之类的抗旱作物，并且极其注意储存粮食，以备干旱年份之需。他们的农耕策略在广大地区蓬勃发展，包括尝试通过点火生烟来减少采采蝇造成的牲畜死亡。
随着 13 世纪降雨量减少，马蓬古布韦的部落酋长们借助那些控制着降雨的超自然力量来求雨的能力，也变得越来越重要了。求雨仪式变得更加集中，从而强化了酋长的合法性，增加了酋长的权力。不过，由于一个个旱季接连而至，酋长的求雨能力似乎显著下降，故他们的地位与权力就变得岌岌可危了。公元1290年前后至1310年间，气温下降和干旱加剧，再加上降雨极其多变，就慢慢地动摇了部落酋长们的威信，削弱了他们确保林波波河洪泛平原土壤肥沃的能力。马蓬古布韦的影响力，也在 13 世纪那一场场越发漫长的干旱中大大下降了。
马蓬古布韦并不是独一无二的。整个地区的生物丰富多样，养活了无数的群落；这些群落从事着本地贸易与长途贸易，贸易物包括基本商品、金属以及像印度洋地区的珠子、海贝和纺织品之类的进口商品。他们应对气候变化的长期性保障措施，很大程度上源自与远亲近邻之间的合作。许多措施也要依靠别人的技能。其中包括铜铁加工、矿石开采，甚至是制作捕猎大象的铁矛。其中一些社群的社会变得相当复杂，但最重要的是，他们擅于通过农业知识、手工艺生产，以及在不同社群和亲族群体中共享知识和经验来进行风险管理。遗憾的是，对于马蓬古布韦地区这些为数众多的社群及其弱点，我们仍然几乎一无所知。
随着马蓬古布韦适应较干旱气候条件的能力受到削弱，干旱更加普遍的现象与酋长们影响力的下降之间，无疑是存在关联的。14世纪初，政治权力从林波波河流域往北转移到了津巴布韦高原上。大津巴布韦地区举世闻名，那里有其标志性的石制建筑和一座居高临下的山丘。[13] 但不那么为人所知的是，这个遗址起初是很有影响的求雨仪式的主要中心，事实上后来也一直如此。津巴布韦位于一个黄金矿区的边缘，但更重要的是，这个地区一年四季都绿意盎然，因为从印度洋直接袭来的雾气和雨水会频繁地从附近的穆蒂里奎河流域向北推进。这个偏远之地，似乎是津巴布韦高原上一个相对干燥的地区里的一片绿洲，曾经被人们奉为求雨中心。那座气势雄伟的山丘上，有许多巨石和洞穴，成了一个与姆瓦里崇拜有关的求雨和祭祖的仪式中心；在绍纳人社会中，姆瓦里崇拜扮演着一个重要的角色。姆瓦里这种一神论信仰中身兼祭司的酋长，对津巴布韦社会产生了重大的影响。
当时，人们上山肯定受到了限制，但津巴布韦的部落酋长们统治着一个由自给农业和养牛业支撑着的辽阔王国；就像马蓬古布韦一样，在大大小小的社群里，牛既是一种财富之源，也是区分社会等级的基础。只不过，牛属于一种要求颇高的财富来源，原因不但在于牛容易患上疾病，而且在于它们需要广阔的牧场，尤其是需要充沛的水源。不断增长的人口密度，为获取柴火和出于其他目的而对林地进行的过度开发，以及更加寒冷和干燥的气候条件（这一点是最重要的），逐渐削弱了王国的适应能力。这个王国位于降雨无常的地区，当地土壤往往也只有中等肥力。尽管大津巴布韦的酋长们曾经努力储存粮食，可能还集中管控过粮食供应，但他们在应对气候压力方面几乎没有什么长期的保护措施，唯一能对他们起到保护作用的是印度洋贸易给他们带来的威望、财富和权力。
津巴布韦王国及其后继者所处的政治环境十分复杂。王室牛群规模太过庞大，以至于无法饲养在都城里；在这样一个王国中，王位继承是个很复杂的问题。这就意味着，王国周围的众多其他政治实体（其中许多有自己的祭司），提供了一些可以取而代之的权力中心。酋长们的都城频繁地搬迁，如今，其中许多都城都以规模较小的石制建筑为标志。战争显然非常普遍；不过，鉴于人们需要在地里耕作和收割，当时的战争规模还是有限的。
像大津巴布韦这样的牧牛王国，可能都由一些权势显赫的酋长统治着，他们曾受益于黄金和象牙贸易，但他们的统治地位不但依赖于印度洋贸易带来的威望，还依赖于他们拥有的需要广阔牧场的畜群。一些主要中心可能掌控着长途贸易，但它们与散布在其大部分领地的村落相比，更容易受到气候变化的影响。较大的中心周围的人口也较为稠密，故需要可靠的粮食供应，而小村落则可以靠狩猎与觅食为生，并且更容易转向种植较为耐旱的替代性作物。它们的主要压力并不来自频繁出现干旱年景，而是来自较为长期的干旱，因为长期干旱既有可能毁掉牧草，也有可能毁掉固定的水源。像牛瘟之类的牲畜疾病、蝗灾，甚至是偶尔出现的洪涝灾害，则带来更多的危险。在这种情况下，数量庞大而分散在各处的畜群就提供了一定的保护，让人们可以应对饥荒，野生植物和精心组织的狩猎也是如此。
至于津巴布韦王国瓦解的确切原因，至今依然是一个谜；但极有可能的情况是，一连串事件的发生，导致了该王国的解体。其中之一，可能就是大津巴布韦附近的金矿枯竭，导致商人们都到其他地方寻找黄金了。15世纪初，气候条件再次变得较为寒冷和干燥，这一点有可能破坏了农耕生产，从而养活不了日益增长的人口。来自邻近王国的竞争日益加剧，可能也有影响。到了16世纪60年代，与那些较为分散的部落群体相比，像津巴布韦这样的较大王国其实更加脆弱了。当时，政治重心已经北移到了赞比西河地区，葡萄牙商人与殖民者为了搜寻黄金，也已深入了高原腹地。在 1625 年至1684 年间，葡萄牙人从当地酋长的手中夺取了矿区开采的控制权，削弱了那些实力一度强大的王国经济繁荣的基础。尽管如此，在政局动荡的情况下，传统的食物体系和求雨仪式仍旧延续了下来，许多较大的群落也依然顽强地生存着。[14]
非洲南部的王国中，没有哪一个曾经长久存在或者达到了幅员辽阔的规模，连津巴布韦也不例外。这里没有任何将中央集权的高棉帝国团结起来的那种大规模的基础设施。这是一个由分散的村落与反复无常的王国组成的世界：村落的韧性，靠的是谨慎的风险管理；王国则由酋长们统治着，存续时间很少超过200年，而酋长们得到的忠诚度又取决于其是否慷慨，允许一夫多妻则会显得更加大度。凡是在津巴布韦高原上统治一个王国的人，都必须既是企业家又是政治家。酋长们的地位安全与否，在很大程度上取决于他们与人打交道的技巧和获得牲畜的本领。在这个方面，我们所知的情况仍然很不可靠，而唯一能够肯定的是，内陆地区的所有王国都很脆弱。最终幸存下来的是分散的村落，它们积累的知识可以告诉人们如何在一个极具挑战性的环境中进行风险管理。现代非洲的情况仍是如此；尽管城市化的速度很快，但数以百万计的人口仍然依靠自给农业与村落生活。无论是过去还是现在，在地方层面上应对气候变化都最为有效，因为当地的人们对环境与地形地貌都了然于胸。
本章将长途的全球性贸易与村落里的农民所在的世界联系了起来，但非洲农民的思维方式，与种植水稻的高棉农民或者封建制度下的欧洲农民截然不同。与“中世纪气候异常期”有关的气候温暖的那几个世纪，对热带非洲的大部分地区产生了重要的影响。在接下来的章节里，我们将对“中世纪气候异常期”和欧洲、北美洲的“小冰期”加以探究。
[1] Mike Davis, Late Victorian Holocausts: El Ni.o Famines and the Making of the Third World (Brooklyn, NY: Verso Books, 2001), 201.
[2] Davis, Late Victorian Holocausts, 201.
[3] Brian Fagan, Floods, Famines, and Emperors: El Ni.o and the Fate of Civilizations. Rev. ed. (New York: Basic Books, 2009), 16. 阿布·扎伊德·艾尔赛拉菲（Abu Zayd al-Sirafi）是一名航海者。公元916年前后，他撰写了Accounts of China and India, trans. Tim Macintosh-Smith (New York: New York University Press, 2017)。
[4] Matthew Fontaine Maury, Explanations and Sailing
Directions to Accompany the Wind and Current Charts (New York:
Andesite Press, 2015)，初版于1854年。
[5] Lionel Casson, The Periplus Maris Erythraei: Text with
Introduction, Translation, and Commentary (Princeton, NJ:
Princeton University Press, 1989). 关于古代红海航线的更多内
容，请参见Nadia Durrani, The Tihamah Coastal Plain of South
West Arabia in Its Regional Context c.6000 BC–AD 600. BAR
International Series (Oxford: Archaeopress, 2005)。
[6] 文献资料浩如烟海，并且在迅速增加。优秀的概述，请参见
Timothy Insoll, The Archaeology of Islam in Sub-Saharan
Africa (Cambridge: Cambridge University Press, 2003), 172
177。
[7] Roger Summers, Ancient Mining in Rhodesia and Adjacent
Areas (Salisbury: National Museums of Rhodesia, 1969), 218.
[8] David W. Phillipson, African Archaeology, 3rd ed.
(Cambridge: Cambridge University Press, 2010).
[9] T. N. Huffman, “Archaeological Evidence for Climatic Change During the Last 2000 Years in Southern Africa,” Quaternary International 33 (1996): 55–60.
[10] 后续各段主要参考的是P. D. Tyson et al., “The Little
Ice Age and Medieval Warming in South Africa,” South African
Journal of Science 96, no. 3 (2000): 121–125。
[11] Peter Robertshaw, “Fragile States in Sub-Saharan
Africa,” in The Evolution of Fragility: Setting the Terms,
ed. Norman Yoffee (Cambridge, UK: McDonald Institute for
Archaeological Research, 2019), 135–160，对本节涉及的问题进
行了讨论。亦请参见 Matthew Hannaford and David J. Nash,
“Climate, History, Society over the Last Millennium in
Southeast Africa,” WIREs Climate Change 7, no. 3 (2016):
370–392。
[12] Graham Connah, African Civilizations, 3rd ed. (Cambridge: Cambridge University Press, 2015)，这是一部权威的概述之作。T. N. Huffman, “Mapungubwe and the Origins of the Zimbabwe Culture,” South African Archaeological Society Goodwin Series 8 (2000): 14–29，从这篇文章开始了解相关问题会很有帮助；Robertshaw, “Fragile States in Sub-Saharan Africa”和Tyson et al., “The Little Ice Age and Medieval Warming in South Africa”两篇论文则提供了最新的信息。
[13] Peter S. Garlake, Great Zimbabwe (London: Thames & Hudson, 1973)，此书尽管有点过时，但仍属基础文献。Robertshaw, “Fragile States in Sub-Saharan Africa”一文参考了许多最近的研究。
[14] 相关讨论见 Tyson et al., “The Little Ice Age and Medieval Warming in South Africa”。
第十一章短暂的暖期（公元536年至1216年）
以任何标准来衡量，公元536年都是东地中海地区一个极其可怕的年份（亦请参见第五章）。拜占庭的历史学家普罗科匹厄斯曾写道：“日如淡月，无熠无光，全年皆然。”[1]欧洲、中东和亚洲的部分地区都有如浓雾笼罩，天昏地暗，长达18个月之久。造成这种状况的罪魁祸首，是冰岛发生的一次大规模的火山爆发，这一次爆发将大量的火山灰抛到了整个北半球。接着，公元540年和547年又出现了两次规模巨大的火山喷发。这几次火山事件，再加上“查士丁尼瘟疫”，让欧洲的经济陷入了100多年的停滞不前，直到公元640 年才有所好转。
火山作乱（公元750年至950年）
火山喷发的时候，会将硫、铋和其他物质抛至高空的大气中。这些物质会形成一层气溶胶，将阳光反射回太空，从而让地球上的气温下降。研究人员首先确定了公元536年的火山爆发，因为采自格陵兰岛和南极洲的冰芯都表明当年喷发物处于峰值。随后，2013年研究人员又在瑞士阿尔卑斯山上的科尔格尼菲蒂（Colle Gnifetti）冰川中钻取了一段长达 72 米的冰芯，并从体现了数天或者数周降雪情况的激光切割冰条中，获得了有关火山爆发、撒哈拉沙尘暴和人类活动的记录。[2] 每米冰芯中大约有5万个样本，使得冰川学家保罗·马耶夫斯基（Paul Mayewski）和其同事们能够精准地确定像火山爆发之类的气候事件，甚至是铅污染的情况，并且时间可以精确到2,000年前的月份。在探究公元536年的火山爆发时，他们就是根据冰芯粒子确定其源头在冰岛的。
从全球范围来看，火山活动从来没有出现过连续不断的情况。尽管公元536年出现了火山爆发，但在公元1千纪的前500年里，几乎找不到其他的火山活动迹象。不过，从公元750 年至 950 年的两个世纪，就是另一番光景了。其间，全球至少发生了8次大规模的火山爆发。我们能够知道这一切，应归功于研究人员从“格陵兰冰盖项目2”（Greenland Ice Sheet Project 2，略作GISP 2）中获取的数据。“格陵兰冰盖项目 2”为我们提供了一份重要的大气化学成分记录，揭示了西伯利亚的天气事件、中亚地区的暴风雨以及海洋风暴等方面的情况。格陵兰岛冰芯中出现的火山爆发证据，就是硫酸盐颗粒的背景值突然大幅增加了。其中大部分硫酸盐颗粒的来源仍不为人知。值得注意的是，研究人员从“格陵兰冰盖项目2”的冰芯中获得了公元750年至950年间的气候事件记录，它们的时间都精确到了2.5年之内，其间8次主要火山喷发的记录还更加准确。
不过，将科学研究与同一时期的书面史料进行对照，结果又会如何呢？迈克尔·麦考密克和保罗·达顿（Paul Dutton）这两位历史学家与冰川学家保罗·马耶夫斯基合作，将公元750 年至 950 年之间最重大的火山喷发事件与现存的历史资料进行了对照。只有史料中记载的几个地区同时异常寒冷（而非只有局部观察到这种反常现象）的冬季，他们才会纳入研究范围。冰川钻芯与当时在世者撰述的第一手资料结合起来，揭开了一段令人目眩的历史；其间既有严寒的冬季和气候偶尔湿润的夏季，也出现过作物歉收和饥荒。公元 750年至950年间，欧洲西部出现过9次严冬；其中有8次表明，“格陵兰冰盖项目 2”冰芯中的硫酸盐沉积水平峰值，与那些抱怨说天气异常寒冷的史料之间有所关联。公元763年至764 年的冬天给爱尔兰到黑海的广大地区带来了巨大的灾难。爱尔兰的历史文献中曾经提到，那里的降雪持续了差不多 3个月之久。严寒席卷了欧洲中部，而君士坦丁堡也遭遇了酷寒，以至于冰雪从黑海北部海岸开始，一路延伸了157千米。待到2月份冰雪融化之后，浮冰竟然阻塞了博斯普鲁斯海峡。
极其严酷的寒冬，又在公元821年至822年和公元823年至824年卷土重来，而此前的两个夏季气候湿润，故查理帝国的葡萄酒收成不佳；当时，查理帝国统治着西欧的大部分地区。莱茵河、多瑙河、易北河与塞纳河全都冻结了，马车可以在这些大河之上行驶，时间达 30 天或者更久。公元855 年至856年和859年至860年，寒冬再度来袭。公元859年至860年的那个冬天异常漫长和寒冷，整个欧洲西部都是如此。在鲁昂，严寒从头年11月30日开始，一直持续到了次年的4月5日。公元873年夏末，今天的德国、法国和西班牙所在地区先是爆发了一场蝗灾，接着又经历了一个严冬。饥荒加上相关的疾病，夺走了西法兰克王国和东法兰克王国三分之一左右的人口。从公元913年到939年或940年那段时间也极其寒冷，后者是由冰岛的埃尔加火山爆发导致的。
火山爆发能够对气候产生重大影响。大规模的爆发，有可能令大量的火山气体、火山灰和其他物质喷射到大气的平流层中。像二氧化硫之类的火山气体，可以导致全球气温下降。二氧化硫变成硫酸之后，硫酸会在平流层里迅速凝集，形成硫酸盐气溶胶。这些东西会提高大气对太阳辐射的反射量，将阳光反射回太空，从而导致地球的低层大气降温。1991年6月菲律宾的皮纳图博火山大爆发，曾令大约2,000万吨二氧化硫喷向高度达32千米的大气中。这一事件，使得地表温度的最大下降幅度超过了 1℃。过去一些规模更大的火山爆发，比如 19 世纪时的坦博拉火山爆发和喀拉喀托火山爆发[3] ，曾经让气温的下降持续数年之久。虽然没人会说公元1 千纪末期似乎频繁出现的火山爆发事件摧毁了一个又一个王国，但它们对这两个世纪的气候造成了强大的冲击，既影响了作物收成，也对动物和人类产生了影响。在那段艰难岁月里，人口下降现象严重，粮食供应方面也出现了经济倒退。从更广阔的历史范围来看，经历了快速气候变化的法兰克国王查理曼（742—814）相对来说还算幸运，因为他的臣民挺过了公元763年至764年那个可怕的冬季，以及公元792年至793年间的饥荒。然而，他的儿子“虔诚者”路易（Louis　the Pious，778—840）却忧心忡忡，固执地相信公元821年至822年间那个同样可怕的冬天与上帝的震怒之间存在一种似是而非的联系，故他还在公元822年8月，为自己和父亲的罪孽进行了公开忏悔。可就算是忏悔，也无济于事，因为一年之后又是一个严冬，他的帝国陷入了酷寒之中。
“中世纪气候异常期”（约公元950年至1200 年）
就在4个多世纪之后的公元1244年，方济各会修士“英国人”巴塞洛缪（Bartholomew the Englishman）宣称，欧洲占据了已知世界的三分之一，从“北大洋”一直延伸到了西班牙南部。[4] 当时的学者，都在凝望着一片广袤的陆地。东边的尽头，是似乎无边无际的欧洲平原，并在遥远的天际融入了亚洲大草原。那里人口稀少，主要是经常四处奔波的游牧民族，他们受到没有规律的干旱周期与更加充沛的降雨所驱使，不断地迁徙。那里的半干旱草原宛如一个个吞吐呼吸着的巨肺，雨水降临时引来动物与人类，而到了干旱时节，又将其赶往周边水源条件较好的地方。所以，中世纪的欧洲人以为他们被一个危险的人类—自然世界包围着，并不让人感到奇怪。东面有伊斯兰教步步进逼，西面的大西洋则形成了一道屏障。来自东方平原上的游牧部落，则在欧亚大陆的边缘徘徊。
东方的大草原，是成吉思汗的天下。由此带来的威胁，是切实存在的。公元1227年，成吉思汗驾崩；14年之后，在如今波兰境内的莱格尼察，一支蒙古军队打败了欧洲诸侯。装有被杀的波兰人右耳的 9 个袋子，被送到了蒙古的王庭。可在1242年，这些入侵者却突然向东撤退了；至于原因，至今仍然是一个谜团。他们的撤退可能并非巧合，因为大雨和较低的气温束缚了蒙古骑兵的行动，并且导致战马所需的饲料不足。[5]
我们并不能责怪蒙古人曾经把贪婪的目光投向西方水源较充足的肥沃之地。欧洲人生活在一个半岛上，四周都是较为干旱的环境。在大约10世纪至13世纪，这里的气候条件比较暖和，气温略高于之前的年份。这3个世纪，基本就是人们通常所称的“中世纪气候异常期”，它短暂地让欧洲变成了一个繁荣兴旺的粮仓。
提出中世纪是一个异常温暖的时期这一观点的，是目光敏锐的英国气候学家休伯特·兰姆（Hubert Lamb）；此人在1965 年率先创造了“中世纪暖世”（Medieval Warm Epoch）的说法，只不过后来改成了“中世纪暖期”（Medieval Warm Period），如今则称为“中世纪气候异常期”。[6] 不同于现代的气候学家，兰姆当时几乎没有什么气候替代指标可用，主要依靠七拼八凑的历史资料。他是最早提出气候有可能在数代人的时间里发生变化的科学家之一；这种观点，与当时认为气候长期不变的正统观点针锋相对。兰姆指出，北大西洋与欧洲上空的冬季环流在中世纪存在适度却持续的变化。他还说过：“尤其是在英格兰，在公元1100年至1300年间，每年5月份出现霜冻的可能性一定小一些。”这一点，可是丰收的好兆头。
气候上的转折点，出现在“中世纪盛期”（公元1000年至 1299 年）。兰姆将它与艺术史学家肯尼斯·克拉克（Kenneth Clark）提出的“欧洲文明的第一次伟大觉醒”观点联系了起来；那种观点，因1969年克拉克在英国广播公司播出的电视系列片《文明》（Civilization ）而给世人留下了难以磨灭的印象。尽管世人（尤其是欧洲与北美洲以外的人）如今仍对“中世纪气候异常期”知之甚少，但这个时期已经成为许多国家气候学界公认的气候标准。不过，作为一个定义明确的实体，这个时期真的存在吗？
如今细致入微的考古学表明，许多对古代人类社会进行的刻板分类，其实与过去的文化现实几乎没有什么相吻合之处。过去是动态的、不断变化的，很少有清晰明确的分界线。同样，考古学家仅仅把我们对人类历史进行的人为细分看作各种便利的参考术语和有用的工具。可以说，“中世纪气候异常期”也是如此。虽说大多数气候学家一致认为，这个异常时期从公元 950 年至 1000 年左右持续到了公元 1250 年至1300 年前后，但其时间范围存在无数变化，并且其中许多都因地而异。[7]
我们已经描述了一系列范围广泛的古代社会，它们都在“中世纪气候异常期”的数个世纪中，崛起和瓦解于欧洲以外的地区，但我们至今仍然没有发现其他地区明确出现过欧洲的气候模式。20世纪70年代，气候学家V.C. 拉马什（V.　C. LaMarche）在研究了源自美国加州怀特山脉的树木年轮和其他资料之后表明，在公元1000年至1300年间，那个地区的气候条件大多较为温暖和干旱，而到了公元 1400 年至1800 年间，气候则变得较为寒冷和湿润了。这些变化，是由该地区上空的暴风雨路径自北向南移动造成的。这一发现说明，当时的全球环流模式可能发生了变化，且其影响范围远远超出了欧洲。
但情况还不止于此：人们对太阳黑子活动强度进行的研究表明，在过去的1,200年里，太阳黑子活动强度出现过5个低谷期。这些低谷期，通常与气温下降的时期相一致。最早的一个低谷期从公元1040年持续到了1080年，与“中世纪气候异常期”中气温相对较低的时段相吻合。接着，公元1280 年之后，又接连出现了4个太阳黑子活动低谷期（我们将在第十二章加以论述），与16世纪末至17世纪初的“小冰期”相吻合。
这几个世纪里发生的各种气候变化，主要都是局部变化；就算它们起源于大气与海洋之间较大规模的相互作用，也是如此。欧洲经历了一个个漫长的暖期，只是当时的气温略低于如今。东太平洋地区由于有拉尼娜现象而气候凉爽、干燥，导致了北美洲西南部出现了特大干旱（参见第八章）。西太平洋和印度洋地区较为炎热；北大西洋涛动朝着正指数阶段发展，导致了气温升高和更严重的暴风雨；北极地区夏季的冰雪范围则缩小了。有粗略的证据表明，从中国西藏到安第斯山脉的广大地区以及热带非洲的气温当时都有所升高。简而言之，在公元1000年到1349年间，全球6个大洲的气温都要高于1350 年前后至1899 年间的气温。然而，从1900年到现在，世界各地的气温始终都在上升，只有南极洲除外，因为南极洲四周的海洋有可能导致气候惯性。至于是不是这样，迄今还无人知晓。
丰富的气候替代指标资料，说明了中世纪晚期欧洲的气候情况。它们表明，当时出现了一个气候变化频发且有时还相当剧烈的时期。例如，气候学家乌尔夫·宾特根（Ulf Büntgen）与莉娜·赫尔曼（Lena Hellman）对源自欧洲阿尔卑斯山脉的冰川冰芯和树木年轮中的气候数据进行了比较。[8] 数据表明，中世纪和近代的气温都相对较高，其间则是一段较为寒冷的时期。瑞士阿尔卑斯山脉西部高海拔落叶松的横截面，曾经被用于进行年轮定代，其中有的来自树木本身，还有的则来自那个时代的历史建筑。它们表明，10世纪至13世纪的高温与现代相似。然而，过去1,000年里还出现过相当大的自然变化，大约公元1250年之前和大约1850年之后的气温相对较高。公元755年至2004年间气温最高的10年中，有6个出现在20世纪。其间，最冷的年份是1816年，最热的是2003年，而最热的20世纪40年代与最冷的19世纪第二个10年之间，气温的变化幅度达到了3.1℃。气候情况方面的线索，有时会源自我们意想不到的地方。连从瑞士阿尔卑斯山一个冰川湖中发掘出来的细小蠓虫，也可以用作气候替代指标，能够为我们提供早至1032 年7月的大致气温。这些蠓虫表明，中世纪的气温比1961年至1990年间的气温高了1℃左右。
还有一项令人瞩目的研究，则是利用欧洲中部的橡树年轮，集中研究了每年4月至6月间的降水变化情况；研究人员使用了数千份具有降水敏感性的树木年轮序列，它们来自一个面积广袤的地区，时间涵盖了过去的2,500年。[9] 这项研究牵涉的远不只是树木年轮，还包括了当时的仪器记录与历史记录之间的对照研究。至少有 88 位亲历者对降雨情况的描述，与树木年轮记录里总计32次极端降雨中的30次相吻合——这可是一种令人印象深刻的相关性。通过将橡树年轮记录与其他树木（比如奥地利阿尔卑斯山上的高海拔落叶松）年轮中的记录结合起来，研究人员得到了一种复合记录，这与现代气象学家在1864年至2003年间对每年6月至8月的气温变化情况进行的记录相当一致。
到了9世纪初，由一连串火山活动引发的极端气候开始平静下来，与古罗马时代的气候条件更加相似了；只不过，当时仍有大量的火山活动，冬季也仍然极其寒冷。此时，正值欧洲一些新的王国在“中世纪盛期”开始崛起。在大约公元800 年至 1000 年间，各个王国都开始取得骄人的文化成就与政治成就。
生存与苦役（公元1000年）
公元1000年时，欧洲人几乎完全依赖于自给农业。在这一点上，他们与吴哥人、玛雅人或者古普韦布洛人并无不同。当时的农耕方式仍然极其简单，特别容易被突如其来的气候变化所影响，或者被火山活动对环境造成的往往很严重的影响所波及。
欧洲的乡村由森林与林地、河谷与湿地等地形地貌交织而成，而在历经了数千年的农业与畜牧业生产之后，它们都出现了沧桑巨变。[10] 到了公元1000年，大多数农村人口都生活在分散的小村落里；但更常见的情况是，他们都生活在较大的村庄里，四周是开阔的田野，且田地被分割成了每块面积约为0.2公顷的狭长地块。虽说在欧洲靠耕作土地谋生从来都不容易，但人们还是做得到这一点，尤其是在天气炎热和相对干燥的夏季里，到了5月份，气温便高到足以让人们耕种了。
英法两国的自给农民，当时主要种植小麦、大麦和燕麦。约有三分之一的耕地种植的是小麦，可能有一半土地种植大麦，余下的耕地种植的就是各种各样的作物了，包括豌豆。用现代的标准来衡量，当时的收成少得可怜，只有如今的四分之一左右。每0.4公顷土地所获的4公石[11] 收成中，20%会留作下一季的种子，重新播到地里去。加上教会征收的什一税，以及就算是歉收年份也须向地主缴纳的粮食税，余下来养活一家人的收成就少到弥足珍贵了。一个有妻子和两个孩子的农民，靠2公顷土地的收成只能勉强维生，几乎无力去应对意外出现的霜冻、干旱或者暴风雨。所有人都须劳作，去种植蔬菜，采集蘑菇、坚果以及其他野生的植物性食物，连孩童也不例外。
大多数家庭都养有一些牲畜，也许是两头奶牛、猪、绵羊或者山羊，还有鸡。要是运气好的话，他们会有一匹马，或者起码也能找到一匹马来耕地。牲畜既可以为人们提供肉和奶，还可以提供皮革和羊毛。牲畜身上的每一个部位，都会做到物尽其用。一年当中的大部分时间里，牲畜都是自由放养；但一到冬天，要让牲畜活下来并且得到妥善喂养，却成了一场持久战。每年秋天，村民都会屠宰多余的公畜和老牛，好让他们珍贵的草料能够储存得更久一点儿。每年的 6月和7月，人们都会割取草料，同时祈盼天气晴好，以便将庄稼晒干储存起来，而不致让庄稼腐坏或者自燃起火（如果农民们很快地把大量潮湿的粮食储存起来，那么他们依然会碰到这个问题）。多雨年份会带来严重的后果，有可能导致存粮遭受灾难性的损失。
春、夏、秋、冬四季的无尽循环，既决定了自给农民的生活，也反映出了人类的生存真谛。播种、生长和收获，然后是宁静的几个月；这种循环宛如人类的生存，是一个从出生、生活到死去的过程。生存曾经非常残酷。忍饥挨饿，是免不了的事情。中世纪从事农耕的村落里，每个人都经历过营养不良，有时是饥荒和挨饿，以及随之而来的与饥荒相关的疾病。当时婴儿的死亡率极高，大多数农民的平均寿命只有二十几岁。
与此前各个古代文明中的民族一样，中世纪的农民也对周围的环境了如指掌。他们熟悉不同禾草的性质；他们懂得肥力枯竭的土地可以再次耕作，明白必须将牲畜赶到耕地上去放牧、施肥，然后休耕，使土壤恢复肥力，并将植物病害降至最低程度。人人都知道各种可以食用或者药用的水果与植物在什么时候应季。当时的小麦种植效率很低，因为人们使用的是极其简单的工具，靠的是极其艰苦的劳作。生存取决于人们在田间地头获得并且代代相传的知识。例如，播撒的种子若是太少，就会给杂草留下生长的空间；可种子若是播得太多，就会扼杀幼苗。他们没有什么精心制定的标准公式，只有民间习俗与实践经验。与世界各地的自给农民一样，中世纪的耕作者也擅长风险管理，他们会尽可能地种植多种多样的作物。这里的降水常常比热带地区更加充沛，但要从土地中收获庄稼，就算不是更难，至少也与玛雅低地或者吴哥窟一样艰难，更别提粮食富余了。
了解历史但对气候变化持怀疑态度的人，把“中世纪气候异常期”视为一个持久且宜人的夏季，认为它是有益的，还是自然变暖的象征，并且声称当时的情况与如今没什么两样。他们的论调，建立在下述观点之上：我们正在经历的这种变暖，过去已经出现过，所以不是气候不稳定或者危机即将到来的征兆。当然，这纯属无稽之谈。非但是我们人类导致了目前的气候变暖，而且我们越来越多的证据也不支持他们的幻象（即认为往昔的夏日美好漫长、无休无止）。
逐渐变暖（公元800年至公元1300年）
确实有那么几年，中世纪的农民们曾在夏天沐浴着太阳，而庄稼则在明媚的阳光之下茁壮成长。可气候从来都不是一成不变的，而是常常以不显眼的方式反复变化着。虽然欧洲的农民们享受过比较温暖和干燥的气候，但树木年轮中却记录了持续而细微的气候变化；它们都是由如今仍然鲜为人知的一些变化导致的，比如地球倾角的变化、太阳黑子活动周期的变化以及火山喷发等。公元800年左右至公元1300 年间是欧洲发生深刻变化的一个时期，当时大气与海洋之间无休无止的“共舞”速度稍有放缓，变成了一种较有规律的常见现象。不过，大家仍然是一个季节又一个季节地生活着；夏季白天漫长，天气炎热，冬季则号称“黑暗季节”，人们都挤在一起取暖，充其量靠摇曳的烛光和烟雾缭绕的火堆来照明。一件厚长袍或者一张舒适的床，就是极佳的奢侈品了。
尽管如此，这个暖期顶多不过像怀疑气候变化的人所称的那样，是一个人们快乐生活的时期，而欧洲在“中世纪盛期”（大约自公元1000年至1250年）确实也很繁荣。马姆斯伯里的威廉是一位修士兼历史学家，他曾在公元 1120 年前后游历了英格兰的格洛斯特谷，欣赏了那里的夏日景象。他如此写道：“此地所见，通衢大道果树满目，且非人力所植，乃自然天成。”[12] 他对英国的葡萄酒也赞不绝口，称“味之甘美，不逊法国红酒”。令法国当地的酒商大感惊慌的是，产自英吉利海峡对面的葡萄酒大量涌入了法国市场。这种情况其实不足为怪。当时的葡萄园可谓遍地开花，最北可达挪威南部。
最重要的一点是，在气温较高的年份，谷物的生长季延长了3周之久。在公元1100年至公元1300年间，祸害过早先农民的5月霜冻几乎没有再出现过；这是一种可喜的征兆，说明作物的生长季和收获季通常都是漫长而稳定的夏日天气。随着谷物种植的范围急剧扩大，常常还延伸到了以前人们认为贫瘠的土地上，农村人口也稳步增长了。地处苏格兰南部的凯尔索修道院，曾经在海拔300米的地方种植了100多公顷的谷物，那里远远超出了如今的谷物耕种范围。修士们还拥有1,400 只绵羊，并且在他们的土地上养活了 16 个牧羊人家庭。挪威的农民，曾经在北至特隆赫姆的地方种植小麦。在南方海拔较低的地区，由于生长期较长，所以谷物的产量也大幅增加了。由此带来的粮食盈余，为不断发展的市镇和城市提供了粮食。与此同时，随着原始橡树林被砍伐一空，人们对耕地的需求也急剧上升了。对大多数人而言，生活并不容易。一个中等收入的城市家庭，每年要购买 5.5吨食物，其中大部分粮食都被制成了面包。大多数生活在贫困线以上的家庭，每天会消耗1.8公斤的面包。穷苦人家经常喝奶麦粥，那是一种用碎小麦或其他碎谷物加上牛奶或者清汤熬成的粥。不过，面包与啤酒是当时的主食，每天可以为每人提供大约1,500至2,000卡路里的热量。[13]
然而，当时仍出现过一些极其寒冷的冬季，比如公元1010 年至 1011 年间的那个冬天，甚至让东地中海地区也陷入了严寒。尽管气温偶尔有起落，但持续较暖的气候条件还是融化了冰盖，提高了山间的林木线，并且导致北海的海面上升了80厘米。到了公元1100年，一条潮汐汊道竟然深入英国内陆，到达了诺福克郡的贝克尔斯镇，将那里变成了一个繁荣兴旺的鲱鱼港口。海平面上升还导致猛烈的西风带来了强劲的风暴潮，淹没了地势低洼的沿海地区，尤其是尼德兰地区。公元1251年和1287年的两场大风暴还导致海水涌到岸上，形成了一条巨大的内陆水道，即须德海。此外，虽然“中世纪气候异常期”里阳光明媚，可欧洲却并不平静，暴力随处可见。精英阶层与特权阶层醉心于结成昙花一现的联盟，进行残酷的军事征战。按骑士标准表现出来的勇敢与力量，是人们评估政治权威的范围、确定效忠对象的依据。战争时起时消，但正是因为有了粮食盈余，加上野心勃勃的领主们能够修建要塞与城堡来保护日益增长的人口，这里才有可能爆发战争。
最终，随着一些存续时间更久远的王国崛起，迅速增长的人口与日益扩大的长途贸易量就改变了欧洲的政治格局。现代欧洲的遥远发端首次出现了。在阿尔卑斯山北部，靠土地为生的人越来越多，导致森林和沼泽遭到了大规模的砍伐和清理，其中还包括古罗马时代耕作过，但后来荒芜了的地区。人们开始迁往土壤贫瘠的边缘地区。成千上万农民往东迁徙，越过了易北河。在这几个世纪里，天主教会的政治权力达到了巅峰，而其标志就是“十字军”进攻塞尔柱突厥人和法蒂玛王朝治下的埃及，在黎凡特地区建立“十字军国家”，以及推翻西班牙的信奉伊斯兰教的安达卢西亚。
“中世纪盛期”的欧洲，经历了一场艺术与知识话语的新爆发；这场爆发，将亚里士多德和托马斯·阿奎那等思想家的思想，与一些源自伊斯兰教和犹太哲学的观念结合了起来。这是一个各国君主都鼓励兴建哥特式大教堂的时代，也是一个彩绘手稿和上等木制品盛行的时代，成就不胜枚举。所有这些创新之举，无论是知识上的、物质上的、精神上的还是社会政治上的，全都依赖于丰富的粮食盈余，才能创造出财富与金钱，去支付工匠和不断增长的非农人口的工资，以及去礼敬上帝。作物丰收、粮食充足的时候，每一个人，无论是君主、贵族还是平民百姓，都会感谢上帝，并向上帝敬奉奢侈的供品。大家都害怕神之震怒，因为神灵一怒，就会出现饥荒、瘟疫和战争。若是收成不佳，供品就会缩水，大教堂的修建速度也会放缓。不管有没有供品，中世纪欧洲那副华美壮丽的外表，最终都靠农村那些自给农民默默无闻的辛勤劳作来维持；此时，乡村已经包围了正在发展中的城镇。
君主、贵族、宗教人士以及城镇居民，都是靠着几乎全部由当地农民供应的谷物为生。那时，大多数人的饮食都很简单——面包、饼干、粥和汤。他们在单调的饮食中添加新鲜或腌制的水果或蔬菜，偶尔也会有肉。肉类太贵，大多数人并不常吃；鱼则是沿海、湖滨或者河滨地区的主食，只有腌制过的鳕鱼或者鲱鱼除外。就算是轻微的作物歉收，也会导致粮食价格上涨，农村地区出现饥荒，农村居民因此更易生病。在不断发展的城市里，社会动荡和饥荒则会结合起来，以“面包暴动”的形式爆发。
在差不多1,000年的时间里，欧洲的自给农业一步步地发展起来。欧洲的经济和社会体系，依赖于掌控在地方贵族和教会手中的封建土地所有权。耕作土地的农民一个季节又一个季节地勉强维生，使用的是数个世纪以来几乎没有改变过的工具。由于金属农具供应不足，故许多农民严重依赖于木制工具，它们只能勉强翻开土壤的表层。很少有人买得起牛马来拉犁，只能靠家人拉犁耕地。实际上，他们从事的是一种简单的单一栽培，会耗尽土壤中的重要养分，降低作物的产量，连休耕之后也是如此。考虑到教会和贵族都很贪婪，即便是在丰收年份也会增加实物税，而农民留下的份额则保持不变，故农民也没有提高粮食生产的动机。偶尔出现的粮食短缺，确实是一个问题，但整个系统还是能够存续下去。不过，当降雨和气温都出现重大变化时，日常生活与农业的根基就会分崩离析；1314年初的情况就是如此。
黑暗时代和大饥荒（公元1309年至1321年）
公元1309年至1312年间，欧洲的冬天都极其寒冷。浮冰从格陵兰岛延伸到了冰岛，厚度足以让北极熊从一地走到另一地。北大西洋涛动一直处于高指数模式，低气压则以冰岛上空为中心；这种情况，就是气候寒冷的原因。接着，突然间，北大西洋涛动变成了低指数模式，气候条件开始变得不稳定，整个欧洲都受到了影响。无人知道为什么会这样，但某种突发性的大气作用导致了一个巨大的气团在欧洲北部上升，冷凝成水，然后在大片地区降下了暴雨。[14]
要知道，这些大规模的降雨，始于1314 年4月中旬或者5月，即五旬节前后。法国北部拉昂附近圣文森特修道院的院长曾写道：“大雨如注，历时甚久。”[15] 另一份文献则称，从比利牛斯山脉一直到东方的乌拉尔山脉和北方的波罗的海的广大地区，曾经连续不断地下了155天的雨。萨尔茨堡的一位编年史家曾称：“彼时泽国汪洋，宛如末世之洪水。”仅在萨克森一地，洪灾就将450多座村庄夷为了平地。桥梁纷纷倒塌，堤坝纷纷溃垮。就连地基牢固的房屋，也轰然垮塌。
这种情况，对田地的破坏最严重。一代代的人口增长和乱砍滥伐，已经让密实的薄土裸露出来，尤其是山坡与地势较高处那些产量不高的贫瘠土地。暴雨的冲刷，让田地变得泥泞不堪，冲走了此时已经裸露出来的紧实土壤，形成一道道很深的侵蚀沟壑，将农田变成了大坑。在14世纪，人们耕作最优质的表层土壤时，犁铧已能掘出深深的犁沟了。在正常情况下，这种犁耕田地可以毫无困难地吸收掉760毫米的年均降雨量。可1314 年的暴雨却降下了5倍的雨水，至少达到了2 540毫米。所以，有着很长犁沟的表层土壤被冲走，露出来的就只有坚硬的黏土底土了。那些较松和较贫瘠的土壤，几天就被冲刷得一干二净。从苏格兰和英格兰直到法国北部，再往东到波兰，差不多有一半的可用耕地不复存在，只剩下了岩石。
饥荒和一个极其寒冷的冬季接踵而至。1315 年的春季，出现了更多无情的降雨。按照惯例，大家都播下了种子；可经过4个月持续的倾盆大雨，英格兰与法国北部都颗粒无收。许多欧洲家庭便采取了那种跨越时间与文明的经典对策，抛弃了他们的土地，开始漫无目的地流浪，或者向亲戚寻求救济。到了1316年底，成千上万的劳力和农民都变得穷困潦倒了。社群要么解体要么规模缩小，尤其是那些靠贫瘠耕地维生的群落；由于没有谷种和耕牛，他们只得舍弃自己的村庄与田地。所以，经常没有充足的人手去耕种或者犁地。
当时，经营水力磨坊是一桩有利可图的生意；水力磨坊由磨坊所有者严格掌控着，他们会向使用者征收实物税。1315 年和 1316 年的洪水，冲毁了数百座水力磨坊；其余磨坊也被洪水淹没，无法再用了。这些磨坊，正是人们将小麦这种主食磨碎成粉的地方；不过，当时的粮食供应反正也紧缺。暴雨大幅降低了土地的生产力；这不但是因为下雨让收割和播种都变得很困难，常常会把粮食冲走，而且是因为洪水会带走土壤中的硝酸盐。潮湿的天气还给贫瘠的土地带来了植物病害，尤其是霉菌和霉病。到1316年时，英格兰南部温切斯特周边地区的小麦与大麦收成，都只有平均水平的60%，成了公元 1217 年至 1410 年之间的最低水平。这两种作物的收成，至少在接下来的5年里都比平均水平低25%。[16]
为了增加粮食供应，英国王室还给予过西班牙的粮食商人安全通行权。但是，1316 年再度出现了同样的天气模式。到了此时，数年来降雨过度和洪水接连不断所形成的累积效应，带来了毁灭性的后果。我们之前已经看到，热带地区的干旱有可能让正在生长的作物枯萎，并将农田变成干旱的荒芜之地。然而，当雨水再度来临，庄稼可以再度迅速生长之后，人们就会忘掉干旱。不过，1315年至1317年间的这种极端降雨和由此引发的洪灾，却导致了持久的破坏，需要多年才能缓解。德国的编年史家曾经记载过许多一度肥沃的农田如今变得贫瘠的现象。在人们对这一切还记忆犹新的1326年，英王爱德华二世的佚名传记作者曾称：“豪雨泛滥，种子皆腐，若以赛亚之预言，此时似将实现……诸地之草料，久淹于水下，至刈拔皆为不能。”[17] 接着，就是1317年至1318 年间的那个冬季；由于北大西洋暖流与北冰洋之间的水温梯度增加了，所以那个冬天异常寒冷。
降雨对中世纪饮食的方方面面都产生了影响。在那个时代，没有人把葡萄酒储存起来制作年份酒。他们都是在几个月内就把葡萄酒喝光了。1316年，由于葡萄歉收，法国实际上没有生产出什么葡萄酒。食盐主要是通过阳光晒制和在沿海盐田中点火焚烧制成的，当时由于木柴太过潮湿、无法点燃，故食盐变得稀少和昂贵。腌鳕鱼与腌鲱鱼的价格，很快就上涨到了一个世纪以来的最高水平。
营养不良是粮食短缺带来的一种明显后果。战争持久不断，四处游荡的军队大肆掠夺庄稼和其他粮食，导致营养不良的程度不断加剧，从而变成了饥荒。由此造成的影响，从各个方面都看得出来；当时，欧洲各地饿死的人达数百万人之多。1319年，英格兰前都城温切斯特的大街小巷里，饿殍遍地，还散发着恶臭。在绝望之下，有些人开始吃人，还有一些人则开始杀死婴儿。照例，穷人和乡间农民遭受的苦难最为严重。富人与宗教群体中的人，通常都有多种多样的充足食物。当然，情况也并非始终如此。比如同一年里，在北海对岸原本富裕的佛兰德斯，30 个星期之内就死了将近3,000 人，可那里的居民总共仅有25,000人。
糟糕的情况还在后头。1319 年，欧洲暴发了一场牛瘟。[18] 牛瘟病毒对人们无害，对牲畜却是致命的，杀死了英格兰65%的牛、绵羊和山羊。人们转而开始饲养繁殖速度很快的猪，但日益增长的需求很快就导致这种牲畜也短缺起来。牛羊畜群都极度营养不良，以至于存栏量直到1327年才恢复过来。牛奶产量骤降至每头奶牛仅有170升。牛奶的匮乏，更是让营养不良的局面变得雪上加霜。这种情况原本已经够糟糕的了，可牛瘟还导致用于拉犁耕地的公牛大量死亡。有些人养了马匹，但喂养马匹的成本更高。农耕生产的成功依赖于耕作更多的土地，故由此带来的后果极其严重。
这场“大饥荒”，让欧洲的农民遭受了重创。尽管粮食短缺的情况很严重，但农民社会足够顽强，仍然能够存续下来。凭借传统的知识，他们能够熬过数场庄稼歉收。1314 年至1321 年间多年的降雨和饥荒，导致了政治和社会动荡、叛乱，以及近乎持续不断的暴力与战争。气候温暖的数个世纪结束之后，这些天灾人祸结合起来，导致“中世纪盛期”出现了一场不可思议的粮食危机。这场灾难，将对教会、国家以及整个欧洲社会的未来产生影响；与之前的数个世纪比起来，欧洲社会的未来将呈现出更加动荡和更加暴力的特点，其中就包括了“百年战争”（1337—1453）期间的种种恐怖情景。
欧洲中世纪的作物产量一向很低，就算在人们可以称之为正常天气的情况下也是如此，因为其间有可能存在反常的霜冻和秋季冰雹。其中还不包括禽鸟和啮齿类动物造成的破坏；事实上，还有养活（以前）剧增的中世纪人口所带来的各种压力。其实，人口数据就说明了一切。1066年，“征服者”威廉入侵时，英格兰有260万至340万公顷的土地种植着谷物。这些土地，轻而易举地养活了 150 万左右的人口。到了13 世纪最后的几十年，英格兰的人口达到了500万，却只能靠460万公顷的土地维持生计了；而且，其中很多还是贫瘠的土地。
“中世纪气候异常期”并不像许多人认为的那样，经历了几个世纪温暖而稳定的气候。其间确实出现过几十年的好天气，有一周又一周的明媚阳光，充沛的雨水则带来了丰收，而冬季气候也比以往更加温和。但这几个世纪属于异常现象，其显著之处并不在于气温较高，而在于气候多变，常常在极端的寒冷和炎热之间来回变换。无疑，我们并不能像那些否认气候变化的人一样，声称“中世纪气候异常期”比如今还要温暖。当时的大多数时候，平均气温似乎都跟21世纪的常态差不多，偶尔有些年份，甚至是几十年，气温还要比如今稍高一点儿。只不过，这些影响很微妙，而“中世纪气候异常期”那几个世纪也属于气候持续多变的时代，就像此前和此后的多个世纪一样。
起初，人们认为“中世纪气候异常期”是欧洲特有的一种现象。如今我们得知，它的影响虽然很微妙，却具有全球性，有时还是灾难性的，尤其是像美国西南部和东南亚这样的半干旱地区遭遇持久干旱的时候。在欧洲，这种异常气候加上经常的作物丰收，催生出了人们通常所称的“中世纪盛期”，催生出了那里众多宏伟壮观的大教堂，以及随着人们争夺资源控制权而爆发的地方性战争。特别是，正如我们在后续各章中即将看到的那样，在接下来的数个世纪里，随着人口增加，随着越来越多的农民为了可持续农业而被迫去开垦那些称为贫瘠土地都很勉强的地块，人们易受气候变化影响的脆弱性也急剧增加了。“中世纪暖期”再次提醒我们，可持续性与韧性取决于前瞻性思维、细致的环境知识，以及对短期和长期的气候变化做出的长期规划。
1316 年春季，数代以来不断增加的脆弱性，终于结出了恶果。春雨不停地下，冲走了地里的种子，侵蚀了脆弱的山坡。与营养不良和饥荒相关的疾病在欧洲的广大地区肆虐了5 年。饥荒就像是《圣经》中的“天启第三骑士”一般降临，后者骑着黑马，带着象征着食物的价格与丰裕程度的决定命运的天平。在这位神话中的“骑士”的脚步声中，瘟疫和几个世纪的降温随着“小冰期”的到来不断出现，经常是极度寒冷的气候波动，对生活在欧洲和美洲的人都产生了影响。
[1] 凯撒里亚的普罗科匹厄斯（Procopius of Caesarea，约500—
约570）是拜占庭的一位希腊学者兼律师，曾大力批评过查士丁尼一
世皇帝。此人的《查士丁尼战争史》（History of the Wars）是记载
6 世纪早期诸事件和“查士丁尼瘟疫”的宝贵资料。Procopius,
History of the Wars (Cambridge, MA: Loeb Classical Library,
1914), IV, xiv, 329.
[2] Michael McCormick, Paul Edward Dutton, and Paul A.
Mayewski, “Volcanoes and the Climate Forcing of Carolingian
Europe, A.D. 750–950,” Speculum 82 (2007): 865–895，此文
是论述这一时期的气候以及火山喷发与气候之间关系的基本资料，我
们在很大程度上参考了这篇论文。
[3] 这两座火山都位于印度尼西亚。——译者注
[4] 巴塞洛缪·安格利库斯（Bartholomeus Anglicus，约 1203—1272）
通常被称为“英国人”巴塞洛缪，是一位方济各会学者兼教会官员。
他的19卷本《万物本性》（De proprietatibus rerum）一书是现代百科全书的前身，曾经广为传阅。此书论述的主题范围广泛，包括上帝和动物。
[5] Ulf Büntgen and Nicola Di Cosmo, “Climatic and
Environmental Aspects of the Mongolian Withdrawal from
Hungary in 1242 CE,” Nature Scientific Reports 6 (2016):
25606.
[6] Hubert Lamb, Climate, History and the Modern World, 2nd
ed. (Abingdon, UK: Routledge, 1995)，它是了解兰姆作品的优秀
指南。至于“中世纪气候异常期”，请参见他的“The Early
Medieval Warm Epoch and Its Sequel,” Palaeogeography,
Palaeoclimatology, Paleoecology 1 (1965): 13–37。
[7] Michael Mann et al., “Global Signatures and Dynamical
Origins of the Little Ice Age and the Medieval Climate
Anomaly,” Science 326 (2009): 1256–1260.
[8] Ulf Büntgen and Lena Hellman, “The Little Ice Age in
Scientific Perspective: Cold Spells and Caveats,” Journal
of Interdisciplinary History 44 (2013): 353–368. Sam White,
“The Real Little Ice Age,” Journal of Interdisciplinary
History 44, no. 3 (winter 2014): 327–352，其中也提供了一些重要的见解。宾特根与赫尔曼强调说，这一切煞费苦心的研究都只是暂时的，因为许多高度技术性的问题还有待解决。其中的核心问题，就在于需要有可靠的仪器测量网络，来校准精心搜集与精确断代的替代指标资料。然而大致来看，现有的研究至少提供了气候变化的总体印象，比早期的研究要精确多了。未来的气候变化情况将变得更加精确，因为其中很大一部分将来自极其复杂、有时甚至非常深奥且掌握在专家手中的统计计算。不过，对于中世纪的气候及其变迁，我们了解的情况已经比短短几年之前都要丰富得多了。
[9] Ulf Büntgen et al. “Tree-ring Indicators of German
Summer Drought over the Last Millennium,” Quaternary Science
Reviews 29 (2010): 1005–1016.
[10] 论述中世纪农业的文献资料有很多。Grenville Astill and
John Langdon, eds., Medieval Farming and Technology: The
Impact of Agricultural Change in Northwest Europe (Leiden:
Brill, 1997)，是一部重要的概述之作。
[11] 公石（hectoliter），英制容量单位，1 公石合 100 升（略作“石”）；作重量单位时，1公石合100公斤（也作“公担”）。缩写为hl.。——译者注
[12] 马姆斯伯里的威廉（William of Malmesbury，约 1096—1143）
是英格兰西南部的一位修士，也是一位备受推崇的历史学家，地位仅
次于“尊者比德”（Venerable Bede）。他的《历史小说》（Historia Novella）第5 卷（Book V）描述了当时的葡萄园。
[13] 参考的是Hubert Lamb, Climate, History and the Modern
World (London: Methuen, 1982), 169–170。
[14] William Chester Jordan, The Great Famine (Princeton, NJ:
Princeton University Press, 1996)，对那场饥荒进行了最权威的
描述，我们在这里也主要参考了这部作品。亦请参见William Rosen,
The Third Horseman: Climate Change and the Great Famine of
the 14th Century (New York: Viking, 2014)。
[15] 本段引文来源：Abbott of St. Vincent: Martin Bouquet et al., eds., Recueil des historiens des Gaules et de la France, 21:197。From Jordan, The Great Famine, 18.
[16] 本段参考了Rosen, The Third Horseman, 149–151。
[17] Wendy R. Childs, ed. and trans., Vita Edwardi Secundi:
The Life of Edward II (New York: Oxford University Press,
2005), 111.
[18] C. A. Spinage, Cattle Plague: A History (New York: Springer, 2003).
第十二章 “新安达卢西亚”与更远之地（公元1513年至今）
一切都始于诺曼人，且远早于克里斯托弗·哥伦布登陆巴哈马群岛的时候。欧洲人与美洲原住民之间的第一次接触，是在“中世纪气候异常期”；当时，冰岛与加拿大拉布拉多之间的北方海域上，浮冰已经消退。到公元874年时，北欧殖民者已经开始利用北方海域的有利冰雪条件了。他们在北极边缘的冰岛上永久定居下来。他们的航海鼎盛期持续了差不多3个世纪，当时北大西洋东部的气温较高，气候条件也较稳定（参见第八章中的地图）。
公元986年，因为“杀了一些人”而被逐出冰岛的“红发”埃里克[1] 在格陵兰岛建立了殖民地。不久之后，这些殖民者就跨海而过，来到了如今属于加拿大北部的巴芬兰。埃里克的儿子莱夫·埃里克森（Leif Eirikson）又驾船往南航行，远至圣劳伦斯河河口，并且在纽芬兰的北部过了冬。此人的过冬之地，可能就是该岛最北端的兰塞奥兹牧草地遗址；在这处遗址上，考古学家发现了北美洲唯一一个为世人所知的维京人殖民地。[2] 后来，他们又多次航行到了拉布拉多，与因纽特部落进行了不定期的接触，还前去采伐格陵兰岛上供不应求的木材。世世代代，格陵兰人都是用他们在这些航海活动中获得的海象象牙，向祖国的教会缴纳部分什一税。1075 年，一位名叫奥顿（Audun）的商人甚至从格陵兰岛运来了一只活的北极熊，并把它当作礼物送给了乌尔夫松国王；这种事情，在公元 1200 年以后气候较为寒冷的数个世纪里根本就做不到。
诺曼人从未在北美洲定居下来；至于原因，部分在于他们与美洲原住民之间的激烈交锋阻碍了殖民。但在北大西洋西部气温较低、气候寒冷的数个世代里，他们却一直留在格陵兰岛上，生活了3个世纪。在格陵兰岛对面的巴芬兰，高山冰川的面积在公元1000 年前后到1250 年间达到了最大。此外，从“格陵兰冰盖项目 2”的冰芯中获得的气温数据表明，从公元1000年左右到1075年以及从公元1140年至1220年这两个时期，都出现过气温下降的现象。[3] 诺曼殖民者的人口逐渐减少，直到1450年他们彻底弃定居点而去。至于诺曼人离开格陵兰岛的确切原因，如今仍然是一个存有争议的问题。日益孤立的环境、海象象牙贸易的衰落，或许还有因纽特人的敌意，可能都是他们遗弃定居地的原因。只有诺曼人的史诗中，还保存着人们对美洲原住民与欧洲人首次相遇时的记忆。
神秘的“新安达卢西亚”（公元1513年至1606 年）
15 世纪末至16世纪初，欧洲已知世界的边界显著扩张了。克里斯托弗·哥伦布及其后继者，在属于热带气候的加勒比地区建立了殖民地。阿兹特克的印第安人曾在西班牙的宫廷之前接受检阅。西班牙征服者对佛罗里达和新墨西哥进行勘察，结果却酿成了一场灾难，遭遇了严寒。在深受干旱与低温所困的弗吉尼亚，英国殖民者建立了詹姆斯敦。1497年约翰·卡伯特到纽芬兰的航行以及后来的探险活动则清晰地表明，任何一条前往亚洲的“西北通道”，都要经过冰天雪地、极其寒冷的地带。
1605 年至 1607 年间，丹麦国王克里斯蒂安四世曾经派遣3支远征队，前去寻找业已消失的诺曼人殖民地。这几次远征，都以失败而告终。远征队遭遇了严寒，连夏季也是如此；夏季冰层从格陵兰岛沿海往外，一直延伸到了很远的地方。此后，捕鲸就成了荷兰人在北极水域的主要活动。北方的真正“黄金”藏在纽芬兰的鳕鱼渔场里，可汉弗莱·吉尔伯特（Humphrey Gilbert）在这座岛屿上进行殖民的努力，在1583年以灾难而告终。[4] “小冰期”里最寒冷的一些天气，不利于人们在纽芬兰进行永久性的殖民活动。人们的关注焦点，便转向了科德角的南部。
与欧洲的情况一样，“小冰期”从来都不是一个持久存在的深度冰冻期，也不只是数个世纪的寒冷天气。这几个世纪不断变化的气候，同样对美洲的殖民历史产生了极大的影响。[5] 寒冷刺骨的冬天、旷日持久的干旱、飓风以及猛烈的暴风雨，都曾导致船只偏离航线和失事。北美洲的情况，尤其让当时的人感到困惑。来到陌生环境里的欧洲农民，都期待着这里有他们熟悉的、界限明确的季节，而不是像夏季炎热潮湿、冬季气温低于零度之类的极端气候。此外，他们在狩猎与捕鱼时碰到的也是不同的物种。
当时欧洲人对北美洲天气的态度很僵化，以为世界上任何一个纬度地区的气候都是恒定不变的。[6] 古典作家把已知世界划分成了一些所谓的“克利玛塔”（climata）带，故才有了如今的“气候”（climate）一词。[7] “克利玛塔”往往是指气温，它会随着纬度的变化而以一种相对可预测的方式变化。欧洲属于湿润的海洋性气候，一年到头降雨充沛，气温日较差与季节性温差相对较小，而且一般来说，每个季节的起始时间变化不大。这就意味着，那些鼓吹殖民的人以为，生活在北美洲的人也会享受与欧洲西部相似的温和气候。这种看似常识的设想，其实是完全错误的。
北美洲的东部，夏季极其炎热，冬季极其寒冷；那里属于大陆性气候，为来自陆地而非来自大西洋的气团所控制，后者对欧洲的气候具有强大的影响。不但如此，两地气温所属的纬度区间也不同，欧洲为北纬40°到60°之间，而北美洲则为北纬35°到50°之间。伦敦位于北纬51°，与纽芬兰北部的纬度相同。美国弗吉尼亚州的切萨皮克湾位于北纬37°，则与西班牙塞维利亚的纬度相同。弗吉尼亚的降雨主要出现在夏季，并且不那么可靠，还会出现毫无规律的干旱周期。对于欧洲殖民者而言，这种气候现实很严酷，他们原本指望这里是一个气候温和、气温较高且如地中海地区一般的“天堂”。一些劝人去殖民的作家，把这里称为“新安达卢西亚”。[8]
最早记载从北部诸地前往波多黎各的西班牙殖民地的情况的资料中，提到过一个叫作“比米尼”（Bimini）的岛屿。1513 年，西班牙探险家胡安·庞塞·德莱昂（Juan Ponce de León）沿着比米尼岛海岸航行，将这个神秘之地改名为“佛罗里达”。两度探险失败后，此人便放弃了野心勃勃的殖民计划，抱怨那里的气候不好，那里的人则“十分野蛮和好战”。在接下来的50年里，还有人往返于此地，全都大失所望。“佛罗里达”不是什么“新安达卢西亚”，不会给他们的祖国提供地中海各地可以找到的橄榄油和其他商品。大部分雨水都是在夏季的那几个月里降下，使得冬季作物很少，甚至根本就没有发芽所需的水分。那里也没有旱季来让作物成熟。年复一年，西班牙殖民者种植的庄稼全都烂在了地里。佛罗里达还深受猛烈的飓风和冬季极其寒冷的北风所害。大大小小的探险队曾经向西远行，到达了密西西比河与如今的得克萨斯州；其中的一次远征，是1538年至1543年间埃尔南多·德索托（Hernando de Soto）发动的损失惨重的入侵，这次远征因他们的苦难经历和暴行而令人瞩目。西班牙之所以殖民失败，部分原因就在于远征者无能且领导无方，同时也在于殖民者怀有不切实际的野心。这些远征，并不是王室经过了精心计划且持续提供资金支持的行动。一切都依赖于个人的开拓精神，可这又要靠西班牙贵族的财富来支持。王室国库负担不起实施这种计划所需的费用。
西班牙的殖民活动，也是因为严酷的气候变化才会土崩瓦解。今天的美国东南部在“小冰期”里曾经显著降温，气温降幅视地点而异，高达1℃至4℃不等。这种降温，在一定程度上是由西北部寒冷干燥的空气和冬季降雪导致的，16世纪和17世纪尤其如此。西班牙人的记述中，就反映出了当时大气环流的变化和寒风肆虐的情况，以及殖民者遭遇的严重干旱。[9] 异乎寻常的寒冷、大雨和大雪，使得他们不论身处哪里都有挨饿和生病的危险，同时还会遭到心怀敌意的印第安人的袭击。1528年，得克萨斯沿海地区极其寒冷，以至于海中的鱼都冻僵了，还有过同一天既下雪又下冰雹的情况。
十几年之后的 1541 年，埃尔南多·德索托率领的那支远征队在如今密西西比州境内距奇卡索人（Chickasaw）不远的地方扎下了营寨。当时，天气极度寒冷，故他们“整夜无眠，辗转反侧；半身若暖，半身受冻”。[10] 经历了“小冰期”的气候严寒（如今几乎不为人所知）后，人们的“新安达卢西亚”之梦就破灭了。至于其间的一场场干旱，下文所述的树木年轮序列表明，当时的旱情是数个世纪以来最严重的。
但人们还是继续努力，想要在这里永久定居下来。1565年9月，西班牙海军将领佩德罗·梅内德斯·德·阿维莱斯（Pedro Menédez de Avilés）率军来到了佛罗里达。此人将法国殖民者从圣约翰斯河畔的卡洛琳堡（Fort Caroline）赶走，然后在圣埃伦娜和圣奥古斯丁两地建立了殖民地；当时，恰好碰上16世纪60年代一次严重的干旱和一场大飓风袭击了各个殖民地。在6年的时间里，阿维莱斯手下有一半的士兵都饿死和病死了。当地的印第安人便把西班牙人赶出了圣埃伦娜。16世纪80年代初，又出现了一场大旱；当时，西班牙殖民者正在与当地的瓜勒（Guale）印第安人进行一场残酷的战争。最终，印第安人缴械投降，圣埃伦娜则进行了重建。佛罗里达一度短暂地恢复了元气，直到 1586 年弗朗西斯·德雷克（Francis Drake）袭击了圣奥古斯丁，放火将那里的250幢房屋夷为平地，并且掳走了一切。但在“新西班牙”[11] 当局的大力资助下，这座城市最终幸免于难，成了西属佛罗里达的首府，时间超过了200年。
此时的西属美洲已经因为从墨西哥与秘鲁攫取了大量黄金和白银，积聚了巨大的财富而享有了传奇般的声誉，所以引来了大量的海盗与私掠船。一双双贪婪的眼睛，全都盯着西班牙的领地，以及此时几乎还无人了解的佛罗里达北部沿海。1584 年 5 月，英国伊丽莎白一世时期的冒险家沃尔特·雷利（Walter Raleigh）派遣两艘船只，对那里实施过一次侦察。他们在哈特勒斯湾登岸，然后又向北航行到了罗阿诺克岛，那里的塞科坦（Secotan）印第安人热情地欢迎了他们。这些来客带着极尽赞誉之语的报告而返，称那里有肥沃的土地、丰富的木材，甚至还有野生葡萄。至于当地的印第安人，则一个个都态度温和，当然也没有敌意。据说，他们都是“按照着黄金时代的方式”生活着。
另一支前往罗阿诺克岛的探险队，由理查德·格伦维尔（Richard Grenville，或者拼作Richard Greenville）与拉尔夫·莱恩（Ralph Lane）两人率领，于1585年起航。[12]由于遭遇了暴风雨、船只失事和偶尔的私掠船骚扰，再加上旗舰在“外滩群岛”搁了浅，失去了全部的辎重，所以这些殖民者狼狈不堪地到达了罗阿诺克。格伦维尔返回英国寻找新的给养，莱恩则与大约100位殖民者留了下来。这个殖民地，很快就变得岌岌可危了。完全不同于之前的报告，这里的土层很薄，一点儿也不肥沃。种在地里的庄稼全都死了。此地的池柏年轮表明，在1587年至1589年的殖民期间，800年来最严重的一场干旱仍在这里肆虐。[13] 殖民者还遭遇了食物短缺，因为印第安人不愿把玉米卖给他们。英国也没有派来救援船只。尽管害怕遭到印第安人的伏击，这些绝望的殖民者还是不得不去寻找给养。接下来，莱恩与当地酋长的对手结盟，杀掉了那位酋长。不到一个星期之后，弗朗西斯·德雷克爵士就率领一支满载劫掠品的船队抵达了；只不过，他手下的船员因为患病而数量大减。他提出帮助莱恩另觅一个殖民地，可一场大风却刮了4天，可能会让德雷克的舰队陷入搁浅的危险。于是，殖民者迅速遗弃了这个前哨，坐船返回了英国，只在罗阿诺克留下了15个人，这些人后来消失得无影无踪。关于这些消失的殖民者，有一个传说流传了下来，可他们的遭遇，至今依然不为人知。极有可能，他们要么是加入了当地的一个印第安群落，要么就是为印第安人所杀。对此，我们多半永远都无从知晓了。
尽管有罗阿诺克岛之祸，可北美洲以及那里的原住民，还是深深地吸引着英国国内的民众。激情洋溢的支持者计划开拓新的殖民地，其中就有持乐观态度的理查德·哈克卢特（Richard Hakluyt）；此人是一位大臣兼业余地理学家，他确信英国拥有巨大的潜力，能够掌控海外勘探和贸易。[14] 他曾经热情地吹嘘说，北美洲拥有丰富的黄金、白银、珍珠和充足的热带食物，其实这种说法并不正确。西班牙帝国在美洲进行扩张的流言，时断时续地传到了欧洲，因为西班牙人认为他们的发现属于国家机密，只有少数精英人士才能知晓。英国没人看过16世纪70年代至80年代编纂而成的《印第亚斯之地理关系》（Relaciones geográficas de Indias ）一作，而此作也从未得到过广泛传播。这份具有里程碑意义的报告详细描述了当地的天气状况。对于任何一个打算到加勒比地区、佛罗里达以北和更往北的海岸进行航海探险的人而言，这种信息都属于无价之宝。除了地理方面的错误，哈克卢特还重申了一种错误的观点，即从卡罗来纳到缅因地区的整个东海岸都是地中海气候。他在作品中称，那里气候温和、土地肥沃、气温较高，是一个农民可以种植橄榄、葡萄、柑橘和其他各种作物的地方；这些作物，原本都是英国耗费巨资从地中海地区进口的。这片土地上，“气候、土壤皆似意大利、西班牙，以及吾等获取葡萄酒与油料之群岛”。[15]
这种前景确实诱人，也构成了弗吉尼亚公司在 1606 年派遣 3 艘船只前往美国东海岸时制定的《建议性指示》（Instructions by Way of Advice）的核心内容。当时的组织者，几乎没有从过去其他地方的错误中吸取经验教训。他们想当然地以为，尽管16世纪末的气候日益寒冷，但他们的目的地的气候会跟祖国的气候差不多。
詹姆斯敦的麻烦（公元1606年至1610年）
1606 年 12 月，从伦敦起航的3艘船只和大约144位殖民者在美国东海岸登陆了。1607年5月6日，他们在如今的弗吉尼亚驶入了詹姆斯河河口；虽说当地的“印度人”[16]袭击了他们，可他们还是继续进行了勘探。最终，他们在这条河上游方向大约 80 千米处一个沼泽密布的半岛上，修建了一座呈三角形的要塞。从战略上来看，这个低洼之地的选址是很合理的，而且那里的土壤“肥沃之至，非言表所及”。不过，要塞紧挨着水边，除了河水就没有淡水供应。森林则不断地向这个定居地逼近，故他们有遭到伏击的危险。对于即将到来的可怕遭遇，这些殖民者毫无准备。[17]
哈克卢特的计划以当时人们能够接触到的最佳信息为基础，同时也着眼于长远。他将目光投向了殖民活动的遥远未来。然而，殖民者首先就碰到了一个更加紧迫的问题，那就是他们必须在詹姆斯敦挺过最初的几个冬天，只能靠自给农业维生。从一开始，这里的粮食供应就很稀缺，因为印第安人并没有像大家以为的那样慷慨地给他们提供粮食。很快，疾病与死亡接踵而至，到8月份就死了50个人。没人知道究竟是哪些原因导致了他们死亡，但毫无疑问的是，与饥荒有关的疾病位列其中。更加糟糕的情况还在后头，因为气候也对殖民者造成了压力。此地的树木年轮中，就客观地记录了气温变化的情况。不巧的是，殖民者抵达詹姆斯敦的时候，正值一场从1606年持续到1612年的漫长干旱刚刚开始。气候学家还研究了取自切萨皮克湾中的沉积岩芯，发现这段时间也是整个千年里最寒冷的几年，气温比20世纪低了2℃。[18]
美国西弗吉尼亚州的树木年轮与洞穴沉积物都表明，17世纪初这里的季节性气候条件出现了重大变化，从而证实了殖民者自己记载的情况：冬季更加寒冷，夏季则更加干旱。
在詹姆斯敦这个殖民地最初和最脆弱的几年里，极端的气候变化造成了严重的破坏。炎热干燥的夏季，毁掉了正在生长的庄稼。詹姆斯河的水位急剧下降，使得河水中的盐分增加，变得极不利于健康了。当时也没人想过要挖一口井来获取淡水。冬季那种不常见的寒冷所导致的作物歉收，既加剧了粮食短缺的程度，也让殖民者之间的人际关系变得恶化。人们每天聊以为生的，只有1品脱甚至更少[19] 的小麦与大麦，再加上他们能够找到的其他食物。他们虽说既有武器，也有渔具，但显然很少加以利用。他们的生活条件充其量只能说是非常简朴，许多人都睡在冰冷的地上。正如历史学家凯伦·库珀曼所言，那些殖民者极有可能始终处在饥饿与震惊的状态之中；她还认为，这种状况堪比受到了虐待的战俘。[20]
除了不得不将就着饮用盐分很高的肮脏河水，殖民者可能还把伤寒从卫生条件很差的船上带了过来；他们花了2年的时间，才掘出一口井来获取“甜水”。鉴于印第安人的袭击始终都是一种威胁，故他们取水的地方可能距他们处理垃圾的地方很近，这也很危险。可以说，许多殖民者可能都是死于饮用了不干净的水，而非死于饮用啤酒。英国当时的大麦收成，绝大部分用于酿制啤酒了；许多人每天要饮用6品脱左右，故啤酒在他们每天所获的热量中占有重要的比例。酿制啤酒时的麦芽，也是他们日常饮食的一部分。由于对艰苦的生活条件和食物匮乏的情况毫无准备，所以殖民者还遭到了毁灭性的心理打击。
当地的美洲原住民村庄，都被波瓦坦部落联盟统治着；那是一个实力强大的酋邦，控制着无数座村落，总计有约15,000 人生活在詹姆斯敦的上游地区。不同于新来的殖民者，他们在当地的气候下生活了数百年，故经验丰富。与当时弗吉尼亚的所有印第安人一样，波瓦坦部落把农耕生产与狩猎、捕鱼以及采集植物性食物结合起来维生。[21] 他们追求食物的多样化。在春季里，他们会用鱼梁捕鱼，并且用陷阱捕猎松鼠之类的小型动物。5月和6月是播种季节，他们主要以橡子、核桃和鱼为食。还有一些人则散布在各个小营地里，靠各种各样的食物维生，其中既有鱼类、螃蟹和一些猎物，也有多种多样的植物性食物。6月、7月和8月是食物相对充沛的几个月，他们会以箭叶芋（tocknough）的根茎、浆果（疆南星属植物）、鱼和青玉米为食。夏末秋初是收获和富足的季节；接下来，他们整个冬天就会捕猎鹿和其他猎物。有些酋长和位高权重的个人还会设法储存玉米供全年所食，但大多数波瓦坦人种植的粮食都只足以吃上几个月，然后他们就靠吃野生食物来熬过一年中余下的时间。
从当时一些美洲原住民遗骸中重要的碳、氮同位素来看，17 世纪的大多数印第安人主要是以玉米为食。[22] 而且，尽管他们对环境中的各种资源了如指掌，可这些人的骨骼也证明，他们经历过严重的营养不良时期。生存从来就不是一件容易的事情，哪怕他们比欧洲移民有更多的选择，也是如此；至于他们具有更多选择的原因，部分在于他们对自己所处的环境与气候有着深入的了解。他们可以把园圃迁到气温较高和朝南的向阳坡上，可以种植一些比玉米更加耐寒的作物。在极端情况下，当地人要么是迁往别处，要么就是彻底回归狩猎与采集的生活方式。正如人类学家海伦·朗特里指出的那样，部落里的女性可能不愿储存较多的玉米，因为她们担心酋长和精英阶层会把余粮当成贡品夺走。[23] 当时，随着一些实力强大的酋长相互争夺权力和威望，波瓦坦人生活的村落越来越大、越来越集中，并且筑有防御工事。从印第安人的角度来看，如何应对新来的殖民者，其实是一个非常简单的问题，那就是：他们怎样才能在不冒不必要的风险的情况下，最大限度地利用欧洲人的存在呢？
波瓦坦印第安人都盼着把玉米和其他食物卖给欧洲人，以此来获得欧洲人那些奇异的金属工具。由于地位和外交等问题都很棘手，并且有时还很微妙，所以二者之间的交易时起时落。到1607年秋季，殖民者几乎没有开垦任何土地。这些新来者都住在简陋不堪的洞穴居所里，其中许多人还意志消沉，坐在那里无所事事。1608年1月两艘补给船抵达之后，一场火灾又迅速把船上带来的一切连同要塞烧了个精光。那个冬天异常寒冷，冰冻的詹姆斯河几乎把两岸连起来了。1608 年，一支损失惨重的救援远征队带来了更多的殖民者，可他们的粮食供应却降到了最低限度。
面对敌意越来越强烈的当地人，大约400位殖民者都挤进了那座重建的要塞，几乎没人去耕种作物了。饥荒自然随之而来。1609年末，还有大约240人住在詹姆斯敦。到了第二年夏天，就只剩60人还活着了，死者则被安葬在附近的一处墓地里；他们的遗骸清晰地表明，这些人都是饿死的。到了这一年的隆冬，天气太过寒冷，以至于人们都没法涉水到浅滩上去寻找牡蛎了。一些绝望的欧洲殖民者竟然掘出死尸，以之为食。人们曾在这座要塞的一个地窖发现过一具少女的遗骸，上面带有明显的杀戮痕迹；有人甚至切开了少女的头骨，将她的大脑拿走了。[24] 1610年，切萨皮克湾周边的河流里，连鲟鱼这种重要的食物也不见踪影；至于原因，可能就是持续的干旱使得河水盐度太大，导致鲟鱼未至。殖民者只得把东西都装上船，离开了这里，结果却在詹姆斯河河口碰上了从英国而来的一支给养充足的新船队；若是没有这支船队，詹姆斯敦殖民地就不可能在“小冰期”中幸存下来。
努纳勒克知道如何做（公元17世纪以后）
在“小冰期”天气最寒冷的那些年里，詹姆斯敦爆发了一场粮食危机。即便是在较为暖和的年份，这个定居地也很容易受到作物歉收的影响，而波瓦坦印第安人不稳定的粮食供应，也让这里深受困扰。当地的美洲原住民，已经适应了数个世纪里迅速变化的气候；出现极端气候的时候，他们通过在一个有鱼、野生植物性食物和小型猎物的环境中追求食物的多样化而幸存了下来。尽管当地人的文化当中含有某种礼尚往来的精神，但与殖民者相比，他们获取食物的方式还是要灵活得多。而且，波瓦坦印第安人只是众多美洲原住民部落里的一个；这些部落都曾利用食物多样化的对策，在“小冰期”的气候波动中幸存了下来。
努纳勒克是一个图勒族村落，位于白令海靠阿拉斯加沿海地区卡斯科奎姆湾畔的昆哈加克村附近。[25] 从14世纪至19 世纪，那里的气候明显更为寒冷，降雪量更大，夏季气温比如今低了 1.3℃，而海冰的面积也更广阔。原住民在努纳勒克生活了差不多300年之久。此地居民最密集的时期，是17 世纪早期与中叶，与詹姆斯敦人口最密集的时间相同；当时正值“蒙德极小期”的最盛期，也就是“小冰期”里气候最寒冷的数年。这个定居地紧挨着河流，河中既有丰富的季节性洄游鱼类，也有世界上迁徙性水禽的一些最大集中地。这里到处都是小型的哺乳动物；鲸鱼在近海觅食，而海洋中的哺乳动物很丰富。人们所吃的肉类来自北美驯鹿，它们冬季会在海岸附近觅食。如今，这里却变成了多雪的北极气候，夏季凉爽而湿润。努纳勒克丰富的食物资源层级，也为人们提供了从衣物到狩猎武器的各种原材料。最重要的一点是，该村村民的食物都可以在距离相对很近的地方得到。利用天然冻土层的制冷作用，食物储存不成问题。人们几乎也不存在饮食方面的压力。正因为如此，人们才在这个地方居住了一代又一代。
这个村落的地理位置很优越，使得人们可以极其灵活地获取多种多样的食物。他们的家门口就有各种食材，而且有高效地储存食物的潜力，这意味着气候发生变化的时候，人们完全可以改变狩猎目标，只需重点捕杀其他的猎物就行了，因为气候变化不太可能对一个地区的所有动物产生同样的影响。就算是气候迅速波动，可能也不成问题，这主要是因为像鲑鱼这类食物的有无可以相对容易地预测出来。风险管理始终是人们在寻觅食物时的背景，但与季节更替、干旱和极端低温对食物供应有直接影响的许多环境相比，这里的人进行风险管理却要容易得多。目前，冰雪消融、海平面上升以及较高气温对当地永久冻土层的融化作用，正在侵蚀着这座遗址。
努纳勒克繁荣发展起来的环境，我们可以称之为一个“资源热点”。在这里，得益于对当地环境的深入了解，当地人形成了一种灵活多变的生存策略。他们的技术非常先进，完全适合在零度以下的气温中寻觅食物与生活，从而让村中的居民能够在条件艰苦的几十年里生活在一个地方；当时的天气条件会在毫无征兆的情况下突然改变，而食物来源也年年不同。与波瓦坦印第安人一样，高效灵活的缓冲机制与应对机制，让这个群落在“小冰期”的极端气候最恶劣，同时也是各个群落争夺食物资源的一个时期里幸存了下来。他们的位置得天独厚，这或许也是这里最终受到了袭击，接着又在17世纪末被人们遗弃的原因。
干旱演变成特大干旱（公元16世纪末至1600 年）
最后，我们再来看一看美国西南地区的情况。在前文中，我们已经描述过这里的美洲原住民社会利用迁徙并通过与不论远近的相邻群落维持亲族纽带的对策，适应了一次次漫长干旱的过程。这些干旱，都是由自然的气候变化造成的。“新西班牙”诸殖民者遭遇的干旱，也是如此；他们往北步步推进，深入了有着种种极端气候的新墨西哥州的沙漠地带。他们在 16 世纪晚期到达了这里，当时正值“小冰期”里西部地区气候最干旱和最寒冷的一个时期。数个世纪以来，古普韦布洛诸社会已经出色地适应了这里的环境：作物歉收是常有的事情，生存则取决于谨慎细致地利用泉水和降雨。普韦布洛人的骸骨表明，在那些经常发生暴力事件的社会中，曾经频繁地出现过营养不良、慢性贫血与寿命短暂的时期。[26]最早到达新墨西哥地区的欧洲人的经历，几乎与殖民者在北美洲东部的遭遇完全一样。错误的希望、不准确的预测和不熟悉的气候全都产生了影响，纯粹因厄运而遭遇的严重干旱与其他气候异常则令其雪上加霜；这些干旱与气候异常，在一定程度上是由1600年的于埃纳普蒂纳火山爆发导致的。
这里与其他地方一样，极度的不信任、缺乏了解与相互冲突，都曾让美洲原住民与新来者之间的关系备受困扰。从与之为邻的美洲原住民那里，欧洲殖民者没有了解到多少关于当地环境和食物的知识，也没有学习其狩猎、打鱼的策略，这一点实在令人感到惊讶。他们都是从自身的艰苦经历中吸取教训，利用来自祖国的技术来生产和生活。在应对这片土地和气候的数千年里，当地居民已经开发出了一些技术，可以制作充足的防寒装备、防水捕鱼服和防冻鞋具；假如欧洲殖民者能够看到并且借鉴这些技术，他们经历的苦难可能就要少得多了。
展望未来
与北美洲其他大多数地方相比，美国西南地区给我们带来了更多的启示，让我们看到人类活动导致的全球变暖正在改变我们的未来。尽管处于低活跃水平的厄尔尼诺现象可能是造成美洲遭遇特大干旱的一个主要因素，但一项新的研究将树木年轮中记录的1 200年之久的夏季土壤湿度重建与水文建模、统计评估结合起来，表明从2000年至2018年的这19 年才是公元800年以来第二个最干旱的时期。此外，目前这场特大干旱造成的严重后果当中，有不少于 47%是人为气候变暖导致的。人类活动抬升了气温，降低了相对湿度，杀死了西部数以百万计的树木。因此，一个原本属于常规性的干旱周期，就演变成了一场特大干旱，并且严重程度和持续时间在1 200年来均位居第二。严重干旱的表征，体现在各个方面，比如积雪大幅减少、河流流量下降、地下水减少、森林火灾增多等等。[27] 气候学家把干旱的原因归咎于太平洋东部海面气温的下降，其气候条件与厄尔尼诺现象处于低活跃度时的拉尼娜现象相似。这些气候条件，在北太平洋西部催生出了一个大气波列，从而挡住了暴风雨，使之无法到达美国西南部。过去1,000年中严重程度位居第二的这场特大干旱始于公元2000年，并且仍在继续发展着。如今，它已经让20 世纪30 年代的“尘暴”大干旱和20世纪50年代“大平原”南部的严重干旱相形见绌了。当然，我们还没法预测出这场干旱会不会因为不久之后一种降水较为充沛的新循环而结束，但更严重的人为变暖带来的威胁令人不安，因为它表明了我们现在对全球气候的影响究竟有多么强大。
未来究竟会怎样呢？在撰写本书之时（即2020年），我们还没有看到气温下降或者降雨更加充沛的迹象。气候建模的预测表明，到21世纪中叶时，干旱情况可能会更加严重。现有的气候变化数据，更加全面地描绘出了过去由大气与海洋异常导致的干旱的情况，而大气与海洋的异常，又是由自然的气候变化造成的。那些声称气候变化总会发生的人一直都在强调21世纪的变暖属于自然现象。但是，根据人们过去在美国西南部进行的学术研究来看，2000年至2018年间的土壤变干、蒸发增强和早期积雪的消失，全都因人类做出的决策与活动而增强了，因干旱叠加于气候压力之上而受到影响的地区也扩大了，故它们已经将原本属于常规性的一个干旱期变成了一场特大干旱。而且，真正的干旱可能还未开始。就算是自然力量终结了当前的干旱，全球人类排放的温室气体也会对将来干旱期的规模产生极大的影响。我们又一次收到了有力的提醒，必须牢记可持续发展的重要性。虽说记忆短暂，但我们已经看到，过去的地下水源是如何在短时间里灾难性地枯竭的。这种情况，已经在一些国家里出现，比如印度。可不可以兴建更多的水库，来储存更多的水呢？虽然在某些情况下，我们把这种做法视作一种短期的解决办法，但若认为这样做可以解决我们预测的未来降水会越来越少的长期问题，尤其是在我们的行为还会加速这一趋势的时候，就纯属痴心妄想了。
[1] “红发”埃里克（Eirik the Red，950—1003），挪威维京时期的探险家兼海盗埃里克·瑟瓦尔德森（Erik Thorvaldsson）， “红发”是其外号。——译者注
[2] 对于古代北欧人在格陵兰岛定居以及随后越过大洋前往北美洲
的航海活动，人们已经进行了深入的研究，其中包括丹麦考古学家在
格陵兰岛进行的出色发掘工作。参见Kristen A. Seaver, The Frozen
Echo: Greenland and the Exploration of North America, ca.
A.D. 1000–1500 (Stanford, CA: Stanford University Press,
1996)。至于兰塞奥兹牧草地，参见Helga Ingstad, ed., The Norse
Discovery of America (Oslo: Norwegian University Press, 1985)。
人们一直在质疑，兰塞奥兹牧草地究竟是不是埃里克过冬的地方。这
一争议尚未解决。
[3] Nicolás Young et al., “Glacier Maxima in Baffin Bay During the Medieval Warm Period Coeval with Norse Settlement,” Science Advances 10.1126/sciadv.1500806. 1, no. 11 (2015). doi:
[4] Brian Fagan, Fish on Fridays: Feasting, Fasting, and the
Discovery of the New World (New York: Basic Books, 2006)，其中进行了综合论述。
[5] Sam W. White, A Cold Welcome: The Little Ice Age and Europe’s Encounter with North America (Cambridge, MA: Harvard University Press, 2017)，此书是关于这一主题的权威资
料。在撰写本章余下的内容时，我们在很大程度上也参考了此书。
[6] White, A Cold Welcome, 9–19，怀特在书中此部分论述了气
候。亦请参见Karen Kupperman, “The Puzzle of the American
Climate in the Early Colonial Period,” American Historical
Review 87 (1982): 1262–1289。
[7] Anne Lawrence-Mathers, Medieval Meteorology: Forecasting
the Weather from Aristotle to the Almanac (Cambridge:
Cambridge University Press, 2019).
[8] White, A Cold Welcome, 28–47，怀特在书中此部分有全面的论述。
[9] White, A Cold Welcome, 31–32.
[10] 本段中的引文源自White, A Cold Welcome, 38, 41。
[11] 新西班牙（New Spain）， 1535年至1821年间西班牙在其殖民
地设置的一个总督辖区，范围包括如今的美国西南部、墨西哥、巴拿
马北部的中美洲及西印度群岛的大部分，首府设在墨西哥城。——译者注
[12] 关于罗阿诺克：Karen Kupperman, Roanoke: The Abandoned Colony (Lanham, MD: Rowman & Littlefield, 2007)。
[13] David W. Stahle et al., “The Lost Colony and Jamestown
Droughts,” Science 280, no. 5363 (1998): 564–567.
[14] Richard Halkuyt, Voyages and Discoveries: The Principal
Navigations, Voyages, Traffiques and Discoveries of the
English Nation, ed. Jack Beeching. Reissue ed. (New York:
Penguin, 2006). . See also White, A Cold Welcome, 103–108.
[15] 转引自White, A Cold Welcome, 105。
[16] 英文中的“印度人”与“印第安人”为同一个单词。这是因为
欧洲殖民者初抵美洲时，以为他们到达的是印度。为了将其区分开来，
我们才将两地的人分译为“印度人”和“印第安人”。此处的“印度
人”加了引号，无疑是指印第安人。——译者注
[17] 关于詹姆斯敦的这一节，参考了White, A Cold Welcome, chap.
6。亦请参见Karen Kupperman, The Jamestown Project (Cambridge,
MA: Harvard University Press, 2007)，以及James Horn, A Land
as God Made It: Jamestown and the Birth of America (New York:
Basic Books, 2005)。
[18] Stahle et al., “The Lost Colony and Jamestown
Droughts”，说明了树木年轮方面的研究情况。亦请参见 T. M.
Cronin et al., “The Medieval Climate Anomaly and Little Ice
Age in Chesapeake Bay and the North Atlantic Ocean,”
Palaeogeography, Palaeoclimatology, Paleoecology 297 (2010):
299–310。
[19] 品脱（pint），英美等国的容积单位。在英制单位中，1品脱约
合0.568 3升，美制单位中则有干、湿之分，1干量品脱约合0.550
6 升，1湿量品脱约合0.473 2升。1品脱小麦换算成重量之后，无
论干湿，都不到0.5公斤。——译者注
[20] Karen Kupperman, “Apathy and Death in Early Jamestown,”
Journal of American History 66 (1979): 24–40.
[21] Helen C. Rountree, The Powhatan Indians of Virginia:
Their Traditional Culture (Norman: University of Oklahoma
Press, 1989)，这本书是一份重要的参考资料。
[22] 这个方面的文献资料正在快速增加。其中的概述之作，请参见
Martin Gallivan, “The Archaeology of Native Societies in
the Chesapeake: New Investigations and Interpretations,”
Journal of Archaeological Research 19 (2011): 281–325。
[23] Helen C. Rountree, Pocahontas, Powhatan, Opechancanough: Three Indian Lives Changed by Jamestown (Charlottesville: University of Virginia Press, 2005), 64.
[24] William M. Kelso, Jamestown: The Truth Revealed
(Charlottesville: University of Virginia Press, 2018).
[25] 努纳勒克因近期的考古发掘才为世人所知：Paul M. Ledger et
al., “Dating and Digging Stratified Archaeology in
Circumpolar North America: A View from Nunalleq, Southwestern
Alaska,” Arctic 69, no. 4 (2019): 278–390。亦请参见
Charlotta Hillerdal, Rick Knecht, and Warren Jones,
“Nunalleq: Archaeology, Climate Change, and Community
Engagement in a Yup’ik Village,” Arctic Anthropology 56
(2019): 18–38。
[26] Gideon Mailer and Nicola Hale, Decolonizing the Diet:
Nutrition, Immunity, and the Warning from Early America (New
York: Anthem Press, 2018)，它是对这一新兴研究领域进行概述的
一部有益之作。
[27] A. Park Williams et al., “Large Contribution from
Anthropogenic Warming to an Emerging North American
Megadrought,” Science 368, no. 6488 (2020): 314–318。供一般读者阅读的概述之作，请参见David W. Stahle, “Anthropogenic Megadrought,” Science 368, no. 6488 (2020): 238–239。
第十三章冰期重来（约公元1321年至1800年）
对于接下来要描述的现象，人们曾称之为“大曼德雷克”（the Grote Mandreke），或者“人类大溺水”。13世纪末和 14 世纪的大部分时间里，欧洲北部都是世界上一个暴风雨肆虐的地区。至少有12场大风暴曾在“低地国家”[1] 的沿海肆虐，将面前的一切全都席卷而去。接着，1362年1月16 日，“大曼德雷克”出现了；它在北大西洋形成了一股强劲的西南大风，然后横扫爱尔兰和英格兰，导致诺威奇大教堂的木制尖顶轰然倒塌，坠到了下方的中殿[2] 里。这还仅仅是个开始。狂风巨浪在北海上呼啸而过，然后冲到了德国北部和尼德兰地区，将那里的一切也都席卷而去了。这场特大风暴，摧毁了丹麦的60多个教区，像玩“九柱戏”[3] 一样把牛群击倒。当时的一位目击者写道：“狂风令锚楫折断，港内舰船尽毁，溺亡者众，牛羊皆不能免……亡者不可胜数。”[4] 由于当时几乎不存在什么海防设施，也没有什么预警机制，故成千上万生活在海边的百姓在面对这种似乎是为了惩罚罪人而释放的神灵的震怒时，全都束手无策。
差不多就在1315年至1321年的“大饥荒”期间，随着暴雨和持续不断的气候波动，“中世纪气候异常期”迅速结束了。随后的数个寒冬导致大河封冻，并且阻塞了波罗的海上的航运。其间，既不是没有出现过气候十分炎热的夏季，也不是没有出现过持续近 10 年或者仅仅一两个季节的严重的干旱周期。毫无征兆地刮起的狂风，是数十年间快速气候变化中的一部分，而且常常伴随着极端的寒冷和炎热，从而开启了“小冰期”。
从气候的角度来说，一位旅行者在“小冰期”里穿越欧洲时，除了偶尔会碰上极其严酷的寒冬和一个个酷暑之外，其经历与现在几乎不会有什么不同。如今，我们许多人都经历过高速公路结冰、雪连下几周或者夏季气温高于20℃之类的情况。14世纪的欧洲农民，有可能种植多种多样的作物来降低霜冻或干旱天气的影响，但在面对反复无常的气候波动时，他们基本上无能为力。由于敏锐地认识到了这种脆弱性，所以他们都生活在忧虑之中，担心作物歉收和饥荒，害怕营养不良导致的疾病。神灵的报复与“末日审判”带来的威胁，无形地笼罩在城镇与乡村之上。接下来，腺鼠疫暴发了。
黑死病（公元1346年至1353年）
1346年至1353年间，臭名昭著的黑死病降临到了欧洲。[5] 欧洲西部大约有2 500万人染病死亡，只是确切的死亡人数我们无法确知。这场可怕的瘟疫，实际上是腺鼠疫第二次侵袭欧洲了；至于第一次，就是公元541年至542年间的“查士丁尼瘟疫”（参见第五章）。引发此疫的罪魁祸首是一种细菌，即鼠疫杆菌，它会感染寄居于地面上的啮齿类动物身上的跳蚤；这些啮齿类动物中，包括了中亚旱獭和各种鼠类。人们并不清楚鼠疫杆菌首次到达欧洲的确切时间，但这种细菌最晚也是在公元前3000年就在欧洲出现了；只不过，鼠疫杆菌第一次暴发时，并未导致真正的瘟疫大流行。[6]
中世纪的黑死病起源于亚洲中部，有可能源自吉尔吉斯斯坦；那是“丝绸之路”上的一个内陆国家，与哈萨克斯坦、中国、塔吉克斯坦以及乌兹别克斯坦等国接壤。鼠疫从那里开始，传播到了中国和印度。这种疾病，有可能是沿着连接国际大都市的“丝绸之路”，或者经由船只一路来到黑海地区的。到1346年底，欧洲各个港口接到了报告，称印度人口正在减少，而美索不达米亚、叙利亚、亚美尼亚和蒙古人统治的地区已尸横遍野。据说，是1347年乘坐帆船从克里米亚半岛东部的卡法（Kafa）回来的30名热那亚商人，将鼠疫传到了西西里岛上。于是，瘟疫从意大利开始，沿西北方向蔓延到了整个欧洲。染病者身上出现的显著症状，有淋巴结炎（即腋窝下或腹股沟出现疖子）、发烧和吐血。最近人们对伦敦和欧洲大陆因患黑死病而身亡的人进行了DNA分析，结果表明，鼠疫杆菌就是造成这场瘟疫的罪魁祸首。
为什么黑死病会在中亚地区盛行呢？气候变化在其传播过程中，有没有发挥作用？验证这个问题的一个方法，就是研究沙鼠而非老鼠。在吉尔吉斯斯坦，沙鼠的种群密度会随着占主导地位的气候条件而变化。温暖湿润的环境，会提高这些大沙鼠及其身上的跳蚤原本就在增加的种群密度。假如同样的天气在一个面积广大的地区里发展，那么瘟疫就会迅速蔓延开去。每只沙鼠身上的跳蚤密度都会增加，鼠疫则会变得更加盛行；而更重要的是，跳蚤会寻找其他的宿主，包括人类及其饲养的牲畜。假如气温下降，环境变得较为干燥，那么沙鼠的数量就会大幅下降，而跳蚤的数量也会减少。
为了验证这种观点，一组研究人员曾将源自喀喇昆仑山脉上的刺柏年轮序列以及其中记录的降水和气温情况与鼠疫暴发的历史记载进行了比较。[7] 他们发现，亚洲暴发的一场鼠疫过了15年左右之后，才传播到了欧洲的港口。但在人口较为稠密的欧洲，瘟疫的传播速度却比中亚地区快得多，每年能够传播1,300千米左右。长久以来，流行病学家和历史学家都以为，黑死病是一桩单一的意外事件。新的气候学证据却表明，由于沙鼠的种群数量以及它们身上的跳蚤种群数量都随着气候而波动，故源自亚洲大量野生啮齿类储存宿主身上的鼠疫出现了由气候驱动的、间歇性暴发的新菌株。欧洲本地却没有这些储存宿主。
由此导致的后果，是毁灭性的。在苏格兰，染病者“残喘于世，仅有二日”。与此同时，巴黎及其周边地区的人口锐减了三分之二。据估计，当时法国的人口数量降幅惊人，达到了 42%。许多死者原本就异常容易受到感染，因为他们在“大饥荒”期间已经营养不良了。到了15世纪初，法国大约有3,000座村庄都被人们所遗弃。由英法“百年战争”引发的连年战乱，本已让粮食短缺的情况变得很严重，而作物歉收与潮湿的天气更是加剧了这个问题。人们的绝望情绪，在1420 年至 1439 年间集体陷入了低谷，当时北大西洋涛动处于高指数模式，带来了非比寻常的大暴雨。虽说要养活的人口少了许多，但粮食短缺与饥荒仍然存在，其中许多都是由连年的战争导致的。
反复暴发的瘟疫和时不时出现的饥荒，在数十年里一直对欧洲人口的增长产生遏制作用。多场粮食危机爆发的时间，都与斯堪的纳维亚半岛上空高气压导致的异常寒冷的冬天相吻合，特别是在15世纪30年代；当时出现了长达7年的漫长霜冻和猛烈的暴风雨，比斯开湾与北海海域尤其如此。1451 年黑死病结束之后，随着农民回到疫情期间废弃的土地上，粮食生产开始飙升。1453 年“百年战争”结束后，欧洲迎来了真正的复苏。气温逐渐升高，降雨日见充沛。70年之后，16世纪20年代的英国出现了5次异乎寻常的大丰收，这一局面直到 1527 年一场寒潮导致圣诞节期间小麦供应不足，并且有可能爆发针对富人的粮食骚乱才结束。尽管如此，以自给自足和作物多样化两种观念为基础的历史悠久的自给农业传统仍在继续。不过，这种暂时的缓解并没有持续多久。气候造成的凛冽之风，正在天边聚集。
“小冰期”（约公元1321年至19世纪晚期）
所谓的“小冰期”，是指“中世纪气候异常期”之后出现的一个“短暂”的显著降温期，但并不属于一段真正持久的冰期。弗朗索瓦·马泰是一位受人敬重的冰川学家，曾任职于美国地球物理学会冰川委员会；他在 1939 年首次使用了这个术语，如此写道：“我们正生活在一个重新开始但规模中等的冰川时期——一个‘小冰期’里。”[8] 马泰当时是用一种非正式的方式使用这个说法的，他甚至没有用大写字母进行突出显示，但这一术语如今已经成为一种公认的气候学标签了。
1939 年，“小冰期”还仅仅是一种观点。如今，研究人员却已积累了来自世界各地“小冰期”里的气候替代指标与历史记录，其中不但有欧洲和北美地区的，也有包括澳大利亚的大洋洲和日本等遥远之地的。比如说，日本对樱花盛开期的详尽记录可以追溯到600年之前，并且提供了充足的降温记录。最近进行的一次全球气温重建，利用了不少于73种不同的全球性气候替代指标，它们证明确实存在降温现象，尤其是公元1500年至1800年间。目前，“小冰期”十分突出，成了自公元前6000年以来最显著的一个气候异常期；当然，这并不包括当今人为造成的全球变暖。[9]
究竟是怎么回事呢？在公元 1250—1300 年到公元1850—1900 年的这段时间里，全世界的气温稍有下降；至于原因，我们却还不清楚。采自格陵兰岛、冰岛和拉布拉多周边的深海岩芯提供了确凿的证据，证明了北极海冰有随着气温突然下降而向南移动的趋势。例如，采自“东冰岛大陆架”且断代准确的高分辨率洋底岩芯中，记录了公元 1300 年之后一次持续了60年至80年左右的气温陡降，这就是北极冰层南移的结果。14世纪中叶有过一次短暂的升温期，14世纪末期再度出现了一次突如其来的降温。在另一个冰层较少南移的时期之后，从公元1500年至20世纪初，南移的冰层面积就普遍增加了。冰层的这些变化究竟是由火山喷发事件或者太阳变化造成的，还是由其他因素导致的，目前我们还不得而知。
“气温稍降”在很大程度上算是一种一般性的说法，因为降温趋势会随着时间和空间而变化。真正意义上的全球变冷始于公元1400 年前后，直到 1850 年左右才结束；当时，工业污染导致的温室气体抵消了长期的“轨道强迫”效应（也称“轨道驱动”，即地轴倾角以及它围绕太阳公转时轨道形状的缓慢变化带来的影响，其中可能涉及太阳能在纬度和季节方面的再分布）。
“小冰期”里的气候，并不是一成不变的。较短的强迫期（比如火山爆发或者太阳活动的变化）虽然只有暂时的影响，但确实也曾导致气候记录中出现突然而短暂的波动。其他的极端事件包括“大饥荒”这场灾难，以及特大干旱、异常寒冷的冬季和周期性的大风，还有一些对人类社会产生了深远影响的事件，其中包括瘟疫流行、作物歉收和禽畜周期性地大批死亡。这样的事件，既加剧了我们的短期脆弱性，也减缓了人类的顺应速度。
亲历者描述早期全球降温情况的史料非常罕见。1572年，荷兰豪达一座天主教修道院的院长沃特·雅各布森（Wouter　Jacobszoon）迁居到了阿姆斯特丹。此人写有一部日记，记录了当时普遍存在的暴力现象与天主教徒受到迫害的情况，其中也有对寒冷天气的牢骚之语。当时，阿姆斯特丹的人连谷物与鲱鱼之类的主食也买不起。降雪一直持续到了来年的4 月份。可天气如冬季一般，依然寒冷。1574年11月，一场暴风引发了洪水，冲垮了堤坝，将淹没的田野变成了冰雪覆盖的荒漠。在普鲁士，新教牧师丹尼尔·沙勒（Daniel　Schaller）竟然怀疑世界末日已经来临。“非但面包奇匮，吾等珍爱之玉米及谷物，价格亦昂贵至极……林中之木，长势不如既往……是故ruina mundi［世界之毁灭］将至。”[10]
雅各布森及其同侪曾一再祈求上帝施以援手，却无济于事。那些年间的树木年轮记录的确表明，树木的生长速度放缓了。自公元1510年以来，普鲁士发生了10次地震。虔诚的沙勒认为，地震预示着即将到来的“末日审判与末世之震，凡亡者皆醒，出其墓穴，领受基督之审判”。
不过，“末日审判”始终都没有降临。相反，气候变化仍在继续，而随着海洋温度下降，北海海域很快出现了大量的鲱鱼，让渔民颇感欣慰。但是，寒冷仍然持续不去。泰晤士河的伦敦段在公元1408年至1437年间出现过5次封冻，而在1565年至1695年间则封冻了12次。（泰晤士河上一次封冻是在1963 年，那是1814 年以来最寒冷的1月份。）这段时间，也就是泰晤士河上的“冰冻集市”蓬勃发展起来的时候。一些具有生意头脑的小商小贩甚至会在冰上烤全牛。冬季的气温不但下降了，而且变得非常极端，完全无法预测。根据气候替代指标重建出来的气温证明，在14世纪和从16世纪末到 19 世纪之间，罗讷河上的封冻期要比之前的各个时期多得多。
欧洲 16 世纪末的“小冰期”并不是一个令人觉得愉快的时期，因为当时社会普遍动荡不安，而社会动荡常常是由粮食价格上涨引发的。光是在英国，自威廉·莎士比亚出生的1564 年至 1660 年间，就爆发了70多起粮食骚乱。在之前的数个世纪里，英国的酒商一直都向法国出口葡萄酒，可他们的收成在寒冷面前却化为乌有。战争、时有发生的饥荒和严寒，影响了数百万欧洲人的生活。法国的损失尤其严重，这既是连年战乱所致，也有寒冷造成作物歉收的影响。在16世纪晚期，至少有400万人死于军事暴力、饥荒和流行性疾病。1590年，信奉新教的国王亨利四世率军围困了信奉天主教的巴黎。由于无法获得充足的大炮，故他决定用断粮的方式，迫使这座城市投降。寒冬对城中的粮食供应造成了严重的破坏；愤怒的暴民要求获得食物，但守军还是继续坚持着。街道两旁，全都是死去的人和极度饥饿、虚弱得无法动弹的民众。到1590 年 8 月信奉天主教的守军突围之时，已经有45,000 人饿死或者病死，这一数目占城中人口的五分之一。[11] 在此期间，英国与整个欧洲人口外迁的速度加快了，这可不是巧合。
波罗的海地区的粮食与荷兰的基础设施（公元16世纪及以后）变革即将发生。早在14、15世纪，佛兰德斯与尼德兰就率先出现了应对气候变化的创新之举。[12] 长期以来，波罗的海诸国与乌克兰都是欧洲大部分地区的粮仓，这里种植的粮食经由阿姆斯特丹外销，远至南方的意大利。17世纪初，从波罗的海诸国进口而来的粮食当中，75%的粮食都会抵达阿姆斯特丹，储存于一座座巨大的仓库中。在国内进行粮食生产，已经变得很不划算了。
为了应对这种情况，荷兰与佛兰德斯的农民都开始尝试种植牲畜饲料，并且种植牧草供牛吃。他们在以前闲置休耕的土地上种植豌豆、蚕豆和富氮的苜蓿。随着越来越多的闲置土地被开垦出来进行耕作，畜牧业也变得越来越重要。由于新的农业生产打破了人们对谷物的一味依赖，并且促生了一种新的国内贸易，因此粪肥、肉类、羊毛和皮革纷纷进入了市场。农民在以前种植谷物的地里种植苜蓿，而他们饲养的牛群则在主人重新种植谷物之前，在草地上吃草。这种自我延续的农业循环，大幅提高了土地的生产力，尤其是在作物中包括了芜菁或者用于酿造啤酒的啤酒花，还有像亚麻和芥菜之类的纯粹经济作物的时候。
波罗的海地区进行的贸易也不容易。冰雪是一个始终存在的难题，严冬之际尤其如此。1586年2月12日，正值天气严寒的隆冬时节，大风和滴水成冰的气温把 18 艘船困在了霍伦港外迅速扩张的冰层之中。城中居民用斧子破开冰层，费了九牛二虎之力，才把那些船只拖进港口。冬季的暴风雪甚至更加危险。1695年9月9日，接二连三的狂风吹沉了北海上的几十艘船只。大约有1,000名水手因此而丧生。到了夏季，荷兰的沿海地区则完全暴露在盛行的西风之下。在大风中，许多商船都在这个危险的下风岸搁了浅。
阿姆斯特丹的商贾在舒适的住所和仓库里，相当有效地解决了“小冰期”的冬季带来的各种挑战。不过，运送货物的水手却要历经各种艰难险阻，常常还会丢掉性命。诚如历史学家达戈马·德格罗所言：“许多荷兰人都适应并利用了不断变化的环境。他们也许并未意识到气候正在改变，但不管是有意还是无意，他们的应对方式都于他们的利益有所裨补，并且反过来造福于他们社会的利益。”[13] 尽管云谲波诡的战争和日益复杂的外交手段导致波罗的海诸国间的贸易关系变得更加棘手，这一切还是发生了。例如，在小麦供不应求的时候，人们开始广泛使用价格较为便宜的黑麦，尽管后者制作出来的面包不太受欢迎。结果，小麦和黑麦的价格都出现了波动。在粮食匮乏时，荷兰商贾非但根本没有被这些挑战吓倒，反而动用了阿姆斯特丹的大量存粮，高价出售谷物（尤其是黑麦），将粮食销往那些深陷作物歉收之困境、有可能爆发饥荒的南方地区。
荷兰人在生意上的适应能力，还不止于此。荷兰是一个由大大小小的水道、沟渠、河流、湖泊及近海航路构成的网络，此外还有陆路。荷兰多种多样和紧密相连的交通网络，使得这里比欧洲其他地方都更容易出行，只有在“格林德沃波动期”（1560—1620）出现最严重的暴风雪（气温更低）的时候与“蒙德极小期”除外。[14] 阿姆斯特丹和霍伦港还开发出了小型帆船的摆渡服务；它们都定时出发，前往不同的地方，无论空载还是满载，都是如此。这个“船渡”系统经营得红红火火，故16世纪时开始在沿海诸省得到广泛应用。两个世纪之后，阿姆斯特丹每周已有不少于800艘渡船出发驶往荷兰共和国境内的121个目的地了。虽然逆风和狂风有可能导致混乱，可这个系统运作得相当好。1595年，英国富翁法因斯·莫里森（Fynes Moryson）开始了前往耶路撒冷漫长旅程中的第一站：在“喧嚣狂暴”的大风中，从吕伐登前往格罗宁根。他们一行人乘坐的是一条私家渡船。受一股可怕却又有利的西风的推动，乘客们在“狂风大作”时失去了船舵，当时差点儿就沉了船。
各座城市的政府和商贾新建了一些带有纤道的运河，供马匹拉拽的驳船所用。当时逆风航行根本不成问题，人们可以用一种很悠闲的速度，每小时航行7千米，差不多2个小时之内就能从阿姆斯特丹坐船到达霍伦港，反之亦然。到17世纪中叶时，已有30多万名乘客乘坐过这种“拉拽渡船”，并且有头等舱与二等舱之分。儿童乘坐时，只需要半价。
当时包括奴隶在内的人，再加上基本的商品，甚至是干草、鱼和信件，都是通过农民和企业主的小型船只运送的。这种小船叫作schuiten，有些挂着船帆，最长可达10米；它们不仅在主要水道上来去，还在通往所有小社区的各种小运河与渠道中穿梭。天气较为暖和之时，这些渡船通常都能顺利航行。可到了寒冬腊月，冰雪与持续封冻则有可能阻断船渡交通达3个月之久，从而危及乳制品如牛奶的运输，这种商品主要就是用渡船进行运输的。就算是在那种时候，当地人也发挥出了聪明才智，让货物与人口继续流动，从而赋予荷兰共和国一种超过英、法等国的巨大优势；在英、法两国，兴建远离内河与海洋的基础设施是一种更大的挑战。
荷兰的国内交通网络为旅行者提供了一种灵活性与韧性，使得人们能够在“小冰期”气候迅速变化的情况下出行。狂风与冰雪，曾是人们在波罗的海与北海地区进行贸易的两大威胁。幸运的是，尽管粮食价格不断变化，荷兰人在饮食方面却具有多样性，故几乎没有出现过食物匮乏的情况。
多样化的农业经济，使得人们更加容易适应突如其来的短期气候变化；特别是，这里很容易获得波罗的海地区的粮食，而内陆水道则让粮食运输变得更加便捷，几乎可以运往任何一个地方。在人们大规模地开垦土地的同时，这些基础设施也得到了改善，故从16世纪至19世纪初，荷兰的农田面积扩大了差不多10万公顷，而其中大部分又是在1600年至1650年间开垦出来的。幸运的是，荷兰人拥有一种灵活的社会组织制度，在农民收入不断增加的过程中促进了小型农场的发展。与此同时，较年轻的家庭开始追求基本生活用品以外的东西。随着砖木结构的普及和像衣物、家具之类的消费品更易买到，人们的居住条件也大幅改善了。
由于能干和极具竞争精神，故在当时仍然以自给农业为主，且农耕方式数个世纪以来几乎没有什么变化的欧洲，荷兰与佛兰德斯的农民显得独一无二。他们的种种创新之举，逐渐普及开来。到了公元1600年，英国伦敦附近开始出现商品菜园，为城中的市场种植蔬菜。60年之后，荷兰移民又将抗寒的芜菁引入了土质较松的英格兰东部。绿色的芜菁嫩叶可以很好地替代干草。英国东部地势低洼的沼泽地带，长久以来都是牧民、渔民和捕鸟者的庇护所。荷兰出生的工程师兼海防专家科尼利厄斯·费尔默伊登则在 17 世纪开垦了那里的 15.5 万多公顷沼泽地，使之一跃进入英国产量最高的耕地之列。[15]
尝试种植新的作物，开始变成多样化生存的另一种策略。从美洲引入的玉米和土豆，成了两种常见的作物。土豆是在1570 年前后，由一个从南美洲回国的西班牙人引入欧洲的。起初，人们只是把土豆当成一种奇异的植物，甚至认为它是一种具有催情作用的药物；当时一位姓名不详的权威人士曾称，食用土豆会“激起爱欲”[16] 。这种外来的块茎类植物，非但产量比燕麦和其他作物高得多，而且还富含矿物质。它们先是被用作牲畜的饲料，在18、19世纪才变成了爱尔兰和欧洲各地的一种主食。新作物、具有创新性的农耕方法（包括广泛施肥）和改善排水，再加上圈地政策，让英国慢慢地摆脱了谷物种植的束缚。法国却要再过两个世纪的时间，才会摆脱那种束缚。与此同时，像烟草与巧克力之类的成瘾性产品，则变成了社会等级制度中的一部分。
肉类消费也急剧增长了。到18世纪时，英国人已经养成了大量食用牛肉、羊肉和猪肉并且乐此不疲的习惯。仅在1750 年一年，伦敦的屠夫就宰杀了至少 7.4 万头肥牛和 57万只绵羊。随着农作物产量的提高和饲料的丰富，畜群规模变得越来越大，牲畜因它们的肉、皮和副产品而受到了重视。畜牧业在 18 世纪变成了一门艺术，尤其是在罗伯特·贝克维尔的手中；此人是英格兰中部的一位农民，他饲养了许多拉车运货的马匹和肉质上好的牛群。此人最大的成功还在于养羊，特别是“新莱斯特羊”；这是一个成熟速度很快的品种，饲养两年就可以上市出售。[17]
太阳黑子、火山与罪孽（公元1450年及以后）
尽管农民和牧师们仍会想起一些将气候灾难与神之震怒联系起来的古老噩梦般的可怕场景，但17世纪至18世纪初也见证了一些重大的科学进步，并且其中很多都出现在天文学领域里。天文学家记录了金星和水星的凌日现象，还通过观察木星诸卫星的轨道，确定了光的速度。他们的一些研究，有助于我们理解宇宙对地球气候的影响方式。除了对太阳黑子进行探究，他们还研究了日食，发表了第一批详尽论述太阳本身的研究结果。
1711 年，针对1660年至1684年间太阳黑子活动处于低水平的现象，英国自然科学家威廉·德勒姆发表了评论。他声称：“彼时观日者咸以远镜窥之，并无休止，故黑子当无所遁形。”[18] 在1774年之前，人人都以为黑子是遮挡了太阳的云朵，所以直到19世纪，几乎都没有什么新的观测结果问世。如今我们知道，黑子其实是太阳磁场从其表面突起的地方。黑子活动差不多每隔11年就会出现一次盛衰，但不会直接对我们产生影响。有时，可能几天甚至是数周之内完全不出现太阳黑子活动。但在过去的两个世纪里，只有1810年全年都没有出现过黑子活动。以任何标准来衡量，“小冰期”内太阳黑子活动处于低水平的现象都是不同寻常的。这些黑子活动平静期是否导致了该时期的较低气温，我们仍不得而知；但是，它们在很大程度上与气候最寒冷的年份相一致。“小冰期”内有过3个极小期。第一个时间较长的寒冷阶段，出现在1450年至1530年间。这个阶段，与一个被称为“斯波勒极小期”（以一位德国天文学家的名字命名）的太阳黑子活动水平很低的时期相吻合。[19] “斯波勒极小期”各个年份都气候寒冷，但从16世纪60年代初持续到了1620年的第二个极小期，却要显著寒冷得多；这个时期以阿尔卑斯山上的一座小镇为名，被称为“格林德沃波动期”。在“格林德沃波动期”最寒冷的年份里，欧洲北部的作物生长季竟然短了多达6周。许多农民都不再种植小麦，转而开始种植更加耐寒的大麦、燕麦和黑麦。尽管如此，当时仍然出现了作物歉收，而那些贫瘠土地上的歉收现象尤其严重。“蒙德极小期”（1645—1715）是太阳黑子活动水平极低的一个时期，与欧洲和北美洲气温低于平均水平的那个时期相吻合。当时，泰晤士河的伦敦段与荷兰的运河全都封冻起来了。在“蒙德极小期”里，太阳辐射出来的紫外线较弱，使得平流层里的臭氧含量下降了。这种下降导致了“行星波”，从而让北大西洋涛动转向了负指数模式。在这种情况下，冬季的暴风雪往往更加寒冷，气温也更低，有限的历史资料已经证实了这一点。
太阳黑子活动并不是出现“小冰期”的原因。极有可能，火山活动是一个主要因素，因为寒冷会随着火山活动的增加而加剧。1600年2月19日，秘鲁南部的于埃纳普蒂纳火山爆发了；这是此前2 500年里规模最大的一次火山爆发，使得掩埋了庞贝古城的维苏威火山爆发，以及 19 世纪的坦博拉火山和喀拉喀托火山爆发都相形见绌（参见第十四章）。[20] 于埃纳普蒂纳火山爆发时，将 30 立方千米的火山灰与岩石喷射到了35千米高的大气当中。火山灰有如大雨一般，落到了面积达数百平方千米的地方。火山灰还覆盖了被火山包围的阿雷基帕。当地的学者费利佩·华曼·波马·德阿亚拉（Felipe Guáman Poma de Ayala）曾称，足足有一个月的时间，人们既看不到太阳和月亮，也看不到星星。1601年的夏季，成了整个北半球自公元 1400 年以来气温最低的一个夏季。冰岛当年夏天的阳光无比暗淡，地上连影子都照不出来。
太阳和月亮不过是两个“朦胧而微红”的幻影罢了。虽然17世纪至少还有 4 次火山爆发导致气温显著达到了寒冷峰值，但没有哪一次的后果像于埃纳普蒂纳火山爆发那么严重。
沙莫尼如今已是一个时尚的滑雪胜地，但在当时还是一个贫困的村庄，冰雪始终都在威胁着生长中的作物。从1628年至1630年，面对雪崩、洪水和不断推进的冰川，这个村庄失去了三分之一的土地。由于田地一年当中的大部分时间都被积雪覆盖，故三季收成当中只有一季达到了成熟。村民都深感绝望，便说服社区的头领们，向日内瓦主教汇报了他们的困境。他们将冰雪带来的种种威胁，以及他们认为自己正在因为罪孽而遭到惩罚的恐惧之情通通告知了主教。主教便率领一支由300人组成的队伍，来到了4个被冰川围困的村庄里。他一遍又一遍地祷告，并且为冰原祈福。幸运的是，他的祈福似乎起到了作用，冰雪慢慢地消退了。可不幸的是，刚刚从冰川之下现出身来的土地却太过贫瘠，不适合耕作。而且，冰川的消退也不是永久性的活动。每当冰川再次进逼，沙莫尼和其他地方重新开始的虔诚祈祷，就会上达天听。在1850 年左右冰川开始消退之前，高山冰川的规模比如今要大得多。
与此同时，由于作物持续歉收，葡萄酒的价格不断上涨，粮食价格也上涨了。作物歉收、饥荒以及由此导致的疾病，便引发了面包骚乱和社会动荡。一如数个世纪以来的历史，教士们纷纷宣称，持久的恶劣天气是上帝对罪孽深重的人类感到震怒的结果。在1587年和1588年的寒冷岁月里，一场歇斯底里的指控狂潮爆发了。邻居们之间相互指控对方使用巫术。1563 年，德国维森施泰格市政当局就将不少于63名被人指控使用巫术的女性判处了火刑。[21] 直到科学家开始对气候事件做出自然的解释，巫术才逐渐淡出了人们的视野。在此之前，上帝和种种超自然力量都很容易被人们当成这一切的始作俑者。
大洋彼岸（公元17世纪以后）
尽管为了应对气候条件的挑战，农场与住宅都发生了革命性的变化，但其中有些最彻底的变革，却发生在远距离的海上贸易领域。虽然葡萄牙人与西班牙人在历史上处于领先地位，可令人惊讶的是，此时的荷兰人在一个暴风雨强度日益增加的时期顶替了他们。[22] 在“格林德沃波动期”里，佛兰德斯地区出现猛烈暴风雨的次数达到了以前的4倍。最显著的是，风向与风速都出现了重大的变化，导致了一些很有意思的结果。
“格林德沃波动期”内寒冷天气的日益加剧，对荷兰水手以及商贾雄心勃勃地要开辟一条穿越欧洲北部的北极航线的尝试构成了障碍。当时的冰天雪地令人望而生畏，走这条航线的成本也过于高昂，对长途贸易来说并不划算。于是，他们便把注意力转向了一些小型的公司，这些公司曾经对一条经由好望角前往亚洲的南部航线进行过投资。对于这些小企业而言，前往东南亚的航程既危险又漫长，其中的风险也是难以接受的。因此，1602年，荷兰国会便将这些公司联合起来，组建了荷兰东印度公司（荷兰语为 Vereenogde Oostindische Comagnie，因此略作VOC）。这家公司实际上是一个企业集团，通过用印度和东南亚出产的香料与纺织品交易贵金属而迅速蓬勃发展起来。荷兰东印度公司由“17人董事会”（Heren XVII，意即“17贵族”）掌管着，而公司的最终目标为削弱其竞争对手西班牙的商业实力。1619 年，荷兰东印度公司驻亚洲总督扬·彼得松·科恩（Jan　Pieterszoon Coen）占领了东南亚的巴达维亚（即如今的雅加达）；后来，这里变成了荷兰企业在该地区的中心。荷兰东印度公司变成了一个庞大的企业，有3万多名员工，此外还有来自非洲的大量劳工，像奴隶一样遭到公司剥削。荷兰人很快就掌控了欧、亚两洲和亚洲诸港之间的贸易，时间长达数代之久。
荷兰东印度公司凭借东印度商船组成的船队，以将风险降至最低程度的规模进行远洋航行。这在很大程度上依赖于公司在海况方面积累起来的经验，尤其是对盛行的洋流与信风的了解。通常来说，这些洋流与信风在北半球是来自东北方向，在南半球则是来自东南方向。起初，公司的船长们尝试了不同的航线，但“17人董事会”制定了标准化的航程安排：穿过英吉利海峡，然后往南到达好望角，再从那里往东到达澳大利亚沿海，最终向北前往东南亚。每年都有两支船队起航：一支是冬季的“圣诞船队”，另一支则是春季的“复活节船队”。从巴达维亚返回的航程，则是11月至次年1 月间起航，并于次年的11月抵达荷兰共和国。
以任何标准来衡量，荷兰东印度公司的航海活动都是很危险的，尤其是在“小冰期”气候最寒冷、时常狂风大作的那几十年里。任何一条船失事都是一场灾难，因为每艘船上都满满当当，全是人员和贵重的货物。在极其寒冷的数十年里，由天气原因导致的沉船事故当中，有一半以上都发生在北海海域。
荷兰东印度公司船只的航海日志是一个宝库，让我们对年复一年的气候变化影响航海的情况有了新的认识。在“蒙德极小期”里，低指数模式的北大西洋涛动和西伯利亚高压（东方一种持久存在的高压）加剧了大西洋东北部盛行的东风，而那里通常是整个航程中速度最慢的地方。热带辐合带也已南移，使得船队在途中的港口停靠变得很不划算。与此同时，1640年后在加勒比海南部涌流的驱动下，信风强度不断提升，加快了荷兰东印度公司的船只横跨大西洋的速度。“小冰期”缩短了前往东南亚的航程，提高了利润；夏季风的强度虽然较弱，但船只若是及时抵达，它们就能够在整个东南亚地区进行贸易。
荷兰商人及其手下的海员可能较为有效地应对了“小冰期”里天气寒冷的数十年，因为该国沿海各地都从全球远洋贸易中获取了巨大的利益。到了17世纪晚期，由于斯堪的纳维亚人、法国人和英国人的小型船舶速度变得更快，运载的也是其他一些利润更高、供精英阶层所用的商品，比如咖啡与茶叶，所以荷兰东印度公司的影响就逐渐衰落下去了。从气候方面来看，“蒙德极小期”的衰退增强了大西洋东北部的西风，从而减缓了出港船舶的速度。
荷兰共和国拥有一种独特的政治结构形式，主要由城市商人委员会实施管理。这些人当中，有野心勃勃的企业家和创新者，也有对非洲原住民进行残酷剥削的人；如今，不但荷兰人承认了这些剥削者的存在，事实上西方的其他大多数殖民国家也承认了这一点。他们利用由此攫取的财富，改进了土地开垦和造船技术，甚至是消防方面的技术。快速发展起来的阿姆斯特丹，变成了欧洲的商业和金融中心，以及一个以商业效率而著称的国际性的进出口中心。最重要的是，荷兰人还成功地适应了气候异常寒冷所带来的种种挑战，并且充分利用了各种独特的机会。
最终，不管是身为工程师、农民、水手，还是农场里的劳力，荷兰人都非但逐渐习惯了持续不断的气候变迁，还设计出了许多巧妙的方法来规划航线，克服了数十年常见的酷寒和各种变幻莫测的自然挑战；这一切，都是人类的奴隶付出了无数努力，辛勤劳作才促成的。我们可以称这种资本主义为有助于解决环境挑战的企业资本主义。但到了最后，正如我们将在第十四章中看到的那样，1815年一场巨大的火山喷发让每个人所处的局面都彻底发生了逆转。
这些事件，都是在基督教教义对人们思考自然、环境以及人类起源等方面维持着一种宗教束缚的数个世纪里发生的。亚伯拉罕宗教的教义宣称，《创世记》中上帝创造世界与人类的故事属于历史事实。身为阿马大主教的厄谢尔，曾经利用《圣经》中的谱系计算出，上帝是在公元前4004年10月 22 日创造出地球和人类的。厄谢尔是一位令人敬畏的学者，他发表这一研究结果的时候，正值各个领域里都出现了重大科学进步的几十年，从天文学、生物学、数学、医学到植物分类，不一而足。科学在田野上、实验室里和书房中蓬勃发展起来了。农业多样化和动物选育开始盛行起来；理性的论争与对话，则与宗教意识形态展开了竞争。
在“小冰期”里，认为气候变化是上帝对人类罪孽感到震怒导致的结果这种长久存在的、想当然的观点，在一个理性对话与仔细观察促进了各种科学探究的时代中逐渐消失了。这是古代与当代气候研究中的一个重大转折点；此后，科学便逐渐登上了气候条件预测研究的中心舞台。除了少数阴谋论者和宗教信徒，将科学与其他解释对立起来的论争早已结束。古气候学在很大程度上属于20世纪和21世纪的一门科学，它彻底改变了我们对全球气候的认知。不过，与世俗和宗教推测相对立的科学，其主导地位却是在“小冰期”气候最寒冷的那个时期开始形成的，对当今和未来的世界都具有根本性意义。
[1] 低地国家（Low Countries），对欧洲西北沿海地区的称呼，广义上包括荷兰、比利时、卢森堡以及法国北部与德国西部，狭义上则仅指荷兰、比利时、卢森堡，因地势和平均海拔较低而得此名。——译者注
[2] 中殿（nave），欧洲基督教传统教堂的一个重要组成部分，是举行礼拜活动时容纳信徒的场所，亦译“中厅”。——译者注
[3] 九柱戏（ninepins），现代保龄球运动的前身，发源于德国，起初是教会的一种宗教仪式（人们在教堂的走廊里放置9根象征着叛教者与邪恶的柱子，然后用一个球滚地击打它们，叫作打击“魔鬼”），后来逐渐发展成了贵族之间盛行的一种高雅游戏。——译者注
[4] Hubert Lamb and Knud Frydendahl, Historic Storms of the
North Sea, British Isles, and Northwestern Europe (Cambridge:
Cambridge University Press, 1991)，这是一项出色的研究，说明了“大曼德雷克”和其他风暴背后的气象状态。引自第93页。
[5] Ole J. Benedictow, The Black Death, 1346–1353: The Complete History (Woodbridge, UK: Boydell & Brewer, 2006).
[6] M. Harbeck et al., “Distinct Clones of Yersinia pestis
Caused the Black Death,” PLOS Pathology 9, no. 5 (2013):
c1003349.
[7] Boris V. Schmid et al., “Climate-Driven Introduction of
the Black Death and Successive Plague Reintroductions into
Europe,” Proceedings of the National Academy of Sciences
112, no. 10 (2015): 3020–3025.
[8] Fran.ois Matthes, “Report of Committee on Glaciers,” Transactions of the American Geophysical Union 20 (1939): 518–523.
[9] 近年来，环境史学家对“小冰期”极其关注，故如今有丰富的历
史资料，其中大部分都集中于16世纪与17世纪。我们尤其推荐这两
部著作：Philipp Blom, Nature’s Mutiny: How the Little Ice
Age of the Long Seventeenth Century Transformed the West and
Shaped the Present (New York: W. W. Norton, 2020)，以及 Dagmar
Degroot, The Frigid Golden Age: Climate Change, the Little
Ice Age, and the Dutch Republic, 1560–1720 (Cambridge: Cambridge University Press, 2018)。亦请参见Geoffrey Parker, Global Crisis: War, Climate Change and Catastrophe in the Seventeenth Century (New Haven, CT: Yale University Press, 2013)。至于冰雪频现和“小冰期”的开始，请参见Martin M. Miles et al., “Evidence for Extreme Export of Arctic Sea Ice
Leading the Abrupt Onset of the Little Ice Age,” Science
Advances 6, no. 38 (2020). doi.10.1126/sciadv.aba4320。
[10] 沙勒的话转引自Blom, Nature’s Mutiny, 30–31。
[11] 描述来自Blom, Nature’s Mutiny, 39–40。
[12] Dagmar Degroot, The Frigid Golden Age，是这一节参考的权
威资料。
[13] Dagmar Degroot, The Frigid Golden Age, 130.
[14] 相关论述见Dagmar Degroot, The Frigid Golden Age, 130
149。
[15] 荷兰工程师科尼利厄斯·费尔默伊登（Cornelius Vermuyden，
1595—1677）曾在英格兰的数个地区兴建排水工程，其中还包括英格
兰东部的沼泽。在人们开始使用蒸汽泵之前，他的努力只取得了一定
程度的成功。
[16] 原文为“incites to Venus”。维纳斯（Venus）为古罗马神话中十二主神之一，是爱与美的女神。——译者注
[17] 罗伯特·贝克维尔（Robert Bakewell，1725—1795）是一位农
学家，长于畜牧，尤其是绵羊的畜牧。他曾给牧场施肥，以改良牧草。
他饲养的绵羊盛产羊毛，被出口到远至澳大利亚和新西兰这样的地方，
同时他也是第一个饲养牛来获得牛肉的人，这种牛的体重在18世纪
翻了一倍多。
[18] 威廉·德勒姆（William Derham，1657—1735）曾是距伦敦不
远的阿普敏斯特的教区牧师。此人酷爱数学、哲学和科学，发明了最
早的以合理方式精准测量声速的办法。引自“Observations upon
the Spots That Have Been upon the Sun, from the Year 1703 to
1711. with a Letter of Mr. Crabtrie, in the Year 1640. upon
the Same Subject. by the Reverend Mr William Derham, F. R.
S,” Philosophical Transactions of the Royal Society 27
(1711): 270。
[19] 供普通读者阅读的关于太阳活动极小期的概述，请参见Dagmar
Degroot, The Frigid Golden Age, 30–49。
[20] J.-C. Thouret et al., “Reconstruction of the AD 1600
Huaynaputina Eruption Based on the Correlation of Geological
Evidence with Early Spanish Chronicles,” Journal of
Vulcanology and Geothermal Research 115, nos. 3–4 (2002): 529–570.
[21] Gary K. Waite, Eradicating the Devil’s Minions: Anabaptists and Witches in Reformation Europe, 1525–1600 (Toronto: University of Toronto Press, 2007).
[22] 本节主要参考了Degroot, The Frigid Golden Age, chaps. 2 and 3。关于荷兰东印度公司的部分见该书第81页至第108页。
第十四章可怕的火山喷发（公元1808年至1988年）
哥伦比亚天文学家弗朗西斯科·何塞·德卡尔达斯感到十分困惑。他从1808年12月11日就开始观察到，平流层里有一层持久存在的“透明之云，翳金乌之辉”。他的观察结果进一步指出：“［日之］自然赤色已转银白，至众人皆误以为月。”[1] 秘鲁利马的一位外科医生也注意到，日落时分的晚霞异于寻常。这两位目击者的描述，是唯一记录了一场大规模火山喷发的第一手资料；那场火山爆发很可能发生在东南亚，对全球广大地区的气温都产生了影响。唯一的另一项记录，则是坦博拉火山大爆发5年之前，南极冰芯中的硫酸盐含量达到了一个峰值；坦博拉火山也位于东南亚，于1815 年喷发。
神秘莫测的火山喷发，并不是只有一次。从 1808 年至1835 年间，全球至少出现过5场重大的热带火山喷发；在那几十年里，4月至9月间的气温与随后气温较高的30年相比低了0.65℃左右。[2] 这种显著的降温，很可能与猛烈的火山活动有关。高山冰川的面积不断扩大。这些火山活动导致的气温变化，减少了印度、澳大利亚和非洲的季风活动，带来了干旱，并在尼罗河的低泛滥水位和东非地区的低湖泊水位中体现出来。火山爆发之后，大西洋—欧洲气旋的路径便南移了，而这种南移，与非洲季风活动的强度降低之间具有关联性。
火山活动就是“小冰期”的最后阶段以广泛的气候波动而引人关注的一个原因；这些气候波动，持续了十年或者数十年之久。火山活动消停之后气温又快速上升，反映出全球气候系统在经历了一系列罕见的火山爆发，或许还有与“工业革命”初期有关的某种有限的人为变暖之后的恢复情况。但从18世纪末和19世纪初以来，随着“小冰期”为长期的变暖所取代，人类导致的温室气体增加就在长期性的气候趋势中占据了首要地位。
火山爆发频繁的那些年，也是社会和政治动荡不安的时期。火山及其原生熔岩流与灾难性的爆炸，成了时髦的奇观。意大利维苏威火山喷发后形成的火山口不但成了一处旅游胜地，还是当时“壮游”[3] 中的一个亮点。一些不那么富有的寻欢作乐者，则可以在伦敦的休闲公园与剧院里一睹壮观的火山爆发场景。“维苏威火山大爆发，喷出滚滚烈焰”（The Eruption of Vesuvius Vomiting Forth Torrents of Fire）这样的标题，就有可能让一家报纸在竞争激烈的广告行业中大获成功。
失控的火山爆发（公元1815年）[4]
与东南亚太平洋“火山圈”发生的大规模火山爆发相比，维苏威火山喷发只能算是小打小闹，且过去与现在都是如此。取自北极与南极地区的冰芯表明，1808年西南太平洋地区曾经出现过一场大规模的火山喷发（至于具体日期，仍然有待确定），是15世纪初以来规模位列第三的一次大喷发，其规模仅次于坦博拉火山爆发（参见下文所述）和1458年西南太平洋地区瓦努阿图岛上的库维火山喷发。1808年的火山爆发导致遥远的英国都降了温；那一年的整个春季，苏格兰低地山丘上的积雪都久久未化。在英国南部的曼彻斯特，5 月清晨的气温竟然到了冰点以下。1810年的夏季，接连数周之内的天气都是阴云密布。
一次大规模的火山爆发，对全球气温的影响会持续一两年的时间；这一点，与一系列火山爆发（其中也包括1808年的那一次）造成的影响大不相同。1815年东南亚松巴哇岛上的坦博拉火山爆发之前，全球气温已经因为 1808 年那场火山喷发而下降了；坦博拉火山爆发，是现代最猛烈的一桩火山事件。坦博拉火山长期处于休眠状态，但如今我们得知，它在 77,000 年以前曾经喷发过，对亚洲以外的遥远地区也产生了影响。1815年那场灾难与之前相隔久远的历次喷发一样，是一桩真正的全球性事件。
隆隆作响了数个星期之后，1815年4月5日晚，坦博拉火山开始喷发了。在3个小时的时间里，山上不断喷出巨大的火苗和一团团火山灰云。5 天之后，火山爆发，炽热的熔岩从山坡上倾泻而下，发出耀眼的光芒。有多达1万人因困于火焰、火山灰和熔岩中而死去。两三天之后，坦博拉火山坍塌下去，形成了一个宽达6千米的火山口，原来的顶峰则不见了踪影。此山的高度在爆发中减少了1 500米，而其爆炸之声，数百千米以外亦可听到。船舶上积满了厚度1米多的火山灰。云层之中尽是灰烬，遮天蔽日，将白昼变成了黑夜。火山爆发引起的海啸对沿海地区造成了严重的破坏，导致了大量的人员伤亡。喷发造成的一座座浮石岛屿向西最远漂到了印度洋中部。在方圆600千米的范围内，整整两天都是天色昏暗，有如黑夜。整个地区都变得难以辨认，田地尽毁。随着这场灾难的影响不断加剧，有数以千计的人都死于饥饿。松巴哇岛上的森林尽数被毁，此后也一直没有完全恢复原貌。如今，人们对那场火山爆发的情景仍然记忆犹新。
当地人还把 1815 年 4 月坦博拉火山爆发的那段时间称为“灰雨时期”，这是有充分理由的。[5] 从全球范围来看，坦博拉火山爆发造成的环境影响与社会影响，一直持续到了遥远的将来。此山喷出的火山灰量，达到了1980年美国华盛顿州圣海伦斯火山喷发的100倍。1883年的喀拉喀托火山爆发同样位于东南亚，它是人们系统地加以研究的第一场大规模爆发，使得直射到地球上的阳光量减少了15%至20%。
这场火山喷发之后不久，火山灰便开始在平流层里肆意飘散起来。巨大的火山灰云加上其中的硫酸盐气体，形成了气溶胶；由于气溶胶的密度变得很大，足以将太阳能反射回太空，故平流层的温度升高，地表温度却下降了。陆地、海洋与天空之间的热同步遭到了破坏，季风以及原本长达3个月的季风降雨也遭到了削弱。1816年，南亚的广大地区并没有出现倾盆而下的季风雨，反而遭遇了干旱。气温的波动非常剧烈，储水罐里的饮用水见了底，庄稼也无法再播种，免得播下去之后枯死。降水不足严重地抑制了树木的生长。1816 年9月大气状况恢复过来之后，季风却一反常态地猛烈袭来，造成了大范围的洪涝灾害。
在地球的另一端，坦博拉火山爆发则导致了欧洲1816年的阴冷天气。那一年的冬天十分寒冷，暴风雪无比猛烈；随着那一年过去，形势也没有出现任何好转。事实上，1816年还被人们称为“无夏之年”；这种叫法虽然恰如其分，但它掩盖了此次事件的规模：这是一次全球性气候异常现象，而不是一桩孤立的气候事件。
那个不同寻常的夏季里，英国诗人珀西·比希·雪莱曾经携其第二任妻子玛丽，在诗人拜伦勋爵的陪同下去瑞士度假，并且在“猛烈至极的狂风暴雨”中攀登过阿尔卑斯山。当时，这对夫妇和当地人都抱怨天气寒冷，降雨几乎连绵不断，狂风与雷暴把他们困在屋子里。那是自1753年有记载以来，日内瓦最寒冷的一个冬天，4月至9月间下了130天的雨，7 月甚至下过雪。为天气所困的玛丽，写下了她那篇标志性的恐怖小说，讲述了一位名叫“弗兰肯斯坦”的年轻科学家的故事；如今，弗兰肯斯坦已经成了文学作品当中一个不朽的角色。[6] 拜伦则创作了一首题为《黑暗》（“Darkness”）的诗歌，描述了极其寒冷的一天，那天寒冷到小鸟在中午就回巢栖息。在那可怕的一年里，人们连牲畜的草料也买不起，所以马匹要么死去，要么被宰杀吃掉。在边境另一侧的巴登，这种情况还激发了德国发明家卡尔·弗赖尔·冯·德莱斯的灵感，使之发明了“跑步机”，后来则称为“脚踏车”，用以取代马匹。不过，他的这种脚踏机器（即自行车的前身）因危及行人的安全，故被当局禁止使用，连印度车水马龙的加尔各答也是如此。[7]
整个生长季里的异常低温不但毁掉了牲畜的草料，而且毁掉了所有的庄稼收成。英国的小麦达到了1816年至1857年间的最低产量，当时食物支出占到了一个家庭预算的三分之二。[8] 法国的作物收成只有正常情况下的一半，部分原因就在于大范围的洪水泛滥和雷暴、冰雹。当年的葡萄收获始于10 月19 日，是多年以来最晚的一次。粮食价格上涨了，但幸运的是，以前收成中余下了大量储备，让粮食暂时保持着合理的低价。由于交通运输条件有了一定程度的改善，加上粮食进口，故当时出现的仅仅是粮食短缺，而不是一场普遍的饥荒。尽管如此，德国还是陷入了一场全面的粮食危机，而苏黎世的大街小巷里也挤满了乞丐。社会动荡、粮食骚乱和暴力事件在欧洲各地频频爆发，而当时的欧洲仍未从拿破仑战争的浩劫当中恢复过来。
制造业与贸易停滞、普遍失业和英国经济快速工业化所带来的压力造成了大范围的骚乱，但它们都被国民卫队镇压下去了。爱尔兰刚刚开始依赖从南美洲引入的那种不耐霜冻和潮湿的主要作物，即土豆，由于救济工作做得不足而陷入了大范围的饥荒之中。[9] 这场生存危机导致欧洲各地出现了大规模的移民现象，成千上万饥肠辘辘的穷苦百姓沿着莱茵河而下，前往荷兰，寻找去往美洲的途径。有2万多名穷困潦倒的莱茵兰人[10] 移民到了北美洲，以逃避在高度分散和作物歉收风险越来越高的土地上从事自给农业的悲惨命运；至于迁往美洲的英国人和爱尔兰人之多，就更不用说了。
乱局（公元1815年至1832年）
暴风雨天气一直持续到了第二年。到了1817年，孟加拉湾的水环境产生了刺激作用，导致潜伏在干旱地区水域中的霍乱细菌出现了基因突变。坦博拉火山爆发导致的异常旱涝灾害，诱发了一场全球性的霍乱疫情，令印度人和欧洲人都大量死亡。（据估计，光是爪哇岛一地就死了12.5万人，比死于火山喷发中的人还要多。）国界在霍乱面前形同虚设，疫情势不可当地蔓延着。霍乱在1822年传到了波斯，1829年传到了莫斯科，1830年传到了巴黎，1年之后又传到了伦敦，并在 1832 年蔓延到了北美洲。疫情对历史的长期影响是巨大的。霍乱让这个刚刚连通起来的世界面临着瘟疫带来的种种危险，并且让拥挤不堪、穷困潦倒的贫民窟里疾病肆虐，导致了种种社会不平等现象。[11] 坦博拉火山爆发造成的气候影响，为一场破坏力堪比黑死病的瘟疫奠定了基础。
坦博拉火山爆发之后的 1816 年夏季，中国上空曾经呈现出瑰丽的色彩。目击者阿裨尔（Clarke Abel）如此描述：“粉色斑斓，层层叠叠……骤升于天际。”诚如环境专家吉伦·达西·伍德恰如其分地指出的那样：“我们完全可以这样来形容坦博拉的火山灰尘：它是一种迷人的致命之物，对各国而言是伪装成壮观日落的悲剧。”[12] 由此带来的影响可谓立竿见影：华东地区的气温达到了历史最低，作物则基本歉收。在中国西北地区的陕西省，作物严重歉收令成千上万的民众到其他省份逃荒；他们的反应，与欧洲人无异。但受灾最严重的地方还是西南部的云南省，这是一个山区省份，与东南亚的贸易网络之间联系紧密。云南的群山之间，坐落着一处处土地肥沃的河谷，故长期以来都是一个种植水稻和小麦的粮仓。该省的气候温和、宜人，猛烈的印度季风和东亚季风都无法为害。18世纪末和19世纪初云南的农业集约化使得当地人口猛涨数倍，从1750年的300万增加到了1820年的2,000万。
1815 年的云南既无春季，也无夏季，因为坦博拉火山爆发之后刚过了一个月，那里的天气就开始寒冷起来。多云多雨的天气毁掉了冬季作物；8 月份的霜冻则冻坏了稻田，让水稻也颗粒无收。由于寒冷的北风导致作物收成减少了三分之二，甚至可能更多，所以从1815年至1818年，这里就陷入了一场可怕的饥荒之中。气温比平均水平低了 3℃左右。这种温差看似很小，但别忘了：气温每下降 1℃，作物的生长季就会缩短3个星期。不幸的是，1814年的一场旱灾已经让云南的粮食储备消耗一空，因此这里出现了大范围的饥荒。1816 年，这里不但下了雪，还再次出现了一场由寒冷气温和史无前例的冰雾导致的水稻歉收。这场饥荒，直到1818年大气条件恢复正常之后才得以缓解。
到了 1817 年初，清朝中央政府对这种紧急情况充分警觉起来，于是各级官吏开始从官方粮仓中拨出免费粮食来赈灾。这种做法并不新鲜，因为中国的官吏一直都仔细地监测着粮食的价格与分配情况，已有数个世纪之久。他们在收获季节征收粮食，然后到了冬季和春季，随着当地粮食供应减少和价格上涨，他们又会分发粮食。据本地官吏称，当时云南储存的粮食足够该省的每个成年男子吃上一个月之久。不过，由于政府多年来对粮仓疏于管理，故这个系统很快就分崩离析，而民众也陷入了饥荒之中。于是，他们转而开始种植经济作物。云南的罂粟种植面积激增，从而催生出了利润丰厚的鸦片贸易。一个世纪之后，云南的粮食几乎就全靠从东南亚进口了。鸦片贸易在18世纪和19世纪发展起来，以英国为主的西方国家纷纷把印度种植的鸦片出口和销售给中国；中国国内也种有鸦片。然后，英国人再用鸦片销售的利润购买中国的奢侈商品，比如瓷器、丝绸和茶叶，因为西方国家对这些商品的需求量都很大。
美洲的退化？（公元1816年至1820年）
在西半球，“无夏之年”不但已经变成了一个历史传说，也是数个世代以来北美洲历史上被人们撰文论述得最多的一桩气候事件。当时许多人都称之为“19世纪的冻死之年”（Eighteenth Hundred-and-Froze-to Death）。 1816 年 5 月初，美国华盛顿特区的上空中出现了尘埃云。同样是在5月初，格陵兰岛东部上空形成了一个强大的高压系统，引导着北极地区的大气南移，且那一年的隆冬时节也是如此。由于有一个巨大的低压槽驻留在北美洲的五大湖区上空，故冷空气涌入了新英格兰地区之后，那里的气温就大幅下降了。5月中旬的一场黑霜，毁掉了刚刚种植的作物；当时还出现了一股寒潮，给整个美国东北部带来了厚达三分之一米的降雪。寒冷刺骨的气温笼罩着整个东部地区，向南远至弗吉尼亚的里士满，西至俄亥俄州的辛辛那提。6月、7月下旬和8月接着出现了霜冻；历史记载中，只有这一年出现过此种情况。在康涅狄格州的纽黑文，作物的生长季缩短到了只有70天；干草十分紧缺，牛群则变得饥肠辘辘。[13]
干旱天气加上异常寒冷，一直持续到了1817年；当时，业已退休的美国总统托马斯·杰斐逊曾称，他家的大部分庄稼都出现了歉收。3 年之后，他就面临破产了，因为作物歉收让他进一步陷入了债台高筑的困境。杰斐逊向来希望美国成为一个农业大国，可此时他的这个梦想似乎受到了威胁。法国著名的科学家布丰伯爵曾因很少提及上帝在气候与自然中的作用而遭到过神职人员的批评，可正是此人声称，北美洲的持久寒冷不可能让作物和小型物种以外的任何动物存活。这是一种古老的观点，认为纬度决定了气候，以至于当时还有人说，欧洲殖民者在这片被布丰伯爵称为“十足沙漠”的土地上“退化”了。
布丰伯爵的理论当然属于无稽之谈，只不过在广大听众当中一直都很受欢迎。就连玛丽·雪莱也曾提到，弗兰肯斯坦的怪物就是在“退化”的美洲想要逃离文明的。对于造访欧洲的美国人来说，天气变成了一个敏感的话题。18世纪80年代初担任美国驻巴黎大使期间，杰斐逊曾是祖国的积极辩护者。他那部具有里程碑意义的作品《弗吉尼亚纪事》（Notes on the State of Virginia ）对布丰伯爵的种种假说发起了一次正面进攻。他以业已灭绝的猛犸的硕大体形和“精神之充沛及活力与吾等无二”的美洲原住民为例，既为祖国的民众辩护，也为祖国的动物辩护。至于美国的西部，则是一幅健康与幸福的景象。[14] 对于美国，杰斐逊心怀一种充满激情的帝国愿景。他曾与布丰伯爵共进晚餐。两人用一种极其文明的方式，一致同意求同存异。
与17 世纪一样，19世纪早期许多论述美国的作品中充斥着的气候乐观主义，在创纪录的寒冷面前并未保持下去；那种寒冷首先是由 1808 年的火山喷发引起的，这次喷发导致纽黑文的气温远远降到了平均水平以下。接下来是坦博拉火山的爆发，它主要影响的是美国的东部沿海地区，而在像俄亥俄州这样位于其西部的地区，当年的庄稼还获得了丰收。不过，坦博拉火山事件带来的严寒，让美国的经济陷入了一场从1819年持续到1822年的萧条之中。许多人为了逃离经济萧条而迁往西部，从而形成了美洲历史上第一次为气候所驱动的大规模移民，可他们最终却沦为了土地投机商的牺牲品，只能任其摆布。除了这些移民，还有成千上万为逃离欧洲的恶劣条件而来的移民，所以这里不可避免地出现了地产泡沫和信贷危机。随着欧洲的农作物产量在 1820 年之后大幅增加，美国棉花与小麦的价格也急剧下跌了。到了此时，金融恐慌已经导致300多家银行在一夜之间倒闭。总而言之，坦博拉火山爆发不仅导致美国商品的欧洲市场崩了盘，而且削弱了金融系统和美国经济的方方面面，在美国人口还只有区区1,000 万的一个时期，导致了可能在 19 世纪最具破坏性的一场经济危机。
以煤驱寒（公元1850年及以后）
“小冰期”是什么时候结束的呢？长期以来，传统观点一直认为是在 1850 年左右，认为其结束与工业活动日益加剧导致的持续变暖有关。然而，据取自瑞士阿尔卑斯山上的冰芯来看，情况却并没有这么简单。
在19 世纪中叶的冰川最盛期，全球大约有4,000 座大小不一的高山冰川，它们延伸的距离差不多是如今的 2 倍。接下来，它们在1865年前后开始消退。科学家长久以来都认为，是气温上升和降雨减少导致了冰川的快速消退，从而标志着“小冰期”的结束。但最终证明，这种假设是错误的，因为冰川消退的时候，当地的气温比18世纪末期和19世纪初期更低。降雨量显然也没有发生变化。所以，还有某种强迫机制在发挥作用，导致了冰川的神秘消退。
人们在海拔大约4,000米的地方钻取的高海拔冰芯表明，当时的炭黑排放量及含碳气溶胶都急剧增加了；这种情况，在一定程度上是由化石燃料的不完全燃烧和其他的人类活动导致的。[15] 这两种物质，随着工业革命的发展而进入了大气当中；工业革命 18 世纪中叶始于英国，然后在接下来的100 年里蔓延到了法国、德国和西欧的大多数国家。1850年以后，炭黑的排放量急剧上升。冰川研究人员将当时冰川上的炭黑能量效应进行转换之后发现，炭黑的融化效应导致了冰川消退，而没有导致气温出现剧烈的变化。由于阿尔卑斯山脉周边地区都在大力进行工业化，故此地冰川中的炭黑含量在1850年至1870年间迅速攀升，此后则稳步增长，一直持续到进入20世纪后的很长一段时间。
为了取暖和工业用途而进行的煤炭燃烧，是造成污染的一个重要原因；同时，阿尔卑斯地区旅游交通的增长，也是如此。阿尔卑斯诸谷中的空气中弥漫着乌黑的烟尘，所以19世纪那里的家庭主妇从来就没有在户外晾晒过衣物。
对于阿尔卑斯山脉上的冰川，人们的了解超过对世界上其他任何地方的冰川；因此，若是想当然地认为阿尔卑斯山地区“小冰期”的结束与其他地方的冰川消退时间相一致，那就错了。并不是所有的冰川都在19世纪60年代同时开始消退。早在1740年，玻利维亚安第斯山脉上就出现了冰川消退的现象；喜马拉雅冰川在19世纪中叶开始消退，而阿根廷与挪威等地的冰川则到 20 世纪初才开始消退。跟其他许多与气候有关的现象一样，气温变化与其他变化既是地方性的，也是全球性的。
而且，欧洲也不是明确地在1850年之后变暖了。19世纪70年代各个年份都比较暖和，只是1875年之后偶尔出现过2月份极其寒冷和夏季湿润的情况。1878年至1879年间出现过一次短暂的寒潮，其间的气候条件堪比17世纪90年代。英格兰东部的农民过了圣诞节之后仍在收割庄稼；当时，产自美国大草原地区的廉价小麦正在铺天盖地地涌入英国的粮食市场。随后，就出现了农业萧条。此时也正是印度和中国持续出现季风不力的一个时期，有1,400万至1,800万人死于寒冷、干旱与季风不力导致的饥荒。晚至19世纪80年代，仍有数百名伦敦穷人在持久的寒潮中死于意外高热。1894 年至 1895 年间的隆冬时节，泰晤士河上出现了大块大块的浮冰。接下来，漫长的气候变暖开始了。从 1895 年至1940 年这差不多半个世纪的时间里，欧洲的冬季气候都相对温和。其间只有1916年至1917年间和1928年至1929年间的两个冬天异常寒冷，但完全没有出现“小冰期”里那种持久不断的刺骨之冷。
19 世纪80 年代经济萧条的局面，导致移民如潮水一般迁往了各个新的国度。成千上万失业的农场劳力从乡村迁入了城市，或者搬到了澳大利亚、新西兰，以及他们觉得有生存机会的其他地方。19世纪的移民大潮，让渴望获得土地的欧洲农民纷纷迁移到了澳大利亚、北美洲、新西兰、南非以及其他地方，寻找未开垦的肥沃之地。他们像蝗虫一般蜂拥而至，砍伐了数以百万计的树木，以供耕种、取薪，并且为发展中的市镇和城市提供建筑所用的木料。[16] 大规模的森林砍伐让大气中的二氧化碳含量增加，从而助长了气候变暖。一座原始森林中，每平方千米的林木可以吸纳多达3万吨的碳；再加上其中的林下植物，它们吸纳的碳还会更多。树木被伐之后，它们不再吸收碳，故大部分碳就会进入大气当中。据一项估算，1850年至1870年这20年间全球农业生产和土地改造的剧增，导致大气中的二氧化碳含量增加了10%左右；即便是把海洋中吸收的碳算进去之后，也是如此。虽然在那些年里，古老的加州狐尾松中的同位素水平上升了，但其时燃烧化石燃料在整个环境中还是一个无关紧要的因素。我们可以把这种情况与 2020 年巴西亚马孙雨林中由农民与伐木工引发的 76,000 次林火造成的灾难性影响进行对比。光是2020 年 7 月，亚马孙雨林的面积就减小了1,345平方千米。
燃煤是炭黑聚积的主要原因。早在1912年8月14日，新西兰北岛的一份报纸《罗德尼与奥塔马泰亚时报、韦特马塔与凯帕拉公报》上就曾指出：“如今，全世界的火炉每年都要烧掉大约20亿吨煤炭。煤炭与氧气结合进行燃烧后，每年会让大气中增加大约700万吨二氧化碳……几个世纪之后，由此产生的影响将会相当之大。”[17] 这篇默默无闻的文章，并不是人们头一次论述气候变暖的危害。早在一个月之前，即 1912 年 7 月 17 日，澳大利亚的《布雷德伍德快报》（Braidwood Dispatch ）上就刊登过同样的报道，而那篇报道又是从同年 3 月发表过一篇类似报道的英国《大众机械》（Popular Mechanics ）杂志上复制过来的。这种可怕的警告，并不是什么新鲜事。它们早已以某种形式，存在很长一段时间了。
燃烧的问题（公元19世纪晚期）
早在17世纪，伦敦人就对烧海煤（即在海平面或海平面以下的地方发现的烟煤）时会产生具有污染性的烟雾问题发过牢骚。感觉敏锐的约翰·伊夫林（John Evelyn）曾经抱怨过煤炭燃烧时产生的“烟汽”。英王查理二世想过一些办法来减少日益严重的雾霾问题，却无济于事。1843年，曼彻斯特至少有500座工业烟囱，使得整座城市都笼罩在一层“浓云”之下，而透过云层看去，太阳“宛如无光之盘”。[18] 到了19世纪50年代，伦敦已经成了全球最富裕、实力最强大的城市，随后又成了全球最拥挤和污染最严重的城市。到1900 年时，伦敦这座靠燃煤取暖的城市里已有650万人生活着。与此同时，该市的卫生问题却令人瞠目，让泰晤士河变成了一条可怕的下水道。该市有如“豌豆汤”一般的浓雾，阿瑟·柯南道尔爵士曾在其“夏洛克·福尔摩斯”系列小说中描写过；这种浓雾，不但在整个欧洲赫赫有名，而且一直持续到了20世纪中叶。工业活动与自然条件结合起来，便产生了一种有毒的大气。
一个深奥的研究领域，也让人们产生了空气污染日益严重的印象，那就是19世纪绘画作品中的风景画。[19] J.M.W. 透纳（1775—1851）是一位风景画家，他在光线和气氛方面的表现主义研究生动而出众。在坦博拉火山喷发之后的3年里，他和一些画家一样，绘制过一些令人震惊的日落之景。
透纳说过，他绘制风景画的目的，是展示场景的本来面貌。颜色较红的日落之景，可能就反映出了火山喷发的影响。20世纪70年代，气象学家汉斯·纽伯格（Hans Neuberger）曾经对欧洲与美国的美术馆里收藏的、绘制于1400年至1967年间的画作进行了分析。他的统计分析表明，几个世纪以来，画作中的云量都在缓慢增加，但1850年之后，画作中的天空就不再那么蔚蓝，空气也更加朦胧了；至于原因，除了艺术惯例，纽伯格还认为那是由于空气污染加剧，欧洲的蓝天逐渐消失了。如今，雅典国家天文台的一个小组正在对旧时无数大师绘制的日落作品进行研究。然而，诚如环境史学家业已指出的那样，我们必须将众多因素考虑进去，才能将这些作品视作当时气候状况的可靠指标来使用；这些因素中，也包括了艺术市场的种种时尚。尽管如此，许多知名度不那么高、描绘了19世纪末泰晤士河上航运情况的日常画作，却都以伦敦受到污染的天空中飘浮着一层薄雾为特点。
虽说燃煤和工业污染是气候持续变暖的原因，可我们很难确定，人类活动究竟是从何时开始导致如今这种长期变暖局面的。在某种程度上，这是一个定义的问题。例如，成立于 1988 年的联合国政府间气候变化专门委员会就武断地将公元 1750 年定为起始点，认为工业活动从此开始更加广泛地扩散，从而导致化石燃料的使用与温室气体排放量增加。不过，人们将海洋的古气候数据综合起来之后，却得出了一种更加微妙的判断：海洋古气候数据表明，过去2,000年里海洋表面温度最低的时期出现在1400年至1800年间；这种情况，很大程度上是过去 1,000 年间火山活动加剧导致的。在许多地区，海面温度长期下降的趋势到了工业时代发生了逆转，与陆地上的相同温度趋势相吻合。海陆两种趋势都表明，全球变暖是在1800年之后开始的。
这些关于平均气温的资料，都掩盖了显著的地区性气温差异。19 世纪30年代，热带海域开始持续变暖，北半球的陆地变暖也反映出了这一点。大约50年之后，南半球（尤其是大洋洲和南美洲）才开始变暖。这里具有争议的问题，就是气候变化带来的影响究竟在何时超出了各种自然体系能够适应的气候变化范围。最新评估表明，属于20世纪的标志性特征并且持续至今的大范围气候变暖源自一种持续的趋势；这种趋势，早在19世纪30年代就在热带海洋和北半球的部分地区开始了。火山活动有没有在其中发挥作用呢？坦博拉火山爆发导致的降温并没有持续下去，反而是随着气候的恢复，进入了一个全球加速变暖的间隔期。情况极有可能是，到了 19 世纪中叶，工业时代气候变暖的“温室强迫效应”就已开始，并且持续至今。
人为变暖（公元1900年至1988年）
1900 年至 1939 年间是一个西风频现、冬季气候温和的时期；这两个方面，正是北大西洋涛动处于高指数阶段的典型特征。亚速尔群岛与冰岛低压之间的气压梯度十分陡峭，足以维持盛行风。世界各地的气温都在20世纪40年代初达到了峰值，而像冰岛和斯匹次卑尔根岛这些靠近北极的地区，气温也明显上升了。北方的浮冰面积减少了 10%左右；高山上的雪线上移；船只每年可以抵达斯匹次卑尔根岛的时间达到了5个月，而在20世纪20年代却只有3个月。欧洲北部和西部降雨增多，使得“一战”中的西线战场变成了一片泥泞的荒野。随着气候持续变暖，充沛的降雨也持续到了20世纪20年代和30年代。1925年以后，高山冰川退入了山间，从一座座谷底消失了。更强劲的太平洋西风带不但导致了20世纪 30 年代美国俄克拉何马州的“尘暴”，而且增加了落基山脉频频出现干燥之风的可能性。大气环流的变化，使得印度季风更加稳定可靠，在1925年至1960年间只出现过两次强度稍有不足的情况。
20世纪40年代，科学家开始讨论气候持续变暖的问题，因为这种变暖已经超过了以前各个时代正常的气候波动范围。据他们推测，长此以往，北极冰川将会消退，北方的浮冰也会消失。不过，他们并没有把人类的行为考虑进去，比如砍伐森林或者使用化石燃料，因而将大多数人为造成的变化排除在外，免除了人类的责任。当时，气候研究还处于起步阶段，没有计算机模型、卫星以及全球天气跟踪技术。除了无工具可用，降雨和气温的持续变化往往还掩盖了一些至关重要的长期性趋势。人们也缺乏时间跨度以千年和世纪计，并且经过了精心组织的气象资料。
随着西风带的强度减弱和欧洲西部气候变得更加寒冷、冬季通常也变得更加干燥，北大西洋涛动在20世纪60年代转入了一个低指数阶段。1965年至1966年间，波罗的海完全为冰层所覆盖。1968年的冬季异常寒冷，冰岛自1888年以来第一次被北极海冰所环绕。那一年，欧洲东部和土耳其也经历了两个世纪以来最寒冷的一个冬天。美国中西部和东部地区出现了创历史纪录的低温，使得许多人都认为，另一个“大冰期”即将来临。
1971 年至 1972 年间，北大西洋涛动突然发生了变化。气候变暖重新开始，速度似乎还加快了。波罗的海上，1973年至1974年间全然无冰。英国度过了自1834年以来气温最高的一个夏季。1975 年至1976 年间，创纪录的热浪席卷了西欧的大部分地区。越来越多的极端天气和日益增加的飓风活动，再加上无数场干旱，描绘出了一幅与20世纪初截然不同的全球气候图景。1988年出现了一个暴露政治真相的时刻，一场2个月的热浪在美国中西部和东部地区肆虐。密西西比河上，一长段一长段的河道几近干涸。驳船搁浅了数个星期之久。“大平原”上约有一半的庄稼歉收，而美国西部为干旱所困的乡村地区则有 1,000 多万公顷的土地发生了火灾。1988 年 6 月23日，美国参议院在华盛顿特区举行的一场听证会将气候变化与全球变暖从一个鲜为人知的科学问题变成了一个公共政策的问题。气候学家詹姆斯·汉森在美国参议院的能源和自然资源委员会做证的那一天，气温高达38℃。[20] 汉森利用世界各地2,000座气象站的数据证明，不但全球气温在过去一个世纪里变暖了，而且20世纪70年代初期以后，全球气温再度急剧上升。他直言道，由于人类胡乱使用化石燃料，地球正在永久性地变暖。我们未来的气候当中，将出现更加频繁的热浪、干旱和其他极端气候事件。
他的证词，在一夜之间就将人为造成的全球变暖问题推到了公众的视野当中。从那以后，还没有哪一桩气候事件证明汉森的观点是错误的。
但是，气候变化意识慢慢地进入了公众觉悟的背景当中。工业发展不但改变了美国的经济，还导致美国形成了一种复杂的金融制度；这种制度发挥了巨大的作用，让绝大多数美国人都不会受到作物歉收与气候突变等严酷现实的影响。不过，自20世纪90年代以来，气候变化已经变成了公众关注的焦点；之所以如此，在很大程度上是因为大规模的厄尔尼诺现象、持续的升温和漫长的干旱周期造成了巨大的破坏。人类活动正在导致全球势不可当地变暖，这一点如今已为科学所证实。正是如今，在一个人为导致气候不断变暖的世界上，气候变化才迅速变成全球政治中的一个重大问题；尽管仍有一些落伍的理论家在喋喋不休，也是如此。
[1] 弗朗西斯科·何塞·德卡尔达斯（Francisco José de Caldas）
在1805年至1810年曾任哥伦比亚波哥大天文台的台长一职。引自A.
Guevara-Murua et al., “Observations of a Stratospheric
Aerosol Veil from a Tropical Volcanic Eruption in December
1808: Is This the ‘Unknown’ ~1809 Eruption?” Climate of
the Past Discussions 10, no. 2 (2014): 1901。这桩神秘的火山
喷发事件究竟发生在1808年末还是1809年，如今仍然存有争议。
[2] Stefan Br.nnimann et al., “Last Phase of the Little Ice Age Forced by Volcanic Eruptions,” Nature Geoscience 12 (2019): 650–656.
[3] 壮游（grand tour），旧时英国富家子弟游历欧洲各主要城市的一种教育旅行。——译者注
[4] 原文为FRANKENSTEIN’S ERUPTION。其中的FRANKENSTEIN（弗兰肯斯坦）是英国女作家玛丽·雪莱1818年发表的长篇小说《弗兰肯斯坦——现代普罗米修斯的故事》（或译《科学怪人》）中的主人公，是个热衷于研究生命起源的生物科学家。此人尝试用不同尸体的各个部位拼凑出一个巨大的人体，并且最终创造出了一个怪物。后来，“弗兰肯斯坦”一词就变成了“作法自毙者”或“失控的创造物”等的代名词。——译者注
[5] 此处我们参考了Gillen D’Arcy Wood, Tambora: The Eruption That Changed the World (Princeton, NJ: Princeton University Press, 2014)，这是描述那次火山喷发的一部优秀的通俗作品；还有William Klingaman and Nicholas P. Klingaman, The Year Without Summer: 1816 and the Volcano That Darkened the World and Changed History (New York: St. Martin’s Press, 2013)。
[6] Miranda Shelley, Mary Shelley (London: Simon & Schuster, 2018).
[7] 卡尔·弗赖尔·冯·德莱斯（Karl Freiherr von Drais，1785— 1851）是一位多产的发明家，他不但发明了脚踏车，还在1821年发明了最早的带有键盘的打字机，甚至发明了用脚来蹬踩的人力轨道车，即如今轨道手摇车的前身。1848年，作为对法国大革命一种迟到的致敬，他放弃了自己的贵族头衔，去世时身无分文。
[8] John D. Post, The Last Great Subsistence Crisis in the Western World (Baltimore: John Hopkins University Press, 1977)，是一份权威的参考资料。
[9] 关于爱尔兰的饥荒，见Wood, Tambora, chap. 8。
[10] 莱茵兰（Rhineland），旧地区名，也称“莱茵河左岸地带”，位于如今德国的莱茵河中游，包括今北莱茵—威斯特法伦州、莱茵兰—普法尔茨州。——译者注
[11] Christopher Hamlin, Cholera: The Biography (New York: Oxford University Press, 2008)，是一部标准的作品。
[12] 引自Wood, Tambora, 97。此书第5章中描述了云南发生的一些事件，我们的论述便是以此为基础的。
[13] 本段参考了Wood, Tambora, chap. 9。
[14] Thomas Jefferson, Notes on the State of Virginia (Chapel Hill: University of North Carolina Press, 2006). 1784 年初版于巴黎。
[15] Thomas H. Painter et al., “End of the Little Ice Age in the Alps Forced by Industrial Black Carbon,” Proceedings of the National Academy of Sciences 110, no. 38 (2013):15216–15221.
[16] Richard H. Grove, Ecology, Climate, and Empire: Colonialism and Global Environmental History, 1400–1940(Cambridge, UK: White House Press, 1997).
[17] Rodney and Otamatea Times, Waitemata and Kaipara Gazette, August 14, 1912.
[18] Peter Brimblecombe, The Big Smoke: A History of Air Pollution in London Since Medieval Times (Abingdon, UK: Routledge, 1987). See also Stephen Halliday, The Great Stink of London: Sir Joseph Bazalgette and the Cleansing of the Victorian Metropolis (Stroud, UK: Sutton, 2001).
[19] C. S. Zerefos et al., “Atmospheric Effects of Volcanic Eruptions as Seen by Famous Artists and Depicted in Their Paintings,” Atmospheric Chemistry and Physics 7, no. 15(2007): 4027–4042; Hans Neuberger, “Climate in Art,”Weather 25, no. 2 (1970): 46–56.
[20] James Hanson, congressional testimony, June 23, 1988.
第十五章回到未来（今天与明天）
美洲、罗马、中国、印度；洪水、火山、干旱、温和年份；饥荒、战争、剥削、适应，以及合作。在本书中，我们已经讲述了许多关于人类祖先成功和不成功地应对气候变化的故事。但在当前这种气候变化的背景之下，过去还重要吗？毕竟，除了少数否认气候变化的人，大多数人都一致认为，如今气候变化的原因就是我们自己在工业时代的行为；可在19世纪以前，这样的变化是自然促成的。正如一群气候学家最近强调的那样，古时的气候变化大部分都发生在局部和地区的层次上，而如今人为导致的变暖与气候变化却是持续不断和全球性的；现在，我们可以在全球范围内几乎同时共享气候变化的信息了。[1] 这些即时性的联系，赋予每个人以新的力量。无论是谁，都可以对未来的气候变化施加影响；这种情况，有时被称为“格蕾塔·通贝里效应”。那么，为什么有人要去关注工业化之前众多常常互不联系的社会适应气候变化的方式呢？我们那些业已作古的祖先的经验，对于我们今天正在面对、未来甚至要更加直接地面对的气候变化，又可能具有哪些意义呢？正如小说家 L.P. 哈特利在1954 年所写的那样：“过去有如他乡，人们行事方式相异。”[2]
尽管在本书论及的3万年间，整个世界已经发生了沧桑巨变，一如我们的经济发展，但我们以及生活在这万千年里的人们，无论肤色还是国籍，都具有很浅的进化根基。我们智人在本质上都很相似，全都拥有相同的激素、躯体、血液和大脑潜能。而且，由于我们属于同一物种，故我们对意外事件所做的反应常常具有惊人的相似性，跨越了时间与空间。我们之所以明白这一点，是因为亲历者对古罗马人在维苏威火山爆发那场灾难发生后所做反应的描述，听起来与人们对1815 年坦博拉火山爆发或者对1980年美国西北部太平洋沿岸圣海伦斯火山爆发的反应出奇地相似。2005年8月“卡特里娜”飓风将美国新奥尔良变成一片汪洋和 2012 年超级风暴“桑迪”袭击古巴和美国东部地区的时候，人们也出现了同样的行为。
从这些自然灾难当中，我们已经得知，最强大的顺应与生存武器，就是人类身上一些可以追溯至遥远过去的品质：在适应和恢复过程中进行地方性合作十分重要；不论是社群之间、亲族群体之间进行合作，还是常常有可能在政治、宗教或者文化上处于对立状态的范围更广的群体之间进行合作，都是如此。回顾过往，我们还能看出人类这个物种所有的潜在行为；虽然其中一些行为令人毛骨悚然和具有剥削性，但我们也可以从中吸取教训。
新的科学研究也正在彻底改变我们对过去那些全球性的和地方性的气候变化的看法。半个世纪以前，我们对过去2,000 年间欧洲和美洲的气候情况还知之甚少。如今，我们却可以破译 2,000 年甚至是更久的季节性气候变化密码了。在中国和印度、澳大利亚和新西兰以及太平洋诸岛上进行的研究表明，气候变化在人类历史上始终都是一种强大的驱动因素，只是常常并不引人注目罢了。我们也得知了当前的许多情况，明白了我们人类对全球生态系统已经造成并将继续造成的生态危害。许多研究气候变化的人士都预测说未来很危险，因为未来世界在很大程度上将受到日益激增、居住之地也越来越近的人口影响，以及受几乎全部由人类活动导致的气候变化所影响。他们恰如其分地呼吁人们寻找解决方案，减少人为导致的变暖。这依然是一个全球性的问题，而不能成为一个被狭隘的民族主义和党派政治所模糊的问题。[3]
我们要重申这一呼吁。人人都须牢记，我们是同一个物种，只有很浅的进化根基，代表了全球之间的紧密联系；而且，我们都是过去和未来的参与者。
生而为人
之所以说我们的根基很浅，是因为我们所处的现代工业世界建立在不久之前的奴隶制度与殖民主义的基础上。为了证明利用奴隶和剥削其他国度具有正当性，西方殖民主义者曾经强调，世界不同地区的人（或者“种族”，这是一个难以明确分类的术语，很大程度上是以肤浅而容易改变的外貌为基础）之间存在一条鸿沟。这种洗脑之举，根深蒂固。直到20 世纪90 年代，许多人类进化论者还认为，不同大陆上的现代智人之间的进化关联都极其久远（差不多有200万年），而且不同“种族”是在不同的地区同时进化出来的，比如在中国、欧洲、非洲等等。可如今我们得知，我们这个物种是在大约30万年前于非洲登上历史舞台的，身体结构（即生理上，可能心理上也是如此）则在15万年前以后变得和现代人完全一样了；所有生活在非洲以外的人，都是在大约5万年前离开那个大陆的。
的确，其中有些人后来跟尼安德特人和其他物种繁育过后代，但由此遗传下来的 DNA，却并未局限于单一的肤色、头发类型或者头部形状，而且绝对不会造成种族主义者所鼓吹的种种巨大差异。作为一个物种，我们在生物学上很相似。我们的外貌属于表面现象，且容貌也很容易在一代人的时间里就发生改变。而具有普遍性和让我们成为“生理结构上的现代人类”的，是我们的内在布局：我们都有一个很大的脑袋，具有说话、提前规划和创造性思维的能力。这些能力，有助于定义我们作为智人的独特身份。把现代人类与世间其他动物区分开来的关键行为特征，就是文化。文化既是人类的一种独特属性，也是我们适应不断变化的环境的主要手段。
不过，文化具有悠久的历史，比我们人类这个物种的历史还要悠久。
让维多利亚时代那些顽固不化的人大感恐惧的是，我们竟然属于裸猿。我们的整个进化起源，可以追溯到600万年前甚至更久以前，追溯到早期人类与现代黑猩猩的祖先分道扬镳的时候。我们只发现了在那数百万年之后人类文化的证据：在肯尼亚境内发掘出的具有330万年历史的“洛迈奎3号”（Lomekwi 3）遗址中，出现了粗糙的残破石器这种考古记录。
洛迈奎3号出土的石器
这些工具表明，一个古老的人类物种已经开始巧妙地利用天然石块为自己服务了。诚然，还有一些聪明的动物也会使用工具——我们会想到章鱼和黑猩猩——但它们不可能达到我们如今和过去已经达到的那种程度。
只有人类依赖于各种各样的“物质文化”（即我们制造出来的东西），并将其当成自身与环境之间的缓冲之物，而不是只依靠我们的身体。这一点独一无二，与依赖皮毛、獠牙、网子、毒液、兽角等的其他动物截然不同。文化具有令人着迷的多样性：如今极北之地的因纽特人会缝制厚厚的多层衣物，建造圆顶冰屋，并且用石头、鹿角和兽角制成的器具捕杀猎物为食，而大多数伦敦人却住在砖木房屋里，穿着工厂里生产出来的布料衣物，从超市里购买食品，并且使用计算机。但是，我们可不能为这种多样性所蒙蔽，以至于看不到我们固有的相似之处。
尽管种类繁多，但所有的人类文化都有一个共同的特点，那就是它们会持续不断地适应各种各样的变化。在狩猎社会中，一群驯鹿有可能在毫无征兆的情况下改变它们的春季迁徙路线；邻近群落（或者街道）的亲族可能发生争执；从其他女性那里搜集到的消息，有可能导致一个群体迁徙到20千米以外的地方去采摘成熟的果子。自给农民有可能因土地继承的问题而发生纠纷，在饥馑岁月里有可能靠住在一定距离之外的亲族提供食物。城市领导人有可能争夺贸易线路，甚至发动战争来控制像铁矿、大米或石油之类的资源。所有社会，在做出决策或者讨论决策时都会出现动荡。
令人瞩目的是，人类常常以同样的通用方法来适应。这就是为什么迁徙是适应策略中的一种强大催化剂。数千年以来，迁徙始终都是一种合乎逻辑的适应策略。不过，当我们回顾更加久远的过去时，由于没有文献记载，故我们有可能很难理解以前经济、环境、政治与社会方面的变化。适应过程很复杂。考古学家如今已经变得相当擅长发现重大经济变化和技术变化的痕迹，比如从狩猎与采集变成农业与畜牧业。虽然人类的许多行为都存在于无形的领域——比如，我们虽然无法发掘出一种业已消失的语言，或者一种早已失传的口头传统——但我们可以看到帮助我们适应了重大气候变化的种种技术创新。
在“大冰期”末期的严寒气候中，生活在欧亚大草原上的人们曾穿着用有孔针缝制的分层服装御寒，但这并不意味着，这些生活在“大冰期”里的人是最早使用针这种工具的人；他们并不是率先使用针的人，因为南非斯布都洞穴的古人早在61,000 年前就使用这种工具了。大约 15,000 年前，陶罐开始被用于烹煮和储存食物。但同样，人们甚至在更早的时代就已经使用陶土了，它们以装饰性的小雕像形式留存于世。人类能够创新，但聪明的人还会从过去和别人那里吸取教训。用于制造斧头和刀剑的青铜，彻底改变了农业与战争；随后又出现了硬度更大的铁，以及被各地群落迅速采用的冶炼方法。灌溉技术与城市卫生设施，以及战车与有舷外支架的独木舟，都是我们这个“聪明的”物种的非凡发明。有的时候，这些发明是在相距遥远的地区独立出现的（比如说，美洲和近东地区的作物驯化就是如此）；有的时候，一些非凡的发明却会逐渐变得默默无闻，并且最终消失（比如说，随着印度河文明终结，又过了2,000年，才出现可以与之比肩的卫生技术）。不过，有时聪明的点子会在广大地区之间共享，从一个社群传到另一个社群；假如愿意的话，您可以喻之为一种有益的“传染病”。我们这些身处 21 世纪工业时代的人类，并不是带着超级计算机和原子能突然之间就敏捷地跳上了历史舞台的。我们的背后，至少有300万年的技术实验和创新，以及人类适应气候变化的数百万年历史。
为什么这些遗产会持久存续呢？因为我们总是把自己掌握的知识和经验传授给年轻人。在后“大冰期”时代气候开始变暖以前，几乎所有社会都以小型狩猎与采集群落的形式繁衍生息着；对这些群落而言，经验具有至关重要的意义。老一辈人积累起来的经验，会以口头形式代代相传，而工业化之前的所有农业和畜牧业群落也是如此；他们有时是通过口口相传，或者以歌唱、吟诵和讲故事的方式（当然还有举例）将经验传递下去。这些经验，大部分都属于有关当地环境和环境中各种动植物的深入知识；动植物不但为人们提供了食物，还提供了药物、衣物，以及用于制造狩猎武器、挖土棍棒和其他工具的原材料。这种环境知识，源自人们世世代代的仔细观察，观察的对象既有随季节更替而变化的自然现象，也有猎物和即将出现的天气情况，不论那是一场暴风雪、一场飓风，还是表明一股干燥的离岸风将毁掉正在生长的作物的种种征兆。这种知识异常全面，通过人类遗留下来的东西向我们表明了当时的情况，比如“大冰期”洞穴壁画中的驯鹿皮毛细节、为夏威夷的酋长制作斗篷所用的羽毛，或者牛群在不同季节里所吃的野草。因纽特人以前和现在都有许多的词语来描述不同的冰雪环境。阿留申群岛上曾经划着独木舟在白令海峡上乘风破浪的印第安人，也是如此。他们曾经用各种各样的词汇，描述过海峡上汹涌的波涛。这些全都属于传承性的知识，父传子、母传女，代代相传，从祖先一路传授给了后代。
知识传承
大量的环境知识，已经通过一代又一代人传承到了我们的手中；其中，记载于纸张或者羊皮纸上的知识很少，大部分都属于口述传统，且如今越来越多的口述传统正在逐年消失。历经数千年才习得的这些自然环境知识，当是我们从过去传承而来的最不朽之遗产。只可惜，随着18、19世纪开始的工业化，这个庞大而至关重要的专业知识宝库正在迅速枯竭，被工业化的粮食生产及其生产过程中所用的肥料边缘化，被人们对森林的乱砍滥伐扫到了一边；这些做法的特点，就是几乎完全无视原住民族和我们这个世界的未来。
尽管如此，世间仍然留存着一个传统的气候与环境知识宝库；它既留存于自给农民的记忆当中，也留存在世人遗忘已久的人类学档案与历史档案之中。19世纪和20世纪的西方人类学家搜集了这种知识当中的一大部分；之所以如此，是因为他们对日常生活的细节怀有持久的兴趣（常常是服务于殖民主义），而日常生活就包括了自给农业和常规的传统做法。这种传统知识当中，大部分都以我们如今所称的“风险管理”为中心。与一位靠一季又一季作物收成为生、在土地上辛勤劳作的农民谈一谈，或者读一读维多利亚时期的渔民驾驶帆船在北大西洋冒险出航的故事，您就会发现，自己看到的都是一些谨慎之人。无论现在还是过去，他们所关心的，都是如何在饥荒与营养不良始终像幽灵一般徘徊于地平线上的世界里长期生存下去。
这些人都生活在农村社区，而不是大城市；如今，全世界仍有数以百万计这样的人。巨大的认知鸿沟，再加上一种紧迫感和采取行动的需要，将我们这些城里人与那些传统上与环境联系紧密的人分隔开来了。二者的生活，是脱了节的。那些生活与环境密切相关的人，对他们的农田都投入了深厚的情感——为兴建重大水电项目而安置被迫搬迁的民众时会困难重重，就是明证。自给农民对他们的土地和所处的环境了如指掌，而对生活在拥挤的都市环境里的大多数人而言，这一点是难以想象的。他们对本地的生态、对干旱周期之类的局部气候变化以及它们在环境中的征兆等方面的认识，原本是揭示小型社群如何在气候变化中生存下去的宝贵资料；可这种正在快速消失的知识，却被人们遗忘或者忽视了。亚马孙人、安第斯地区的农民、美国西南部的普韦布洛印第安人，以及非洲中部的农村社群里的人们，如今仍然严格保守着这些知识的秘密。考虑到最近几个世纪的掠夺性殖民活动，我们并不能去责怪他们。环境智慧是一种令人叹服却经常被人们忘记的历史遗产，其中的大部分知识与如今生态学家费尽辛苦得来的知识相比，要细致得多。随着气候危机不断加剧，我们是否可以认为这条鸿沟终将弥合呢？当今世界的气候瞬息万变，我们这些目前与环境脱了节的城市居民，是否会有朝一日开始更加直接地面对环境呢？倘若如此，我们将受益无穷。
亲族关系
过去的另一种宝贵遗产，就是亲族关系（指社群内部和社群之间实际存在或者想象出来的种种亲族联系）。没有哪一个人类社会做到过彻底的自给自足，连“大冰期”里的许多狩猎群落也是如此——他们在短暂的一生中，可能只会遇到群落以外的大约30个人。即便是规模最小的群落，也与远近不一的相邻群落保持着至少不定时的联系。有的时候，他们会聚到一起娶妻嫁夫，解决纠纷，或者交换兽皮、外来装饰品，以及像制造工具的石头之类的其他重要物品。这种接触，全然依靠亲族关系。亲属关系是一代又一代人类学家的关注焦点，而他们这样做也有充分的理由，因为家庭、大家族以及与生活在遥远之地的亲族群体保持联系，始终都是让大大小小的人类社会团结起来的必要纽带。
成为亲族群体中的成员，需要承担若干义务，比如履行婚约、相互支持，尤其是互惠互助（即在必要的时候，亲族应当彼此支持，提供食物和其他必需品）。这样的合作与互助关系，就是人们应对作物歉收和漫长干旱等风险时所采取的措施当中一个至关重要的组成部分。亲族关系曾在古普韦布洛社会中发挥过核心作用；比如，查科峡谷里的人曾经与遥远社群中的亲族保持着牢固的互助关系。假如峡谷里的生存条件变得难以为继，这种关系甚至可以让他们迁徙到亲族所在的村落里去；而他们的确就是这样做的。
强大的亲族纽带，也是工业化之前那些复杂得多的文明当中的一大组成要素。从根本来看，美索不达米亚最早的城邦都由村落凝聚而成，并且根据亲族成员的身份与职业分成了众多的社区。大多数古埃及人，都与具有数代历史的乡间村落保持着紧密的联系。南亚印度河流域的城市居民和东南亚地区的高棉村民，也是如此。古代玛雅人与安第斯地区的印加人由于生活在山间径流与降水都变幻莫测的环境里，故也严重依赖于亲族关系。
在如今规模庞大的城市社会中，隐姓埋名和独居避世的现象都极其普遍，故亲族关系这种传承受到了极度削弱。无疑，其中也有许多例外情况；但我们完全可以说，亲属关系最牢固的根基就存在于那些至今仍与土地维持着密切联系的社群中。幸好，如今一些联系最紧密的城市社群，包括具有强烈文化认同感的城市社区，以及像兄弟会和教会之类的组织，都与各自的本地成员之间保持着牢固的联系。令人瞩目的是，与“卡特里娜”飓风这样的灾难性气候事件和其他灾难做斗争时最有力的一些武器，就是亲族纽带与社群关系，以及种种具有悠久传统、可以追溯至遥远过去的制度。在面对未来将有更多极端天气事件的现实时，这样的应对机制必将变得更加重要。
迁徙时代
散居与迁徙，也是早期人类两种强大有力的传家宝。近几十年来，我们开始面对这样一种现实：在一个人口密度高得多、城市居民动辄数以百万计的世界上，人类的流动性降低了。比方说，人们怎样才能在很短的时间里大规模地离开像休斯敦、迈阿密之类的城市，或者离开上海的中心城区呢？这几乎是一项不可能完成的任务。事实上，现代民族国家还禁止人们在没有规范证件的情况下流动。
然而，自由来去的本领既是我们的天性，也是数百万年以来人类的生存常态。毕竟，狩猎与觅食靠的就是不断移动——追逐猎物、寻找可食用的植物性食物以及追踪从遥远之地所获的重要知识。在人口很少、群落只由几个家庭组成的时候，人们的迁徙毫不费力；这是一种具有高度适应性的方式，可以让人们免受异常严重的洪水和短期性或长期性干旱周期造成的破坏。多格兰（即如今的北海）心脏地带从事打鱼和觅食的狩猎部落曾经在一个地势低洼的环境中不断地迁徙，因为此种环境在一个人短暂的一生中，就能迅速改变景观。当时参与迁徙的部落人口都很少，迁徙是他们日常生活中根深蒂固的一部分。
待到农民在永久性的村落里定居下来，再也离不开他们的土地之后，这种局面就彻底改变了；由于继承规则已经牢牢扎根于亲族群体与血统当中，所以他们的土地会代代相传。很多情况下，当附近的土地已经枯竭，从事刀耕火种的农民就会将整个村落搬离。或许每一代都会发生一次迁徙，而定居地的迁徙路线经常大致呈椭圆形，故他们最终又会回到多年以前遗弃的那些地方。大多数群落的规模都很小，因此迁徙起来相对容易，这不过是一个共同做出决策和听取大家意见的问题罢了。在此种情况下，面对漫长干旱或者像灾难性暴风雨之类的其他因素时，他们往往就会选择散居到其他地方去。
迁徙是一种重要的顺应策略，而在像印度河文明这样的前工业化社会中尤其如此，因为当时的城市与农村社群保持着强大的联系。食物或水源不足，就会促使人们迁徙到乡村去寻找这些资源；而他们利用的，常常就是以种种源远流长的互助义务为基础的亲族关系。如今的大规模移民，甚至让19 世纪时人们为应对强厄尔尼诺现象、常常迫于贫困和长期干旱而进行的移民也相形见绌。应对这种经常属于非自愿性的人口流动而采取的措施，往往会引发复杂的社会问题。不过，为摆脱气候变化而采取的散居与迁徙两种策略既具有悠久的历史，也是人类面对压力时两种近乎本能的行为。强制迁徙的现象虽然比较罕见，但也的确出现过；此处只举两个例子，即古亚述人和印加人，他们都曾将被征服民族重新安置于常常很偏远的新领地上。在当今这个世界上，人口迁徙不再是一种有益本领，而是一种负累之举了；所以，制定全球性的政策来应对生态难民，就成了一个紧迫的问题。
领导力
从早期社会中传承下来的人类行为遗产，在公元前3100年以后世界各地发展起来的前工业化文明中曾经显得更为重要。正是在这个时期，领导力在许多人类社会中都发挥了核心作用；它对人们克服气候变化的方式既产生过积极影响，也产生过消极影响。
领导力首先在于经验和获得的智慧，且这两种品质都与受人敬重的长者、巫医以及灵媒有关；古人认为，巫医、灵媒是人类与超自然世界之间的强大中介。祖先则对人类的生存发挥着必不可少的作用；他们一旦去世，就会成为决定人类能否延续下去的种种超自然力量之中的一部分。在与祖先耕作过的土地之间具有密切联系的农耕社会里，这种作用还变得日益强大起来。为了将所有权合法化和主张土地所有权，人们会把祖先搬出来（如今所有的民族国家也仍在如此做）。随着人类社会变得越来越复杂，亲族关系和祖先变成了领导力的两大支柱。随着第一批前工业化文明崛起，人们对气候变化做出的文化反应与社会反应呈现出了许多更加复杂的新特点。在村落变成城镇与城市的过程中，宗族和其他亲族群体中开始形成等级制度，有些人则获得了公认的宗教权威与政治权威。长久以来，部落首领都是通过个人魅力以及巧妙地利用赏赐、任命位高权重的官职等方式来培养忠诚的追随者，从而获得并保持他们的势力。但这样的忠诚转瞬即逝，并不牢靠，因为它在很大程度上取决于馈赠与互惠，即恩宠与赏赐，无论是赏赐食物还是提供政治支持，甚至是军事援助；首领赐予这些东西的目的，都是指望获得手下效忠这种形式的回报。首领必须让追随者感到满意，否则的话，后者就会弃之而去，转而追随另一位首领。
世袭制的领导权带来了社会不平等和贫富差距。大多数前工业化文明，都属于社会不平等的集权制金字塔社会，由实力强大的个人以及他们那些位于或接近塔尖、拥有特权的亲族统治着。这些人之下，就是各级官吏和神职人员，他们对成千上万的平民百姓实施监管，并向百姓征取赋税；平民的无尽劳作则积聚起粮食盈余，支撑着整个王国。为少数人的利益服务的古代社会全都依赖于大量的粮食盈余、强大有力的政治意识形态和宗教意识形态，以及坚决果断的领导，来生存下去。它们全都很容易受到当地和全球性气候变化的影响，只是程度各异而已。在很多方面，它们与当今的许多社会并无太大的不同，因为当今社会的贫富之间也存在巨大的社会鸿沟。
在几乎每一个古代社会里，自给农民都是勉强维生，因为食不果腹是一种始终存在的现实，而谨慎的风险管理则是一种不言而喻的现实。但是，当一个拥有特权的精英阶层依赖可靠的粮食盈余以及从农民那里攫取的口粮来生存时，又会出现什么情况呢？面对变幻莫测的气候事件，比如北美洲和北海上刮向沿海地区的飓风与狂风，让秘鲁诸河谷中灌溉设施毁于一旦的百年不遇之大雨，尤其是干旱的时候，脆弱性这个幽灵就会暴露出其更加丑陋的面目来。毫无疑问，长久干旱曾经是所有前工业化文明面临的最大威胁。我们已经将干旱区分成了有可能持续1年至3年的短期性干旱，以及有可能持续一个世纪或者更久，且要严重得多的水文干旱周期。乡村里的农民对短暂的干旱都习以为常，或许还习惯了一两个荒年；在荒年里，人们会去种植一些不那么受欢迎的作物，或者去采集野生的植物性食物，但常常会无功而返。他们也许遭遇过饥荒，甚至有人饿死，但生活仍在继续。对于早期文明而言，这种短暂的干旱周期并不是毁灭性的打击，尤其是在统治者已经采取了措施储存下供荒年所用的粮食的时候。
水文干旱周期，或者我们如今所称的特大干旱，却是另一回事了。“4.2 ka事件”，即公元前2200年至公元前1900年间的那场特大干旱，其影响波及地中海东部和南亚地区。公元前2118年，季风强度减弱，尼罗河泛滥严重，埃及整个国家也四分五裂，各州之间你争我夺。粮食盈余化为乌有，人们对法老的权威也信心尽失。数代人之后，国家才在崇尚武力的统治者手下重新统一起来。人们不再说神圣的统治者能够控制尼罗河泛滥之类的话了。此时，法老们开始宣称自己是“百姓的牧人”，并且对灌溉项目和国有粮食储备进行了大力投入。于是，古埃及一直存续到了罗马时代。
组织资源
古埃及很幸运，因为其领土与肥沃之地都位于安全可靠的疆域之内，使得他国几乎不可能进行武装入侵。该国变得更具韧性，并且长期自给自足；尽管当时法老的朝廷之内派系斗争之风盛行，也是如此。在人们寿命很短、医学还处于起步阶段的一个时代，由于竞争对手在暗中争夺权力，故王位继承的问题普遍存在。持续不断的阴谋诡计与各种并不牢靠的联盟，是每一个前工业化文明社会的组成部分；其中大多数文明的兴衰速度之快，令人眼花缭乱。其中的原因，是很容易看出来的。我们仅举几例。比如说，美索不达米亚地区的几乎每一个城邦、玛雅的几乎每一个王国以及中国早期的几乎每一个诸侯国里，都存在基础设施的问题。古埃及的法老们可以通过水路，极其高效地调遣军队和运送各种各样的商品。在沙漠里，他们先是依赖驴子，后来又靠骆驼进行运输；只不过，当时喂养驮畜的粮草问题限制了商队运送的货物量。
陆上国家曾经面临着一种严酷的现实，且这种现实一直延续到了近代。统治者与商贾只能利用人力背驮肩扛，或者用驴子、骆驼等驮畜来运送货物。像木材或一袋袋谷物之类的重物，可以经由河流、湖泊甚至是近海进行运输。但从基础设施的角度来看，陆上往来的各种商品都只能运输大约50千米远，然后就得让驮畜休息，或者更换驮畜。这种现实，也有力地制约了朝廷能够严加掌控的领土面积——有可能少于方圆100千米。出了这个范围，朝廷的掌控就多属于名义上的掌控，并且严重依赖于贵族与各省官吏的忠诚了。
适应突如其来的气候变化，尤其是适应水文干旱和季风强度减弱时采取的措施，其有效程度取决于坚决果断的领导与亲族关系。强有力的领导能够让下属保持忠诚，能够组织兴建基础设施（偏远地区尤其如此），这些措施可以利用充足的粮食盈余，帮助百姓度过粮食短缺的时期。在作物歉收、百姓挨饿的时候，这些措施都至关重要。曾经把生活在摩亨佐达罗、蒂卡尔或者乌尔等城市里的人与城市腹地的社群联系起来的种种亲族关系，也是如此。这种联系就像一份保单，因为遭遇干旱的时候，互助义务可以让挨饿的民众安静平稳地散居到更理想的地区与环境中去；比如底格里斯河泛滥不力，或者数月降雨毁掉了中世纪欧洲的庄稼之时，就是如此。一种古老的生存策略，可以带来莫大的好处。
包括罗马帝国在内的前工业化文明社会，全都严重依赖于人力、驮畜，以及帝国广大地区的种植业，而其栽培的其实是单一作物。在后来的几个世纪里，帝国极大地依赖从埃及和北非地区进口的粮食，由横跨地中海往来的大型运粮船只负责运送。在以桨和帆为动力的货船以及驮畜从偏远的农田运送粮食时，为帝国供应大部分粮食的基础设施曾经做到了尽可能地高效。帝国的海上运输，在很大程度上依赖于奴隶。最后，削弱帝国经济的并不是基础设施，而是弱季风，因为弱季风大幅减少了尼罗河的洪水量，导致撒哈拉沙漠的范围北移了。跟同一时期以及此前的其他国家一样，面对那些影响到了全球广大地区且其中许多都发生在帝国疆域以外的重大气候变化时，罗马帝国也束手无策。
工业化之前的中央集权国家，都特别容易受到特大干旱与其他气候变化的影响。像一系列弱季风或者突如其来的气候变化导致洪水冲毁了灌溉所用的沟渠，随后又是干旱（吴哥的情况就曾如此）之类的情况，都超出了统治者的能力范围；国家无论实力多么强大，都无法存续下去。这些国家有可能是从内部崩溃的，但转型的社会却从它们残余的部分中崛起；转型的社会也许更加分散，也许与新的长途贸易路线相连，但始终缺乏工业规模的基础设施来应对日益增加的脆弱性与风险。
多个世纪以来，前工业文明的兴衰往往伴随着常见的经济与政治动荡。它们都很容易受到气候变化的影响，几乎无一例外。假如成功适应了气候变化，那就是它们在地方层面采取了适应措施，因为有能力的地方管理者可以集中食物供应、封锁各省边界，或者派遣工人去修建灌溉沟渠。大言不惭的古埃及州长安赫提菲，曾在其陵墓的墙壁上吹嘘过他在公元前 2180 年成功战胜了干旱的丰功伟绩。就算是有所夸张，我们也必须承认，此人清晰地认识到了成功适应的一大秘诀：地方性措施的效果，往往比那些让许多人仍然陷于危险当中的宏伟计划大得多。追随安赫提菲的后人，则不断地创造和开发出解决问题的新方法。最终，这就导致了工业化，以及随之而来的更多技术。
现代技术赋予了我们一种胜过前工业化时期那些祖辈的巨大优势。我们的技术能力如此之强，以至于我们能够登陆和探索月球、研究太平洋深处的海沟，以及涉足人工智能领域了。我们甚至到了这样的地步：许多人都天真地以为，技术可以解决气候变化的问题。确实，技术将有所帮助，古罗马的修路者和快速帆船的船长就曾受益于此；不过，我们由此付出的环境代价已经极其巨大，将来也仍会如此。找到应对未来气候挑战的方法，确实需要我们在技术解决方案上进行大力投入；但是，这种解决方案必须做到碳中和，且能够自我维持下去。这种投入将是长期的，既需要巨额资金，也需要改造社会，改变我们的自我管理和行事方式的政治意愿。控制全球气候变化的技术创新，很可能正在向我们走来，但实现这些创新的使命，却是未来数代人的巨大责任。与过去一样，创新会带来义务；只不过，如今这种情况达到了工业化之前的世界无法想象的规模而已。
转折点
有史以来第一次，适应气候变化既成了一个全球性问题，也成了一个地方性问题。此时，也正是历史遗产走上前台之时。过去其实一直与我们同在，既鼓励着我们，提醒我们注意无处不在的危险，也为我们提供了应对未来危机重重的气候之先例。古人的真知灼见，从来没有像今天这样重要过。有史以来第一次，人类正在造成巨大的气候变化，扰乱全球气候的自然循环。大气中的二氧化碳含量日益增加，全球持续加速升温，海平面上升定将淹没地处海边或者海拔接近海平面的繁荣发展着的众多城市，再加上人类长期的乱砍滥伐，导致在这个拥有 76 亿多人的世界上，到处都是破坏生态的现象。数以亿计的人，都生活在极端天气事件以及一些大江大河（比如尼罗河与密西西比河）出现剧变的威胁之下；这些剧变，都是人为造成的气候变化导致的。我们会陷入一连串潜在的气候灾难和生态灾难的重围，其中的大部分灾难也是人类活动的直接后果。这种情况，与安第斯人、印度河文明、中世纪的欧洲农民以及印度莫卧儿王朝面临的各种气候适应性变化都大不一样。如今，我们正处在一个必须面对史无前例和极其凶险的全球性气候变化的时刻。
一些气候学家、生态学家、备受世人敬重的科学家，以及一些政府机构和国际组织，已经一再提醒我们注意未来的这种危机。不过，像古罗马皇帝尼禄一样，就在整个世界都有可能燃烧起来且不可逆地变暖的时候，我们却仍在歌舞升平、虚度光阴。世间如今几乎全然缺乏全球性的领导力；这种领导力并非仅仅展望未来的几年或者几十年，而是放眼未来的一代代人，制定出全球性的战略，为我们的子孙后代创造出一个安全的世界。这是一种真正的全球性挑战，在人类历史上独一无二，的确将让我们和我们的后代付出极其高昂的代价。人类的未来岌岌可危，这种说法并不夸张。
采取一致行动的时机即将到来。说得委婉一点，无视过去数十万年来人类适应气候变化的经验教训，是一种目光极其短浅的做法。
前车之鉴
那么，在适应气候变化方面，我们又从过去获得了一些什么样的经验教训呢？其实都是些非常简单的道理。
第一，我们是人类，具有与每一代智人相同的行为特点，即前瞻性思维、长于规划与合作、能进行智力推理与创新等卓越的品质。在规划适应未来气候变化的措施之时，我们必须最大限度地发挥这些历久弥坚的品质；这些品质，会支撑我们为将来制订出具有决定性的适应规划。
第二，我们在预测气候变化方面已经逐渐获得了一种非凡的、如今仍在迅速改善的专业知识。本书中所描述的古代社会，从来就没有得益于科学的天气预报、卫星观测、全面的气候替代指标以及计算机建模等技术；这些进步，已经彻底改变了我们对全球气候以及对大气与海洋之间无休无止、变幻莫测的相互作用的了解。古巴比伦人和其他民族曾经把观察天体当成预测天气的一种方法，却没有成功；欧洲中世纪的天文学家也是如此。气象学家休伯特·兰姆曾称，19世纪末期之前的天气预报都属于“教堂尖塔式的气象学”，也就是从高处对云层和其他天气征候进行的观察。
完全科学的气象学，是20世纪和21世纪的产物。不过，如今仍然有许多至关重要的传统气候知识不显山不露水地留存了下来。古埃及的祭司们利用“尼罗尺”来测量和预测每年的泛滥水位。早期的欧洲水手，都看得出大风将起的迹象；加勒比海上的岛民与玛雅的占星家，有时能够发现飓风即将到来；太平洋上的航海者曾经利用波利尼西亚的盛行信风会转向 180°的特点，在厄尔尼诺现象期间向东航行。如今仍然挨着土地或者海洋生活的人们都拥有非凡的预测性知识，可我们常常忽视了这些知识。考虑到气候变化大多会造成的地方性影响，我们的做法是错误的。这种口耳相传的传统知识大多依然存在，因此需要我们在为时已晚之前加以搜集和整理。
第三，在一心关注全球性气候变化的同时，我们还忘记了一点：大量适应气候变化的措施，其实都是一个地方性领导力与行动的问题；无论是建造防波堤，还是把住宅迁往地势更高的地方，都不例外。如前文所述，我们不断看到气候变化带来的地方性影响，而各地成功适应气候变化的例子，也比比皆是。其中一个值得注意的例子，就是英格兰东南部的梅德梅里（Medmerry）；此地过去经常被淹的沿海地带曾经屈服于海洋的威力，如今则变成了一个自然保护区。不管代价如何巨大，地方性的适应措施都至关重要；就算它们是全球性气候变化的结果，也是如此。
第四，我们是一种社会性动物，这就意味着，在一个有着令人不快的气候危机的世界上，家庭与范围更广泛的亲族之间的纽带，以及社群与成员之间联系紧密的非营利性组织之间的关系，是一种非凡的生存机制，并且具有极其重要的作用。从一开始，这些关系就是人类历史中的组成部分。它们是人类最强大的一种适应武器，只是我们一直忽视了它们的巨大潜力。此外，作为一个定居世界里的社会性动物，我们往往会利用彼此、利用环境；为了显示我们的地位高人一等也好，实际上仅仅为了在资源有限、有时还很成问题的地方确保自己的生存也罢，我们都会这样干。在我们看来，许多战争可能都与资源冲突相关，而不管战争双方声称的意识形态或者宗教借口是什么。
第五，我们生活在一个工业化的世界，拥有非凡的基础设施，它们在未来具有巨大的潜力。但我们常常忘记，无数人仍在靠着一季一季的收成为生，他们的水源供应往往变化无常，极易受到饥荒与干旱的影响。在极度干旱与饥荒时期放赈救灾的做法虽说可敬，却不是一剂长效的灵丹妙药。人们极少关注传统农业的运作方式以及传统农业中固有的、对当地环境的深入了解，这一点曾令我们深感震惊。美国西南部的普韦布洛农民、伯利兹的凯克奇玛雅人以及玻利维亚高原上的台田农民都是典型的例子，说明我们应当向这些在无人关注的情况下成功践行了多个世纪的传统农业学习。口耳相传的农业知识是过去留下来的一份强大遗产，如今却面临着消失的危险。
第六，工业化之前的文明社会在面对气候危机时，都出现过显著的动荡。一次又一次，连一些强大有力的领袖在情况紧急时也曾犹豫不决，特别是在干旱与其他气候变化否定了他们身上种种公认的超自然力量的时候。那些幸存下来的人，不管是采取了行动还是深思熟虑地适应了业已改变的环境，都是坚决果断的领导人，都能够未雨绸缪并采取大胆的行动。秘鲁沿海的奇穆人当中，就曾有一些姓名不详的目光长远的头领。中国的历代皇帝当中，偶尔也出现过高瞻远瞩的帝王；只不过，他们的努力往往遭到了思想僵化的官僚阻挠。过去的经验提醒我们，长久成功地战胜气候变化的终极助力因素将是有魅力的威权式领导力；这种领导力能够超越国家利益，从真正的全球性视角来与气候变化做斗争。
我们的起点，必须基于我们是一个由智人组成的全球性共同体这个现实，因为我们的未来依赖于那种不痴迷于选举周期和其他类似琐事的领导力。过去提醒我们，有史以来第一次，人类正面临着一种真正的、在过去300万年里从未碰到过的全球性挑战。原因就在于，是我们导致了这种挑战，而其影响将波及太多的人。本书希望通过考古学家与历史学家提供的证据，揭示过去气候变化的真相和人们生活的真实面貌。
人类会不会存续下去呢？假如历史记载具有指导意义的话，那么我们应该会存续下去。只不过，我们需要去适应，或许还是不得不去适应。我们将面临无数挑战，并且几乎可以肯定，其中会有暴力与大量的伤亡。过去提醒我们：人类既灵巧又具有创造力，能够经受比古时更加严峻的考验。回首历史的时候，由于我们如今能够用前人做梦也想不到的方式来进行回顾，所以我们就能看出过去哪些方面有效，哪些方面无效。不过，或许最重要的是，作为一个物种，我们显然需要团结与合作。人类将存续下去，而其中的一个原因就在于，我们已经理解了人类与世界上不断变化的气候之间的复杂关系。过去并非他乡，而是我们所有人的一部分，掌握着开启未来的钥匙。
[1] Raphael Meukom et al., “No Evidence for Globally Coherent Warm and Cold Periods over the Preindustrial Common Era,” Nature 571 (2019): 550–554.
[2] “过去有如他乡”（“The past is a foreign country”）： L. P. Hartley, The Go Between (New York: New York Review Book Classics, 2011)。David Lowenthal, The Past Is a Foreign Country, 2nd ed. (Cambridge: Cambridge University Press, 2015)，是最近对这个主题进行讨论的一部作品。
[3] 要想了解全球变暖的方方面面与潜在的解决之道，最有效的办法就是参见Paul Hawken, ed., Drawdown: The Most Comprehensive Plan Ever Proposed to Reverse Global Warming (New York: Penguin Books, 2017)。这部非凡之作中的论文，提供了一些观点与可能的解决办法；它们虽说有时极其简单，但总是具有前瞻性。

2024-06-02
Venki Ramakrishnan《Why We Die: The New Science of Aging and the Quest for Immortality》
Table of Contents
Introduction
1. The Immortal Gene and the Disposable Body
2. Live Fast and Die Young
3. Destroying the Master Controller
4. The Problem with Ends
5. Resetting the Biological Clock
6. Recycling the Garbage
7. Less Is More
8. Lessons from a Lowly Worm
9. The Stowaway Within Us
10. Aches, Pains, and Vampire Blood
11. Crackpots or Prophets?
12. Should We Live Forever?
Acknowledgements
Notes
Index
Introduction
Almost exactly one hundred years ago, an expedition led by the Englishman Howard Carter unearthed some long-buried steps in the Valley of Kings in Egypt. The steps led to a doorway with royal seals, signifying that it was the tomb of a pharaoh. The seals were intact, meaning that nobody had entered for more than three thousand years. Even Carter, a seasoned Egyptologist, was awestruck by what they found inside: the mummified young pharaoh Tutankhamun, with his magnificent gold funerary mask, kept company in the tomb for millennia by a wealth of ornate and beautiful artifacts. The tombs had been secured shut so that mere mortals could not enter—the Egyptians had gone to enormous efforts to create objects never intended to be seen by other people.
The splendor of the tomb was part of an elaborate ritual aimed at transcending death. Guarding the entrance to a room of treasures was a gold and black statue of Anubis, the jackal-headed god of the underworld, whose role is described in The Egyptian Book of the Dead. A scroll of the book was often placed in the pharaoh’s sarcophagus. We may be tempted to think of it as a religious work, but it was more akin to a travel guide, containing instructions for navigating the treacherous underworld passage to reach a blissful afterlife. In one of the final tests, Anubis weighs the heart of the deceased against a feather. If the heart is found to be heavier, it is impure, and the person is condemned to a horrible fate. But if the examinee is pure, he would enter a beautiful land filled with eating, drinking, sex, and all the other pleasures of life.
The Egyptians were hardly alone in their beliefs of transcending death with an eternal afterlife. Although other human cultures may not have constructed such elaborate monuments as the Egyptians did for their royalty, all of them had beliefs and rituals around death.
It is fascinating to consider how we humans first became aware of our mortality. That we are aware of death at all is something of an accident, requiring the evolution of a brain that is capable of self-awareness. Very likely it needed the development of a certain level of cognition and the ability to generalize as well as the development of language to pass on that idea. Lower life forms and even complex ones such as plants, don’t perceive death. It simply happens. Animals and other sentient beings may instinctively fear danger and death. They recognize when one of their own has died, and some are even known to mourn them. But there is no evidence that animals are aware of their own mortality. I do not mean being killed by an act of violence, an accident, or a preventable illness. Instead, I mean the inevitability of death.
At some point, we humans realized that life is like an eternal feast that we join when we are born. While we are enjoying this banquet, we notice others arriving and departing. Eventually it is our turn to leave, even though the party is still in full swing. And we dread going out alone into the cold night. The knowledge of death is so terrifying that we live most of our lives in denial of it. And when someone dies, we struggle to acknowledge that straightforwardly, and instead use euphemisms such as “passed away” or “departed,” which suggest that death is not final but merely a transition to something else.
To help humans cope with their knowledge of mortality, all cultures have evolved a combination of beliefs and strategies that refuse to acknowledge the finality of death. Philosopher Stephen Cave argues that the quest for immortality has driven human civilization for centuries. He classifies our coping strategies into four plans. The first, or Plan A, is simply to try to live forever or as long as possible. If that fails, then Plan B is to be reborn physically after you die. In Plan C, even if our body decays and cannot be resurrected, our essence continues as an immortal soul. And finally, Plan D means living on through our legacy, whether that consists of works and monuments or biological offspring.
All of humanity has always incorporated Plan A into their lives, but cultures differ in the extent to which they fall back on the other plans. In India, where I grew up, Hindus and Buddhists gladly embrace Plan C, and the idea that each person has an immortal soul that lives on after death by being reincarnated in a new body, even in a completely different species. The Abrahamic religions, Judaism, Christianity, and Islam, subscribe to both Plans B and C. They believe in an immortal soul but also in the idea that we will rise bodily from the dead and be judged at some point in the future. Perhaps this is why traditionally these religions insisted on burial of the intact body and forbade cremation.
Some cultures, such as the ancient Egyptians, hedged their bets by incorporating all four plans into their belief systems. In grandiose tombs, they mummified the corpses of their pharaohs so that they might rise up bodily in the afterlife. But they also believed in a soul, called Ba, that represents the essence of the person and survives death. The first emperor of a unified China, Qin Shi Huang, took a similarly multipronged approach to immortality. Having escaped many attacks on his life, conquered warring states, and consolidated his power, he turned his attention to seeking the elixir of life. He sent emissaries to pursue even the faintest rumors of its existence. Facing certain execution for their failure to find it, many quite sensibly absconded and were never heard from again. In an extreme combination of Plans B and D, Qin also ordered the construction of a city-sized mausoleum for himself in Xian, employing 700,000 men in the process. The tomb contained an army of 7,000 terra-cotta warriors and horses—all meant to guard the deceased emperor until he could be reborn. Qin died at the age of forty-nine in 210 BCE. Ironically, it may have been toxic potions taken to prolong his life that ultimately cut it short.
Our ways of coping with death began to change with the arrival of the Enlightenment and modern science in the eighteenth century. The growth of rationality and skepticism means that although many of us still hang on to some forms of Plans B and C, deep down we have become less sure they are real alternatives. Our focus has shifted toward finding ways to stay alive and preserving our legacy after we die.
It is a curious facet of human psychology that even if we accept that we ourselves will be gone, we feel a strong need to be remembered. Today, instead of constructing tombs and monuments, the very rich engage in philanthropy, endowing buildings and foundations that will long outlast them. Throughout the ages, writers, artists, musicians, and scientists have sought immortality through their works. Ultimately, however, living on through our legacy is not an entirely satisfying prospect.
If you are neither a powerful monarch or billionaire, nor an Einstein, do not despair. The other way to leave a legacy and be remembered is accessible to nearly all living things, which is to live on through our offspring. The desire to procreate so that some part of us will live on is one of the strongest biological instincts to have evolved, and is so central to life that we will have much more to say about it later. But even though we love our children and grandchildren and want them to live on long after we are gone, we know that they are separate beings with their own consciousness. They are not us.
Nevertheless, most of us do not live in constant existential angst about our mortality. Rather, our brains appear to have evolved a protection mechanism by thinking of death as something that happens to other people, not ourselves. A separation of the dying reinforces the delusion. Unlike the past, when we were confronted by people dying all around us, today people often die in care homes and hospitals, isolated from the rest of the population. As a result, most of us, especially young people, go about our daily lives acting as though we are immortal. We work hard, engage in hobbies, strive after long-term goals—all useful distractions from potential worry about dying. However, no matter what tactics we employ, we cannot fully escape awareness of our mortality.
And that brings us back to Plan A. The one strategy that all sentient beings have had in common for millions of years is simply to try to stay alive for as long as possible. From a very young age, we instinctively avoid accidents, predators, enemies, and disease. Over millennia, that universal desire led us to protect ourselves from attacks by forming communities and fortifications and developing weapons and maintaining armies; but it also led to the search for potions and cures and eventually to the development of modern medicine and surgery.
For centuries, our life expectancy hardly changed. But over the last 150 years, we have doubled it, primarily because we better understood the causes of disease and its spread, and improved public health. This progress allowed us to make enormous strides in extending our average life span, largely as a result of reducing infant mortality. But extending maximum life span—the longest we can expect to live even in the best of circumstances—is a much tougher problem. Is our life span fixed, or could we slow down or even abolish aging as we learn more about our own biology?
Today the revolution in biology that began with the discovery of genes more than a hundred years ago has led us to a crossroads. For the first time, recent research on the fundamental causes of aging is raising the prospect not merely of improving our health in old age but also of extending human life span.
Demographics is driving a huge effort to identify the causes of aging and to find ways to ameliorate its effects. Much of the world is faced with a growing elderly population, and keeping them healthy for as long as possible has become an urgent social imperative. The result is that after a long period in which it was a scientific backwater, aging research—or gerontology—has taken off.
In the last ten years alone, more than 300,000 scientific articles on aging have been published. More than 700 start-up companies have invested a combined many tens of billions of dollars to tackle aging—and this is not counting large, established pharmaceutical companies that have programs of their own.
This enormous effort raises a number of questions. Could we eventually cheat disease and death and live for a very long time, possibly many times our current life span? Certainly some scientists make that claim. And California billionaires, who love their lifestyles and don’t want the party to end, are only too willing to fund them.
The immortality merchants of today—the researchers who propose trying to extend life indefinitely and the billionaires who fund them—are really a modern take on the prophets of old, promising a long life largely free of the fear of encroaching old age and death. Who would have this life? The tiny fraction of the population who could afford it? What would be the ethics of treating or modifying humans to achieve this? And if it becomes widely available, what sort of society would we have? Would we be sleepwalking into a future without considering the potential social, economic, and political consequences of humans living well beyond our current life spans? Given recent advances and the enormous amount of money pouring into aging research, we must ask where this research is leading us, as well as what it suggests about the limits of human beings.
The coronavirus pandemic that hit the world in late 2019 is a stark reminder that nature does not care about our plans. Life on Earth is governed by evolution, and we are yet again reminded that viruses have been here long before humans, are highly adaptable, and will be here long after we are gone. Is it arrogant to think that we can cheat death using science and technology? If it is, what should our goals be instead?
I have spent most of my long career studying the problem of how proteins are made in the cells that make up our body. The problem is so central that it impinges on virtually every aspect of biology, and over the last few decades, we have discovered that much of aging has to do with how our body regulates the production and destruction of proteins. But when I started my career, I had no idea that anything I did would be connected with the problem of why we age and die.
Although fascinated by the explosion in aging research that has led to some very real breakthroughs in our understanding, I have also watched with growing alarm the enormous amount of hype associated with it, which has led to widespread marketing of dubious remedies that have a highly tenuous connection with the actual science. Yet they continue to flourish because they capitalize on our very natural fear of growing old and disabled and eventually dying.
That natural fear is also the reason that growing old and facing death is the subject of innumerable books. They fall into a few categories. There are books that provide practical advice on how to age healthily; some are sensible, while others border on snake oil. Others are about how to face our mortality and accept our end gracefully. These serve both a philosophical and moral purpose. Then there are books that delve into the biology of aging. These too fall into a couple of categories. They are written either by journalists or by scientists who have considerable personal stake in the form of their own start-up anti-aging companies. This book is not any of these.
Considering how rapidly the field is advancing, the enormous amount of both public and private money invested in it, and the resulting hype, I thought it was an appropriate moment for someone like me, who works in molecular biology but has no real skin in the game, to take a hard, objective look at our current understanding of aging and death. Because I know many of the leading figures in this area personally, I have been able to have many frank conversations to gain an honest and deeper understanding of how they see aging research in its many aspects. I have deliberately refrained from talking to those scientists who have made their positions clear in their own books, especially when they are also tied closely to commercial ventures on aging, but I have discussed their highly publicized views.
Given the pace of discovery, any book that focuses just on the most recent aging research would be out of date even before it was published. Moreover, the most recent discoveries in any area of science often do not hold up to scrutiny and have to be revised or discarded. Accordingly, I have tried to concentrate on some of the essential principles behind the most promising approaches to understanding and tackling aging. These principles should not only stand the test of time, but also help readers realize how we got to our present state of knowledge. I also give a historical background to some of the basic research that led to our current understanding. It is both fascinating and important to realize how much of what we know began with scientists studying some completely different fundamental problem in biology.
I said I have no skin in the game, but, of course, all of us do. We are all concerned about how we will face the end of life—less so when we are young and feel immortal, but more so at my age of seventy-one, when I find that I can do only with difficulty, or not at all, things I could do easily even just ten or twenty years ago. It sometimes feels that life is like being constrained to a smaller and smaller portion of a house, as doors to rooms that we would like to explore slowly close shut as we age. It is natural to ask what the prospects are that science can pry those doors open again.
Because aging is connected intimately with so many biological processes, this book is also something of a romp through a lot of modern molecular biology. It will take us on a journey through the major advances that have led to our current understanding of why we age and die. Along the way, we will explore the program of life governed by our genes, and how it is disrupted as we age. We will look at the consequences of that disruption for our cells and tissues and ultimately ourselves as individual beings. We will examine the fascinating question of why even though all living creatures are subject to the same laws of biology, some species live so much longer than even closely related ones, and what this might mean for us humans. We will take a dispassionate look at the most recent efforts being made to extend life span and whether they live up to their hype. I will also challenge some fashionable ideas, such as whether we do our best work in old age. I hope to probe, as well, the crucial ethical question that runs beneath anti-aging research: Even if we can, should we?
The first step in our journey is to think about what exactly death is, the many ways it can manifest itself, and explore the fundamental question of why we die.
1. The Immortal Gene and the Disposable Body
Whenever I walk along the streets of London, I never cease to be amazed by a city where millions of people can work, travel, and socialize so seamlessly. A complex infrastructure, and hundreds of thousands of people, all work in concert to make it possible: the London Underground and buses to move us around the city; the post office and courier services to deliver the mail and goods; the supermarkets that supply us with food; the power companies that generate and distribute electricity; and the sanitation services that keep the city clean and remove the enormous quantities of waste we produce. As we go about our business, it is easy to take for granted this incredible feat of coordination that we call a civilized society.
The cell, our most basic form of life, has a similarly complex choreography. As the cell forms, it builds elaborate structures like the parts of a city. Thousands of synchronized processes are required to keep it functioning. It brings in nutrients and exports waste. Transporter molecules carry cargo from where they are made to distant parts of the cell where they are needed. Just as cities cannot exist in isolation but must exchange goods, services, and people with surrounding areas, the cells of a tissue need to communicate and cooperate with neighboring cells. Unlike cities, whose growth is not always constrained, the cell needs to know when to grow and divide but also when to stop doing so.
The complex organization of a cell has similarities to a city. Only some of the major components are shown, and for clarity, they are not drawn to scale.
Throughout history, cities were imagined by their inhabitants to be permanent. We don’t go about our lives thinking that the city we live in will one day cease to exist. Yet cities and entire societies, empires, and civilizations grow and die just as cells do. When we talk about death, we aren’t usually thinking about these other kinds of death; we mean as it occurs to each one of us as individuals. But it turns out to be tricky even to define an individual, let alone what we mean by its birth or death.
At the moment of our death, what exactly is it that dies? At this point, most of the cells in our body are still alive. We can donate entire organs, and they work just fine in someone else if transplanted quickly enough. The trillions of bacteria, which outnumber the human cells in our body, continue to thrive. Sometimes the reverse is also true: suppose we were to lose a limb in an accident. The limb would certainly die, but we don’t think of ourselves as dying as a result.
What we really mean when we say we die is that we stop functioning as a coherent whole. The collection of cells that forms our tissues and organs all communicate with one another to make us the sentient individuals we are. When they no longer work together as a unit, we die.
Death, in the inevitable sense we are considering in this book, is the result of aging. The simplest way to think of aging is that it is the accumulation of chemical damage to our molecules and cells over time. This damage diminishes our physical and mental capacity until we are unable to function coherently as an individual being—and then we die. I am reminded of the quote from Hemingway’s The Sun Also Rises, in which a character is asked how he went bankrupt, and he replies, “Two ways. Gradually, then suddenly.” Gradually, the slow decline of aging; suddenly, death. The process of aging can be thought of as starting gradually with small defects in the complex system that is our body; these lead to medium-sized ones that manifest as the morbidities of old age, leading eventually to the system-wide failure that is death.
Even then, it is hard to define exactly when this happens. Death used to mean when someone’s heart stopped beating, but today cardiac arrest can often be reversed by CPR. The loss of brain function is now taken as a more direct sign of death, but there are hints that even that can sometimes be reversed. Differences in the precise legal definition of death can have very real consequences. Harvesting organs for donation from two persons in two different US states could be perfectly legal in one and murder in the other, even if they were both considered dead using identical criteria. A girl who was declared brain dead in Oakland, California, was considered alive by the standards of New Jersey, where her family lived. Her family petitioned and eventually had her body transported with its life support equipment to New Jersey, where she died a few years later.
If the precise moment of our death is ill-defined, so too is the moment of our birth. We exist before we emerge from the womb and take our first breath. Many religions consider conception to be the beginning of life, but conception too is a fuzzy term. Rather, there is a window of time after a sperm has made contact with the surface of an egg during which a series of events has to take place before the genetic program of the fertilized egg is set into motion. After that, there is a multiday window during which the fertilized egg undergoes a few divisions, and the embryo—now called a blastocyst—has to implant itself in the lining of the womb. Still later, the beginning of a heart develops, and only long after that, with the development of a nervous system and its brain, can the growing fetus sense pain.
The question of when life begins is as much a social and cultural question as it is a scientific one, as can be seen by the continuing debate over abortion. Even in many countries where abortion is legal, including the United States and the United Kingdom, it is a crime to grow embryos for research beyond fourteen days, which corresponds roughly to the time when a groove called the primitive streak appears in the embryo and defines the left and right halves. After this stage, the embryo can no longer split and develop into identical twins. Although we think of birth and death as instantaneous events—in one instant we come into existence and in another we cease to exist—the boundaries of life are blurry. The same is true of larger organizational units. It is hard to pinpoint the exact time when a city came into existence or when it crumbled.
Death can occur at every scale, from molecules to nations, but there are common features of the growth, aging, and demise of these very different entities. In every case, there is a critical moment when the component parts no longer allow the organic whole to function. Molecules in our cells work in a coordinated way to allow the cell to function, but they themselves can suffer chemical damage and eventually break down. If the molecules are involved in vital processes, their cells will themselves begin to age and die. Moving up the scale hierarchy, the trillions of cells in a human being carry out their specialized duties and communicate with one another to allow an individual to function. Cells in our body die all the time, with no adverse effects. In fact, during the growth of an embryo, many cells are programmed to die at precise points of development—a phenomenon called apoptosis. But when enough essential cells die, whether in the heart or the brain or some equally critical organ, then the individual can no longer function and dies.
We human beings are not so different from our cells. We carry out roles in groups: companies, cities, societies. The departure of one employee will not normally affect the functioning of a large company, and even less that of a city or a country, just as the death of a single tree says nothing at all about the viability of a forest. But if key employees, such as the entire senior management, were to leave suddenly, the health and future of the company would be in doubt.
It is also interesting to see that longevity increases with the size of the entity. Most of the cells in our body have died and been replaced many times before we ourselves die, while companies tend to have much shorter life spans than the cities in which they operate. The principle of safety in numbers has driven the evolution of both life and societies. Life probably began with self-replicating molecules, which then organized in closed compartments that we know as cells. Some of those cells then banded together to form individual animals. Then animals themselves organized into herds—or, in our case, communities, cities, and nations. Each level of organization brought greater safety and a more interdependent world. Today hardly any of us could survive on our own.
STILL, WHEN WE THINK OF DEATH, we are generally thinking about our own: the end of our conscious existence as an individual. There is a stark paradox about that kind of death: although individuals die, life itself continues. I don’t mean just in the sense that our family, community, and society will all go on without us. Rather, it is remarkable that every creature alive today is a direct descendant of an ancestral cell that existed billions of years ago. So, although changing and evolving with time, some essence in all of us has lived continuously for a few billion years. That will continue to be true for every living thing for as long as life survives on Earth, unless we one day create an entirely artificial form of life.
If there is a direct line of succession from us to our ancient ancestors, then there must be something about each of us that doesn’t die. That something is information on how to create another cell or an entirely new organism, even after the original carrier of that information has died—just as the ideas and information here can persist in some form long after the physical copy of this book has deteriorated.
The information to continue life resides, of course, in our genes. Each gene is a section of our DNA, and is stored in the form of chromosomes in the nucleus, the specialized compartment that encapsulates genetic material in our cells. Most of our cells contain the same entire set of genes, known collectively as our genome. Every time our cells divide, they pass on the entire genome to each of the daughter cells. The vast majority of these cells are simply part of our body and will die with it. But some of our cells will outlive our body by developing into our children—the new individuals that make up the next generation. So what is special about these cells that allows them to live on?
The answer to this settled a raging controversy, one that came long before our knowledge of genes, let alone DNA. When people first began to accept that species could evolve, two opposing views emerged. The first, advanced by the Frenchman Jean-Baptiste Lamarck in the early nineteenth century, held that acquired characteristics could be inherited. For example, if a giraffe were to keep stretching its neck to reach higher branches for leaves to eat, its offspring would inherit the resulting longer neck. The second theory was natural selection, proposed by a pair of British biologists, Charles Darwin and Alfred Wallace. In this view, giraffes were variable, some with longer necks and others with shorter. Those with longer necks were more likely to find nourishment and thus be able to survive and have offspring. Progressively, with each generation, variants with longer and longer necks would be selected.
A relative outsider working in what was then the Malay Archipelago, thirty-five-year-old Alfred Wallace wrote to Darwin in 1858 expressing his ideas, not realizing that the older man had himself come to the same conclusion many years earlier. Because these ideas were so revolutionary, and had social and religious implications, Darwin had not yet summoned the courage to publish them, but the communication from Wallace spurred him into action. Darwin was at the heart of the British scientific establishment, and had he been less scrupulous, he could have simply ignored Wallace’s letter and hurriedly published his book. Nobody would have ever known Wallace’s name. Instead, Darwin arranged for himself and Wallace to make a joint presentation at the Linnean Society of London on July 1, 1858. The response to the lecture itself was relatively muted and had little immediate impact. In what was one of the worst pronouncements in the history of science, the society’s president said in his annual address, “The year has not, indeed, been marked by any of those striking discoveries which at once revolutionize, so to speak, the department of science on which they bear.” However, the lecture paved the way for the publication of Darwin’s book On the Origin of Species the following year, which changed our understanding of biology forever.
In 1892, thirty-three years after Darwin’s monumental tract was published, the German biologist August Weismann posited a neat rebuttal of Lamarck’s ideas. Although humans have known for a very long time that sex and procreation were connected, it is only in the last 300 years that we discovered that the key event is the fusion of a sperm with an egg to start the process. The fertilization of an egg by a sperm results in the seemingly miraculous creation of an entirely new individual. The individual consists of trillions of cells that carry out nearly all of the functions of the body and die with it. They are known collectively as somatic cells, from soma, the Latin and Greek word for “body.” The sperm and the egg, on the other hand, are germ-line cells. They reside in our gonads, which are testes in males and ovaries in females. And they are the sole transmitters of heritable information: our genes. Weismann proposed that germ-line cells can create the somatic cells of the next generation, but the reverse can never happen. This separation between the two kinds of cells is called the Weismann barrier. So if a giraffe stretches its neck, it might affect various somatic cells that make up its neck muscles and skin, but these cells would be incapable of passing on any changes to its offspring. The germ-line cells, protected in the gonads, would be impervious to the activities of the giraffe and any characteristics its neck acquired.
The germ-line cells that propagate our genes are immortal in the sense that a tiny fraction of them are used to create the next generation of both somatic and germ-line cells by sexual reproduction, which effectively resets the aging clock. In each generation, our bodies, or our soma, are simply vessels to facilitate the propagation of our genes, and they become dispensable once they have fulfilled their purpose. The death of an animal or a human is really the death of the vessel.
WHY DOES DEATH EVEN EXIST? Why don’t we simply live forever?
The twentieth-century Russian geneticist Theodosius Dobzhansky once wrote, “Nothing in biology makes sense except in the light of evolution.” In biology, the ultimate answer to a question about why something occurs is because it evolved that way. When I first began to consider the question of why we die, I thought naively that perhaps death was nature’s way of allowing a new generation to flourish and reproduce without having older ones hanging around to compete with it for resources, thus better ensuring the survival of the genes. Moreover, each member of a new generation would have a different combination of genes than its parents, and the constant reshuffling of life’s deck of cards would help facilitate survival of the species as a whole.
This idea has existed at least since the Roman poet Lucretius, who lived in the first century BCE. It is appealing—but it’s also wrong. The problem is that any genes that benefit the group at the expense of the individual cannot be stably maintained in the population because of the problem of cheaters. In evolution, a “cheater” is any mutation that benefits the individual at the expense of the group. For example, let us suppose there are genes that promote aging to ensure that people die off in a timely way to benefit the group. If an individual had a mutation that inactivated those genes and lived longer, that person would have more opportunity to have offspring, even though it did not benefit the group. In the end, the mutation would win out.
Unlike humans, many insects and most grain crops reproduce only once. Species such as the soil worm Caenorhabditis elegans, as well as salmon, produce lots of offspring in one big bang and die in the process, often recycling their own bodies as a form of suicide. This kind of reproductive behavior makes sense for worms, which usually live as inbred clones and are therefore genetically identical to their offspring. On the other hand, the reproductive behavior of salmon is a result of their life cycle: they have to swim thousands of miles in the ocean before returning to spawn. With little chance of surviving such a journey twice, they are better served by putting everything they can into breeding just once, using up their entire energy and even dying in the process, to produce enough offspring and maximize the chance that those offspring survive. For species that can reproduce multiple times, like humans, flies, or mice, it would not make genetic sense to die in the act of producing offspring to which they are only 50 percent related. In general, natural selection rarely acts for the good of species or even groups. Rather, nature selects for what evolutionary biologists call fitness, or the ability of individuals to propagate their genes.
If the goal is to ensure that our genes are passed on, why has evolution not prevented aging in the first place? Surely the longer humans survive, the more chance we have of producing offspring. The short answer is that through most of our history as a species, our lives were short. We were generally killed by an accident, disease, predator, or a fellow human before our thirtieth birthday. So there was no reason for evolution to have selected us for longevity. But now that we have made the world safer and healthier for us, why don’t we just keep living on?
The solution to this puzzle began in the 1930s with two members of the British scientific elite, J. B. S. Haldane and Ronald Fisher. Haldane was a polymath who worked on everything from the mechanisms of enzymes to the origin of life. He was a socialist who late in life became disillusioned with Britain and emigrated to India, where he died. Fisher’s fundamental contributions to statistics have propelled our understanding of evolution and also form the basis of randomized clinical trials that are used to test the efficacy of new drugs or medical procedures and have saved millions of lives. More than fifty years after his death in 1962, he became controversial for his views on eugenics and race. A stained glass window that portrayed one of Fisher’s key ideas for the design of experiments was recently removed by Gonville and Caius College in Cambridge, where he was once a fellow, and its final disposition is still uncertain.
Around the same time, Fisher and Haldane independently came up with a revolutionary idea. A mutation that is harmful early in life, each realized, would be strongly selected against because those who carry it would not reproduce. However, the same could not be said for a gene that is deleterious to us only later in life, because by the time it causes harm, we will already have passed it on. For most of our history as a species, we would not have even noticed its harmful consequences, because long before these effects would be felt, we would have died. It is only relatively recently that we have become aware of the consequences of any mutations that are detrimental late in life. Huntington’s disease, for example, primarily affects people over thirty, by which time, historically, most of them would have already reproduced and died.
Fisher’s and Haldane’s ideas explain why certain deleterious genes persist in the human population, but their relevance to aging was not immediately obvious. That understanding came when British biologist Peter Medawar, another brilliant and colorful figure, turned his attention to the problem. Medawar, born in Brazil, was most famous for his ideas of how the immune system rejects organ transplants and acquires tolerance. Unlike many scientists who focus narrowly on one area, Medawar, like Haldane, had widespread interests, and wrote books that were famous for their erudition and elegant writing. Many scientists of my generation grew up reading his Advice to a Young Scientist (1981), which I found pompous, arrogant, thoughtful, engaging, and witty all at once.
Medawar proposed what has become known as the mutation accumulation theory of aging. Even if a person harbored multiple genetic mutations that didn’t noticeably impair health early on, in combination they brought about chronic problems later in life, resulting in aging.
Going one step further, the biologist George Williams suggested that aging occurs because nature selects for genetic variants, even if they are deleterious later in life, because they are beneficial at an earlier stage. This theory is called antagonistic pleiotropy. Pleiotropy is simply a fancy term for a situation in which a gene can exert multiple effects. So antagonistic pleiotropy means that the same gene could have opposite effects; with genes involved in aging, the effects could occur at different times, such as being helpful early in life and problematic later. For example, genes that help us grow early in life increase the risk of age-related diseases such as cancer and dementia when we are old.
Similarly, the disposable soma hypothesis posits that an organism with limited resources must apportion them between investing in early growth and reproduction and prolonging life by continuously repairing wear and tear in the cell. According to biologist Thomas Kirkwood, who first proposed this theory in the 1970s, the aging of an organism is an evolutionary trade-off between longevity and increased chances of passing on its genes through reproductive success.
Is there any evidence for these various ideas about aging? Scientists have experimented on fruit flies and worms, two favorite organisms because they are easy to grow in the laboratory and have short generation times. Exactly as these theories would predict, mutations that increase life span reduce fecundity (the rate at which an organism produces offspring). Similarly, reducing the caloric intake of the daily food given to these organisms also increases life span and reduces fecundity.
Apart from the ethics of experimenting on humans, the two to three decades between generations is too long for a typical academic career, let alone the handful of years a graduate student or research fellow might stick around. But an unusual analysis of British aristocrats over the past 1,200 years shows that among women who survived beyond sixty (to weed out factors such as disease, accidents, and dying in childbirth), those with fewer children lived the longest. The authors argue that in humans too, there is an inverse relationship between fecundity and longevity, although, of course, as any harried parent knows, there could have been many other reasons why having fewer children extends life expectancy.
THE INCREASE IN OUR LIFE span over the last century brings us to another curious feature of aging that is almost unique to humans: menopause. With the exception of a few other species, including killer whales, most female animals can reproduce almost to the end of their lives, whereas women suddenly lose the ability in midlife. The abruptness of this change in women, as opposed to the more gradual decline in male fertility, is also strange.
You might think that if evolution selects for our ability to pass on our genes, it should want us to reproduce for as much of our lives as possible. So why do women stop reproducing relatively early in life?
This may be asking the wrong question. Our closest relatives, such as the great apes, all stop having babies about the same age that we do: the late thirties. The difference is that they generally die soon afterward. And for most of human history, most women too died soon after menopause, if not earlier. Perhaps the real question is not why menopause occurs so early in life but why women live so long afterward.
People cannot be sure they have reproduced in the sense of passing on their genes until their youngest child has become self-sufficient, and humans have a particularly long childhood during which they are dependent on their parents. Menopause may have arisen to protect women from the increased risk of childbirth in later age, keeping them alive longer to take care of the children they had already. This might also explain why men—who don’t suffer such an increased risk—can be reproductive until much later in life. So perhaps menopause developed as an adaptation to maximize the chances of a woman’s children growing up—and thus propagating her genes. This is the so-called good mother hypothesis. Indeed, the few species where females live well beyond their reproductive years are ones whose offspring require extended maternal care. However, even in these species, there is a gradual loss of fertility rather than the abrupt change brought on by menopause. For example, although the fertility of elephants declines with age, they, unlike humans, can continue to have offspring until very late in life. Similarly, while living beyond childbearing age has also been observed in chimpanzees, menopause actually occurs near the end of their life span.
The grandmother hypothesis for the origin of menopause takes the idea one generation further. Proposed by the anthropologist Kristen Hawkes, it argues that living longer makes sense if a woman helps in the care of her grandchildren, thus improving their survival and ability to reproduce. But others contend that it is rarely better for a woman to give up the chance to pass on half her genes through continuing to have her own children for the sake of improving the survival of grandchildren, who only carry a quarter of her genes.
Another idea, based on studying killer whales, one of the few species that, like humans, has true menopause and lives in groups, is that menopause is a way to avoid intergenerational conflict. In some species that breed in groups, reproduction is suppressed in younger females, who act as helpers to older, reproducing females. But in humans, there is little overlap: women stop breeding when the next generation starts to breed. Women would have no interest in helping their mother-in-law have more children, since they would not have any genes in common. But a woman who helps her daughter-in-law reproduce will help to bequeath a quarter of her genes to her grandchildren. So her best strategy may be to stop breeding and help her daughter-in-law breed instead.
It could also simply be that the number of eggs in a female evolved to match its average life span in the wild. Steven Austad, now at the University of Alabama in Birmingham, points out that menopause may not be adaptive at all in the sense of favoring mothering or grandmothering. It was only about forty thousand years ago that we became much longer lived than Neanderthals and chimpanzees. So perhaps there has just not been enough time for the aging of human ovaries to adapt to that increased life span. In the absence of hard experiments, scientists, especially evolutionary biologists, love to argue.
THESE THEORIES OF WHY WE age depend on the idea of a disposable body being able to pass on its genes before it ages and dies. In doing so, the aging clock is somehow reset with each generation. Such theories should apply only to organisms where there is a clear distinction between parents and offspring. Certainly that distinction is true for all sexual reproduction. Sex evolved because it is an efficient mechanism to produce genetic variation in the offspring by generating different combinations of genes from each parent, allowing organisms to adapt to changing environments. In some sense, you could say that death is the price we pay for sex! While this may be a catchy statement, not all animals with a distinction between germ line and soma reproduce sexually. Moreover, scientists have found that even single-celled organisms such as yeast and bacteria age and die, as long as there is a clear distinction between mother and daughter cells.
The laws of evolution apply to all species, and all life forms are made up of the same substances. Biologists from Darwin onward have never ceased to be amazed that evolution, which is simply selecting for fitness—or the efficiency with which each species can pass on its genes—has given rise to the amazing variety of life forms on Earth. That variety includes a huge range of life spans, from those best measured in hours to those that may stretch more than a century. For human beings seeking to understand the potential limits of our own longevity, some surprising lessons can be learned from species across the animal kingdom.
2. Live Fast and Die Young
In springtime, my wife and I will often take a walk in Hardwick Wood near Cambridge to see the riot of bluebells that cover the forest ground. Once, we were walking along a path when we came upon a stone monument commemorating Oliver John Hardiment, a young man who died in 2006 at the age of twenty-five. Below his name was a quotation from the Indian writer Rabindranath Tagore: “The butterfly counts not months but moments and has time enough.”
The life of a butterfly can be as short as a week, and most live less than a month. As I considered the fleetingly short life of a typical butterfly, I was reminded of the contrast with something else that had fascinated me. I have often visited the American Museum of Natural History in New York, where there is an enormous section of the trunk of a giant sequoia tree. The tree was more than 1,300 years old when it was cut down in 1891. Some yew trees in Britain are estimated to be over 3,000 years old.
Of course, trees are fundamentally different from us because of their ability to regenerate. In the Cambridge University Botanic Garden there is an apple tree that was grown from a cutting from the tree under which a young Isaac Newton sat a few hundred years ago about a hundred miles north at Woolsthorpe Manor, the Newton family home. In fact, there are several “Newton” trees, all started as cuttings from the one with the famous apple that fell to the ground, allegedly inspiring Newton to formulate the theory of gravity. The question of whether these trees should be dated back to the root system of the original is interesting, but it is different from looking at the life span of animals.
Even in the animal kingdom, there are some species that possess tree-like properties. If you cut off one of a starfish’s arms, it can grow right back. A small aquatic animal called a hydra is even more impressive: it doesn’t seem to age at all and is able to regenerate tissue continuously. Still, it is a complex procedure. One study showed that a large number of genes are involved just for regenerating its head. All this for an organism that is barely half an inch in length.
If the hydra is remarkable, it is related to another sea dweller that can age backward—at least metaphorically. That species is Turritopsis dohrnii, also known as the immortal jellyfish. This jellyfish, when faced with injury or stress, will metamorphose into an earlier stage of development and live its life all over again. It is almost as if an injured butterfly could transform itself back into a caterpillar and start over.
Since hydra and the immortal jellyfish don’t exhibit obvious signs of aging, they are often called biologically immortal. This doesn’t mean they don’t die—they can and do die for all sorts of reasons. They still fear predators and must themselves obtain enough food to survive. Nor does it even mean that they cannot die of biological causes. But, unlike most every animal, their likelihood of dying does not increase with age.
Species such as hydra and the immortal jellyfish excite gerontologists because they may provide clues about how to defeat the aging process. But to me, their property of being able to regenerate entire body parts, or even a whole organism, makes them more similar to trees than to us. Although we may learn some fascinating things about their lack of apparent aging, it is not at all clear how relevant those findings will be to human aging. Sometimes biology is universal, especially if it relates to fundamental mechanisms. But in other cases, even discoveries in rats or mice, which are mammals and biologically much closer to us, are difficult to translate into humans. It may be a very long time before any findings gleaned from hydra or jellyfish are useful to us.
PERHAPS WE NEED TO LOOK at species that are more closely related to us—say, mammals, or at least vertebrates. Although this class of animals doesn’t span the enormous range of longevity from insects to trees, they still vary considerably. Some small fish live for just a few months, while a bowhead whale is known to have lived for more than 200 years, and a Greenland shark is thought to have lived almost 400 years.
What causes this large variation even among a particular group of animals such as mammals? Can we detect a pattern among these species just from some overall characteristics? Scientists have long looked for such relationships. Physicists, especially, love to look for general rules to make sense of disparate observations. Geoffrey West at the Santa Fe Institute is one such physicist who now works on complex systems, including aging. West takes a broad view, analyzing how cities and companies, as well as organisms, grow, age, and die. Along the way, he explores how some properties of animals scale across a wide range of sizes and longevities.
If you look at mammals, the larger the animal, generally speaking, the longer its life span. This makes evolutionary sense. A small animal is more vulnerable to predators, and there would be no point in having a long life span if it is going to be eaten long before it dies of old age. But the more fundamental reason for the relationship between size and life span is that size is related to metabolic rate, which is roughly the rate at which an animal burns fuel in the form of food to provide the energy it needs to function. Small mammals have more surface area for their size and so lose heat more easily. To compensate, they need to generate more heat, which means maintaining a higher metabolic rate and eating more for their weight. This means that the total number of calories burned per hour by an animal increases less slowly than the mass of the animal. An animal that is ten times as large burns only four to five times as many calories per hour. So for their weight, smaller animals burn more calories than larger animals. The relationship between how fast an animal burns calories and its mass is named Kleiber’s law after Max Kleiber, who showed in the 1930s that an animal’s metabolic rate scales to the ¾th power of its mass. The exact power is a matter of dispute and some show that for mammals, a ⅔rd power fits the data better.
Since heart rate also scales with metabolic rate, over a very wide range of sizes—from hamsters to whales—mammals typically have roughly the same number of heartbeats over their lifetime: about 1.5 billion. Humans currently have almost twice that, but, then, our life expectancy has doubled over the last hundred years. It is almost as if mammals were designed to last a certain number of heartbeats, much like a typical car can be driven about 150,000 miles. West points out that 1.5 billion is also roughly the number of total revolutions a car engine makes over its expected lifetime and asks, perhaps tongue in cheek, whether this is just a coincidence or whether it tells us something about the common mechanisms of aging!
These relationships suggest that there will be natural limits on life span because size and metabolic rate can vary only so much. For example, an animal cannot evolve to become arbitrarily large without collapsing under its own weight. Such an animal would also have great difficulty supplying its cells with the necessary oxygen. A metabolism must be fast enough for an animal to move and find food—and there are biological limits on how fast a metabolism is actually achievable if you are small. But within the allowable range, these rules hold remarkably well. Geoffrey West declares that just knowing the size of a mammal, he could use scaling laws to estimate almost everything about it: from its food consumption, to its heart rate, to its life span.
This is quite remarkable, and although it deals with averages, it sounds almost like a hard-and-fast rule that limits life span. But what of human beings’ marked increase in longevity over the past century? As West observes, this is a question of what one means by life span: we have almost doubled life expectancy in the last hundred years, but we have done nothing at all to increase the maximum human life span, which remains about 120 years. He argues that, according to the evidence, aging and mortality result from the wear and tear of being alive. Inexorable forces of entropy—a measure of disorder—that push in the direction of disorder and disintegration press against that dream of immortality. Unlike cars, which consist of mechanical components that we can swap out for new ones as they wear out, we cannot simply replace ourselves with new parts and keep going indefinitely.
WHILE THIS RULE-OF-THUMB CONNECTION AMONG size, metabolism, and life span is fascinating, biologists tend to be more interested in the exceptions. They love to study species that beat the system, in the hopes that they can tell us something about the underlying mechanisms of aging. One big question is whether there is a theoretical maximum life span or not. We have seen species such as hydra and jellyfish that seem not to age and can, in fact, continuously replace their worn-out parts. While biologists are well aware of the second law of thermodynamics—which states that in any natural process the amount of disorder or entropy increases with time—most would disagree that the law applies in some blanket form to aging and death, because living systems are not closed as the law requires but need a constant input of energy to exist. In fact, with a sufficient expenditure of energy, you can indeed reverse entropy when it comes to regularly cleaning your attic or hard drive; it is just that most of us don’t feel it is worth it.
As a result, biologists do not think that aging is inevitable. Rather, all evolution cares about is fitness: the ability to pass on our genes most efficiently. But living a long life is worth it only if you are not going to be eaten or die of disease or an accident long before you die of old age. Hence birds, which can escape predators by flying away, generally live longer than earthbound animals of about the same size. For those lucky animals that don’t have as much to fear from predators, living a longer life gives them more time to find a mate and reproduce. Slowing down their metabolism, so that they need not procure large amounts of food every day, may then simply be a way of surviving better into old age. In each case, the life span simply reflects how evolution has optimized the fitness of each species.
Steven Austad is a leader in aging research who studies exotic species with widely varying life spans. For a scientist, he has a highly unusual background: he majored in English literature at the University of California, Los Angeles, hoping to write the Great American Novel. Given that we’ve never heard of it, Austad jokes, one can see how that worked out. After graduation, while not writing his novel, he drove a taxi and worked as a newspaper reporter before spending several years taming lions, tigers, and other wild animals for the movie industry. This sparked an interest in science, and Austad went back to school to study animal behavior. From there, he became interested in the question of why animals age at different rates.
In 1991 Austad and his graduate student Kathleen Fischer examined the longevity of several hundred species. They discovered that, even among mammals, the relationship between body size and longevity disappears below a threshold of about one kilogram of body mass. Possessing a biologist’s instinct for the particular, the two of them then asked which species deviated most from this scaling law, coining what they called the longevity quotient. The LQ is the ratio of the average life span of the species to what it would be if it followed the scaling laws. This allowed them to focus on those species that deviate by either living much longer or much less than would be expected for their size.
The life span of animals generally increases with size. Estimates for the maximum life span of mammals are shown along with a line showing the general trend. In addition, points for the Major Mitchell’s cockatoo, Galapagos tortoise, and Greenland shark are shown. Data are taken from the AnAge database (https://genomics.senescence.info/species/index.html).
It turns out that humans already do rather well: we have an LQ of about 5, meaning that we live 5 times as long as would be expected. Nineteen mammalian species outperform us: eighteen species of bat and the naked mole rat. Over the years, Austad has studied these outlier species, and he describes them in colorful prose as befits his background in English literature. He poses this provocative question: Why do aging researchers study mice and rats, both of which have LQs of just 0.7, when they could be looking at these more exceptional species instead? There are many reasons why animals are chosen as model organisms, including ease of breeding and maintenance, and the ability to study their genetics. We have acquired tremendous knowledge of their biology over decades. Since the underlying mechanisms of aging are likely to be universal even if their rates are not, and studying short-lived animals could actually be an advantage by speeding up experiments, I am not sure that many in the gerontology community will rush to follow Austad’s advice. But I hope enough of them do, so that we learn how these unusually long-lived outliers have evolved such different rates of aging.
Among the species Austad describes are giant tortoises, such as the Galápagos tortoise, which holds the record for life span of a terrestrial vertebrate animal and can amble along for two centuries. There might well be a Galápagos tortoise still alive that was spotted by Darwin during his five-year voyage aboard the Royal Navy ship HMS Beagle from 1831 to 1836. Also, for much of their long life, they are remarkably free of diseases such as cancer. Determining the LQ of these tortoises is tricky, though. For one, their exact age is hard to determine, since their history is usually poorly documented and the subject of much exaggeration. Even thornier is the question of what a tortoise truly weighs. Much of their body mass consists of their protective shell, which is more like our hair and nails than highly active tissue, so drawing comparisons with other animals can be misleading.
These giant tortoises may not be alone in their longevity. Two studies that evaluated survival data from various turtles and other reptiles and amphibians found negligible senescence in a number of turtles and other species. The biologist’s term negligible senescence, which means little or no increase in mortality, has been interpreted popularly to mean “eternal life,” but this is a bit of a misnomer. Actually, it means that mortality, or the likelihood of dying, does not increase with age.
The relationship between mortality and age was worked out in 1825 by Benjamin Gompertz, a self-educated British mathematician. Gompertz worked for an insurance company, and so was naturally interested in the question of when a person seeking to purchase coverage might die. By digging through death records, he discovered that starting in our late twenties, the risk of dying increases at an exponential rate year after year. It doubles roughly every seven years. At age 25, our probability of dying in the next year is only about 0.1 percent. This rises to 1 percent at age 60, 6 percent at age eighty, and 16 percent at age 100. By the time a person reaches 108 years old, there is only about a 50 percent chance of making it another year.
Negligible senescence, when the probability of dying is constant rather than exponentially increasing with age, violates Gompertz’s law. But even if there is negligible or even negative senescence, you still face a probability of dying every year from age-related diseases, quite apart from dying of infections or accidents. Aging involves more than increasing mortality with age. It also depends on maintaining the physiology of the animal. The long-lived tortoises show unmistakable signs of aging. Like elderly humans, their eyesight and heart gradually fail. Some of them develop cataracts. Some become feeble to the point where they need to be fed by hand. So these animals do age, just slowly.
Moreover, biological time for tortoises is very different: they live life in the slow lane. They are not warm-blooded creatures like us mammals. They move slowly and reproduce slowly, often taking several decades to reach puberty in the wild. Their hearts beat only once every ten seconds, and they breathe slowly. Despite their long chronological lives, they fit the metabolic rate theory of longevity.
Other long-lived species are aquatic, such as the Beluga sturgeon and the aforementioned Greenland shark. Like the tortoise, they too aren’t in any hurry. Greenland sharks swim more slowly than a normal eighty-year-old human walks, and they seem to be scavengers, rather than catching prey. Perhaps more extraordinary than the Greenland shark is the bowhead whale. This baleen whale lives in freezing Arctic waters, but because it is a warm-blooded mammal, its internal body temperature is only a few degrees lower than that of most other mammals. Moreover, it eats about three times more than was previously suspected, implying a metabolic rate three times higher than was thought. How such an animal can survive for about 250 years is still a mystery.
The Greenland shark and the bowhead whale are large aquatic vertebrates, but there are much smaller terrestrial outliers too. One particularly interesting example is Major Mitchell’s cockatoo, a striking white bird with a pink face and a vibrant bright red and yellow crest that resembles a radiating sun. This cockatoo has been known to live to eighty-three years in a zoo. This would not be exceptional for a human, but the bird is far smaller. So this is definitely not a species that fits the general relationship among size, metabolic rate, and life span.
Remember how the relationship between mass and longevity for mammals disappeared below one kilogram? That’s largely due to bats. Bats do not live as long as Major Mitchell’s cockatoo, but they generally outlive nonflying mammals of the same size, which is exactly what evolutionary theories would predict, since their ability to fly allows them to evade predators. In keeping with this, bats that roost in caves, and are thus further protected from predators, live almost five years longer than those that don’t. The champion is Brandt’s bat, a small, brown animal that fits comfortably in the palm of your hand. A male of the species was recaptured in the wild forty-one years after it was originally banded. Austad estimates that its LQ of about 10 is the highest known for any mammal and about twice that of humans.
Another reason bats are thought to live longer is that they slow down their metabolism during their long periods of hibernation. On average, bats that hibernate live six years longer than those that don’t. But even bats that don’t hibernate live exceptionally long for their size, so clearly metabolic rate is not the only reason for their longevity. Rather, they may have special mechanisms that protect them from aging.
One curious feature is that the longest-lived Brandt’s bats on record are males. This is certainly different from humans. Austad speculates that this could be because female bats are less agile in flight and more susceptible to predators when they are pregnant, because they carry more than a quarter of their own body weight. They also face much greater energy demands in feeding their young.
Finally, no discussion of long-lived animals would be complete without mentioning the remarkably ugly, nearly hairless rodent that has become something of a darling of the aging research community: the naked mole rat. Despite the name, it is neither a mole nor a rat but a species of rodent that is indigenous to equatorial East Africa. It is about the same size as a mouse, but whereas a mouse lives roughly two years, a naked mole rat can live for more than thirty. This gives it an LQ of 6.7—not as high as Brandt’s bat, but a record for a terrestrial nonflying mammal. How do they do it?
Rochelle Buffenstein, currently at the University of Illinois in Chicago, has done more than perhaps anyone else to understand the biology of aging in the naked mole rat. As a result of work by her and many others, we know that naked mole rats are one of a small number of mammals that are referred to as eusocial: they live in underground colonies with a queen, and, in that sense, are reminiscent of ants. As one might expect, they have a very low metabolism and are tolerant of oxygen levels so low that they would kill mice—and us. In the wild, naked mole rat queens live much longer than workers: about seventeen years compared with two to three years. But in the lab, where worker naked mole rats live a comfortable, well-fed life with good health care and no predators, the difference is not so stark.
Not surprisingly, naked mole rats are extremely resistant to cancer, regardless of age—again, in marked contrast to mice. Even more strikingly, when Buffenstein and her colleagues tried to induce cancer in naked mole rat skin cells using techniques that worked reliably for other species, they could not do it. According to their 2010 study, instead of proliferating like cancerous cells, the naked mole rat cells entered a terminal state and were cleared away, suggesting that they respond to cancer-causing genes very differently.
One of the biggest headlines about naked mole rats was generated by the observation that they seem to violate Gompertz’s law: their risk of dying seems not to increase with age. As a result of these findings, no animal has been hyped as much as the naked mole rat, with both the popular press and news articles in scientific journals touting each discovery as a major breakthrough in the quest to defeat aging. This was too much for some scientists, who pointed out that naked mole rats do age, just more slowly than might be expected for their size. As we saw with long-lived tortoises, they show many signs of aging, including lighter, thinner, and less elastic skin resembling parchment, as well as muscle loss and cataracts. They are not like hydra and the immortal jellyfish, which can regenerate themselves with ease. Still, as exceptionally long-lived mammals, they could provide important clues into our own aging processes.
IT IS TIME TO LEAVE these unusually long-lived species and focus on the one that interests us most: ourselves. Most crucially: How long can human beings live? And is this limit fixed, or can it be changed?
For most of human history, life expectancy was just over thirty. But today, in developed countries, we can look forward to living into our mid-eighties. Even in poorer countries, a person born today can expect to live longer than the grandparents of people in the richest countries. The science writer Steven Johnson makes the point that this is like each of us acquiring an entire additional life.
When we say life expectancy, we mean life expectancy at birth, or the average number of years a newborn would live if current mortality rates remained unchanged. This value, as you can imagine, is greatly affected by infant mortality rates. Even in the nineteenth century, when life expectancy was forty years, a person who reached adulthood had a good chance of living to be sixty or more. Most of the increase in life expectancy has come about because of improvements in public health rather than groundbreaking advances in medicine. Johnson observes that the three biggest contributors have been modern sanitation and vaccines, which both prevented the spread of infection, and artificial fertilizers. Other significant innovations were antibiotics, blood transfusions (crucial for accidents and surgery), and sterilization of water and food by chlorination and pasteurization.
The inclusion of fertilizers may surprise you, but prior to the ready availability of food—which has brought about its own problems of obesity, diabetes, and cardiovascular diseases—humans were constantly struggling to get enough to eat. Chemical fertilizers include nitrogen-containing compounds and have increased crop yields several-fold. The ability to chemically capture nitrogen from the air, a discovery for which Fritz Haber received the Nobel Prize in 1918, made it much easier to synthesize fertilizers and helped to double the world’s population. Interestingly, almost half of the nitrogen atoms in our bodies went through a Haber-Bosch high-pressure steam chamber that converted atmospheric nitrogen to ammonia for use in fertilizers, which then ended up in the food we ate and became incorporated into ourselves.
Haber himself was a tragic figure. A German Jew, he was intensely loyal to Germany during World War I, and his method for fixing nitrogen into ammonia enabled the country to prolong the war by producing its own explosives. Prior to that, its military had been importing nitrates from Chile, which became impossible due to the Allied Powers’ wartime blockade. He also initiated the use of chemical warfare against the Allies, who denounced him as a war criminal. At the same time, his Jewishness trumped his loyalty to Germany. Soon after the Nazis assumed power, he had to flee Germany in 1933 although he was a world-famous scientist and director of a prestigious institute in Berlin. After a brief sojourn in England, he set out for Rehovot in what is now Israel, but died mid-journey of heart failure in a hotel in Basel, Switzerland.
Back to life expectancy: preventing infectious disease dramatically reduced infant mortality, which is now as low as 1 percent in advanced countries and about 3–4 percent worldwide. But there has been progress across the rest of the aging curve as well. Public health measures for safety, regulations against smoking, and better treatments for life-threatening illnesses such as cardiovascular disease and cancer have all added up to a slow but steady increase in life expectancy beyond sixty years of age. Does this mean that our life expectancy might go on increasing indefinitely?
Ever since humans became aware of their mortality, we have wondered whether our life span has a fixed limit. Scientists aren’t sure.
Jay Olshansky of the University of Illinois at Chicago says yes. He examined how much we would gain by eliminating various common causes of death such as cancer, heart disease, and other diseases. Based on statistical calculations, he argued that for life expectancy to increase dramatically, we would need to reduce mortality rates from all causes by 55 percent and even more at older ages. He and his colleagues contended that average life expectancy would likely not exceed eighty-five and that it would not exceed a hundred until everyone alive today had died. Even curing all forms of cancer would add only four to five years on average.
In the other corner was the late James Vaupel, who maintained that life span is elastic. If evolutionary theories were strictly correct, then our maximum life span should be adapted for life in the wild and thus not much more than about thirty to forty years. But, as you know, life expectancy has more than doubled. Moreover, in certain species, such as some tortoises, reptiles, and fish, mortality actually falls and then levels off, presumably because as these creatures grow larger, they can better resist starvation, predators, and disease; senescence is not inevitable.
The disagreements between the two boiled into a sort of scientific blood feud, with Vaupel refusing to attend any meetings where Olshansky was present, and attacking his findings as a “pernicious belief sustained by ex-cathedra pronouncements.” Olshansky, for his part, feels that demographers relying purely on statistics fail to consider biology. In agreement with this, an analysis of the lives of primates implies that there are biological constraints on how much the rate of human aging can be slowed.
Of course, life expectancy at birth is not the same as the maximum possible life span, and it is that maximum that tends to interest us more than averages. We want to know how long it is theoretically possible for humans to live. Most cultures have writings about prophets and sages who allegedly lived for hundreds of years. In Western culture, the name Methuselah has become synonymous with longevity, after the biblical prophet who is said to have lived 800 years. In somewhat more recent times, the Englishman Tom Parr, who died in 1635, was said to have lived for 152 years, but this has been thoroughly debunked. Unlike most people, for whom childhood memories are the strongest, “Old Tom” could remember nothing of his youth.
The oldest person for whom we have reliable records is Jeanne Calment, who died at the age of 122 in 1997. She lived in Arles, the town in southern France where van Gogh resided near the end of his life. She actually met the troubled artist in her teens, describing him as “very ugly, ungracious, impolite, and sick.” Apparently Calment had a sharp wit. As she grew older and older, journalists began to gather around her on each birthday. When one of them took leave by telling her, “Until next year, perhaps,” she retorted, “I don’t see why not! You don’t look so bad to me.”
Calment was in very good health for nearly her entire life, riding a bicycle until she was a hundred. It is hard to know what contributed to her longevity, beyond genetics. She smoked for all but the last five years of her life. While this is not an example we should follow, many of us might be tempted to emulate her habit of eating more than two pounds of chocolate every week. While Calment’s robust physical condition even late in life was extraordinary, it did not mean that she did not age; for instance, she was blind and deaf for many of her final years.
Calment is the record holder, but one has to remember that she was born almost 150 years ago, in 1875. It is almost a miracle that she survived for so long in the age before antibiotics and other advances in modern medicine. Given the even greater progress made since then, might we expect today’s humans to live much longer?
A few years ago, Jan Vijg and his colleagues at the Albert Einstein College of Medicine in the Bronx published a study that analyzed demographic data from several countries to look at shifts in the population of each age group. As life expectancy improves, the fastest growing segment of the population is usually the oldest, since many more people reach the threshold for that group. For example, in France in the 1920s, 85-year-old women were the fastest growing group. By the 1990s, the fastest growing group were 102-year-olds. You might expect that with time, this would shift to even older ages. But the study showed that improvements in survival decline after age 100, and the age of the oldest person has not increased since the 1990s. Vijg predicted that the natural limit of our life span is about 115 years; there will be occasional outliers such as Jeanne Calment, but he calculates that the probability of anyone exceeding 125 in any given year is less than 1 in 10,000.
This conclusion was contradicted a couple of years later by a study examining records of men and women in Italy who had reached the age of 105 between 2009 and 2015. It concluded that mortality rates plateaued after the age of 105, in an apparent violation of Gompertz’s law. The researchers went on to say that a limit to longevity, “if any, has not been reached.” This paper in turn was criticized by one of the authors of the earlier study, who felt that it was rather far-fetched that after increasing exponentially for most of one’s life, the chance of dying should plateau in extreme old age. Others pointed out that most of the cohort did, in fact, follow Gompertz’s law, so the plateau came from less than 5 percent of the mortality data. Moreover, they argued that even if mortality did plateau after age 105, the likelihood of anyone surviving much beyond Calment’s 122 years was remote, in the absence of major biomedical advances. It is a question of statistics. At today’s rates, the odds of surviving each year after 105 is only about 50 percent; to beat Jeanne Calment’s 122 would be like tossing a coin seventeen times and having it come up heads every time. Those odds are about 1 in 130,000.
Recent data support the views of Vijg, Olshansky, and other proponents of a limit to maximum life span. After climbing steadily for the last 150 years, the annual increase in life expectancy slowed down globally around 2011 to a fraction of what it had been in previous decades, and plateaued from 2015 to 2019 before falling precipitously as a result of the Covid-19 pandemic. The pandemic, like the influenza epidemic that gripped the world in 1918–19, killing an estimated 50 million people, was an exceptional situation. But we weren’t making progress even in the handful of years before the pandemic. Why not is unclear. It could be due to the rising epidemic of obesity and associated scourges such as type 2 diabetes and cardiovascular disease. As people live longer, Alzheimer’s and other neurodegenerative diseases are responsible for an increasing share of deaths, and there is currently little treatment for them.
In any case, although the number of people who live to be 100 keeps increasing, nobody has beaten Calment’s record of 122 in the twenty-five years since she died. The next oldest person, a Japanese woman named Kane Tanaka, died in 2022 at the age of 119. As I write this, the oldest living person is Maria Branyas Morera of Spain, who is 116 years old. What is striking is that these extremely long-lived people are all women. Now that death rates due to childbirth have been reduced dramatically, life expectancy for women is greater than that of men in nearly every country.
Even if nobody beats Calment’s record soon, there remains great interest in why some humans live exceptionally long. Thomas Perls, who heads the New England Centenarian Study, has been studying centenarians for several decades. As a practicing physician who specializes in geriatrics, he confronts the realities of aging in his patients every day. He investigates the health history, personal habits, and lifestyles of centenarians, along with what is known about their family histories and genetics. In one large study, Perls concluded that centenarians fell into three classes. About 38 percent were what he called Survivors, who had been diagnosed with at least one age-associated disease before the age of eighty; another 43 percent were Delayers, who developed such a disease after the age of eighty; and the last group consisted of Escapers, the 19 percent who reached their hundredth birthday without being diagnosed with any of the ten most common age-associated diseases. In fact, about half of centenarians celebrated turning one hundred without heart disease, stroke, or non–skin cancer, which is extraordinary.
Perls says that centenarians generally maintain their independence up through their early to mid-nineties. For those who live beyond 105, that independence can be observed at least through age 100. So it appears that centenarians survive for so long by staying healthy longer than most people, rather than going through a prolonged period of living with diseases of old age. Perls also told me that he has seen an increase in the number of people aged 100 to 103, a likely reflection of improvements in medicine and lifestyle over the last few decades, but, beyond that, he is not seeing an increase—perhaps because genetics play such an influential role in survival to those extreme ages. He agrees with Olshansky that currently there is a natural limit on our life span.
Perls and other researchers are now sequencing the genomes of centenarians, and he plans to also study the modifications in DNA that accumulate with age. These studies could reveal the underlying biology of extreme longevity in ways that could be very useful to the rest of us. In the meantime, based on what he has learned so far, Perls has developed a website, livingto100.com, which asks visitors questions about themselves, and spits out an estimated life span, along with suggestions for how to improve it. A few findings may surprise you: it recommends tea over coffee, reducing our intake of iron (often found in multivitamins), and flossing regularly. But many of the suggestions are what one might expect: eating moderately and healthily and avoiding fast food, processed meat, and excessive carbohydrate consumption, as well as exercising and maintaining a healthy weight, getting adequate sleep, reducing stress, staying mentally active, and having an optimistic outlook. It helps not to have diabetes, and having a close family member who lived to be over ninety is a big plus. Since my father, at ninety-seven, still does his own laundry, grocery shopping, and cooking—making complicated Indian recipes and his own ice cream from scratch—I may have lucked out.
The debate about whether there is a limit to human longevity led to a famous bet. At a 2001 meeting, a reporter asked Steven Austad when we would see the first 150-year-old human. None of the other scientists wanted to go out on a limb, but Austad blurted out, “I think that person is already alive.” When he read about this, Olshansky, who remains skeptical of exceptional longevity, called up Austad and challenged him to a friendly bet. You might think that this was a safe bet since they would both be dead before it could be decided, but they’d already thought of that. The two men agreed to put $150 each into a fund for 150 years, which, Austad notes, had a nice symmetry to it. A back-of-the-envelope calculation by Olshansky suggested that in 150 years, $150 could turn into about $500 million to be won by either them or one their descendants. A dozen years later, nobody had yet approached the age of Jeanne Calment, but both of them still felt confident, so they doubled the bet, with each putting another $150 into the pot, raising the potential stake to a cool $1 billion 150 years from now—although it is not clear what $1 billion would actually buy at that point.
Why did Austad make this bet? It is not as if he believes that just because we are getting better at treating diseases of old age such as cancer, stroke, and dementia, people will live thirty years more than Calment. In fact, on that point, he and Olshansky agree. Rather, Austad believes that research on aging will result in game-changing medical breakthroughs. The scientists disagree mainly on how rapidly these innovations will occur.
We have now explored how evolutionary theories help us understand why death occurs at all, and how the optimization of fitness by evolution has resulted in a huge range of life spans in different species. We have also explored whether there are biological limits to our own life span. But none of this tells us how aging occurs and how it leads to death.
The quest to defeat aging and death is centuries old, but findings from modern biology over the last half century have led to an explosion of knowledge about exactly what goes on in our bodies as we age. As we noted before, aging is simply an accumulation of damage to our molecules, cells, and tissues due to a variety of causes that bring about increasing debilitation and eventually death. An aging body changes in so many ways that it is hard to glean which factors cause aging and which are simply its consequences. But scientists have homed in on a small number of hallmarks of aging. According to them, such a hallmark should have three characteristics: first, it should be present in an aging body. Second, an increased presence of the hallmark should accelerate aging. Third, reducing or eliminating the hallmark should slow aging.
These hallmarks exist at every level of complexity, from molecules, to cells, to tissues, to the interconnected system we call our body. No hallmark exists in isolation; they all influence one another. Thus aging doesn’t have one or even a few independent causes. It is a highly intricate and interconnected process.
It is easiest to make sense of it all if we start at the most basic level of complexity: with the molecule that could be thought of as the ultimate command and control center of the cell.
3. Destroying the Master Controller
The ancient site of Hampi in South India offers a stark contrast to the thriving metropolis of London. The grand city that existed for more than a thousand years and at its peak in the early sixteenth century was second in wealth only to Beijing is now a collection of well-preserved granite ruins about fifteen miles from the nearest railway station. The once-bustling marketplaces and intricately carved temples and palaces are now only alive with camera-toting tourists. It was once the London of its time: the seat of an empire and a flourishing center of trade and culture. When I travel to London, I simply cannot imagine the city ever not existing, and the inhabitants of Hampi probably thought the same. This failure of imagination extends to us as individuals too. Even if we know we are going to age and die, in our daily lives, unless we are terminally ill, we carry on as if we are immortal.
How could a thriving, vibrant city like Hampi have disintegrated and no longer exist? Throughout history, one of the fastest ways for a society to crumble was the breakdown of law and order resulting from a government’s loss of control due to civil unrest or a war. And just as with society, loss of control and regulation in biology leads to decay and death, not only of the cell but of the entire organism.
Unlike a functioning society run by a government, there is no central authority in the cell that supervises its thousands of components as they go about their business. So is there even a counterpart in the cell of a command and control center? Perhaps the closest thing is our genes, which reside in our DNA. The nature of genetic information in our DNA and the ways it becomes corrupted over time are essential for understanding aging and death.
We didn’t even know about genes as an entity until the late nineteenth century. Most of us think of genes as traits that we inherit from our parents and pass on to our children. We may think of good genes, reflected in positive traits, or bad ones, characterized by disease or defects. But genes are better described as units of information. They contain information not only on how to reproduce an organism and pass on its traits, but also on how to build an entire organism from a single cell and keep it functioning.
Among the most important information that genes contain is how to make proteins. We normally think of proteins as essential components of our diet, and we know they are used to build muscle. In fact, our body contains thousands of proteins. Not only do they give the body form and strength, but they also carry out most of the chemical reactions that are essential for life. They regulate the flow of molecules in and out of cells. They allow our cells (and us) to communicate with one another. They are the reason we can sense light, smell, touch, and heat. Our nervous system depends on proteins to transmit nerve signals and even to store memory. The antibodies we use to fight infections are proteins. Proteins also enable the cell to manufacture all the other molecules it needs, including fat and carbohydrates, vitamins, and hormones, and—to complete the circle—even our genes. Proteins are everywhere. And every one of these proteins is made by following instructions in a gene.
Exactly how genetic information is stored and used remained a huge mystery until relatively recently. Even in the 1940s, scientists still didn’t understand the molecular nature of genes. Today we know that our genes reside in DNA, a long molecule that consists of two strands wrapped around each other in a double helix. Each strand of DNA has a backbone made up of alternating groups of phosphate and a sugar called deoxyribose. If that were all DNA was, it would just be like any other repeating polymer such as polyethylene or other plastics, and incapable of carrying information. But DNA is able to encode instructions because each sugar in its backbone is attached to one of four types of chemical groups called bases. These bases are adenine (A), guanine (G), thymine (T), and cytosine (C). This phosphate-sugar-base unit is the building block of DNA, known as a nucleotide.
You can think of each building block as a letter, and a DNA chain as a very long sentence written using this four-letter alphabet. Just as a particular sequence of letters can form a sentence that conveys meaning and information, suddenly you could imagine how DNA could too, but it was still not at all clear how. This changed dramatically in 1953 when the three-dimensional structure of DNA was deduced by James Watson and Francis Crick. Normally, the structure of a molecule only hints at how it might work, but DNA was different. Its structure immediately shed light on how the sequence of bases could transmit information, transformed our understanding of genetics, and ushered in the current revolution in molecular biology. Without it, we would have had no hope of understanding the workings of life or unlocking the secrets of why we age.
Genetic information stored in our chromosomes in the form of DNA is copied (transcribed) into mRNA in the nucleus. The mRNA then moves to the cytoplasm, where ribosomes read it to make proteins.
In DNA, two strands running in opposite directions are wrapped around each other in a double helix. A base from one strand chemically bonds, or pairs, with the base directly across from it in a very specific way: an A pairs only with a T or vice-versa, and a C with a G. Hence the magic of DNA: if you know the sequence of bases in one of the two strands, you can determine the sequence of the other. This also means that if you separate the two strands, each of them has the information to make the other, enabling you to create two identical copies of the molecule from an original. Suddenly an age-old problem was solved: How could you get two daughter cells, each of them possessing exactly the same genetic information as the single parent cell? Genetics had become chemistry: we could understand at the molecular level how genetic information could be duplicated and passed on to a new generation.
Still, there remained the second question of how genetic information in DNA actually codes for proteins. It turns out that the section of DNA that codes for a gene is copied into an intermediate molecule called ribonucleic acid. RNA is similar to DNA but with some important differences. Unlike DNA, it has only one strand, and instead of deoxyribose, it has a sugar called ribose. In RNA, the thymine (T) base is replaced by uracil (U), which is slightly different chemically but pairs with A just as T does.
Think of DNA as the collection of all our genes, much as the British Library or the US Library of Congress are collections of all the books published in their respective countries. Those libraries are not likely to let you take a valuable eighteenth-century book home to read at your leisure. But they can often provide a copy of it to take home. Similarly, RNA is a working copy of the gene that can be used by the cell.
Not every piece of DNA that is copied to RNA codes for a protein. Some RNAs are part of the machinery that is used to make proteins. Others can even control whether certain genes are turned on or off. But when an RNA is made from a gene that codes for a protein, it is called messenger RNA, or mRNA, because it carries the genetic message for how to make that protein. We’ve heard a lot about mRNA recently in connection with vaccines for Covid-19. These vaccines are made from mRNA molecules that contain instructions on how to make the spike proteins that are on the surface of the virus that causes Covid-19. When those mRNA molecules are injected into us, our cells read the instructions in it and produce the corresponding spike proteins, which in turn trains our immune system to be ready to fight the real Covid-19 virus.
How instructions in mRNA are read to make proteins was a hard puzzle that took over a decade to crack. The problem scientists faced was that proteins too are long chains, but of completely different types of building blocks called amino acids. Unlike DNA and RNA, which have four types of bases, there are at least twenty different types of amino acids. If proteins were like sentences written in a twenty-letter alphabet, how could they translate those sentences from the four-letter language of genes? The way nature has solved this problem is that groups of three bases (or letters) in mRNA are read as a code word, or codon, each of which specifies an amino acid. The whole process takes place on the ribosome, a giant, ancient molecular machine that consists of almost half a million atoms.
I have spent much of my life trying to understand how the ribosome carries out the complicated process of reading mRNA to synthesize a protein. What seems miraculous is that as the newly made protein chain emerges from the ribosome, the sequence of its amino acids contains within itself the information needed for the protein chain to fold up into a particular shape so that it can carry out its function. It is akin to writing different sentences on strips of paper and, depending on what I had written, each strip would magically fold itself into its unique shape. This ability of a protein chain to fold itself up is why the one-dimensional information contained in our genes allows us to build the complex three-dimensional structures that make up a cell—and, eventually, us.
The gene doesn’t just contain information on how to make a protein. The part that specifies that is called the coding sequence, but flanking it are regions (non-coding sequences) that signal when to make the protein, when to stop, and even whether to make it quickly or slowly, for a brief while or for a long time. These signals are turned on or off either by chemicals in the environment or by other genes. Genes, in other words, don’t act alone; they form a giant network with lots of other genes, as well as the broader environment. This is why some proteins are made by all our cells, but others only by specific cells, such as skin cells or neurons. And why some proteins are made only at certain stages in our development from a single cell to a complete human being. The precise orchestration of this network of thousands of genes is what makes life possible.
You could think of the process of life as an enormous program that somehow activates itself using the blueprint provided by DNA. The word blueprint is a convenient metaphor, but we should not take it too literally, because a blueprint implies a rigid manufacturing process that produces a strictly defined product. Unquestionably, DNA is the central hub for regulating the overall program of the cell. But I think of the cell as more like a democracy than a dictatorship. Just as an ideal government is not autocratic but responsive to the needs of its people over time, DNA does not dictate the entire process. Rather, conditions in the cell and its environment decide which parts of the DNA are used, as well as how often and when.
UNDERSTANDING THE MOLECULAR BASIS OF genetics has transformed modern biology, but what does it have to do with aging? If the genes in our DNA specify the program of the cell, why doesn’t the program just keep running forever? The problem is that the DNA itself changes and deteriorates with time.
Of course, genes and mutations were studied long before we knew about DNA. Prior to DNA, the only way to determine whether an organism had a genetic mutation was when it resulted in a change in an observable trait. Today we know that mutations are simply changes in the bases of DNA. Changing bases in DNA is the equivalent of changing letters in a sentence. Sumtymes we can still dicifer the same meening, but other times, just a single change can be confusing or even have the opposite meaning—for example, if we change the word hire to fire.
Now that we can sequence DNA—or determine the precise order of bases in any piece of DNA—we can see that mutations happen all the time. Many of them have no observable effect. This is because even with the change to the DNA, the altered gene functions just as well; or the organism has redundant genes, so that if one is defective, the others can compensate for it. Other mutations can be harmful to varying degrees because they result in proteins that are defective; or proteins that are produced in the wrong amounts or at the wrong time.
Sometimes, mutations can actually be beneficial. For instance, if the mutation occurs in a germ-line cell, it might very occasionally give offspring an advantage that facilitates their survival. A species that is uniformly the same could be wiped out by some pestilence, like trees susceptible to Dutch elm disease, or by sudden changes in the climate or geography. Mutations can give rise to genetic variability in a population and make it more resilient by increasing the likelihood that some strains might survive better than others as conditions change. Without mutations, there would be no evolution; we would never have emerged from primitive molecules. The cell, then, must strike a balance, tolerating enough mutations in the germ line to allow variability and evolution, but not allowing so many mutations in our somatic cells that the complex process of life begins to break down.
A societal breakdown of law and order can bring about chaos, mass starvation—even the annihilation of entire cities and civilizations. The worst criminal elements often take advantage during turbulent times, usurping power and making life miserable for everyone else. Similarly, loss of control in biology can lead to deterioration and death as well as to many diseases. One of the worst examples of cells misbehaving is cancer, in which aberrant cells are no longer inhibited by neighboring cells but instead multiply unchecked and take over entire tissues and organs, interfering with their functioning. In that sense, cancer and aging are intimately related: they both arise from a biological loss of control, and their ultimate source is often mutations in our genes, owing to changes in our DNA.
LONG BEFORE WE KNEW OF DNA, there were hints that environmental agents could cause what we now know to be genetic mutations. As early as the eighteenth century, the English surgeon Percival Pott discovered that the country’s chimney sweeps, many of them children, had abnormally high rates of cancer of the scrotum. He attributed this to their excessive, prolonged exposure to the soot and tar from burned coal. In 1915, Yamagiwa Katsusaburo, a professor of pathology at the Tokyo Imperial University, demonstrated that applying coal tar to the ears of rabbits caused skin cancer. These products of coal would later be identified as cancer-causing agents, or carcinogens, but when Pott made his observations, nobody had any idea what cancer was, and even when Katsusaburo reported his results, the link between cancer and genetic mutations was still decades away.
The first direct evidence linking an environmental agent to mutations was discovered by a scientist with a remarkably peripatetic life. Hermann Muller was a third-generation American who grew up in New York City and entered Columbia College (now Columbia University) at the precocious age of sixteen, graduating in 1910. He stayed on at Columbia for his PhD, working with the famous geneticist Thomas Morgan, who had used fruit flies to show that genes resided in the chromosomes in our cells.
Later, Muller moved to the University of Texas, where, in a key experiment in 1926, he subjected fruit flies to increasing doses of X-rays. As he ratcheted up the dose, the number of lethal mutations rose dramatically. Even a modest application of X-rays produced 35,000 times as many mutations than would have occurred spontaneously. Muller’s work advanced genetics tremendously by making it much easier to produce mutations, and also raised awareness of the danger of X-rays and other radiation. At the time, people used X-rays rather cavalierly—it was common for shoe sellers to X-ray the feet of their customers in the shoes they were considering.
Like many geneticists in the early twentieth century, Muller was a proponent of eugenics for much of his life and thought of it as a way for improving the human species. Oddly for a eugenicist, he was also quite left wing, a result of his disillusionment with capitalism in the wake of the Great Depression. He recruited lab members from the Soviet Union and as a faculty advisor, helped edit and distribute a leftist student newspaper called The Spark, which spurred the FBI to investigate him.
Partly as a result, in 1932 Muller left the United States for Berlin. Discouraged by the rise of Hitlerism, he left the following year for the Soviet Union, believing that the environment there would be more conducive to his left-wing views. He spent a year in Leningrad before moving to Moscow for a few years. He had not, however, reckoned with the rise of Trofim Lysenko, the Soviet biologist and charlatan who had ingratiated himself to Stalin. Lysenko viewed genetics as inconsistent with socialism, and instead espoused a number of crazy ideas in agriculture, while ruthlessly wielding his power to suppress or destroy any biologist who dared question him. In doing so, he contributed to famines that killed millions of people and set back Soviet biology by decades. Muller and other geneticists did what they could to counteract Lysenko, but eventually Muller incurred Stalin’s wrath for his views on both genetics and eugenics and had to flee.
Not yet ready to return to the United States, where the FBI was still investigating him, Muller ended up at the Institute for Animal Genetics at the University of Edinburgh in 1937. There he helped catalyze another important discovery. He joined a lively group of scientists, many of them refugees from totalitarian regimes, under the direction of pioneering medical geneticist Francis Crew.
One of Crew’s key collaborators, Charlotte Auerbach, had been born to an academic Jewish family in Krefeld, Germany. Auerbach, known as Lotte, was an independent thinker who did not take well to being told what to do. While studying for her PhD in Berlin, her professor refused her request to change her project, so she simply quit and became a high school teacher. She found teaching and keeping order in class exhausting, perhaps not helped by the increasing antisemitism of the time. In what turned out to be a blessing in disguise, she was summarily dismissed in 1933 at the age of thirty-four because she was Jewish. On her mother’s advice, she left Germany, and, with the help of friends of the family, was able to finish her PhD at the Institute for Animal Genetics, where she worked with Crew. In 1939 she became a British citizen; later that year, her mother showed up in Edinburgh without any money or baggage, having made it out of Germany just two weeks before World War II broke out.
Crew’s initial attempt to bring Auerbach and Muller together was not a success. He introduced her to Muller and simply told him, “This is Lotte, and she is going to do cytology for you.” But Auerbach had no interest in spending her time peering through a microscope to characterize Muller’s cells, and, independent minded as always, she refused. She told Muller that she was really interested in how genes enabled development. To his credit, Muller told her that he wouldn’t dream of having someone work with him on a project that didn’t interest her. However, he persuaded Auerbach that if she wanted to pursue her interest in understanding the role of genes in development, she needed to produce mutations in them and see their effects.
Around this time, a colleague of hers, Alfred J. Clark, had noticed that soldiers exposed to mustard gas in World War I exhibited lesions and ulcers that resembled the effects of exposure to X-rays. Auerbach, along with Clark and their colleagues, exposed fruit flies to mustard gas, checking for mutations using the methods Muller had pioneered. It says something about their dedication that their experiments were carried out on the roof of the Pharmacology Department in cold, wet, blustery Edinburgh. The experimental conditions would never pass a workplace health and safety inspection today: the fruit flies were exposed to the gas in vials and afterward were removed by hand, causing serious burns to the workers. In any case, the results were unambiguous. Exposure to mustard gas had resulted in ten times as many lethal genetic mutations. Chemicals, like radiation, could also cause mutations.
MULLER AND AUERBACH’S WORK SHOWED how our genetic blueprint could be damaged by environmental agents such as radiation or chemicals. At the time, we didn’t even know that DNA was the genetic material, let alone how the information it carried could be corrupted. But once Watson and Crick revealed its double-helical nature, the question naturally became how exactly did these agents cause changes in our DNA that resulted in mutations?
Studying the biological effects of radiation had been something of a stepchild of the life sciences before World War II. But once the world saw the horrible effects of radiation wrought by the two atomic bombs dropped on Japan in August 1945, the US government became very interested in this once sleepy field. After the war, many of the sites that had been used for the Manhattan Project to develop nuclear weapons were converted to radiation biology research centers. One of these was Tennessee’s Oak Ridge National Laboratory, which had originally been the site for producing large amounts of the uranium isotope used in the first atomic bomb, detonated over the city of Hiroshima. Remote from the large academic centers of the United States in the Northeast and the West Coast, Oak Ridge was nestled between the spectacular wilderness of the Cumberland and Smoky Mountains. These attractions, and the generous funding provided by the government, allowed Alexander Hollaender, a leading radiation biologist of his time, to recruit many excellent scientists to Oak Ridge, including Dick and Jane Setlow.
Dick and Jane Setlow met as undergraduates at Swarthmore College in the 1940s and married soon afterward. When Hollaender approached them around 1960, Dick was on the biophysics faculty at Yale University. It was one of the oldest biophysics programs in the country, but Hollaender lured away Dick with a shrewd move: he offered Jane, who had a temporary appointment working for someone else, a full position too. In those days, even women who had earned graduate degrees rarely had the opportunity to work as equals and ended up assisting some male scientist, frequently their husband. Hollaender’s gambit worked. Both Dick and Jane became leaders in the field, sometimes working together but just as often separately. They also raised a family of four children and hiked and hunted for fossils in the mountains around Oak Ridge before moving to another national lab in Brookhaven on Long Island about fifteen years later.
Brookhaven National Laboratory was where I first met them, in 1982. Dick was the chair of the department that hired me. It might have helped that I was desperately trying to leave Oak Ridge after only fifteen months there because the resources I had been promised never materialized. Dick, having made the same move himself, was sympathetic. At the time, I was thirty-one years old, and although they were only around sixty then, I regarded them as ancient fossils, like the ones they collected. Like some of the more mainstream molecular biologists, I severely underestimated the importance of their work, and I regret that I didn’t talk to them about their discoveries when I had the chance. It’s a reminder to me of how insular most scientists are, with little appreciation of what goes on outside their narrow specialties.
Even before X-rays were discovered, we knew about other forms of radiation. As early as 1877, the British scientists Arthur Downes and Thomas Blunt discovered that sunlight could kill bacteria. In the early twentieth century, Frederick Gates showed that it was the shorter wavelengths in sunlight—ultraviolet, or UV, radiation—that had the killing effect. Soon after Muller demonstrated that X-rays could cause genetic mutations, scientists started studying UV radiation too; after all, it was easier to produce and safer to handle. They found that for a given dose, UV light produced even more mutations. At Oak Ridge, Dick and Jane began by trying to understand exactly how UV caused mutations in DNA. One finding that intrigued them was that UV light links up two adjacent thymines (the T bases) on DNA. Virtually any sequence of DNA will occasionally have two thymines next to each other, and somehow UV was linking them together so that the two bases were no longer separate but acted as a single unit consisting of two building blocks—known as a thymine dimer, or sometimes as a thymidine dimer, if scientists want to refer to the larger unit that includes the sugar to which the thymine is attached. Was this how UV inactivated DNA and killed bacteria?
Dick and Jane experimented with inserting foreign DNA into a bacterium. This enabled them to introduce a gene that gave the bacterium new abilities, such as growing in the absence of a nutrient it would need otherwise or becoming resistant to an antibiotic. However, when they tried this using DNA containing thymine dimers, it was as if the DNA had become inactivated. Dick went on to show that thymine dimers prevent the DNA from being copied, so new DNA could not be made.
The next step was even more remarkable. Dick and his colleagues found that shortly after exposure to UV radiation, the thymine dimers disappeared from the DNA altogether. The dimers, including the sugar and phosphate to which the bases were attached, were cut out of the DNA, with the missing section filled in using the other strand as a guide, just as when DNA is copied. Discoveries in science are not made in a vacuum. The state of knowledge reaches a stage where the next advances are possible, so new breakthroughs are often made simultaneously. The same year, 1964, that Setlow reported his discovery, two other groups, led by Paul Howard-Flanders and Philip Hanawalt, respectively, made similar findings. The reports all confirmed that the cell clearly had some mechanism to not only recognize the thymine dimers but also to repair them, by a process called excision repair.
Excision repair was also found in a different context. Even in the 1940s, scientists realized that they could reverse the effects of UV light on bacteria by exposing them to visible light. The arrested bacteria would start growing again. Extracts from bacteria that had been exposed to visible light could repair damaged DNA. How it worked was something of a mystery until Aziz Sancar, a Turkish doctor turned scientist, got involved in the work and identified its mechanism, which also involved repairing thymine dimers using a different enzyme. Oddly, Hemophilus influenzae, the organism in which Dick Setlow had identified the same kind of repair, lacks this mechanism (as do we humans)—otherwise he might never have made his discovery. Just the fact that nature had evolved two completely different mechanisms to remove thymidine dimers tells us about the importance of repairing them.
These experiments established firmly that the cell could repair damaged DNA. But we’re rarely exposed to high doses of X-rays. Our clothes and the melanin pigment in our skin protect us from a lot of UV exposure. Also, we know enough to stay away from mustard gas, coal tar, and other nasty chemicals, which human beings never encountered in the wild in prehistoric times. Yet these mechanisms to repair damaged DNA evolved billions of years ago and are part of every life form.
It turns out that our DNA is constantly being assaulted, even in the normal course of living, without exposure to nasty chemicals or radiation. The person who did more than anyone to make us appreciate this was the Swedish scientist Tomas Lindahl. As a postdoctoral fellow at Princeton University, he was working on a relatively small RNA molecule. To his frustration, he found that it kept breaking down.
As we’ve discussed, RNA molecules use the sugar ribose rather than the deoxyribose found in DNA. Ribose differs from deoxyribose by just one additional oxygen atom. That extra atom makes RNA much more unstable, but also gives it the ability to form complex three-dimensional structures that can carry out chemical reactions. Because of these properties, scientists believe that life originally emerged in a primordial world in which RNA carried out chemical reactions as well as stored genetic information. As life evolved to become more complex, using an unstable molecule to store an increasingly large genome was not viable, and so the more stable DNA was used to store genetic information.
Lindahl knew that DNA was more stable than RNA, but he wanted to know how much more. It had to be stable enough to pass on information to the next generation without too much change. Or over the billions of cell divisions that occur by the time a single cell develops into a mature organism. That is a very long time.
Lindahl studied DNA in a variety of conditions and found that over time some of its bases changed. The most common change was that the base cytosine (C) was transformed into a different base called uracil (U), which is normally found in RNA, where it stands in for thymine (T). The problem is that, like T, U pairs with an A, while C pairs with a G. This transformation was like changing a letter in the DNA sentence. Having many of these changes throughout the genome would corrupt the encoded instructions to the point where they would become nonsensical.
Lindahl showed that the change from a C to a U can be caused simply by exposure to water, a ubiquitous occurrence for all living molecules in a cell. In one day, water could cause about ten thousand changes to the DNA in each of our cells. Lindahl estimated later that, taking into account all forms of spontaneous damage to DNA, about a hundred thousand changes are inflicted on the DNA in each of our cells every single day. It was hard to imagine how life could survive when the set of instructions that enabled it was being corrupted so rapidly. Clearly, there had to be a mechanism to correct these errors too. Over the next few decades, Lindahl and other scientists worked out how this change is repaired.
A much more drastic form of DNA damage occurs when both strands break, leaving two pieces that have to be rejoined. Sometimes there are even multiple breaks on different chromosomes. This can result in a complete mess, where half of one chromosome is joined to the other half of a completely different one, or where a broken-off piece has been reinserted backward. Again, if we think of DNA as a text consisting of sentences, changes to individual bases are like typos: although they will occasionally garble the meaning, often you can still make sense of them. But if you repair a double-strand break incorrectly, it is like cutting sentences or whole paragraphs from a long text and pasting them back in some random order. Occasionally, it might still sort of make sense, but other times it will be complete gibberish. So it is imperative for the cell to join broken ends of DNA as soon as it recognizes them, preferably before multiple breaks occur. Special proteins recognize the broken ends and join them together to make an intact DNA molecule. This process does take into account the DNA sequence at the ends, so if there is more than one break in the cell at any given time, there is always a chance that it will join the wrong ends. When our genome is scrambled in this way, it can lead to different kinds of problems. One is a loss of function, where the cell cannot do its job efficiently or perhaps not at all. In other cases, it can corrupt or lose the signals that control genes. As a result, the cell starts growing unchecked, leading to cancer.
Humans are what we call diploid, possessing two copies of each chromosome. The more common and accurate way that the body repairs double-stranded breaks is to use the undamaged DNA in the other chromosome as a guide. Even in organisms such as bacteria, a second copy is often present when cells are dividing and the DNA is being duplicated. Either way, the repair machinery lines up the broken ends against the matching sequence on the other (intact) copy of the DNA to form a complicated structure in which all four strands are intertwined. This is more accurate than simply grabbing random ends and joining them because it checks whether they are the right ends to be joined. By doing so, it restores the integrity of the genome and fills in any gaps that arise if the broken ends have been frayed.
Apart from chemical damage, mutations have another way of creeping into our genome. Each time a cell divides, the entire genome has to be duplicated, which is like copying a text three billion letters long. No process in biology is ever completely accurate. Just as with writing or typing, the faster you try to copy something, the more prone you are to making mistakes. The polymerase enzymes that replicate DNA are incredibly accurate; what’s more, they can proofread their work, so to speak, correcting mistakes as they go. Nevertheless, they still make an error once every million or so letters. In a genome with a few billion letters, that means several thousand mistakes occur each time the cell divides. The cell can’t take forever to divide, and in life there is always a compromise between speed and accuracy. Not surprisingly, the cell has evolved sophisticated machinery to correct these errors.
Relying on some very clever experiments, Paul Modrich figured out how enzymes in a bacterium recognize the mismatch, cut out a section of the new strand containing the mistake, and fill in the section so that the mistake is corrected. That mechanism is now well established in bacteria, but scientists are still debating exactly how these kinds of errors are corrected in higher organisms like humans.
It took a long time for the scientific community to realize the importance of DNA damage and repair. Muller received the Nobel Prize in 1946, a full twenty years after his discovery that X-rays cause mutations. But by the time the 2015 Nobel Prize in Chemistry went to Lindahl, Sancar, and Modrich, the field of DNA repair had long ceased to be a scientific backwater. Now it is widely recognized as crucial for life as well as for understanding the basis of both cancer and aging. As in most scientific areas, hundreds of scientists working in different labs throughout the world had contributed to these discoveries, but the Nobel Prize can be shared by only three people at most, so the committee has the unenviable job of choosing the three most important to honor, not always without controversy. The prize also cannot be given posthumously, and, sadly, Dick Setlow had died a few months before it was announced, at the age of ninety-four.
Over the years, scientists have isolated many different repair enzymes. Many of them are essentially the same in all life forms from bacteria to humans. DNA repair is so essential to life that it originated billions of years ago, before bacteria and higher organisms diverged. Maintaining the stability of the genome and its instructions is critical for the cell and demands constant surveillance and repair. You can think of these repair enzymes as the sentinels of our genome.
Because DNA damage occurs all the time, any defect in the repair machinery itself is particularly disastrous because it means that the damage would accumulate rapidly. Not surprisingly, many mutations in the repair machinery have been linked to cancers: for example, mutations in the BRCA1 gene predispose women primarily to cancers of the breast and ovary. Defects in the repair machinery also cause aging, but because we are also more likely to develop cancer as we age, it is hard to separate out the two effects. Perhaps more than any single person, the Dutch scientist Jan Hoeijmakers has worked extensively to explore how DNA repair defects can age a person prematurely. One condition he has focused on is Cockayne syndrome, which manifests symptoms associated with aging, such as neurodegeneration, atherosclerosis, and osteoporosis. In females, defects in how the cell responds to DNA damage can affect the age at which menopause begins. Generally, the more effectively our bodies can repair our DNA, the more we can resist aging.
WHEN A CELL SENSES SIGNIFICANT DNA damage, it triggers what is called the DNA damage response. This is not all good news: the damage response often has greater consequences for aging than the damage itself. Sometimes the cell will go into senescence, a state in which it is unable to divide further, and in extreme cases, the cell is triggered to commit suicide. It is odd to think that life would have evolved a mechanism to kill its own cells, but one individual cell among an organism’s billions is ultimately dispensable. If, however, that cell were allowed to become cancerous as a result of DNA damage, it could multiply and eventually kill the entire organism. Both cell death and senescent cells are important factors in aging, especially the latter, and we will have a lot more to say about them in later chapters. Suffice it to say here that the DNA damage response evolved to balance the risk between cancer and aging. It is one more mechanism that evolved to benefit us early in life, even if it costs us later, after we’ve already passed on our genes.
At the heart of the damage response is a protein called p53, the product of the TP53 tumor suppressor gene. This protein is so essential that it is often called the Guardian of the Genome. Almost 50 percent of all cancers have a mutation in p53; in some forms of cancer, the rate is as high as 70 percent. Normally, p53 is bound to a partner protein and is inactive. It is also turned over rapidly in the cell, so it is made and then degraded all the time. When DNA damage is sensed, p53 is activated and starts to accumulate. It is also freed from its partner protein, springs into action, and turns on the expression of many genes; in this context, expression means the production of the functional protein from the information coded by the genes. Some of them are genes for DNA repair proteins. Others stop the cell from dividing to give DNA repair genes a chance to do their job. When the damage is too extensive, p53 can turn on genes that induce cell death.
P53 may also hold the key to Peto’s paradox, an oddity observed in the 1970s by the British epidemiologist Richard Peto. Large animals such as elephants or whales can have a hundred times as many cells as we do. Even accounting for their slower metabolism, this means there is a much greater chance that one of their cells will mutate to become cancerous. Yet these large mammals are remarkably resistant to cancer and live almost as long or even longer than us. Humans inherit one copy of the gene for p53 from each of our parents, but it turns out that elephants have twenty copies. Therefore their cells are exquisitely sensitive to DNA damage and commit suicide when it is detected. Scientists are always worried about proving cause, so they wanted to find out what would happen if you increased the level of repair genes in other organisms. Curiously, in studies involving fruit flies, they found that repair gene overexpression did indeed increase longevity—but only if the genes were turned on throughout the fly’s entire life. If the repair genes weren’t activated until adulthood, there was no increase in life span.
Some of the long-lived species we encountered in chapter 2, such as certain whales and giant tortoises, also have unusual variations in the numbers and types of tumor suppressor genes. Perhaps without this, they would have died of cancer at much younger ages. In general, there seems to be a powerful correlation between strong DNA repair genes and longevity. Humans and naked mole rats, which can live up to 120 and 30 years, respectively, have a higher expression of DNA repair genes and their pathways than do mice, which live only up to 3 or 4 years. It remains to be seen whether exceptionally long-lived people have unusually efficient DNA repair mechanisms.
Paradoxically, many new cancer therapies work by inhibiting DNA repair. This is because cancer cells have defects in some of their repair machinery, so inhibiting other routes of repair closes off their options. Unable to repair their own DNA, the cancer cells die off. However, this is a short-term solution to combating aggressive cancers; normally, blocking DNA repair over an extended period could actually increase a person’s risk of both cancer and aging. Attempting to use our knowledge of DNA damage and repair to tackle aging is not straightforward because of the tricky interplay between aging and cancer.
Even if it is difficult to use DNA repair to directly improve longevity, our knowledge of it underpins our understanding of virtually every process of aging. Genes ultimately control the entire process of life: when and how much of each protein we make; whether our cells continue to live or suddenly stop dividing; how well our cells sense nutrients in their surroundings and respond to them; and how different molecules and cells communicate with one another. Genes control our immune system, which must maintain the delicate balance of reacting to invading pathogens without inducing chronic inflammation.
Direct damage to our DNA, and the cell’s seemingly paradoxical response to it, is only one of the ways our genetic program can be changed as to cause aging. For our DNA has two peculiarities. The first is that its end segments are special and protected, and the consequences of disrupting them are serious. The second is that the way our genome is used does not depend exclusively on the sequence of bases in the DNA itself. Our DNA exists as a tight complex with ancient proteins called histones, and both the DNA and its partner proteins can be altered by our environment to affect the way our genes are used. Our genome, it turns out, is not written in stone but can be modified on the fly.
4. The Problem with Ends
Over a century ago, a scientist in a New York laboratory peered at the cells he had cultivated in flasks and wondered whether he might have uncovered the secret of immortality.
Alexis Carrel was a French surgeon who by then was already famous for having pioneered techniques to reconnect blood vessels that had been severed in an accident or an act of violence such as a stabbing. His method for joining blood vessels end to end with tiny, almost invisible sutures transformed many kinds of surgery, and is the basis of organ transplants even today. In 1904 Carrel left France for Montreal and then Chicago. Two years later, he moved to New York City to become one of the earliest investigators at the newly created Rockefeller Institute for Medical Research (now Rockefeller University). The institute offered an unparalleled environment for an ambitious scientist, including superb laboratories and sizable endowments. And the thirty-three-year-old Carrel certainly had ambitions.
As a surgeon, Carrel dreamed of keeping tissues alive outside the human body. In the lab, we can grow cultures of bacteria or yeast indefinitely. Although individual bacteria or yeast can age and die, the culture continues to grow and is, in a sense, immortal. But that was not clear for cells and tissues from higher life forms such as us. At Rockefeller, Carrel began a long series of experiments to see whether a culture of cells from a tissue could be kept alive indefinitely. By placing the cells from the heart of a chicken embryo in a special flask, and steadily supplying them with nutrients, Carrel seemed to have made a breakthrough. The culture could be maintained for years. These cells, he claimed, were immortal.
The discovery was reported with great fanfare. If cells from a tissue could be made immortal, journalists reasoned, then so could entire tissues and eventually us. An editorial in the July 1921 issue of Scientific American gushed, “Perhaps the day is not far away when most of us may reasonably anticipate a hundred years of life. And if a hundred, why not a thousand?”
But Carrel was wrong.
Initially, his work went unchallenged because of his stature, and, over the years, the immortality of cultured cells became dogma. That is, until three decades later, when a young scientist at the Wistar Institute in Philadelphia, Leonard Hayflick, wanted to see if cells would change when exposed to extracts from cancer cells. He decided to use Carrel’s method to grow human embryonic cells in culture. To his disappointment, he found he could not grow these cells indefinitely. Initially, Hayflick, a recent PhD in medical microbiology and chemistry, thought he must have made a mistake. Perhaps he hadn’t correctly prepared the nutrient broth or was washing his glassware improperly. But over the next three years, he carefully ruled out any technical problems and concluded that the prevailing theory was simply incorrect: normal human cells would not replicate indefinitely in culture. They were not immortal.
Instead, Hayflick found that his cells would divide a finite number of times and then stop. In an ingenious experiment, he and his colleague Paul Moorhead took male cells that had already divided many times and mixed them with female cells that had divided only a few times. When they soon reached their limit, the male cells stopped dividing, while the female ones continued to grow to the point that they came to dominate the culture. Somehow the old cells remembered they were old, even when surrounded by young cells. They were not rejuvenated by the presence of the young cells, nor did they stop dividing because of some contaminating chemicals or viruses in the environment. Hayflick and Moorhead coined the term senescence to describe this state, in which the cells were arrested and could no longer divide further.
Another junior scientist might have been nervous about challenging such established ideas, but not the confident Hayflick. He and Moorhead wrote up their results in a meticulously detailed thirty-seven-page paper and submitted it to the same journal in which Carrel had published his original findings. Because it went counter to the prevailing dogma, and perhaps because the editor was a colleague of Carrel’s and more inclined to trust him than some young unknown scientist, the paper was rejected but eventually published in Experimental Cell Research in 1961. It has since become a classic in the field. The number of times a particular kind of cell can divide is now called the Hayflick limit.
How did Carrel get it so wrong? One possibility, suggested by Hayflick himself, is that the French scientist may have inadvertently introduced fresh cells into the culture each time he replenished the nutrient broth in which they were growing. Some have even suggested that fresh cells may have been incorporated deliberately, although this would be a case of either egregious misconduct or sabotage.
My sneaking suspicion is that by the time Carrel worked on these cells, fame and power had gone to his head, and he had become arrogant and less self-critical about his research. This attitude manifested itself in other ways. In 1935 he published a book titled Man, the Unknown, which recommended sterilizing the unfit and gas chambers for criminals and the insane, and commented about the superiority of Nordic people over southern Europeans. In the preface to the book’s 1936 German edition, he praised the Nazi government of Adolf Hitler for its new eugenics program. Given Carrel’s stature, it is quite possible that the Nazis used his remarks as one justification for their activities. His plaque in Rockefeller University was recently corrected to reflect his views.
Titia de Lange, a renowned biologist currently at the very same Rockefeller University, suggested a more straightforward explanation for Carrel’s results: the laboratory next door to Carrel’s was working with malignant tumors in domestic chickens, and these cancerous chicken cells might have contaminated Carrel’s cultures growing nearby. Cancer cells are the exception to the Hayflick limit: they don’t stop dividing after a certain number of divisions, and this uncontrolled growth is why cancer wreaks such havoc on the body.
Why don’t cancer cells stop growing unlike the normal ones studied by Hayflick? And how can a cell keep count of the number of times it has divided and know when to stop?
When a cell divides, each of the DNA molecules in our chromosomes has to be copied. Unlike bacteria, whose genome consists of a circular piece of DNA, the DNA in each of our forty-six chromosomes is linear. Like an arrow, each strand of the double-helical DNA molecule has a direction, and the two strands of the DNA molecule run in opposite directions. The complex machinery that copies each DNA molecule uses each strand as a guide to make the opposite or complementary strand, but it can do so only in one direction. In the early 1970s James Watson of DNA fame and a Russian molecular biologist named Alexey Olovnikov both noticed at about the same time that the way the cell’s machinery copies DNA would create a problem at the very ends of the molecule.
One day, Olovnikov was obsessing over this idea while standing on the platform of a train station in Moscow. He imagined the train in front of him as the DNA polymerase enzyme that copies DNA, and the railway tracks as the DNA to be copied. He realized that the train would be able to copy the rail track ahead of it, but not the part that lay immediately under it. And because the train could go in only one direction, even if it started at the very end of the track, there would always be a section underneath the train that could not be copied. This failure to copy the very end of a DNA strand meant that each newly made strand would be just a little shorter than the original. With each cell division, the chromosomes would progressively shorten, until eventually they lost essential genes and could no longer divide, thereby reaching their Hayflick limit. The end replication problem, as this is known, could explain at least in principle why cells stopped dividing, although the real answer, as we will see, is more complex.
A SEPARATE MYSTERY REMAINED UNANSWERED. Why didn’t the cell see the ends of chromosomes as breaks in the DNA and try to join them together? Why didn’t it induce some sort of DNA damage response?
In the 1930s and 1940s, around the time that Hermann Muller was investigating how X-rays might damage chromosomes, a young scientist named Barbara McClintock was looking at the genetics of maize. At some point, she discovered the phenomenon of “jumping genes”: where genes hop from their position on DNA to a completely different position on the chromosome or even to a completely different chromosome.
Even in the 1930s, both Muller and McClintock, working independently, noticed that there was something special about the ends of chromosomes. Unlike broken chromosome ends, which would often be joined up, the ends of intact chromosomes seemed to stay separate. Muller named the natural ends of chromosomes telomeres. He and McClintock both suggested that they had some special property that prevented them from being mistaken for breaks in the DNA and being joined with each other. This allowed chromosomes to be maintained stably as individual entities in cells instead of being combined randomly. But what made telomeres so special?
Elizabeth Blackburn grew up along with her seven siblings and a large menagerie of pets in the small town of Launceston on the north coast of Tasmania, Australia. She became interested in science and majored in biochemistry at the University of Melbourne, where she had the good fortune to meet Fred Sanger, the famous biochemist who was visiting from England. Encouraged by this encounter, and at a time when there were few women in molecular biology, Blackburn went on to do her doctoral work in Sanger’s laboratory in Cambridge. Her timing couldn’t have been better, for Sanger had just figured out how to sequence DNA. And there was a second fortuitous event in her life: in Cambridge, she met her future husband, American John Sedat, who soon accepted a position at Yale University. As a result, she decided to join Joseph Gall’s lab at Yale for her postdoctoral research.
Gall, a well-established cell biologist, was interested in chromosome structure, and Blackburn knew how to sequence DNA from her work with Sanger. They applied their combined expertise to identify the sequence of DNA specifically at the telomeres of chromosomes. Humans had a mere ninety-two telomeres in each cell; two for each of the forty-six chromosomes. This, they realized, was not enough material. Cleverly, they chose a single-celled organism called Tetrahymena, which in one phase of its life cycle has up to ten thousand small chromosomes. They found that the sequence of DNA at the telomeres of chromosomes was different not only from anything in the rest of the chromosomes but also from anything they’d ever seen before. TTGGGG (or the complementary CCCCAA on the other strand) was repeated anywhere from twenty to seventy times.
Shortly after Blackburn had characterized these repeats, she encountered Jack Szostak, who was working at Harvard Medical School and was trying to insert artificial chromosomes into yeast. The idea was to introduce new genes into yeast through these artificial chromosomes, which would be replicated along with the yeast’s own chromosomes. For some reason, however, they were unstable. The yeast cells were seeing the ends of these artificial DNA molecules as breaks due to damage and setting off a response. Szostak and Blackburn collaborated to see what would happen if they tacked on the telomere sequence of the Tetrahymena chromosomes to the ends of Szostak’s artificial chromosomes. It worked like a charm: the modified artificial chromosomes were now stable in yeast. Szostak went on to characterize the telomeric DNA from yeast itself. It turned out to have a similar repeat to Tetrahymena. Instead of TTGGGG, the repeat was a combination of TG, TGG, or TGGG. From later work, we know now that in humans and other mammals, the repeat is TTAGGG.
Somehow these short telomere sequences told the cell that they were special and should not be treated as ends of broken DNA. Amazingly, although Tetrahymena and yeast are separated by more than a billion years of evolution, the slightly different repeat sequence from Tetrahymena still works in yeast. This suggests a universal mechanism that protects the telomeres of chromosomes and depends on these repeated sequences.
You could think of these repeated sequences as extra, dispensable material tagged on to the ends of chromosomes. Each time the chromosome replicated, it would lose some repeats, but it wouldn’t matter until you eventually lost them all and started losing important genes near the ends of chromosomes. It could explain why cells divided only a certain number of times before they reached the Hayflick limit and stopped.
Even though this explained some things in principle, it still left several basic questions unanswered. What added these telomeric sequences? And why can some cells divide many more times than the Hayflick limit, such as cancer cells or our own germ-line cells?
The first big advance toward answering these questions came when Blackburn, who was now running her own lab at the University of California, San Francisco, was joined by a graduate student, Carol Greider. The two of them discovered an enzyme that adds the telomeric repeat sequences to the ends of chromosomes. They named it telomerase.
Cells from most tissues make very little or no telomerase, but cancer cells and some special cells such as germ-line cells do. Without telomerase, our telomeres get shorter and shorter with age until the cell is triggered into senescence and stops dividing. By contrast, cells with telomerase can simply rebuild their telomeres after each division and thus divide indefinitely. Even introducing telomerase into normal cells can extend their life spans.
As is often the case in biology, it is not quite this simple. Cells lose much more DNA during each division than Watson and Olovnikov would have predicted. Moreover, they stop dividing even before all of the telomeric region is lost. And finally, even if telomeres have a special sequence, it still wasn’t clear why the cell didn’t see them as breaks in the DNA and turn on its DNA damage response.
It turns out that the telomeric ends have a special structure in which one DNA strand extends beyond the other. This longer strand loops back and forms a special structure with the help of special proteins collectively called shelterin, because they shelter and protect the ends of the DNA. This crucial structure is why the cell doesn’t recognize the ends of chromosomes as double-strand breaks. A loss or deficiency in shelterin can be lethal, and even moderately defective shelterin can lead to chromosome abnormalities and premature aging, even when the telomeres are of normal length.
When enough of the telomere DNA is lost, these special structures cannot form. The cell then sees the unprotected ends of the DNA as breaks and sets off the damage response, instructing other cells to either commit suicide or go into senescence. We still don’t know how or why some cells, like the ones Leonard Hayflick studied, go into senescence while others self-destruct. Perhaps cells that are especially important for maintaining or regenerating tissues—such as stem cells—preferentially commit suicide to avoid passing on damaged DNA to their offspring.
This is all very well for understanding cells in culture, but does this have anything to do with why we age? Or our life spans? And why is telomerase switched off in most of our cells? If we switched it on again, would we simply stop aging?
People with defective telomerase, or who have less than the normal amount of it, prematurely develop a number of diseases associated with old age. Likewise, a stressful life can often make us appear to age faster. We look haggard, and even our hair can turn prematurely gray or white. Stress can also bring on many of the diseases we associate with old age. Stress has multiple effects on our physiology, and exactly how it affects the aging process is complex. But one of the things it does is to accelerate telomere shortening. When we are stressed, our body produces much more cortisol—referred to as the stress hormone—which reduces telomerase activity.
You might expect that species with longer telomeres would live longer, but mice, which typically live only about two years in the lab and much less in the wild, have much longer telomeres than we do. So it may be that the shortening of their telomeres occurs more rapidly. Nevertheless, if you reactivate telomerase in mice that are deficient in the enzyme, you can reverse the tissue degeneration that occurs with aging. According to a number of studies, mice engineered to have even longer telomeres showed fewer symptoms of aging and lived longer. Presumably, starting off with much longer telomeres compensated for their more rapid shortening in mice.
Based on studies like these, many biotech companies are introducing the gene for telomerase into cells or using drugs to activate the telomerase gene that already exists. Some of them are working on how to turn on the enzyme transiently, to avoid the potential problem of triggering cancer by having telomerase switched on permanently. Initially, many of these experiments are focusing on specific diseases where aberrant telomere shortening is thought to be the cause. But the efficacy and long-term consequences of these strategies remain unknown.
When telomerase was discovered, it stirred a lot of excitement in cancer research. Since cancer cells had activated telomerase, scientists thought of it as an anti-cancer target—if you could inhibit it or turn it off, you might kill cancer cells. On the other hand, turning it off could potentially accelerate the shortening of telomeres, which could not only lead to premature aging or other diseases, but by disrupting our telomeres, lead to chromosome rearrangements, which, ironically, could itself cause cancer. There seems to be a delicate balance between telomere loss and aging on the one hand and increased risk of cancer on the other, and it may be that our normal process of switching off telomerase in most of our cells is actually a mechanism to suppress cancer early in life. This balancing act is also apparent from a study showing that people with short telomeres are prone to degenerative diseases, including organ failure, fibrosis, and other symptoms of aging. On the other hand, those with long telomeres face increased risks of melanoma, leukemia, and other cancers. This suggests that we have some way to go before tinkering with telomerase can be a viable strategy for either cancer or aging.
In the last two chapters, we’ve talked about how genes contain the program to control the complex process of life. In chapter 5, we will see how even allowing for changes from damage to DNA or to our telomeres, the script of life written in our DNA is not fixed. It is modified and adapted on the fly, depending on its history and environment. The ability to annotate the script, much like a conductor would a score or a film director would a screenplay, is the basis of some of the most fundamental processes of life, including how an entire animal develops from a single cell. When the annotation goes awry, that too is a fundamental cause of disease and aging.
5. Resetting the Biological Clock
On June 26, 2000, President Bill Clinton and British prime minister Tony Blair, each flanked by some of the world’s most distinguished scientists, linked up via satellite to make a carefully choreographed announcement of “another great Anglo-American partnership.” The occasion was the publication of the draft sequence of the entire human genome: the precise order of bases in nearly all of our DNA.
Excitement over this milestone was unanimous across the belief spectrum. Clinton said, “Today we are learning the language in which God created life,” while Richard Dawkins, the evolutionary biologist and passionate atheist, said, “Along with Bach’s music, Shakespeare’s sonnets, and the Apollo space program, the Human Genome Project is one of those achievements of the human spirit that makes me proud to be human.”
Other scientists and the popular press gushed with similarly hyperbolic statements. The identification of every human gene would make possible new treatments against diseases and usher in a new era of truly personalized medicine. If we sequenced the genes of individuals, some suggested, we would be able to understand their fate in detail: their strengths and weaknesses, aptitudes and talents, susceptibility to disease, how quickly they would age, and how long they would survive.
The announcement ceremony was the culmination of a long and difficult path. For many years, an international consortium of scientists, mostly in the United States and the United Kingdom, and funded by government sources or biomedical charities such as the Wellcome Trust, had made slow but steady progress, releasing bits of sequence as they went along. They were called the public consortium because they received substantial public funding and had pledged to make their data available to all.
Then, in the early 1990s, J. Craig Venter, who had made his name by producing the first complete sequence of a bacterium, Haemophilus influenzae, entered the fray. Venter was something of a maverick in the field. He played the part of the American entrepreneur and capitalist, sailing around the world in his yacht, often flying by private jet. On one of the few occasions I saw him, he jetted into a meeting at the Cold Spring Harbor Laboratory to celebrate the 150th anniversary of Darwin’s On the Origin of Species, gave his talk, and left immediately because he clearly must have had more important things to do—unlike me, who stayed for the rest of the weeklong conference. Venter had already caused a huge fracas in the science community when he worked at the U.S. National Institutes of Health (NIH)—the large government biomedical research laboratories in Bethesda, Maryland—by attempting to patent pieces of human DNA sequences to allow their commercial exploitation for treatment and diagnosis. The decision by NIH to green-light this led James Watson to resign as the first director of the agency’s National Center for Human Genome Research. Although the NIH had filed the patents in his name, Venter said later that he was always against them.
Venter felt that the public consortium was too slow and that the method he had used for sequencing the million bases of a bacterium could be scaled up to sequence the roughly 3 billion bases in the human genome at much lower cost. So he started a private company, Celera, to do just that. Of course, Venter wasn’t above using the large portions of the human genome that had already been sequenced by the public consortium before he entered the race. Many in the human genome community were outraged by Venter’s audacity and were determined to ensure that the human genome, and, indeed, all other natural genomes, were not patented for the benefit of a private company but freely available to humanity.
One detractor was John Sulston, one of the leaders of the public consortium. Sulston presented a marked contrast to Venter. Despite his considerable fame and influence, the British scientist continued to dress in the sandals and other shabby attire reminiscent of a 1960s hippie. He lived in the same modest house and commuted to his lab on his ancient bicycle. A particularly passionate advocate of the genome being free for use by all, Sulston was sharply critical of Venter’s motives and contributions. In the run-up to the completion of the draft sequence, relations between members of the public consortium and Venter became so acrimonious that President Clinton had to intervene personally to get them to politely share the stage at the announcement.
Despite all the hoopla, the draft sequence that Clinton and Blair announced was just the beginning. Large sections of the genome were still missing, especially regions consisting of repeating letters and thus difficult to sequence, and scientists had to figure out how some stretches of DNA actually fit together. The sequence was declared finished three years later, although, in reality, even today a few gaps remain, including on the Y chromosome, the male sex chromosome. (Women have two X chromosomes; men, one X and one Y.)
The human genome sequence is often called “the book of life,” but this is somewhat misleading. In reality, even a perfectly complete sequence would be more like one long unpunctuated stream of text than a book. It would have no markings to denote individual chapters, paragraphs, or even sentences, nor cross-references to provide context. It would certainly be nothing at all like a well-edited encyclopedia in which you could look up your favorite gene and learn all about it and its relationship to everything else. And frankly, a lot of it was indecipherable. Only about 2 percent of our DNA actually codes for the proteins that carry out much of life’s functions. The rest consists of what biologists once dismissed as “junk DNA”; they now increasingly think it is important, but don’t fully understand how or why.
Initially, scientists didn’t even know where a lot of the protein-coding genes were, because the signals that indicate where a gene starts and ends on the DNA are not always obvious. They are made even harder to discern by the presence of what are called pseudogenes: regions that once might have coded for proteins but are no longer expressed or functional. Many pseudogenes originated from viruses that inserted their own genes into our DNA. Finally, even knowing the sequence of a gene does not automatically reveal its function. Nevertheless, sequencing the genome was an immensely useful start. It allowed us to ask questions and conduct experiments that would have been unthinkable before. It was a watershed in biology.
You might also think that the book of life would be able to tell us accurately how each of our individual stories develops and ultimately ends. After all, DNA is the carrier of all genetic information, the master controller that oversees biological processes. Shouldn’t knowing its entire sequence enable us to predict how an organism or cell will develop? Certainly mutations in individual genes have been associated with many diseases; examples include cystic fibrosis, breast cancer, Tay-Sachs disease, and sickle-cell anemia. But on the whole, biology is just not that deterministic.
Identical twins belie the view of DNA as destiny. They share the same genes and are often strikingly similar even when separated at birth. That’s not surprising. What is surprising is that identical twins raised in the same environment can sometimes be very different, even when it comes to conditions with a strong genetic basis, such as schizophrenia.
Every one of us is a living testament to the fact that DNA by itself does not determine fate. All of our cells are descended from a single cell, the fertilized egg, and as that cell divides, it produces new cells, each one containing the same genes. Yet these genes give rise to a multitude of different cells. A skin cell is very different from a neuron, or a muscle cell, or a white blood cell. As we know, different genes are turned on and off in response to changes in the environment. It makes sense, then, that as different cells find themselves in slightly different circumstances, they change which genes they express and go down different paths to form the various tissues in the body. Importantly, you cannot reverse this process—even if you try to culture these different cells in exactly the same medium, they maintain their identity, as though the cells still remember which tissue they came from.
This suggests that some more permanent change has occurred in the genetic program of the cells as a result of their environment. The study of this change is known as epigenetics, from the Greek prefix epi-, for “above,” to imply there was a second layer of control on top of our genes. The term was coined by the British polymath and professor of animal genetics Conrad Waddington in 1942. Waddington described the process in terms of a landscape. The original fertilized egg, he said, was like a ball on top of a mountain. Its progeny rolled down different paths into the various ravines and valleys at the foot of the mountain, each valley representing a different type of cell. Once there, it would be impossible to roll back up to the top or to roll up the ridge and down into a neighboring valley. In other words, once a cell had settled down into its final type, it couldn’t change into a different type; a skin cell could not become a lymphocyte, a type of white blood cell. Nor could a skin cell reverse its fate and become a fertilized egg to give rise to an entirely new body.
Initially, Waddington was vilified by many as a Lamarckian, or someone who, like the evolutionary biologist Lamarck, believed that acquired characteristics could be inherited, an idea discredited by Darwin and Wallace’s theory of evolution by natural selection. Waddington’s theory seemed to imply that our environment affected our genes in some irreversible way. Even for those who accepted his ideas, they raised questions. At what point did the cell have its genome so altered that it could no longer direct the development of an entire organism? And how far down Waddington’s mountain could a ball roll and still somehow go back to the top?
During Waddington’s time, we did not even know that DNA was the genetic material, let alone its structure or how it stored genetic information. But it was known already that the fertilized egg, or zygote, was a very special cell: it had the right genetic material, and its cytoplasm, the internal material of the cell, seemed to have everything needed for kick-starting the process of developing into a new organism. The fertilized egg is said to be totipotent, meaning that it can develop into all the cell types needed to make a new animal, including its body and placenta. After a few divisions, the embryo reaches a stage called the blastocyst, which has a couple of hundred cells surrounding a fluid-filled cavity. The outer cells go on to form the placental sac, while the inner cells develop into everything else that forms the new animal. Those inner cells that develop into every cell in the body are called pluripotent.
Waddington’s metaphorical mountain shows the development of special cell types from a pluripotent stem cell.
Development of a blastocyst from the fertilization of an egg.
Was the special property of the fertilized egg a result of its genome or its environment? If the latter, could you take a nucleus containing the genes from a highly specialized cell, put it into an egg that had its own nucleus removed, and make it totipotent so that it developed into a normal animal? This was precisely the question that Robert Briggs and Thomas King at the Institute for Cancer Research and Lankenau Hospital Research Institute in Philadelphia sought to answer. In 1952 they tried this with the northern leopard frog (Rana pipiens), as frog eggs are large and transparent, and thus easy to manipulate under a microscope. Briggs and King found that if they took nuclei from cells in the blastocyst stage of the embryo and introduced them into enucleated eggs, the eggs could develop normally into tadpoles. But if they took nuclei from cells at a later stage of development, the egg would develop partly and then stop and die. By a relatively early stage of development, then, an embryo’s cells are already committed to their program. They are too far down Waddington’s metaphorical hill and can’t go all the way back to the top.
At this time, scientists simply did not know whether specialized cells had lost parts of their genome that were essential for growing an entire animal from scratch, or whether there was something else about them that prevented their development beyond a certain stage. Then along came a young scientist who would carry out one of the most famous experiments in modern biology.
WHEN I FIRST MET JOHN GURDON, I was immediately struck by his shock of golden hair that gave him a leonine appearance. By then, he was a world-renowned scientist in his seventies who worked in the institute named after him in central Cambridge, England, about three miles from my lab. Despite his stature in the world of science, he was unassuming and courteous to everyone, from a beginning graduate student to his senior colleagues. Long after many scientists would have retired, Gurdon remained passionate about science and carried out his own experiments. But his career had a rocky start.
Gurdon hailed from an aristocratic family whose Norman ancestor came with William the Conqueror in the 1066 invasion of England. Like many boys from privileged families, he went to Eton, the prestigious boarding school, at the age of thirteen. His time there did not begin well, for his biology teacher wrote a damning report at the end of his first science course. With the random capitalization that was already a couple of centuries out of date except in certain quarters of the British establishment, it said, “I believe he has ideas about becoming a Scientist; on his present showing, this is quite ridiculous, if he can’t learn simple Biological facts he would have no chance of doing the work of a Specialist, and it would be sheer waste of time, both on his part, and those who have to teach him.” Gurdon was not allowed to take any more science courses. He studied languages instead.
Nevertheless, Gurdon had a strong interest in biology and nature from childhood and was not so easily dissuaded. Fortunately for science, his parents were supportive and able to help him. Although they had already forked out several years’ worth of expensive tuition fees to Eton, they paid for him to study biology with a private tutor for an additional year after he had graduated. In an unusual arrangement, he was then admitted to the University of Oxford on the condition that he first pass exams in basic physics, chemistry, and biology in a preliminary year. Gurdon survived the ordeal, began his undergraduate studies in zoology, and went on to begin research for a PhD with Michael Fischberg, who was also at Oxford. This was just four years after Briggs and King’s experiment with frogs.
Fischberg suggested that Gurdon try to repeat their experiment but using a different kind of amphibian: the African clawed frog (Xenopus laevis). Referred to originally as a toad, it was first brought to the attention of biologists by Lancelot Hogben, a peripatetic British scientist who moved from England to Canada and then, in 1927, became a professor at the University of Cape Town in South Africa. While there, Hogben began studying the frog because of its chameleonlike properties. The clawed frog became a favorite model organism in embryology; not only were its eggs large like those of the frogs that Briggs and King had studied, but also it had a short life cycle and could be triggered by external hormones to lay eggs any time of the year.
After overcoming some technical difficulties, Gurdon finally pulled off an experiment using Xenopus laevis that would revolutionize the world of biology. He was able to take the nucleus from one of the cells lining the intestine of a tadpole and insert it into an egg whose own nucleus had been inactivated by subjecting it to a large dose of UV radiation. The resulting egg developed into a complete tadpole, suggesting that the intestinal cell nucleus had all of the information needed for development that an egg nucleus had. To rule out the possibility that the egg’s own nucleus had not been completely inactivated, Gurdon was careful to use two distinguishable strains of Xenopus for the cell that donated the nucleus and the egg that received it. There was no doubt that the donor nucleus had given rise to the tadpole. In fact, since the genes of the new tadpole were identical to those of the donor that contributed the nucleus, it was a clone of the parent. This was the first time that someone had taken the nucleus from the cell of a fully developed animal to clone an entirely new animal.
Gurdon’s work had a tremendous impact almost immediately. He had demonstrated that the nucleus of a somatic cell of a fully developed animal was capable of directing the development of an entirely new animal—which would be a clone of the animal that donated the nucleus. It meant that a somatic cell could be made to go backward in development; in fact, all the way back to the top of Waddington’s mountain. It could reverse the aging clock and start all over again to grow into a new animal. It also meant that cells that had developed into specialized tissues such as intestines retained all their genes. They were specialized not because they had preferentially lost genes but because they had somehow modified which genes would be turned on or off in each case.
Eventually other researchers reproduced Gurdon’s experiments with different species, but the procedure was not performed on mammals until 1996. Scientists at the Roslin Institute, outside Edinburgh, cloned a sheep named Dolly from a cell taken from the mammary gland of an adult animal. The news generated huge headlines around the world. There was widespread discussion of the ethics of cloning, with concerns ranging from animal welfare to a brave new world in which rich people who wanted to live on would clone themselves or a loved one they had lost. (Apparently the absurdity inherent in this was also lost.) Today cloning has been successful in a wide range of animals, although for obvious ethical reasons, it is internationally forbidden to attempt it in humans.
In spite of all the excitement, Gurdon’s early experiments were quite inefficient: only a small fraction of the nuclear transplantations actually worked. Others failed right away or developed into defective embryos that stopped growing and died. And in the sixty years since Gurdon’s original experiments and the more than twenty-five years since Dolly, scientists have toiled painstakingly to improve the efficiency of cloning; nevertheless, it remains an inefficient technique. Nature’s way of creating offspring works far better.
ONE OF THE BIG PROBLEMS with being human as opposed to, say, a starfish, is that we cannot generally regenerate our tissues. We cannot grow a new arm if one gets cut off. Soon after the first nuclear transplantation experiments, scientists began wondering whether the following might be the solution: Could you make these early embryonic cells grow on command into any type of tissue you wanted, such as heart muscle, neurons, or pancreatic cells? If that ever became a practical option, it would have enormous potential for medicine. Moreover, the deterioration of our tissues is one of the major problems we face as we age, and you could think of regenerating and rejuvenating them.
We might not be able to regrow a limb, but we already have the ability to regenerate certain kinds of tissue. Every time you cut or scrape yourself, your body creates new skin. Donate blood, and your body simply makes more. How does the body do this? While many of our cells are what we call terminally differentiated—they have reached a final state and will simply carry out their assigned tasks until they die—other, highly specialized cells are responsible for producing new cells to regenerate aging tissues. We call them stem cells.
Stem cells can be at many stages themselves. Many of them are already quite a way down Waddington’s mountain, capable of developing into only a few different cell types. For example, hematopoietic stem cells in our bone marrow can generate all the major cells in our blood, including red blood cells and the cells of our immune system. But they can’t become liver cells or heart muscle cells. However, the inner cells of the early embryo are pluripotent stem cells that can develop into every cell type in the body.
Scientists have been able to take these embryonic stem cells, or ES cells, maintain them in culture, and then alter conditions to nudge them into developing into one tissue type or another. Being able to grow ES cells in culture solved the problem of having to extract them from fresh embryos each time and fueled an explosive growth in stem cell research. However, the ultimate source of ES cells was still embryos, which would often be obtained from aborted fetuses, raising ethical questions and regulatory scrutiny. For some time, federal grants in the US could not be used to pay for research involving human ES cells, and labs had to clearly separate areas that were federally funded from those that were not.
It seemed almost miraculous that you could take any adult cell and coax it into developing into any tissue you wanted, let alone into an entirely new animal. What is it about stem cells, especially pluripotent stem cells, that makes them different from most cells in our body?
Molecular biologists had begun to identify transcription factors: proteins that regulate gene expression—that is, turning genes on or off, and by how much. The name comes from their control over whether a particular gene on DNA is “transcribed” into mRNA, which is then read to make the appropriate protein. Stem cells contained a large number of active transcription factors, some of which were needed to keep them growing in the laboratory. It was hypothesized that perhaps a newly fertilized egg possessed similar transcription factors that allowed it to develop into a new animal. Some of these same factors were also active in cancer cells, which can proliferate indefinitely.
Such was the state of affairs in the late 1990s, when a Japanese scientist, Shinya Yamanaka, turned his attention to the matter. Yamanaka was born in 1962, the same year as John Gurdon’s successful cloning of a frog. He began his career as a surgeon, influenced partly by his father, an engineer who ran a small factory in the city of Higashi-Osaka. Yamanaka’s enthusiasm for surgery soon waned, however: not only did he begin to lose confidence in his skills but also he came to see surgery as limited in terms of being able to treat many patients with intractable conditions such as rheumatoid arthritis and spinal cord injuries. Instead, Yamanaka thought, he ought to spend his life working as a basic scientist to find ways to cure them. He earned a PhD in Osaka and went on to postdoctoral research at the Gladstone Institute of Cardiovascular Diseases in San Francisco.
By the time Yamanaka returned to Japan to establish his own lab in the late 1990s, scientists knew that ES cells expressed quite a few transcription factors. If you turned on some or all of these factors in a normal cell, would you be able to trick it into behaving like a stem cell? Yamanaka and his student Kazutoshi Takahashi hoped so. They identified twenty-four factors that might be responsible for the pluripotent property of ES cells, and systematically introduced them into fibroblast cells found in skin and connective tissue—the same cells that Hayflick had attempted to culture. By experimenting with transcription factors in various combinations, they found that just four were enough to convert an adult fibroblast cell into a pluripotent cell.
As a result of Yamanaka’s work, we no longer need to harvest cells from embryos to generate pluripotent cells; we can make them from other adult cells. The pluripotent cells made using Yamanaka factors are called induced pluripotent cells or iPS cells. The increased ease of generating iPS cells has led to an even greater explosion in the field of stem cells. Scientists are constantly improving both the efficiency and safety of the process, as well as becoming increasingly sophisticated in determining the paths that the stem cells can take.
REMARKABLE AS THESE ADVANCES ARE, they don’t tell us exactly what is happening to our genome that makes cells behave so differently even though they all have the same DNA. Why do different cells have such different genetic programs? And why do cells remain true to type, so that one cell type doesn’t suddenly change into a different one? Even stem cells that are responsible for generating blood cells don’t start producing neurons or skin cells.
Each cell carries genes that are always expressed because every cell needs them. They’re referred to as housekeeping genes. But for other genes, which ones are turned on and which are kept switched off depends very much on what that particular cell needs. How does the cell control this process? You just read about transcription factors, proteins that control which genes are actively expressed or repressed. One of the first and simplest examples of such a factor was discovered in exploring how the bacterium E. coli digests the simple sugar lactose. Ordinarily, E. coli doesn’t encounter lactose, so it does not constantly make the enzymes necessary to digest it. Instead, it operates on an as-needed basis: when the bacterium senses lactose, it turns on the genes tasked with turning out the appropriate enzymes. As soon as there is no more lactose around, it shuts down those genes. It is a simple and elegant way to switch genes on or off in response to a change in the environment. A good deal of gene regulation works exactly like that, by controlling transcription in response to a stimulus. It is seldom as simple as the lactose case, and usually involves a complicated network where genes that are activated in turn activate or switch off other genes, which affect even more genes.
With E. coli, you can reverse the response to lactose simply by removing lactose from the culture. But if you took a skin cell and put it into, say, a liver, it wouldn’t suddenly start behaving like a liver cell. The transcription factors of a skin cell and a liver cell are different; in addition, the cell has a way of ensuring that some changes in the genetic program persist for a long time, which involves rewiring the code on DNA itself.
So far, we have thought of DNA as a simple four-letter script containing all the information to make the proteins that carry out various essential functions. But even before the structure of DNA was known, scientists understood that a small fraction of its four bases, A, T, C, and G (or U, the equivalent of T in RNA), had extra chemical groups attached to the base. In the early days, nobody knew what these modifications were for.
Today we know that many of them act as extra tags that serve as signals for whether a gene should be kept switched on or off over the longer term. The most common of these is the addition of methyl (-CH3) group to cytosine, the C base in DNA. When Cs at the right place are methylated in this way, the genes just ahead of them are kept switched off.
As cells develop, they will methylate their DNA in the region of genes they want to shut down, and leave unmethylated those regions that contain genes they need to actively use. So cells that differentiate into skin cells will have a different methylation pattern from, say, neurons.
You might expect that when cells divide and their DNA copied, the patterns of methylation would be lost because you’re making the new DNA with fresh building blocks, but the cell has an ingenious way of restoring the methylation pattern of the parent cell. What this means is that the exact pattern of methylation can be passed on to the daughter cell when a cell divides, so genes that are shut off in a particular cell lineage remain shut off. The flip side of this also occurs: there are demethylases that remove methyl groups, which then allow those genes to be turned back on. Apart from using transcription factors, modifying the DNA itself in this way offers a completely additional level of control over which genes are turned on and off. It is also a method of ensuring that these changes can be passed on to the next generation of cells. These modifications of DNA alter the way our genes are used. They are called epigenetic marks or changes because they are the molecular explanation for the phenomenon of epigenetics that Conrad Waddington had first described.
These epigenetic marks not only persist and even increase as we age—they can even be passed across generations. Toward the end of World War II, between September 1944 and May 1945, the Netherlands suffered from a devastating famine that would claim the lives of more than 20,000 people. A later study showed that despite the relatively brief duration of the famine, the children of women who were pregnant during the mass starvation suffered adverse physical and mental health consequences throughout their lives. They experienced higher rates of obesity, diabetes, and schizophrenia, and had a higher mortality than children who were not in utero during the famine. The effects were even different depending on whether the famine occurred in the early or late stages of pregnancy. Comparing the DNA of subjects who had experienced starvation in utero with those of their older and younger siblings was revealing: the famine had imposed on the fetus a methylation pattern that had consequences over the course of its life and accelerated both aging-related diseases and mortality. It is a striking example of how an external stress can cause epigenetic changes to DNA that last a lifetime.
IF THAT ISN’T COMPLICATED ENOUGH for you, just wait: DNA isn’t present in cells as a naked molecule. Rather, it is heavily coated with proteins called histones, and this mixture of proteins and DNA is called chromatin. These histones help us understand how all of our DNA can fit into a cell’s tiny nucleus. If you could stretch out the DNA in a cell, it would measure approximately two meters (six and a half feet). The nucleus, in contrast, is only microns in diameter—or about a million times smaller. Histones are positively charged and neutralize the negative charges on the phosphate groups of the DNA. By doing so, they allow DNA to condense into a highly compacted form.
The first level of DNA compaction is the nucleosome, in which DNA is wound around a ball-like core consisting of eight histone proteins. The nucleosomes further organize themselves into filaments that are then woven back and forth until it all fits comfortably in the nucleus. When cells divide, the duplicated chromosomes have to move into each daughter cell, and just as you would cram the belongings from your entire household into a truck before you move, chromosomes are most compact just before cell division. That is when they have the familiar X shape that we see in most popular images of chromosomes. But for most of the life of the cell, chromatin is much more extended.
The problem with compacting chromatin is that the cell needs to be able to access information on the DNA when needed. It’s like owning a large collection of books but not having sufficient space in your home to have all of them within easy reach. You might box most of them and store them in the attic but keep the books you’re currently reading or planning to read soon easily accessible on a bookshelf or piled on your nightstand. The cell too has to make sure that appropriate regions of chromatin are accessible, even if it wants to shut down much of it. It does so by tagging histones by adding certain chemical groups to them. Just as with methyl groups on DNA, there are enzymes that add these histone tags and others that take them off. Tags on histones can act as a signal for the cell to recruit other proteins to that region and either inactivate chromatin or open it up, so they too act as epigenetic marks. With histones, one common tag is called an acetyl group, and the enzymes that add them to histones are called histone acetylases.
In general, DNA methylation and histone acetylation exert opposite effects. DNA methylation usually silences the gene that follows the methylated region, while histone acetylation signals that the gene is to be actively transcribed. Both can be reversed by the action of demethylases or deacetylases.
What both modifications do is to overlay on top of the DNA sequence itself a second and longer-lasting way of modifying the program of a particular cell. They allow cells to maintain a stable identity as neurons, skin cells, or heart muscle cells. As a cell develops from the fertilized egg, different epigenetic marks must be laid down as it develops into different cell types.
WE ALL KNOW THAT PEOPLE age at different rates. Some people look old at fifty, while others are remarkably youthful into their eighties. Some of this comes down to genetics, but aging can also be accelerated by stress and hardship. From the moment we are conceived, our cells don’t just acquire mutations in the DNA affecting the underlying code itself. They also acquire epigenetic marks. As we saw with the Dutch famine survivors, some of those marks are the result of environmental stress.
Steve Horvath, while working at the University of California, Los Angeles, was not interested in epigenetics, believing it to be too messy, indirect, and unlikely to show much useful connection to aging. But one day, a colleague was collecting saliva from identical twins who differed in sexual orientation, and he wanted Horvath to help him see if there were any epigenetic differences between them. Horvath is a twin; his brother is gay, while he is heterosexual. In the spirit of scientific inquiry, they contributed some of their own spit to the study. When they looked at the methylation of cytosines, they found absolutely no relationship between the pattern and sexual orientation.
But Horvath now had a lot of data from twins of various ages. He decided to mine it further to see what else he could learn. He discovered a very strong correlation between the DNA methylation pattern and age. He then looked at cells in other tissues and correlated the methylation pattern with actual markers of aging—for example, the sort of things your doctor would analyze from your blood, such as liver and kidney function. He was able to identify 513 sites of methylation that could predict not only mortality but also cancers, health span, and the risk of developing Alzheimer’s disease.
These patterns help scientists approach a fundamental problem. People age biologically at different rates, so how do you measure aging? Methylation patterns are like a biological clock; in fact, they are more accurate than chronological age alone at predicting age-related diseases and mortality. Many other research groups developed their own methylation clocks with slightly different markers, all correlating well with biological age. Still, as Horvath and his colleagues themselves point out, these clocks are useful for research but are not yet a substitute for tests that measure loss of physiological function or provide early diagnosis of diseases.
We don’t think of young children as aging; in fact, throughout much of childhood and adolescence, they become stronger and their odds of dying decline. But it turns out that while the methylation patterns reverse very early in the embryo, suggesting a resetting of the clock or a rejuvenation, from that point on, methylation follows an inexorable pattern. So we age from even before we are born! Similarly, the long-lived naked mole rat is thought not to age because its risk of dying doesn’t increase with time. In fact, its methylation pattern shows that it does age, just more slowly than other rodents.
For an extreme example of the effect of epigenetics on longevity, look no further than a beehive. Bees, like ants, have a queen that can live many times longer than other bees that share exactly the same genes: queen honeybees live two to three years, while worker bees die after only about six weeks. This is partly because once the queen is selected, she is treated very differently. She is kept deep in the hive, pampered and protected against predators, whereas worker bees and ants must go out and risk their lives foraging for food. She is fed an exclusive diet of royal jelly, which has a different composition and a much higher nutritional value than the ordinary nectar and honey that worker bees live on. But the impact of these factors goes deeper. Something about her diet and stress-free environment results in her having different epigenetic marks from worker bees, and she ages at a far slower pace.
The question of why epigenetic marks should cause aging is complicated. The patterns are associated with an increase in inflammatory pathways and a decrease in pathways for making RNA and proteins as well as DNA repair, so it is easy to see how they might result in aging.
The epigenetic changes also seem to occur on a timetable. This doesn’t mean that aging itself is programmed. It could simply be that the epigenetic changes take place when they are needed at some stage, but they are not switched off when their work is done because evolution doesn’t care what happens to you after you have passed on your genes. By shutting down many genes in a stable way, epigenetics may also prevent cells from becoming cancerous early in life. Like telomere loss, and the response to DNA damage, this may be yet another example of the trade-off between preventing cancer and preventing aging.
It is also possible that many epigenetic changes are not programmed but caused by random changes in the environment. Remember the case of identical twins? Those epigenetic changes in their DNA diverge right from birth, so while they still have largely the same DNA sequence, they acquire very different epigenetic marks.
CAN THE AGING CLOCK EVER run backward? Yes, and it has happened to every single one of us: at conception, when the aging clock is reset to zero. When a forty-year-old woman gives birth, that newborn is not twenty years older than a baby born to a twenty-year-old woman. Even though the germ-line cells are older in the forty-year-old woman, both children start at the same age. The aging that takes place in the parents is reset in the child.
We have evolved at least three ways to reset the aging clock. The first is that germ-line cells have superior DNA repair and accumulate fewer mutations than somatic cells do.
Second: the egg and the sperm each undergo a rigorous selection process prior to fertilization. A woman produces all the eggs she will ever have while she is still a fetus. These number perhaps a few million to start with but are down to about a million by the time she is born. By puberty, this number drops to about a quarter million, and by the time a woman is thirty, only about 25,000 eggs remain. However, a mere 500 of those eggs get used up by ovulation during the menstrual cycle over a woman’s lifetime. With sperm, this ratio is even more dramatic: males produce millions of sperm cells from puberty on. So there is a huge surplus of both eggs and sperm. Why? Prior to ovulation—that monthly event in which the ovary releases one mature egg, or ovum, into the fallopian tube for the purpose of potentially being fertilized—the eggs in the ovary are somehow inspected and destroyed if damage is detected. Only those that pass the test make it to ovulation. As damage is likely to increase with age, this might explain why the egg count drops precipitously and the chance of becoming pregnant decreases. Perhaps the monitoring process also becomes less effective, since genetic defects in the baby also increase with the age of the mother.
Similarly, sperm cells may undergo selection as well, and a sperm must swim and outcompete all the millions of others to be the first one to fertilize the egg. Even after fertilization, many embryos are rejected early in development if they are sensed as being defective. And even within an embryo that is developing normally overall, there is competition to eliminate abnormal cells. The process isn’t perfect, but nature has done its best to ensure that our offspring are free of our own cellular damage and aging.
The third method for resetting the aging clock is to actually reprogram the genome. Immediately after impregnation, the fertilized ovum, or zygote, temporarily bears two nuclei (pronuclei): one from the mother and the other contributed by the father. The enzymes and chemicals in the zygote proceed to erase nearly all the epigenetic marks in the DNA of both pronuclei, and then add new ones to start the fertilized egg on the path to making a baby. Notice that I said “nearly all.” An egg with both pronuclei coming from just a male or female parent alone would not develop normally. This is because the pronuclei donated by the mother and father have a different but complementary pattern of epigenetic marks, also called imprinting, which together provide the proper program for development.
Considering all the intricacies of normal development we just described, it is amazing that cloning frogs or Dolly the sheep ever worked at all. For one thing, the genome of cloned animals came from adult somatic cells, with an entire lifetime of accumulated damage. Animals conceived normally, on the other hand, start off from much more protected germ-line cells and go through a rigorous selection process both before and after fertilization. In addition, changing the program of a somatic cell is very different from an egg’s normal task. Given these difficulties, how could these cloned animals possibly be normal? Would they not show signs of premature aging or other abnormalities compared with naturally conceived animals? In truth, it didn’t work so well. Most of the transplants never made it to fully formed animals. Still some, like Dolly, did.
And the truth is, Dolly was quite a sick sheep. She had abnormally short telomeres and, at the age of one, was judged as older than her chronological age by several criteria. Sheep normally live ten to twelve years, but at six, poor Dolly developed tumors in her lungs and had to be put down. It turns out, however, that Dolly was not the only sheep cloned. There were also the lesser-known Daisy, Diana, Debbie, and Denise, who, surprisingly, all lived healthy lives with a normal life span. This suggests that, at least in principle, it may be possible to reverse the effects of aging and reset the clock even if you start from an adult somatic cell, just by reprogramming the cell. Erasing the epigenetic marks and initiating a new program of gene expression can enable a newly cloned animal to begin from scratch.
Cloning, though, is not the main aim of reprogramming cells, even for farm animals or crops. The real payoff would be in using stem cells for regenerative medicine: repairing or replacing tissue that has died or sustained damage. If we can overcome the technical problems, the possibilities are enormous and wide-ranging. Perhaps we could introduce new pancreatic cells that produce insulin in patients with diabetes, replace damaged heart muscles after a heart attack, or even regrow neurons in people who have suffered a stroke or a neurodegenerative disease like Alzheimer’s. The potential for such breakthroughs is why billions of dollars are being invested in stem cell research today.
Even though they’re not going all the way back to zero and creating a new cloned animal, these stem cells are effectively trying to reverse the aging clock by regenerating or even replacing individual parts of an animal that have aged. Both embryonic stem cells and induced pluripotent stem cells (iPS cells) are capable of differentiating into numerous cell types, but the two are not exactly the same. ES cells are natural early embryonic stem cells that scientists have figured out how to keep cultured and then program to follow different paths to make different tissues, whereas iPS cells are reprogrammed not by the action of factors in the egg but by using the four Yamanaka factors in a somatic cell. This means their behavior is not exactly the same. Still, because of the convenience of generating iPS cells (without the added burden of having to contend with the legal and ethical issues surrounding ES cells), many scientists are working hard to improve Yamanaka’s original method for reprogramming cells.
We will soon see how scientists are trying to reverse aging using this approach. There is also much interest in reprogramming the cell by using specific compounds that inhibit DNA methylation or histone deacetylases. This route to rejuvenating tissues, and even the whole animal, is a major focus of current research. As with telomerase, it may well be the case that our epigenetics have evolved to strike a fine balance between reducing the risk of cancer early in life and accelerating aging. Thus, any approaches to slow down aging or attempt to reverse it by rejuvenation may have to contend with how to do it safely. Indeed, many tissues that have been generated using the four Yamanaka factors have been associated with an unusually high proportion of tumors.
In the last three chapters, we have seen how the genetic program that controls life can be disrupted by damage to our genome, accumulated with age. We have seen how the program itself is modified on the fly to suit the organism’s needs at any given stage. The product of the program is the ensemble of proteins in our cells. These proteins carry out a huge number of complex and interconnected tasks and are like players in a large symphony orchestra.
Now we will see what happens when that orchestra becomes discordant and breaks down.
6. Recycling the Garbage
These days, whenever I forget an appointment or misplace my gloves, umbrella, or hat, I panic for a moment. I have just turned seventy as I write this, and these occurrences immediately strike me as signs of an inevitable and worsening decline. I cheer up when I remember that in my early twenties, I once invited a friend to dinner, forgot about it, and wasn’t even home when he called; or that a couple of years later, I was so preoccupied with finishing my work that I forgot to attend my own going-away party that a neighbor was going to throw for me. And that I’ve been notorious for losing things all my life.
Still, there is a good reason for my foreboding. We all face the prospect of suffering from neurodegenerative diseases that cause us not just to forget but also to completely lose our sense of who we are.
Today more than 50 million people suffer from dementia, and as the proportion of older people in the population is increasing in almost every country in the world, that number is expected to grow to 78 million by 2030 and 139 million by 2050. In England and Wales, it recently overtook heart disease as the leading cause of death, partly because treatment of heart disease has vastly improved, while there is still no effective treatment for dementia. In the United States, it still lags behind the more established killers such as heart disease, cancer, and accidents, but its proportion is gradually rising. It is estimated that about one-third of people born in 2015 will go on to suffer from some form of dementia.
Over half of those with dementia have Alzheimer’s disease, named after the German psychiatrist Alois Alzheimer, who, around 1900, characterized the onset of the then-unnamed disease. His patients, he wrote, would oscillate from periods of calm and lucidity to being unable to identify common objects, feeling increasingly disoriented, forgetful, agitated, and even unhinged. That is just the beginning. As the disease progresses, many Alzheimer’s sufferers are unable to recognize their family and friends. They can no longer carry out basic activities such as speaking, eating, and drinking. They become increasingly terrified at their loss of control, their loss of self-identity, and their increasing inability to make sense of the world around them. Their loved ones may have it even worse, though, having to watch this person—a spouse, a grandparent, a cherished friend—gradually vanish.
In the century-plus since Dr. Alzheimer’s description, we have made tremendous progress in understanding the biology behind Alzheimer’s disease. The same is true of other neurodegenerative maladies, such as Parkinson’s and Pick’s diseases. They all have two things in common: the likelihood of the disease increases as we grow older; and they are caused by a malfunction of our own proteins.
Proteins, as we have seen, are long chains of amino acids that miraculously fold up as they are made. Well, not miraculously. The reason that they fold up is that some amino acids, like oils, are hydrophobic, meaning that they do not like to be exposed to water. Hydrophilic amino acids, on the other hand, are happy to interact with water molecules. As a protein chain emerges, it folds into its characteristic shape by tucking away most of the hydrophobic amino acids on the inside of the protein and exposing the hydrophilic ones on the outside where they are in contact with the surrounding water. Most protein chains have a particular shape or fold that is stable and functional. Sometimes a protein chain folds up along with others to form a complex of several chains. But the principle is the same. In an amazing display of coordination, each of our cells makes not one but thousands of proteins in the amounts it needs and at the time it needs them, and they all must work together as a well-orchestrated ensemble. But the process can, of course, go wrong.
Think of the many ways a household item can become useless. Even a brand-new product can be poorly made and arrive saddled with manufacturing defects. You could damage it accidentally while using it. Or it could slowly wear out or rust and become dangerous to use or stop working entirely. Then there are products, once essential, that we no longer need. Perhaps our children have grown up, and we no longer require baby bottles or cribs. Or technology has changed, and we have no use for a cassette recorder or a film camera. Or our possessions simply go out of style, and we no longer want them. Food has an even shorter shelf life. In our daily lives, we deal with all this as a matter of course. We throw out leftover food that has perished, mend or throw out old clothes, and fix or get rid of broken gadgets. If we didn’t do that, our homes would quickly fill up with junk and become unlivable.
It is the same with cells and their proteins. Proteins can have manufacturing defects too. The protein chain may be made incorrectly or be incomplete. It might not have folded into its appropriate shape. During its lifetime, it could lose its shape by unfolding or be damaged by chemicals or other agents. Just as we may need items only during a particular phase in our lives, many proteins are needed only briefly at a particular stage during a cell’s development or in response to some environmental stimulus. And just as we dispose of or recycle products that are faulty or have simply worn out or been damaged, the cell has evolved ways to detect and then destroy proteins that are defective to begin with or when they become aberrant later. It also has ways of getting rid of perfectly normal proteins that it no longer needs. In all these cases, the cell breaks down defective proteins into their amino acid building blocks, which it can then use to make new proteins or to produce energy.
However, there are crucial differences between the proteins in a cell and a home full of household items. Manufacturers don’t usually much care what happens to their products after they are sold (except during the warranty period, of course). Moreover, the manufacturer of your washing machine does not have to make it compatible with other appliances and therefore isn’t concerned about which brand of refrigerator or microwave oven you own, or whether you own one at all. Cells, on the other hand, both manufacture proteins and use them, and have to ensure that the many thousands of proteins all work together without problems.
As we age, the quality control and recycling machinery of the cell deteriorates, leading not only to neurodegenerative but also many other diseases of old age, including inflammation, osteoarthritis, and cancer. Accordingly, the cell has come up with multiple ways of ensuring the quality and integrity of its collection of proteins.
Proteins can be defective in many ways. The birth of a protein chain takes place on the ribosome, the large molecular machine that I have studied for the last forty-five years. As the ribosome chugs along, it reads the genetic instructions on mRNA to stitch together amino acids in a precise order to make a protein chain. The process has evolved to a high level of perfection over billions of years, but it still occasionally gives rise to defective products. Sometimes the mRNA contains mistakes; sometimes the ribosome misreads it. In these cases, the newly made protein has the wrong sequence of amino acids, so it malfunctions—a bit like a brand-new gadget with a manufacturing defect. These days, many of my colleagues and I are trying to understand how the cell recognizes these mistakes and homes in on them for removal.
Even if the new protein chain has the correct sequence of amino acids, as it emerges from a tunnel in the ribosome, it still faces the challenge of folding into its proper shape. Although the protein chain contains within it all the information needed to form that shape, the process doesn’t usually work spontaneously. With larger proteins, it is difficult to keep the hydrophobic sections from different parts of the chain apart so that they do not stick to one another (or even worse, to other chains that are being made at the same time) while the protein is folding. There are many ways that the folding process can go awry, so cells ranging from bacteria to humans have evolved special proteins whose purpose is to assist other proteins to fold correctly. Ron Laskey, one of my fellow scientists in Cambridge, humorously named these proteins chaperones. (Among other things, Laskey is a folk singer who has written and recorded witty songs about life as a scientist. One of his songs is about how, as a young man, he was part of a double bill with Paul Simon in a small venue in England when neither of them was well known—and realized immediately that he had better stick to science.) Like Victorian chaperones during courtship, these proteins prevent improper interactions between different parts of the chain or between chains. Even so, proteins occasionally misfold.
Even after a protein has already folded into the right shape, you can make it unfold. The proteins in a chicken egg are all folded correctly to carry out their collective function of helping a fertilized egg grow into a chick. But if you take that egg and boil it, its proteins unfold. Similarly, if you add lemon juice to milk and stir, the acid unravels the proteins in the milk. In either case, when the protein chains unfold, the water-avoiding hydrophobic amino acids that were on the inside now become exposed to the surrounding liquid. This makes the proteins stick to one another and become tangled, and the egg or milk turns into a gelatinous solid.
Even without being boiled or treated with acidic lemon juice, proteins are not rocklike, static entities. The atoms in a protein jiggle around all the time, and the proteins themselves breathe and oscillate around their average shapes. Over time, they can unfold, either spontaneously or in response to environmental stress. Often the proteins will then fold back into their original shapes, but sometimes they will clump together instead. As we age, more clumps means more proteins that have lost their function. Even more seriously, the protein aggregates themselves can lead to diseases such as dementia.
We can thus have proteins that are incorrectly made to begin with, or proteins that misfold later. But that’s not all. Many proteins have extra sugar molecules added to specific points on their surface after they are made. This process, called glycosylation, is essential for their work. But as we age, sugar molecules are added randomly to proteins, a process called glycation, to distinguish it from the normal and orderly process of glycosylation. Glycation causes a number of common health problems. For instance, eye diseases such as cataracts and macular degeneration result from proteins in the lens or retina of our eye being modified by sugar molecules, which changes their properties and prevents them from functioning normally. These proteins too need to be recognized and destroyed before they become a problem.
The first line of defense are the chaperones, which refold misshapen proteins into their correct shapes. But if unfolded proteins accumulate, more drastic action becomes necessary. Cells have an elaborate sensor to detect the buildup of unfolded proteins. The unfolded protein response, as this is known, is multipronged: First, more chaperones are synthesized to aid in folding these aberrant proteins. Second, they are tagged and targeted for destruction. Since there is clearly a problem with proteins folding properly, the cell also slows down protein production or shuts it down entirely. In extreme cases, where these measures are inadequate, the unfolded protein response can simply direct the cell to commit suicide.
How can a cell destroy proteins that it senses as defective or unwanted? When it senses that something is wrong, it tags the protein with a molecule called ubiquitin, which is itself a small protein. Ubiquitin was discovered in the mid-1970s and got its name from the fact that it was ubiquitous—scientists found it in almost every tissue they examined. It seemed to have something to do with regulating proteins in the cell, but exactly how wasn’t clear.
Eventually researchers discovered a huge molecular machine called the proteasome, which acts as a giant garbage disposal. When a ubiquitin-tagged protein is fed into the proteasome, it gets chopped up into pieces that can be recycled. Of course, you can imagine that such a powerful degrading machine could be quite dangerous if it were free to act on proteins at will. So the entire process is highly regulated. It is used not just for defective proteins but also for perfectly functional proteins that are no longer required.
Any defect in the proteasome or the ubiquitin tagging system means that unwanted proteins hang around the cell and cause problems. Proteasome activity declines with age, and we have reason to believe it is a cause of aging. Deliberately introducing defects in the proteasome or the ubiquitin tagging machinery can be lethal, and even minor defects can lead to diseases associated with old age, such as Alzheimer’s and Parkinson’s.
The ubiquitin-proteasome system is beautifully tuned to get rid of unwanted or aberrant proteins. It works by chewing away the strand of a single protein at any given time. Like the garbage disposal in your kitchen sink, it can handle only one scrap at a time. But what if a cell wanted to get rid of a lot of very large junk, much as we would want to get rid of a used sofa, old furniture, or appliances? Not to worry. Nature has this covered with an apparatus that, oddly enough, was discovered decades before the proteasome.
Scientists have long known that cells from higher organisms have a nucleus that contains our chromosomes, but as they studied the cell in greater detail with ever more powerful microscopes, they discovered that they have many other specialized structures called organelles. How these structures worked together to facilitate cell function remained a mystery. One of those structures turned out to be hugely important for recycling the cell’s garbage.
In 1955, Christian de Duve, who split his time between Rockefeller University in New York and the Catholic University of Leuven in Belgium, discovered an organelle called the lysosome. He and his Leuven colleagues found they were full of digestive enzymes that would break down any of the major constituents of living matter. Initially the lysosome was considered rather boring—about as exciting as a landfill site in a city. But things became more interesting when scientists showed that lysosomes often contained remnants of other parts of the cell. All kinds of unwanted structures were taken to lysosomes for disposal. De Duve coined the term autophagy, from the Greek for “self-eating,” because the cell was digesting away parts of itself. But how did the cell’s garbage make its way to the lysosomes?
In the cell, membranous structures called autophagosomes form and grow in size, gradually engulfing everything the cell targets for disposal. Think of autophagosomes as large garbage trucks. The garbage they collect can be anything from protein aggregates all the way to large organelles. An autophagosome eventually merges with a lysosome to deliver its contents to be digested and recycled. If the proteasome is akin to the garbage disposal in your kitchen sink, the lysosome is the huge garbage recycling center in your city.
While this process goes on perpetually, it is highly regulated. If you stress or starve the cell, autophagy goes up. It makes sense to break down proteins and other structures and recycle their components to survive a difficult time.
However, this still doesn’t tell us how the cell decides when and what to deliver to lysosomes. Science would have to wait almost fifty years to make headway on this problem. In the late 1980s and early 1990s, Yoshinori Ohsumi, a young assistant professor at Tokyo University, hatched a clever idea.
Biology often advances by studying simple organisms that are easy to grow and mutate, and the discoveries made there can then easily be generalized to more complex ones such as humans. Ohsumi turned to that favorite of molecular biologists, baker’s yeast, in which the equivalent of the lysosome is called a vacuole. By isolating strains in which the vacuole had accumulated cellular debris, he was able to find a dozen genes that were essential for activating autophagy.
As a result of these breakthroughs, we know now that autophagy happens continuously as part of the general maintenance of the cell. Its rate can go up or down, depending on the cell’s needs. It can also be triggered when the cell needs to get rid of invading viruses or bacteria. This kind of autophagy requires special adaptor proteins that recognize these foreign objects and bring them to the autophagosome, which then delivers them to lysosomes to be destroyed. Autophagy is the only process by which the cell can destroy such enormous structures.
You might think that the only function of autophagy is to deal with problems, but it is also essential for a single fertilized egg’s development into an adult animal. Imagine that you have a perfectly serviceable house, but you want to remodel it. Maybe you’ve had a new addition to your family, or you suddenly need more space so that you can work from home during a pandemic. Or you simply want a larger kitchen. When you remodel a structure, you have to break down parts of it before you can start building. You may have to take down walls, plumbing, and counters, or get rid of furniture that won’t fit in the new space. Our cells go through this same process as they develop from that original fertilized egg into specialized cells such as neurons and muscles, which have very different internal organization and structures. Autophagy makes it happen.
In short, autophagy is used both to ensure cells develop normally and to jettison defective proteins or aging structures, as well as to destroy bacteria and viruses. It has so many essential functions that when it fails even partially, we develop serious problems, from cancer to neurodegenerative diseases.
So far, we have talked about how cells deal with proteins and larger structures that are defective or they don’t need anymore. If there are just too many defective proteins piling up, it becomes hard for the recycling machinery to keep up. In that case, it would make sense to quickly shut down the synthesis of new proteins, a bit like turning off the main water supply when you have a flood in the bathroom. Also, it makes no sense for cells to produce new proteins and grow when they face starvation or stress.
One way the cell does this is to stop ribosomes from starting the process of reading mRNA to make proteins. It is a way of slowing down the production of new proteins while it handles crises, which is a bit like seeing a traffic jam on a freeway and preventing cars from entering the on-ramp and making the problem worse. While this process shuts down the production of most proteins, it also turns on the production of proteins that help the cell survive the stress and alleviate it. In the traffic jam analogy, this would be like sending a signal that stops new cars from entering the freeway and at the same time bringing in tow trucks to clear the accident that caused the jam.
This process of shutting down the synthesis of most proteins while allowing a few useful proteins to be made can be triggered by starvation, a viral infection, or too many unfolded proteins. Since it is a unified response to many kinds of stress, it is called the integrated stress response, or ISR.
You would think that these problems with protein quality and quantity would worsen with aging, making a strong ISR useful. That is exactly what some groups have found. If you delete the genes that turned on ISR in mice, the rodents were more prone to various pathologies caused by abnormal protein production. When mice suffering from a pathology due to unfolded proteins were treated with a compound that allowed ISR to persist, it alleviated their symptoms, whereas, conversely, suppressing ISR made them worse and hastened their demise. Compounds such as guanabenz or its derivative Sephin1 that strengthen the integrated stress responses prevent diseases caused by poor quality control of protein production. They also extend life span, although in at least one case, there was disagreement about how these compounds acted, and whether they even affected ISR directly.
If all this makes a strong case for restoring or strengthening ISR as we age, some research groups have found the exact opposite. According to their studies, deleting the genes that turn on ISR alleviated some of the symptoms of Alzheimer’s disease in mice, including memory deficits. A molecule that shut down ISR enhances cognitive memory and reverses cognitive defects following traumatic injury to the brain. Even more surprisingly, the effects were seen even when the experimental drug being tested, an integrated stress response inhibitor—ISRIB, for short—was administered a month after the trauma.
Why would turning off a universal control mechanism be beneficial? Nahum Sonenberg, an expert on translation at McGill University in Montreal and a coauthor of the ISRIB study, believes there are pathological conditions in which the ISR itself is chronic and out of control. It may be suppressing protein synthesis when it shouldn’t or to a much greater degree than it should. It’s like driving a car in which the brake is activated all the time instead of only in response to a signal to slow down or an accident ahead. Instead of being a lifesaver, it becomes a nuisance. Even as we age, we still need to make new proteins. For example, forming new memories requires synthesizing new proteins that strengthen connections between brain cells. But when ISR is itself out of control, we are unable to make proteins in the amounts we need. In cases such as this, turning off ISR may be beneficial.
ISRIB has been touted in the press as a “miracle molecule” that could boost fading memory and treat brain injuries. The San Francisco company Calico Life Sciences, owned by Alphabet, the parent company of Google, started conducting clinical trials on ISRIB-like compounds that inactivated ISR. Peter Walter, one of the discoverers of the unfolded protein response and of ISRIB, recently gave up a prestigious professorship at the University of California, San Francisco, to join Altos Labs, a private company that operates research institutes to tackle aging, with campuses in California and Cambridge, England.
How this will play out is unclear. It is well to remember that ISR is a universal control mechanism precisely to deal with situations that are problematic for the cell, such as an accumulation of unfolded proteins, amino acid starvation, and viral infections. As we discussed above, initially, scientists found that prolonging ISR was beneficial for certain pathologies. So there may be situations when it would be helpful to enhance ISR and others in which it would be better to inhibit it. Figuring out exactly how much ISR is optimal at any given stage is unlikely to be straightforward, and we may have some way to go before it can be used with any confidence as a long-term treatment for combating diseases of aging.
We have covered a lot of ground in this chapter, but a common thread runs throughout. For cells to be able to function, their thousands of proteins have to work together. They must be produced at just the right time and in the right amount, and they must be the correct shapes. It is not unlike all the instruments in a symphony orchestra that all have to play their parts together. As with some modern orchestras, there is no conductor. And if parts of the orchestra don’t perform properly, the whole thing falls apart.
Everything we have discussed so far is about the different ways that cells sense when things are not right and what they do to correct that. This is an amazingly complicated web of interactions, which is itself controlled by yet more proteins. If the control proteins themselves become defective, the problems are amplified. That is just what happens as we age.
WE BEGAN THIS CHAPTER WITH the terrible scourge of Alzheimer’s disease. The disease, which is increasingly a dread of old age, turns out to be related to a curious group of diseases whose cause was uncovered in a most unexpected way. The key person to unravel its mystery was Carleton Gajdusek, a scientist with the unique and unfortunate distinction of being both a Nobel Prize winner and a convicted child molester.
After earning his medical degree from Harvard, Gajdusek was serving a fellowship in Boston when he was drafted into the army. He ended up in the Korean War, where he showed that a fever that was killing American soldiers was spread by migrating birds. On the strength of this, he was offered a job with the US government’s Center for Disease Control, but chose instead to work with the famous immunologist MacFarlane Burnet in Melbourne, Australia. Burnet sent him to Port Moresby, New Guinea, to set up part of a multinational study on child development, behavior, and disease. It could not have been easy carrying out fieldwork in such a remote area, far away from any modern research laboratory, but Gajdusek was an unusual character. Burnet once described him as someone who “had an intelligence quotient up in the 180s and the emotional immaturity of a 15-year-old,” adding candidly that his protégé was completely self-centered, thick-skinned, and inconsiderate. At the same time, said Burnet, the young man from the United States would not let the threat of danger, physical hardship—or other people’s feelings—interfere in the least with what he wanted to do.
While in Port Moresby, Gajdusek heard about a mysterious illness called kuru and set out for the Eastern Highlands Province, about 200 miles away, where the disease was prevalent among the native Fore tribe. Patients with the disease showed no symptoms of fever or inflammation but died of a progressive brain disease that caused tremors and highly abnormal behavior such as uncontrolled fits of laughter. Two anthropologists, Shirley Lindenbaum and Robert Glasse, observed that women and children, but not adult men, ate the entire bodies of deceased family members, even the bones. This was a recent practice among the Fore, and by collecting detailed evidence of cannibal feasts which could be matched with the subsequent appearance of the disease in participants, they concluded that this practice of cannibalism may have had something to do with transmission of the disease. Gajdusek and a colleague named Vincent Zigas had observed that one of the practices of the tribe was to cook and eat the brains of deceased family members following funerals. So Gajdusek suspected that something in the diseased brain was transmitting the disease to the people who ate it. Following up on this hunch, he was able to show that you could transmit kuru to chimpanzees by injecting their brains with extracts from the brains of diseased patients.
The autopsied brains of the Fore tribe, when examined under a microscope, were full of holes, like a sponge. Kuru is one of many brain diseases with this pattern, called spongiform encephalopathies, including a variant form of Creutzfeldt-Jakob disease. (Variant refers to the transmissible rather than inherited form of a disease.) About 10 percent of all cases are inherited, and just as he had done for kuru, Gajdusek was able to show that brain extracts from infected patients could transmit the disease to chimpanzees. The idea that a disease could be inherited in some instances but also transmitted like an infection in other cases was unprecedented. Gajdusek was awarded a Nobel Prize in 1976.
Unfortunately, the end of Gajdusek’s career was not so glorious. Over the course of many years, he brought back more than fifty children to the United States from New Guinea and Micronesia, and acted as their guardian. In the 1990s, in response to a tip-off from a member of his lab, the FBI began to investigate the scientist. The bureau persuaded one of the boys to tape a phone conversation in which Gajdusek admitted that he and the boy had sexual contact. In a plea bargain that would be unthinkable today, he served a year in jail in 1997 and then left the United States as soon as he was released to spend the rest of his life in Europe. During his self-imposed exile, he stayed active scientifically and was affiliated with several universities. He showed no remorse for his behavior, dismissing his treatment as American prudishness. Many of the boys continued to have contact with him, some adopting his name and even naming their own children after him. In 2008 he died in a hotel room in Tromso, Norway, where he was a frequent visitor to the university there.
Gajdusek’s concept of transmissibility had a huge impact on our thinking about this class of diseases. Mad cow disease (bovine spongiform encephalopathy) afflicted cows in Britain, notably in the 1980s, as a result of cows being fed the remnants of infected animals. Around this time, more than a hundred people died of Creutzfeldt-Jakob disease. Scientists began to suspect that this was because they had eaten meat from diseased cows. The connection with eating infected beef was then not universally accepted, and John Gummer, a UK government minister, famously encouraged his four-year-old daughter, Cordelia, to eat a hamburger on television, declaring British beef to be completely safe. (The girl did not get sick.) Nevertheless, many countries prudently banned the importation of British beef and lifted it only after several million cows had been slaughtered and farming practices had been changed.
Although the transmissibility of these diseases was established, it was not clear exactly how they spread. Ever since the nineteenth and early twentieth centuries, it has become a firm dogma that every infectious disease is transmitted by living organisms that can multiply in the host, whether they are parasites or microbial organisms such as bacteria, fungi, or viruses. In the early 1980s Stanley Prusiner, an American neurologist at the University of California, San Francisco, began trying to isolate the infectious agent for scrapie, a spongiform encephalopathy of sheep and goats. The brain extracts that transmit scrapie remained infectious even after they were sterilized using standard methods such as heat, so the prevailing view was that the infectious agent was a virus that was resistant to inactivation and had a long incubation time. When Prusiner gradually isolated the infectious agent, it turned out to be a protein—a notion that was greeted with a chorus of skepticism. After all, unlike bacteria or viruses, proteins could not multiply, so how could they possibly cause an infection that spread from one animal to another?
Over the next several years, Prusiner identified the protein and showed that although it was a normal component of brains, its shape in a scrapie-infected brain was abnormal. Prusiner called the protein a prion and proposed there were two forms: a normal version and a scrapie version. Like an evil character who corrupts all the good people around him, this aberrant, misfolded, scrapie version of the protein acts as a mold, or template, and induces each normal prion protein it encounters to switch to the misfolded version. The result is that the misfolded form spreads like an infection throughout the cell and across cells throughout the tissue, bringing about disease.
At first glance, the only commonality between diseases such as kuru or scrapie and Alzheimer’s is that they are lethal brain diseases, but as we shall see, the similarity runs deeper. Dr. Alois Alzheimer himself autopsied the brains of deceased patients and discovered deposits of plaques outside cells as well as tangles of fibrils inside some nerve cells. It wasn’t initially clear whether the formation of these deposits was a cause of the disease or a symptom.
In 1984, scientists identified that the major component of the plaques was a protein called amyloid-beta, which itself is produced by trimming a much larger amyloid precursor protein, or APP. Alzheimer’s is normally a disease of old age and not necessarily inherited, but some patients with inherited forms develop the disease earlier in life. They turn out to have mutations in the APP gene. Scientists have also identified the enzymes that trim the APP to the mature amyloid-beta and, in a nod to their involvement in causing senility, called them presenilins. Mutations in these proteins also led to familial Alzheimer’s disease. The case that the disease was caused by accumulating either too much or incorrectly processed amyloid-beta protein seemed overwhelming. Much of the research community then focused on the details of what caused the plaques to develop and how they could be prevented.
However, in science, things are often never quite so straightforward. For one thing, the plaques typically develop outside nerve cells, so why are they killing them? Another curious feature is that other tissues—for example, blood vessels—also contain amyloid-beta deposits, but it is the diseased brain that kills people. A feature of the disease that was ignored earlier on is that inside some neurons of patients, there are filaments made of a different protein called tau. Perhaps these tau filaments were the cause of the disease?
Although scientists were skeptical at first, evidence incriminating tau also began to mount when three groups found independently that patients with an inherited form of dementia related to Parkinson’s disease had mutations in the tau gene. Also, it was not hard to imagine how tau could cause disease. The tau filaments could block the narrow axons and dendrites that connect neurons, and, not surprisingly, it is these connections that are the first to go, causing cognitive impairment.
Recently, scientists have found that the filaments characteristic of diseased brains are not just random clumps of unfolded proteins. Rather, the aberrant molecules come together to form filaments that are distinct for each type of dementia. Studies show consistently that the tangles we see in diseased brains actually have very well-defined structures, each of which is a hallmark of a particular disease. This is something we did not know even a few years ago.
Therefore, as things stand, we have very compelling evidence that amyloid-beta, tau, and other filaments are implicated in disease. One problem is that nobody really understands what these proteins are doing normally. We do know that if you delete the genes for them in mice, the animals exhibit some abnormalities, but they don’t develop plaques or Alzheimer’s disease. This means that the reason amyloid-beta or tau causes disease is not because it has ceased to function normally. Rather, it is because the unfolded forms can give rise to filaments that spread throughout the brain.
Alzheimer’s and prion diseases are both caused by aberrant forms of proteins that come together to form tangles or plaques. In prion diseases, the prion form assumes a different shape from the normal form, and spreads because it switches the normal version into the prion form when it comes into contact with it. There is a growing feeling that exactly the same thing happens in Alzheimer’s and other neurodegenerative diseases: an abnormal, unfolded form can seed the formation of filaments, which then spread throughout the brain. Injecting brain extracts from Alzheimer’s disease patients into mice stimulates the premature formation of plaques or tangles. But, unlike prion diseases such as kuru and bovine spongiform encephalopathy, nobody has demonstrated that Alzheimer’s, Parkinson’s, or similar diseases are actually infectious. That could be because we don’t eat the brains of patients with dementia or inject extracts of their diseased brains into our own.
What causes Alzheimer’s disease is a burning question because that holds the key to preventing it. The answer depends on how you define cause. The immediate cause may well be the formation of tau or amyloid-beta filaments in the brain. However, an earlier and root cause is the cell’s inability to manage the excess of unfolded proteins that aggregate to form these filaments in the first place. This in turn is caused by damage to our control systems: the quality control and recycling machinery of the cell that we discussed earlier in the chapter. And that damage to our control systems is a result of aging.
So you could say it all boils down to our living long enough for the damage to occur. It is particularly ironic that one of the consequences of our increased life expectancy over the last century is the greater likelihood of spending our final years with the terrible effects of diseases such as Alzheimer’s.
Can anything be done about it? The difficult truth is that there are still no effective treatments for these dementias, despite several decades of work. Just as cancer is so hard to treat because it is our own cells that have gone out of control, Alzheimer’s is caused by our own proteins misbehaving. And just as with cancer, there may be both genetic factors and chemicals or infectious agents that accelerate the process. This creates a fundamental difficulty for treatments. Very recently, therapies based on antibodies that bind to the amyloid-beta protein were shown to halt cognitive decline by about 25 percent after eighteen months. They were most effective at slowing the progression of the disease if treated early, and in patients that had only a modest level of tau aggregates. They carried a serious risk of side effects, including seizures and bleeding in the brain. However, they did demonstrate that targeting beta-amyloid showed some clinical effect, and against the bleak backdrop of having next to nothing to offer Alzheimer’s patients, even an expensive and complicated treatment with a relatively modest gain was heralded as a huge breakthrough.
All the recent breakthroughs in our understanding the basis of the disease offer some hope, however. Now that we know that the filaments are not random but consist of very specific contacts to form their structure, perhaps drugs can be developed to prevent their formation. Others are attempting to inhibit the production of the protein itself. And scientists are busy at work on the ultimate causes as well, including how to modify aging cells so they can handle aberrant proteins as effectively as younger cells do. We also need to identify suitable biomarkers that are an early warning of incipient disease. As we learn much more about the underlying biology involved, we can be hopeful that we will find more ways to prevent the disease in the first place, and diagnose it early and treat it when it occurs.
7. Less Is More
The India in which I grew up is a land of many religions, and there never seemed to be a time when one or another group wasn’t fasting. Hindus fasted before certain religious occasions—or if they were strict, every week. Muslims fasted from dawn to dusk for the entire month of Ramadan, not drinking a drop of water even when the holiday fell amid the long, hot summer days of the subcontinent. Christians fasted during Lent. And fasting was not only a religious imperative. Nearly all cultures considered fasting, and moderation in general, a key to a long and healthy life, and gluttony to be a vice.
For much of our existence as a species, we were hunter-gatherers, feasting occasionally between prolonged periods of involuntary fasting. Perhaps our metabolism evolved to adapt to that lifestyle. It is different today, especially in the rich countries of the West. Like millions of others, I gained an inordinate amount of weight during the early days of the Covid-19 pandemic, when most people were stuck at home, and food was only as far away as the refrigerator. Indeed, today we face a widespread epidemic of obesity, which is linked not only to cardiovascular disease and type 2 diabetes but also to certain cancers and even Alzheimer’s disease. It is also a major risk factor in infections: Covid-19 patients who were obese were far more likely to die from the virus. Clearly it has far-reaching consequences, both for ill health in old age and our likelihood of dying from those disorders.
The reasons for the rise in obesity in recent times are complex. One popular theory is that throughout most of our history, food was scarce and sporadic, and those who had “thrifty genes” that could store fat more efficiently could better survive times of scarcity. Now, in a time of plenty, those very genes efficiently keep storing away all the excess fat we eat and cause obesity. This idea was so prevalent that it became a truism, but it is now being questioned. Even today, less than half the population in the United States is obese. John Speakman, who has studied the relationship between energy intake and weight in organisms, has argued convincingly that it is simply that the population had a lot of genetic variability in how efficiently they could store fat, a variability he calls “drifty genes.” When food was generally scarce, even those individuals who might be prone to becoming obese rarely were. But now, an abundance of calorie-rich food has driven a rise in obesity, especially in the portion of people who have inherited genes that in previous eras would not have caused any harm. Also, historically there was no reason for us to have evolved to be abstemious.
Regardless of the reasons for the rise in obesity, nobody doubts that moderation and maintaining a healthy weight are recipes for good health. Clearly, overeating is bad for your health, but is the converse also true? Would stringently restricting our diet to less than what we eat normally actually make us live much longer? The first studies to test this, carried out in 1917, were not taken seriously, perhaps because for most of our existence as a species, being undernourished was a much greater threat to life than overeating. Nevertheless, the idea persisted, and later studies showed that rats fed a calorie-restricted diet lived longer and were healthier than those allowed to eat without limit.
During caloric restriction, or CR, an animal is fed 30–50 percent fewer calories than it would consume if it ate as much as it liked (ad libitum), while making sure that it consumes enough essential nutrients to not become malnourished. In rodents and other species, animals on CR lived 20–50 percent longer, as judged by both average life span and maximum life span. Moreover, they appeared to have delayed the onset of several diseases of aging, including diabetes, cardiovascular disease, cognitive decline, and cancer.
Mice are small, however, with short life spans. What about animals more similar to us? In 2009 a long-term study from the University of Wisconsin found that rhesus monkeys lived longer and were healthier and more youthful when subjected to caloric restriction. But this was contradicted only a few years later by a twenty-five-year study at the National Institute on Aging (NIA). The Wisconsin diet was richer and had a higher sugar content, so perhaps eating a healthy diet rather than fewer calories might have made the difference. The NIA control animals were not allowed to eat ad libitum but were fed an apportioned amount to prevent obesity. More than 40 percent of the Wisconsin control group developed diabetes, while only 12.5 percent of the NIA control group did. In tandem, the studies suggest that for animals already on a healthy diet and not overweight, further caloric restriction has little additional effect on longevity. Interestingly, all the animals in both groups, even the CR animals, weighed more than animals found in the wild, suggesting that even the restricted diet provided more food than they would eat naturally.
Experimenting with monkeys is hard enough. They can live between twenty-five and forty years, and the studies from NIH and Wisconsin have gone on for over two decades and already cost millions of dollars. Conducting similar studies with humans—who live more than twice as long and whose dietary intake is much harder to track—seems out of the question. Any evidence for the effect of CR on human longevity is purely anecdotal at this point, but that hasn’t stopped individuals from experimenting on themselves and even writing books to tout their lifestyles.
There have also been persistent claims that fasting is beneficial for health beyond simply reducing the overall intake of food. There is 5:2 fasting, whose adherents eat as little as 500–600 calories per day twice a week but eat normally on the other five. Another method advocates eating all your food in a window of a few hours each day. Recently, scientists examined the effects not just of CR and intermittent fasting in mice but also of aligning feeding times to their daily biological rhythms. They concluded that matching feeding times to our biological circadian rhythm greatly improved the benefit of intermittent fasting. This might seem like the home run the field wanted, but, as the accompanying commentary points out, much of the additional benefit may have nothing to do with the time of feeding as such. Rather, if you allowed mice to eat only during the day—when they would normally be asleep—they were faced with the unenviable choice between starving and not sleeping. The test animals chose to disrupt their sleep. Even if you distributed the restricted diet throughout the twenty-four-hour period, the mice would not get enough to eat when they were awake and would choose to disrupt their sleep to get the rest.
I know what a wreck I am when I am sleep deprived. As I get older, my problems with jet lag are getting worse, and I am barely able to function right after I show up on some other continent. So I am always struck by how sleep, which is so intimately related to our health, is ignored by scientists in other fields. We think of sleep as something that is connected with our brains and especially our eyes and vision. But as Matthew Walker explains so well in his book Why We Sleep, you don’t need a brain or even a nervous system to sleep. In fact, sleep is ancient and highly conserved across the entire kingdom of life. Even single-celled life forms follow a daily rhythm that is related to sleep. Considering that sleep can be perilous—animals are vulnerable to attack when they are asleep—it must have huge biological benefits for it to persist through evolution. The consequences of sleep on our health are profound and widespread. In particular, sleep deprivation increases the risk of many diseases of aging, including cardiovascular disease, obesity, cancer, and Alzheimer’s disease. According to a recent study, one of the ways that a lack of sleep accelerates aging and death is by altering repair mechanisms that prevent the buildup of damage to our cells.
But going back to the study matching feeding times with when mice are awake, although it did not explicitly monitor the sleep patterns of the mice, the researchers suggest that as long as you don’t deliberately disrupt sleep, CR has a significant positive effect on both health and longevity. Over the decades, study after study have confirmed the benefits of CR over an ad libitum diet in multiple species.
If all this seems too good to be true, it might be. In one study, the effects of CR varied greatly depending on the strain and sex of the mice; in fact, in a majority of the test animals, CR actually reduced life span. Indeed, one of the pioneers of the aging research field, Leonard Hayflick, expressed skepticism that dietary restriction had any effect on aging. He felt that animals on an ad libitum diet were overfed, and unhealthy as a result, and caloric restriction simply brought their diets closer to conditions in the wild. Moreover, when scientists look outside typical lab conditions to animals in the wild, the link between eating less and living longer becomes much more tenuous.
Nevertheless, in multiple laboratory studies, at least compared to an ad libitum diet, CR appears to be beneficial not only in rats and mice but also in diverse organisms ranging from worms, to flies, to even the humble unicellular yeast. Most scientists working on aging agree that dietary restriction can extend both healthy life and overall life span in mice and also leads to reductions in cancer, diabetes, and overall mortality in humans. On a more granular level, limiting protein intake or even just reducing consumption of specific amino acids such as methionine and tryptophan (both of which are essential in our diets because our bodies don’t produce them) can confer at least some of the advantages of overall dietary restriction.
It might seem counterintuitive that eating the bare minimum to avoid malnutrition would be good for you. In fact, the results of CR may be yet another example of the evolutionary theories of aging. Consuming lots of calories allows us to grow fast and reproduce more at a younger age, but it comes at the cost of accelerated disease and death later on.
So why aren’t we all on CR diets? For the same reason that rich countries face an epidemic of obesity: we now live in a time of plentiful food, and we have not evolved to be abstemious. Moreover, caloric restriction is not without its drawbacks. It can slow down wound healing, make you more prone to infection, and cause you to lose muscle mass, all serious problems in old age. Among its other reported downsides are a feeling of being cold due to reduced body temperature, and a loss of libido. And, of course, a side effect that to most readers will seem blindingly obvious: people on calorically restricted diets feel perpetually hungry. In fact, animals on CR diets all revert to eating as much as possible when permitted.
The anti-aging industry would love to produce a pill that can mimic the effects of CR without our having to forego the ice cream and blueberry pie. For that to happen, we need to understand exactly what caloric restriction does to our metabolism. It’s a story full of unusual twists and turns and the discovery of some completely new processes in our cells.
IN 1964 A GROUP OF Canadian scientists set out on a voyage to Easter Island, a remote spot in the South Pacific that is about 1,500 miles away from its nearest inhabited neighbor. Their goal was to study the common diseases of the island’s Indigenous people, who had little contact with the outside world. In particular, they wanted to know why the islanders did not develop tetanus, even though they walked around barefoot. The researchers collected sixty-seven soil samples from different parts of the island. Only one of them had any tetanus spores, which are typically more common in cultivated soil that has less diversity of microbes than virgin soil does. Nothing further might have come out of this expedition had not one of the scientists given the soil samples to the Montreal lab of Ayerst Laboratories, a pharmaceutical manufacturer. The company was looking for medicinal compounds produced by bacteria. By then, it was well known that soil bacteria, notably the genus Streptomyces, produced all kinds of interesting chemicals, including many of the most useful antibiotics today. Part of the reason they produce them is thought to be biological warfare among soil microbes, where some species make compounds that are toxic to others.
To identify anything useful from an unknown bacterium in a soil sample, you first have to isolate it and coax it to grow in the lab. Then you need to analyze the hundreds or thousands of compounds that it makes and screen them for useful properties. Through this painstaking venture, the Ayerst scientists found that one of the vials contained a bacterium, Streptomyces hygroscopicus, that made a compound that could inhibit the growth of fungi. Because fungi are more similar to us than bacteria are, it is hard to find compounds that will treat fungal infections without also harming our own cells. So it seemed worthwhile to follow up on their initial observation. It took Ayerst two years to isolate the active compound, which the company named rapamycin after Rapa Nui, the Indigenous name for Easter Island.
The scientists soon discovered that rapamycin had another, potentially much more useful property. It was a potent immunosuppressant and stopped cells from multiplying. Suren Sehgal, a scientist at Ayerst, sent off some of the compound to the US National Cancer Institute. Researchers there found the drug to be effective against solid tumors, which are ordinarily difficult to treat. Despite these promising early results, work on rapamycin ground to a halt when Ayerst closed its Montreal lab and relocated the staff to a new research facility in Princeton, New Jersey, in 1982.
Sehgal, however, was convinced that rapamycin was going to be useful. Just before moving to the States, he grew a large batch of Streptomyces hygroscopicus and packed it into vials. At home, he stored them in his freezer next to a carton of ice cream, with a label cautioning, “Don’t Eat!” The vials remained there for years. In 1987 Ayerst merged with Wyeth Laboratories, and Sehgal persuaded his new boss there to pursue rapamycin. He was given the go-ahead to look at its immunosuppressive properties, which could be useful to prevent transplant rejection. Eventually rapamycin was approved as an immunosuppressant for transplant rejection, but nobody had any real idea of how it worked. How could it inhibit the growth of fungi, prevent cells from multiplying, and be an immunosuppressant, all at once?
Here our story shifts to Basel, Switzerland, where two Americans and an Indian chanced upon an unexpected breakthrough. One of the Americans, Michael Hall, had an unusually international childhood: he was born in Puerto Rico to a father who worked for a multinational company and a mother who had a degree in Spanish. They both liked Latin American culture and decided to make their home in South America, where Hall grew up, first in Peru and then in Venezuela. When he was thirteen, his parents decided he needed a rigorous American education; Hall was suddenly ejected from his carefree life wearing T-shirts, shorts, and sandals in warm and sunny Venezuela, and dropped into a boarding school in the freezing winters of Massachusetts. From there he attended the University of North Carolina, intending to major in art but eventually settling on zoology, with the intention of going to medical school. An undergraduate research project whetted his appetite for science, and Hall went on to earn a PhD from Harvard and then put in time pursuing postdoctoral research at the University of California, San Francisco. In between, he spent almost a year at the famous Pasteur Institute in Paris, where he met Sabine, the Frenchwoman who would become his wife. Thus, unlike many American scientists who see leaving the United States as equivalent to falling off the map, Hall cast a broad net in the job search that followed his postdoc. He had not originally thought of moving to Switzerland, but when he interviewed for a starting faculty job at the Biozentrum at the University of Basel, he fell in love with the institute and the city.
Shortly after he started his lab in Basel, Hall was joined by another young American, Joe Heitman, who was in an MD-PhD program that combined medical studies at Cornell Medical School with research at Rockefeller University. After his PhD research, rather than go back immediately and finish his medical degree, Heitman decided to do some postdoctoral research, partly because his wife would be starting her own postdoctoral work in Lausanne, Switzerland. Looking for suitable labs in the vicinity, he identified Hall as someone he wanted to work with. His initial project there turned out to be frustrating, however, and Heitman briefly considered going back to medical school, when he read a scientific paper describing mutants of a mold, Neurospora, that were resistant to the immunosuppressive drug cyclosporine. He approached Hall with the idea of studying immunosuppressants using yeast.
By sheer chance, Heitman could not have found a more receptive mentor. It turned out that cyclosporine was a blockbuster drug for Sandoz, the pharmaceutical company located right in Basel, and Hall had already begun working with a scientist there who was interested in how it and other immunosuppressants worked. That scientist, Rao Movva, who grew up in a small village in India, had already enjoyed quite a bit of success in using yeast to understand the mechanism of cyclosporine, and he was keen to study rapamycin, which was still being developed for use in patients.
To most in the field, this must have seemed a crazy idea. What could yeast—a unicellular organism that doesn’t have an immune system—teach them about immunosuppressive drugs and human beings? But Hall points out that these compounds were produced as part of biological warfare among soil microbes, so, really, yeast was their natural target; it is administering them to humans that is actually unnatural. As soon as Heitman had expressed interest in the problem, Hall put him in touch with Movva. This was a huge advantage, because at a large pharmaceutical company such as Sandoz, Movva had the resources to produce enough rapamycin. One day he came into Hall’s lab with a small vial and told Heitman, “Okay, this is the world’s supply of rapamycin. Think very carefully about the next experiments you’re going to do. Don’t blow it, because this is all we have.”
The gamble paid off. The trio looked for mutant strains of yeast that would grow even in the presence of rapamycin, and their experiments revealed that many of the mutations occurred on two closely related new genes that coded for some of the largest proteins in yeast. Names of genes and proteins from yeast typically consist of a three-letter acronym that makes little sense to those outside a particular field. In this case, from a long list of possibilities, they chose TOR1 and TOR2, to denote “target of rapamycin.” The names held additional appeal for Heitman because he lived near one of the picturesque medieval gates of Basel, and the German word for gate is Tor.
This was a big breakthrough. Rapamycin’s immunosuppressive activity was thought to derive from its ability to inhibit cell growth. The compound also arrests yeast growth, however, so identifying its protein targets would enable scientists to understand exactly how. The mutants identified two genes, but without cloning and sequencing them, nothing was known about the proteins they coded for, let alone what they did.
At this point, the problem almost fizzled out in Hall’s lab. Heitman stayed as long as he could, but he had to return to New York to finish his medical studies. At the time, although it was acknowledged that rapamycin was a potentially important immunosuppressive drug, nobody had any idea of how important their discovery would turn out to be. Meanwhile, Heitman’s mutants were sitting in the lab freezer until a new student was frustrated when her original project was not working. She, along with another student and others in the lab, used the mutants to clone and sequence the TOR1 and TOR2 genes. In those days, sequencing had to be done manually. What’s more, this was no trivial project, because they were both among the largest genes in yeast, and were similar but not identical. One of them was lethal when deleted, proving that it was essential in order for yeast to survive, while the other was not.
Understanding the mechanism of an immunosuppressive drug that was also a potential anticancer drug was of great medical importance, so while Hall and his colleagues carried on their work, they were participants in an intense race to discover the target of rapamycin. Three groups in the United States directly purified the protein target of rapamycin in mammals. It turned out to be the mammalian counterpart of the genes that Hall and his colleagues had identified. Now, scientists can be fiercely competitive and don’t like to come in second place. It’s a bit like leading the second expedition to climb Mount Everest or being the second pair of astronauts to walk on the moon—you just don’t get the same level of recognition. In the case of the two genes, prickly egos and difficulty accepting one’s also-ran status led to a profusion of names in the field, sowing confusion.
The US research groups realized that they had discovered the mammalian version of essentially the same protein that Hall and his colleagues had identified already. Nevertheless, some of them gave it entirely different names. Eventually they all agreed to christen it mTOR, with the m denoting “mammalian,” to distinguish their findings from the yeast TOR. When the same protein was identified in a variety of organisms, including flies, fish, and worms, things began to get a little silly, with scientists studying zebrafish calling their version zTOR or DrTOR (the scientific name for zebrafish is Danio rerio). Eventually everyone settled on mTOR for all species—except, paradoxically, the original yeast!—with the m now standing for mechanistic, which makes no sense at all, since it implies that there is also some other target of rapamycin that is nonmechanistic (whatever that means). Why they didn’t simply revert to the original TOR remains a mystery to me. For consistency, and in deference to the original discoverers, I will refer to the molecule as TOR, but if you read elsewhere about TOR with a small letter before it, it is basically referring to the same protein.
From the start, it was known that rapamycin would prevent cultures of cells from growing, but it wasn’t clear how. Did it limit the number of cells or the average size of each cell? At first, Hall thought that rapamycin would simply stop cells from dividing, but after pushback from a famous expert in that field, he realized that TOR actually controlled cell growth by activating the synthesis of proteins in the cell when nutrients are available. Among other things, Hall and his colleagues showed that in the presence of rapamycin, or mutants of TOR, cells would appear starved and stop growing even when plenty of nutrients were available.
Biologists have known for a very long time that the size and shape of cells is highly controlled. Cell size varies not only in different species but also in different tissues and organs. For example, an egg cell is about thirty times the diameter of the head of a sperm cell, and neurons can have protrusions, the nerve axons, as long as three feet. How cell size and shape are controlled is still a very active area of research. But the general belief was that cells would simply keep growing and dividing as long as you provided them nutrients—unless, that is, they received specific signals to stop growing. Hall’s experiments turned this dogma around. Cell growth, they suggested, was not passive; rather, TOR had to actively stimulate it, by sensing when nutrients were present.
It is a bit like the difference between an old steam locomotive and a gasoline-powered car. Once a locomotive gets going, as long as it has plenty of burning coal in the furnace and water in the boiler, it will keep rumbling down the track unless you take action to stop it. But a car, even with a full tank of gas, requires a foot on the accelerator in order for the vehicle to remain in motion; you have to actively do something to use the fuel. TOR is the driver that presses on the gas pedal to ensure that available nutrients are used to drive cell growth.
Hall’s conclusions represented a paradigm shift in our understanding of how cells grow and ran counter to decades of understanding. His paper was rejected seven times before it found a home in the journal Molecular Biology of the Cell in 1996. Around the same time, Hall also collaborated with Nahum Sonenberg, the same scientist we encountered in chapter 6 for his studies on the integrated stress response, and who is best known for his work on how ribosomes initiate; in other words, how they find the beginning of the coding sequence on mRNA and start reading it to make proteins. They found that without TOR actively making it possible, cells could not begin the process of translating mRNA to produce proteins, and would stop growing.
The initial discoveries by Hall and the other groups opened up the floodgates. Since then, TOR has become one of the most studied molecules in biology with about 7,500 research articles in 2021 alone. There is no question that finding out how rapamycin was immunosuppressive was important. But not even the brilliant scientists first working on it could have imagined that they would later uncover one of the oldest and most important metabolic hubs of the cell. In metabolism, proteins seldom act in isolation; they influence the actions of other proteins. If you think of such proteins as nodes that connect to one another—picture an airline map of its routes—TOR would be a major hub like London, Chicago, or Singapore, making direct connections to a large number of cities all over the world.
How could one protein have such widespread effects on the cell, and how exactly was it linked to caloric restriction? Ever since Michael Hall and his colleagues sequenced the two TOR genes, we have known that TOR is a member of a family of proteins called kinases. These enzymes often act as switches by adding phosphate groups to other proteins, which then act as tags or flags to turn them on or off. (The act of adding phosphate groups is called phosphorylation, and the proteins with the added phosphates are described as phosphorylated.) Sometimes kinases activate other kinases, which in turn activate other enzymes. You can think of kinases as part of a huge relay system, where many different proteins in a large network are turned on or off in response to some cue in the environment or the state of the cell. A map of all the proteins involved in activating or being activated by TOR is enormously complicated. So it is not surprising that by responding to many different environmental cues and then switching on or off many different targets, TOR has such widespread effects within the cell. Some of these environmental cues are not sensed directly by TOR but by other proteins, which in turn activate TOR.
TOR is not a protein chain that functions all by itself. It is part of two larger complexes called TORC1 and TORC2. Much more is known about TORC1, which is activated by proteins that sense the level of nutrients such as individual amino acids and hormones, including those that stimulate growth, known as growth factors. It is also affected by energy levels in the cell. If conditions are right, TORC1 promotes the synthesis not only of proteins but also nucleotides, which are the building blocks of DNA and RNA, and also lipids, which make up the membranes of all cells and organelles.
An important function of TOR is that when nutrients are available and the cell is not stressed, it inhibits autophagy, which, as you learned in chapter 6, is the process by which damaged or unneeded components of the cell are taken to the lysosome to be destroyed and recycled. This makes sense because these are exactly the conditions in which you want to stimulate cell growth and proliferation, not the opposite.
We can now see how TOR is connected to caloric restriction. Under CR, there are fewer nutrients around, and TOR, recognizing that, can switch off protein synthesis and other growth pathways, and also green-light autophagy. We have already seen how important both controlling protein synthesis and clearing defective proteins and other structures through autophagy are to keep the cell working optimally, and to aging in general.
But what if we didn’t need caloric restriction to reap its benefits—if we could inhibit a normal TOR and mimic its effects, with no change to the human diet? TOR was discovered precisely because it was the target of rapamycin. Might rapamycin be the long-sought pill that could imitate CR without our having to cut down on how much we eat?
It turns out that both a defective TOR and inhibiting TOR with rapamycin can enhance health as well as longevity in a range of organisms, from the simple yeast, to flies, to worms, and to mice. Strikingly, even short courses of rapamycin, or initiating treatment relatively late in the life of mice (equivalent to age sixty in men and women), conferred significant improvements in both health and life span. Rapamycin also delayed the onset of Huntington’s disease in a specially engineered strain of mice, presumably because it increased autophagy and prevented the accumulation of misfolded proteins. This shows that rapamycin not only improves longevity, but may also keep the mice healthier. In fact, the two may be closely related—perhaps the mice in these experiments live longer precisely because they are protected against various disorders of aging.
Though rapamycin is an immunosuppressive drug, it also, counterintuitively, improves some aspects of our immune response. There are two important components of our immune system: one is B cells, a type of white blood cell that churns out antibodies for identifying and then binding to the surfaces of bacteria, viruses, and other foreign invaders, or antigens, so that other foot soldiers in the body’s self-defense corps can race to the crime scene and finish off the culprit. The other is T cells, another type of white blood cell: helper T cells stimulate B cells to manufacture antibodies, while killer T cells, as their name implies, recognize and destroy cells that have been infected by a pathogen. While rapamycin inhibits those parts of the immune system responsible for rejecting grafts of tissue from a donor (such as kidney, bone marrow, or liver transplantation) and triggering inflammation in general, it actually increases the functional quality of certain helper T cells, thus potentially improving a person’s response to vaccines. Another study, from 2009, showed that administering rapamycin in mice rejuvenates aging hematopoietic stem cells, the precursors of the cells of the immune system, and boosts the body’s response to the influenza vaccination.
These results generated a great deal of excitement about rapamycin in the anti-aging community, but before we charge ahead with an immunosuppressive drug as a long-term panacea against aging, a note of caution is warranted. As one might expect, numerous studies have warned that long-term rapamycin use increases the risk of infection, such as with cancer patients. In fact, in that seemingly encouraging 2009 mouse study, treatment with rapamycin had to be paused for two weeks prior to administering the vaccine, the authors acknowledged, to “avoid the possible suppression of the immune response by rapamycin.” It makes one wonder whether the results would have been as promising without the pause to clear away the rapamycin.
Moreover, it is possible that some of the effects of rapamycin and TOR inhibitors are due to a general reduction of inflammation. Yet other research contends that optimal health calls for a fine balance between excessive inflammation and heightened susceptibility to infection. In a recent study, scientists show that TOR inhibitors dramatically increase the susceptibility of zebrafish to pathogenic mycobacteria closely related to the bacteria that cause TB in humans, and point out that this “warrants caution in their use as anti-aging or immune boosting therapies in the many areas of the world with a high burden of TB.”
Still, rapamycin’s draw as a potential wonder drug endures. In some quarters, the excitement has overtaken the data: one prominent aging researcher told me that he knew several scientists who were quietly self-medicating with rapamycin. I asked Michael Hall what he thought about using an immunosuppressive drug to combat aging, and he replied, “I suppose the rapamycin advocates are following Paracelsus’s adage that the poison is in the dose.” He was alluding to the Renaissance Era Swiss physician who defended his use of substances that he believed were medicinal even though they were toxic at higher doses. In fact, most drugs, even relatively safe ones such as aspirin, can be toxic if the dose is high enough. It may well be that low or intermittent doses of rapamycin or other TOR inhibitors can confer most of their benefits without serious risks. But we need long-term studies on their safety and efficacy before they can be used to target aging in humans.
A problem with laboratory animals, including mice, is that they are kept in a highly protected and relatively sterile environment that does not mimic real-life conditions. To address this, Matt Kaeberlein at the University of Washington in Seattle is leading a nationwide US consortium to study the health and longevity of domestic dogs. Canines not only vary greatly in size but also live in environments as diverse as their owners’, so this is a way to conduct controlled studies in a natural setting outside of a laboratory environment. The consortium will analyze various aspects of dogs’ metabolism, including their microbiome and the differences between how large dogs age compared to small dogs. It will also carry out a randomized study on the effect of rapamycin in large middle-aged dogs. Experiments like these will go a long way to establishing whether rapamycin will turn out to be useful for general health in old age.
It is curious that using rapamycin to shut down a major pathway in the cell could actually be beneficial. As is often the case, the answer to this paradox lies in the evolutionary theories of aging discussed earlier. In a 2009 paper published in the journal Aging, Michael Hall, of the University of Basel, and the Russian-born evolutionary biologist Mikhail Blagosklonny suggest an explanation: TOR promotes cell growth, which is essential in early life. Later, however, it is unable to switch itself off even when the growth it drives becomes excessive, leading to cell deterioration and the onset of age-related diseases. They go on to suggest that while these pathways that cause aging cannot be completely switched off by a mutation (because that would be harmful or even lethal early in life), perhaps they can be inhibited by drugs such as rapamycin years later, when an uninhibited TOR becomes a problem after individuals have reached middle age.
This chapter began with how the age-old idea of fasting as a beneficial practice gained credence with scientific studies on caloric restriction. However, the journey to discover a potential drug that could replicate the advantages of restricting calories without requiring unwavering self-control is nothing short of extraordinary. It began with a completely open-ended fishing expedition by Canadian scientists to find something interesting in the soil of the remote island of Rapa Nui. Just one of many soil samples they collected had a bacterium that produced a promising compound, and that nearly died in a scientist’s freezer as he moved from one country to another. The baton was taken up years later by two Americans and an Indian working in Switzerland. None of the scientists involved had any idea that they would be revealing one of the cell’s most important pathways with connections to both cancer and aging. This is often how science works: people follow their curiosity, and one thing leads to another. It is a story of persistence, insight, brilliance, and vision, but also chance encounters and sheer luck. If this strange journey ends up unlocking a key to protecting us from the relentless onslaught of old age, it would indeed be a scientific miracle.
8. Lessons from a Lowly Worm
We all know families of long-lived individuals. But exactly how much do genes influence longevity? A study of 2,700 Danish twins suggested that the heritability of human longevity—a quantitative measure of how much differences in genes account for differences in their ages at death—was only about 25 percent. Further, these genetic factors were thought to be due to the sum of small effects from a large number of genes, and therefore difficult to pinpoint on the level of an individual gene. By the time that the Danish study was carried out in 1996, a lowly worm was already helping to overturn that idea.
That lowly worm was the soil nematode Caenorhabditis elegans, introduced into modern biology by Sydney Brenner, a giant of the field known for his caustic wit. Born and initially educated in South Africa, he spent much of his productive life in Cambridge, England, before he established labs all over the world from California to Singapore, leading some of us to remark that the sun never set on the Brenner Empire. He first became famous for having discovered mRNA. More generally, he worked closely with Francis Crick on the nature of the genetic code and how it was read to make proteins. Once he and Crick decided that they’d solved that fundamental problem, Brenner turned his attention to investigating how a complex animal develops from a single cell, and how the brain and its nervous system work.
Brenner identified C. elegans as an ideal organism to study because it could be grown easily, had a relatively short generation time, and was transparent, so you could see the cells that made up the worm. He trained a number of scientists at the MRC Laboratory of Molecular Biology in Cambridge and spawned an entire worldwide community of researchers studying C. elegans for everything from development to behavior. Among his colleagues was biologist John Sulston, whom you met in chapter 5. One of Sulston’s more remarkable projects was to painstakingly trace the lineage of each of the roughly 900 cells in the mature worm all the way from the single original cell, which led to an unexpected discovery: certain cells are programmed to die at precise stages of development. Scientists went on to identify the genes that sent these cells to commit suicide at just the right time in order for the organism to develop.
For an animal with only 900 cells, these worms are incredibly complex. They have some of the same organs as larger animals but in simpler form: a mouth, an intestine, muscles, and a brain and nervous system. They don’t have a circulatory or respiratory system. Though tiny—only about a millimeter long—nematodes can easily be seen wriggling around under a microscope. Being hermaphrodites, they produce both sperm and egg, but C. elegans can also reproduce asexually under some conditions. They are normally social, but scientists have found mutations that make them antisocial. Worms feed on bacteria, and just like bacteria, they are cultivated in petri dishes in the lab. They can be frozen away indefinitely in small vials in liquid nitrogen and simply thawed and revived when needed.
Worms typically live for a couple of weeks. However, when faced with starvation, they can go into a dormant state called dauer (related to the German word for endurance), in which they can survive for up to two months before reemerging when nutrients are plentiful again. Relative to humans’ life span, this would be the equivalent of 300 years. Somehow these worms have managed to suspend the normal process of aging. There is a caveat, though: only juvenile worms can enter the dauer state. Once animals go through puberty and become adults, they no longer have this option.
David Hirsh became interested in C. elegans while he was a research fellow under Brenner in Cambridge, then continued working with the worms upon joining the faculty at the University of Colorado. There he took on a postdoc named Michael Klass, who wanted to focus on aging. This was at a time when aging was simply thought to be a normal and inevitable process of wear and tear, and mainstream biologists viewed aging research with some disdain. However, things were beginning to change, partly because the US government was concerned about an aging population. As Hirsh recalled, the National Institutes of Health had just established the National Institute on Aging, and at least some of his and Klass’s motivation for working in the area was that they knew they stood a good chance of receiving federal funding.
Hirsh and Klass first showed that, by many criteria, worms age little if at all in the dauer state. Next, Klass wanted to see if he could isolate mutants of worms that would live longer but not necessarily go into dormancy. This would help him identify genes that affected life span. To rapidly produce mutants that he could screen for longevity, he treated the nematodes with mutagenic chemicals. He ended up with thousands of plates of worms, which he continued studying after starting his own lab in Texas. In 1983 Klass published a paper about a few long-lived mutant nematodes, but eventually he shut down his lab and joined Abbott Laboratories near Chicago. Before doing so, however, he sent a frozen batch of his mutant worms to a former colleague from Colorado, Tom Johnson, who by then was at the University of California, Irvine.
By inbreeding some of the mutant worms, Johnson found that their mean life span varied from ten to thirty-one days, from which he deduced that, at least in worms, life span involved a substantial genetic component. It still wasn’t clear how many genes affected life span, but in 1988 Johnson, working with an enthusiastic undergraduate student named David Friedman, came to a striking conclusion that ran completely counter to the conventional wisdom that many genes, each making small contributions, influenced longevity. Instead, it turned out that a mutation in a single gene, which the two called age-1, conferred a longer life span. Johnson went on to show that worms with the age-1 mutation had lower mortality at all ages, while their maximum life span was more than double that of normal worms. Maximum life span, defined as the life span of the top 10 percent of the population, is considered a better measure of aging effects because mean life span can be affected by all sorts of other factors that don’t necessarily have to do with aging, such as environmental hazards and resistance to diseases.
At the time, Tom Johnson was not a famous scientist, and his premise that a single gene could affect aging to such a degree defied the consensus view. Thus it took almost two years for his paper to be published. Even after it finally appeared in the prestigious journal Science in 1990, Johnson’s work was viewed with some skepticism by the scientific community.
But then, a few years later, came a second mutant worm. This effort was led by Cynthia Kenyon, already a rising star in the C. elegans field. Kenyon had a golden career: PhD from MIT; postdoctoral work with Sydney Brenner at the MRC Laboratory of Molecular Biology in Cambridge, where the first studies on the genetics of the worm were being carried out; faculty member at the University of California, San Francisco, another world-renowned center for molecular biology and medicine. Kenyon had established herself as a leader in the worm’s pattern development, which is the process by which it lays down its body plan as it grows. She was interested in aging research, but since it was still an unfashionable discipline, she found it difficult to enlist students to work on the problem. After hearing Tom Johnson speak about his work on age-1 at a meeting in Lake Arrowhead just outside Los Angeles, though, she felt inspired to work on the problem of aging and began her own screening for new mutants.
Like Hirsh, Klass, and Johnson, Kenyon focused on dauer formation. In the previous decade, scientists had identified many genes that affected dauer formation, usually prefixed by the letters daf. Scientists traditionally italicize the names of genes; when not italicized, the letters refer to the proteins that the genes encode. Under normal conditions, these mutations would predispose worms to enter the dauer state. But Kenyon had a hunch that some of these genes would affect longevity even outside the dauer state. She employed a trick in which she used mutant worms that were temperature sensitive: they would not enter the dormant state at a lower temperature (68°F, or 20°C). They were allowed to develop at this lower temperature until they were no longer juveniles and dauer formation was no longer an option. At that point, they were shifted to a higher temperature of 77°F (25°C) and allowed to mature into adulthood so that their life span could be measured.
From these studies, Kenyon and her colleagues identified a mutation in a gene, daf-2, that lived twice as long as the average worm. In marked contrast to the skepticism Johnson faced, Kenyon had no trouble publishing her work: her 1993 paper in Nature was received with great fanfare. Apart from her stellar academic pedigree and scientific abilities, Kenyon was also lucid and charismatic, so she was extolled by the media. In an unfortunate omission, neither Kenyon’s paper nor the accompanying commentary mentioned Johnson’s earlier work on age-1, and much of the reporting of Kenyon’s work gave the impression that it was the first time that a mutation that extends longevity had been discovered.
At this point, nobody had any real idea of what the genes identified by Johnson and Kenyon actually did. Enter Gary Ruvkun. Today Ruvkun is most famous for discovering how small RNA molecules called microRNAs regulate gene expression, but he has led a varied and colorful life, both personally and scientifically. When I met him about ten years ago at a meeting in Crete, he became increasingly gregarious after a few drinks; at one point, he donned a bandanna and pretended to smoke a cigarette while pouring himself some strong Greek liquor, which, with his luxuriant but well-tended mustache, made him look like a sailor on shore leave in a Greek taverna. All the while, he incongruously continued to hold forth on RNA biology. In the mid-1990s he too was using the worm and had been studying dauer mutants, including daf-2, for reasons unconnected with aging. Apparently he did not hold the field in high regard, because he recollected that when Kenyon’s report came out, “I thought, ‘Oh, gosh, now I’m in aging research.’ Your IQ halves every year you’re in it.”
The big breakthrough came when Ruvkun isolated and sequenced the daf-2 gene. It coded for a receptor that sticks out of the cell’s surface and responds to a molecule very similar to insulin: IGF-1 (insulin-like growth factor). Both insulin and IGF-1 are hormones that bind to their receptors in the cell. Both receptors are also kinases that activate downstream molecules, which in turn affect metabolic pathways that play a role in longevity. These hormones or their counterparts exist in nearly all organisms, so they must have originated very early in the evolution of life. That these ancient hormones control aging was a stunning finding.
These discoveries led to a general understanding of how this pathway would work. IGF-1 binds to the daf-2 receptor, which is a kinase, and activates it. This sets off a cascade of events in which one kinase acts upon another until a protein called daf-16 is phosphorylated. It’s basically the domino effect. The last domino in the chain, daf-16, is a transcription factor, so its role is to turn on genes. When it is phosphorylated, it cannot be transported to the nucleus, where the genes reside on the chromosomes, so it cannot act on its target genes. But if we disrupt the pathway—for example, by mutations in any of the proteins in this cascade—daf-16 can move into the nucleus and turn on a large number of genes that help the worm survive in the dauer state during stress or starvation, thus extending its life span. As it turns out, the age-1 gene originally identified by Tom Johnson is somewhere in the middle of the cascade that starts with daf-2 and ends in daf-16.
Daf-16 turns on genes that are involved in coping with stress triggered by starvation or increased temperature, as well as genes that code for the chaperones that help proteins fold or rescue unfolded or misfolded proteins before they become a problem for the cell. Kenyon wrote in a 2010 review that these genes “constitute a treasure trove of discovery for the future.” The pathway explained a puzzling paradox. Aging or longevity was thought to be the effect of a large number of genes, each of which would have a small effect. How could a mutation in a single gene, such as age-1 or daf-2, effectively double the life span of the worm? Clearly the reason was that they were part of a cascade that ended up activating daf-16, which then turned on multiple genes that collectively exerted a cumulative effect on life span.
The idea that a growth hormone pathway might be involved in longevity also explains a curious fact. Larger species generally live longer than smaller ones because they have slower metabolisms and can also escape predation. But within species, smaller breeds generally live longer than larger ones. For example, small dogs can live twice as long as large dogs. This may have to do partly with how much growth hormone they make.
Remember that queen ants live many times longer than worker ants. Among the many reasons for this is that queens produce a protein that binds insulin-like molecules and shuts down the IGF-like pathways in ants.
But what of quality of life? Are these long-lived worms sickly and barely surviving? In a word, no. The nematodes don’t just live longer, they look and act like much younger worms. We all know that one of the horrors of aging is the onset of Alzheimer’s disease. Researchers can generate a model for Alzheimer’s disease by making a genetic strain of worms that manufactures amyloid-beta protein in their muscle cells, paralyzing them. However, if the experiment is repeated—but this time using a strain of long-lived worms with mutations in the IGF-1 pathway—paralysis is reduced or delayed. Thus, the same mutations that extend life may also protect you from Alzheimer’s and other age-related diseases that are caused by proteins misfolding and forming tangles. In fact, these mutations may prolong life precisely because they protect against some of the scourges of old age.
It is all very well to make worms live longer and healthier, but what about other species? Evidence elsewhere in the animal kingdom suggests similarly a strong relationship between the IGF-1 pathway and life span. Deleting the gene that codes for a protein called CHICO, which activates the IGF-1 pathway in flies, made them live 40–50 percent longer. They were significantly smaller but seemed healthy otherwise. The IGF-1 receptor is essential, but mice, like humans, have two copies of it (from their maternal and paternal chromosomes), and knocking out one of them made the mice live longer without any noticeable ill effects.
Scientists, of course, are not doing all this work to help mice. We want to know what happens in humans, but you can’t just mutagenize people. There are people who naturally have mutations in the insulin receptor. Some of them suffer from a disease called leprechaunism, which stunts growth, and seldom reach adulthood. An analysis of subjects with the disease showed that the same mutations in daf-2 would affect dauer formation in the worm, yet the consequences were rather different. Still, there are hints that this pathway plays a role in human longevity. Mutations known to impair IGF-1 function are overrepresented in a study of Ashkenazi Jewish centenarians, and variants in the insulin receptor gene are linked to longevity in a Japanese group. Variants in proteins identified as part of the IGF-1 cascade have also been associated with longevity. It may be tempting to see the IGF-1 and insulin pathway as a straightforward route to tackling aging. But just the complexity of the pathway and the range of effects it produces tells us it is a finely tuned system, and tinkering with it while avoiding unforeseen ill effects could be difficult.
When food intake is restricted, the levels of both IGF-1 and insulin decline. If the IGF-1 pathway is inhibited already, you might not expect caloric restriction to have much additional effect. Exactly as you might predict, caloric restriction did not further increase the life span of daf-2 mutant worms; moreover, its full effect depended on daf-16. But this too is puzzling, because the other, completely different TOR pathway is also affected by caloric restriction. So even if the IGF-1 pathway was disrupted, shouldn’t caloric restriction have had at least some effect through the TOR pathway? It turns out that these two pathways are not completely independent. They are two large hubs in a large network, but there is lots of cross talk between them. In other words, proteins that are activated as part of one pathway will activate ones in the other pathway, so they are interconnected. In particular, TOR is activated by elements of the IGF-1 pathway as well as by nutrient sensing.
While the two pathways are highly coordinated, they are not the whole story behind caloric restriction. Two scientists found a mutant that causes partial starvation of the worm by disrupting its feeding organ, the equivalent of the throat. The mutant, eat-1, lengthens life span by up to 50 percent and does not require the activity of daf-16. Also, double mutants of daf-2 and eat-1 live even longer than the daf-2 mutants alone. This means that caloric restriction affects other pathways besides TOR and IGF-1.
Mutations that affect longevity dramatically might seem to suggest that aging is under the control of a genetic program. This idea might seem to contradict evolutionary theories of aging, but, in fact, it doesn’t. When worms were subjected to alternative cycles of food and scarcity, it turned out that the long-lived mutant worms simply could not compete reproductively with shorter-lived, wild-type worms. These pathways allow organisms to have more offspring at the cost of shortening life later on, exactly as one might predict from the antagonistic pleiotropy or disposable soma theories of the evolution of aging.
We have seen what rapamycin can do, but is there a drug that acts elsewhere, such as on the IGF-1 pathway? There is a great deal of interest in metformin, a diabetes treatment. Diabetes, of course, is related to deficient insulin secretion or regulation rather than to IGF-1, although the two molecules are closely related. To understand the difference between these two hormones, I took a short walk from my own lab to the nearby Wellcome-MRC Institute of Metabolic Science on the Addenbrooke’s Biomedical Campus in Cambridge, England, to meet Steve O’Rahilly, one of the world’s experts on insulin metabolism and its consequences for diabetes and obesity.
Despite his many distinctions and his job as the director of a major institute, Steve lacks even a hint of self-importance. He is a jolly man who in his talks often jokes that his physique makes him particularly qualified to study obesity and its causes; while far from obese, he certainly looks well fed. But underneath the jovial demeanor, he is a sharp and critical scientist who has advanced a messy field by imbuing it with intellectual rigor. Among his many contributions is demonstrating the importance of appetite genes in obesity. Here too Steve has a highly personal interest: he told me that appetite can be such a strong urge that when he is hungry, he can hardly concentrate on anything besides food.
Steve pointed out that while insulin and IGF-1 are similar in structure and have similar effects when they act on the cell, they have some major differences. Insulin has to act very quickly and in just the right amounts. Getting insulin regulation wrong can be lethal. The brain needs glucose for fuel, so hypoglycemia, a drop in blood sugar caused by too much insulin in the circulation, is very dangerous even if it only lasts a few minutes.
Insulin receptors are particularly abundant in liver, muscle, and fat cells. In the fasting state, insulin levels are relatively low, and the liver produces the glucose needed constantly by the brain from stored carbohydrates and other sources. But even that low level of insulin is needed to prevent the liver from making too much glucose or ketone bodies (a product of metabolizing fat). After a meal, the level of insulin surges by between ten- and fifty-fold, promoting the uptake of glucose into muscle cells, the synthesis of lipids (fat) in the liver, and the storage of lipid in fat cells.
Newly secreted insulin does not last long in the bloodstream, with a half-life of only about four minutes. If insulin is like a speedboat racing to its destination, IGF-1 is more like an oil tanker. Its effect lasts much longer, and, in the circulation, it is often bound to other proteins and not active. It needs to be released from them to act, and exactly how this happens is not clear, but that too may be under hormonal control. Also, unlike insulin receptors, IGF-1 receptors are distributed much more broadly throughout all the cells in the body, and there are more of them during development, when the organism has to grow.
IGF-1 is produced in response to the secretion of growth hormone, but its action controls the amount of growth hormone in a complicated feedback loop. When IGF-1 levels are low or IGF-1 is defective, the body responds by producing more growth hormone. The problem is that growth hormone has other effects apart from stimulating the production of IGF-1. Most notably, it releases fat from fat cells. Not storing away fat in these cells is the cause of much human pathology, such as clogged arteries, or messing up the metabolism in our liver and muscle. So it is not surprising that mutations in the receptor for insulin or IGF-1 can cause diabetes. On the other hand, with caloric restriction, you are consuming the bare minimum of calories. So you actually have less spare fat because you are burning it off to provide energy. This means that caloric restriction does not have the same consequences as simply reducing the level of IGF-1, where excess fat is released to cause damage. Because of this fundamental difference, drugs that try to mimic caloric restriction by acting on the IGF-1 pathway could be particularly challenging to develop. It is hard to cheat our bodies’ finely tuned system.
That is what explains the current interest in metformin. The drug is already used by millions of people with diabetes all over the world, so it has gone through various clinical trials for safety. Its use, in fact, dates all the way back to medieval Europe, where extracts of the plant Galega officinalis, commonly known as French lilac or goat’s rue, were used to relieve the symptoms of diabetes. One of the products of the extract, galegine, could lower blood glucose but was too toxic. Eventually a derivative, metformin, was synthesized and tested and is now the first-line treatment for type 2 diabetes, which is more common later in life and is caused not by a lack of insulin but because the insulin doesn’t bind well to its receptor.
How metformin works as a treatment for type 2 diabetes is not entirely clear. Traditionally, most charts of metformin interactions resemble an incredibly complicated wiring diagram. Because of recent advances in our ability to visualize biological molecules, we can now see exactly how metformin binds and inhibits its target protein. This target protein is a crucial component in the process of respiration, in which oxygen is used to burn glucose to produce energy in our cells. Disrupting our ability to utilize glucose in turn affects our energy metabolism and acts on components of the IGF pathway, including an enzyme that regulates glucose uptake. Although some studies have claimed that metformin reduces glucose production in the liver, others show that it actually increases it in healthy people and those with mild diabetes. According to another study, the drug alters our gut microbiome in a way that is at least partly responsible for its effects. Steve O’Rahilly’s work demonstrates that metformin also works by elevating the levels of a hormone that suppresses appetite.
It may seem odd that a drug whose mode of action is so complex and poorly understood should be so widely prescribed for people with diabetes, but this is often the case in medicine. For almost a hundred years, we had no idea how aspirin worked, yet people consumed billions of tablets for their aches and pains. Still, given the uncertainties, it is rather surprising that metformin has now become interesting as a potential drug to combat aging. This is partly because of a couple of early studies. In the first, from the National Institute on Aging, long-term treatment with metformin in mice improved both their health and life span. A second study, in humans, showed that diabetics on metformin lived longer not only than diabetics on other drugs but also longer than nondiabetics—a significant finding, since diabetes itself is a risk factor for aging and death.
Such promising outcomes certainly raised optimism about using metformin to prolong healthy life even in people without diabetes, but subsequent studies have questioned these results. One, from 2016, concluded that metformin was merely better than other diabetes drugs, so that diabetics on metformin had about the same survival rate as the general population. More than metformin, it was the family of cholesterol-lowering medications known as statins that dramatically reduced mortality, especially in patients with a history of cardiovascular disease. Metformin did extend the life of worms if treatment was initiated at a young age, but it was highly toxic and actually shortened life span when treatment commenced at an older age. Curiously, some of the toxicity was alleviated by giving the worms rapamycin at the same time. Metformin also undermined the health benefits of exercise, which itself is well established as one of the best remedies against diseases of aging. And one study claimed that diabetics on metformin exhibited an increased risk of dementia, including Alzheimer’s disease.
Given these uncertainties, Nir Barzilai, a gerontologist at Einstein College of Medicine in New York, is the principal investigator for a large clinical trial of about three thousand volunteers between the ages of sixty-five and seventy-nine called Targeting Aging with Metformin (TAME). The study’s goal is to see if metformin delays the onset of age-related chronic diseases such as heart disease, cancer, and dementia, as well as monitor for adverse side effects.
To date, however, despite considerable effort, the evidence for metformin concerning longevity is not at all clear. Its effect isn’t nearly as strong or as well established as that of rapamycin, which inhibits the TOR pathway. One reason for the interest in metformin is that its long-term safety has been established in diabetics. Those with diabetes will be perfectly happy to take metformin, as their risk of poor health and eventually dying of complications of diabetes is much higher without treatment. But given the potential drawbacks noted here, it is quite a different matter to recommend its long-term use in healthy adults just yet.
WE HAVE COME A LONG way from the age-old idea that exerting self-control over one’s diet is good for you and that gluttony comes at a steep price to our health. First there was the scientific evidence that caloric restriction could prolong healthy life compared to an ad libitum diet. Then in the last few decades, two previously unknown pathways, the TOR and the IGF-1, were shown to be major processes in the cell that responded to caloric restriction. This in turn has opened up the possibility of extending healthy living and even life span by tinkering with these pathways. The world of medical science has compiled a tremendous amount of research regarding the effects of rapamycin, metformin, and related compounds on aging and life span; rapamycin and its chemical analogs are among the more promising avenues for tackling aging. Still, bear in mind that inhibiting these pathways individually is not the same as caloric restriction, and a lot more work needs to be done to establish both the efficacy and safety of these approaches.
Several things strike me about the discovery of TOR and the IGF-1 pathways. First, the mere existence of these pathways came as a complete surprise. Second, at least in the case of TOR, scientists were not even looking originally for a connection with caloric restriction, let alone aging. By sheer chance, they uncovered major processes in the cell that have ramifications not only for aging but also for many diseases. Third, they involved organisms that might not seem obvious for studying aging, such as yeast and worms. Finally, the discovery that a single gene could impact life span so dramatically was quite unexpected.
Before we leave the complicated maze of caloric restriction and its pathways, let us visit a third strand that, like the story of TOR, begins with baker’s yeast. Unlike the discoverers of TOR, who were not even investigating anything pertaining to the aging process, this story is about scientists who deliberately used yeast to discover genes related to aging. A yeast cell divides by budding off smaller daughter cells. The mother cell acquires scars on its surface with each budding and can only undergo a finite number of divisions. This inability to divide further is called replicative aging. Still, you might not think that studying this rather specialized property of a single-celled organism such as yeast would have any relevance at all for a phenomenon as complex as human aging. That was exactly the skepticism that Leonard Guarente encountered from his colleagues at MIT when he said he was planning to tackle aging using yeast.
Like many molecular biologists, Guarente had relied on yeast to study how genes are turned on and off by controlling the transcription of DNA into mRNA. By 1991, three years after Johnson’s report on the long-lived age-1 mutant in worms, Guarente was a tenured faculty member at MIT. He was already established and professionally secure, so when two of his students, Brian Kennedy and Nicanor Austriaco, told him they wanted to work on aging, Guarente agreed to embark on what for him was an entirely new area, dramatically altering the trajectory of his career.
Initially, Guarente and his students identified a trio of genes belonging to a family called SIR genes, for silent information regulator. The SIR family in turn controls genes that define the mating type or “sex” of yeast. (Yeast mating is complicated, and they can switch their “sex” from one type to another.) Eventually Guarente’s team showed that just one of these genes, Sir2, had the biggest effect on yeast life span. Increasing the amount of Sir2 in cells extended life span, while mutating it reduced life span. The effect was not as large as the factor of 2 seen for the age-1 or daf-2 mutants in worms. But they had clearly identified a gene in yeast that controlled how many times a mother cell could divide before it was exhausted. Even more promising, Sir2 was a highly conserved gene: it had counterparts in other species, including flies, worms, and humans. They soon found, with mounting excitement, that increasing the amount of Sir2 in flies and worms also extended their lives.
But how did it work? Recall that our genome can be recoded using epigenetic marks—chemical tags—on either the DNA itself or on the histone proteins tightly associated with it. In general, adding acetyl groups to histones activates those regions of chromatin, whereas removing acetyl groups silences them. Sir2 turns out to be a deacetylase, which you might recall are enzymes that remove acetyl groups from proteins such as histones, and there is evidence that this activity silences genes near the boundary of telomeres and affects life span. Sir2 also requires a molecule called nicotinamide adenine dinucleotide (NAD), which is required for metabolizing energy in the cell. This was a hint that when there is starvation, there is not enough free NAD to activate Sir2. Suddenly you could make a plausible link between Sir2 and caloric restriction, which had long been implicated in aging in many organisms, including yeast. Sure enough, in both flies and yeast, mutation of Sir2 eliminated the benefits of caloric restriction in prolonging life, and, in worms, the effect of Sir2 required the presence of daf-16, the same transcription factor that had already been identified as the target of the IGF-1 pathway in worms. Suddenly things appeared to come together: a mutant affecting life span in yeast was associated with a pathway affecting aging in worms that in turn was connected with caloric restriction.
Finding mutants that increased longevity in both worms and yeast prompted Guarente and Kenyon to publish a highly enthusiastic article in the journal Nature extolling the prospects of curing the aging problem. “When single genes are changed,” they wrote, “animals that should be old stay young. In humans, these mutants would be analogous to a ninety-year-old who looks and feels forty-five. On this basis, we begin to think of ageing as a disease that can be cured, or at least postponed.” They went on to found a company in Cambridge, Massachusetts, with the equally optimistic name Elixir Pharmaceuticals.
Not long after Guarente had made his initial breakthrough, he gave a talk in Sydney, Australia. In the audience sat David Sinclair, a brash young graduate student working on his PhD at the University of New South Wales. Sinclair was clearly both impressed and excited by Guarente’s results because he persuaded the latter to take him on as a postdoctoral fellow at MIT. Following his fellowship, Sinclair started his own lab at Harvard Medical School, across the river in Boston, and continued to work on Sir2 and aging, in effect becoming a competitor of his former mentor. Next, Sinclair started his own company, bearing the more descriptive and modest name of Sirtris Pharmaceuticals.
By then, researchers were keen to see if the counterpart of Sir2 in humans and other mammals would have similarly beneficial effects on life span and health. In mammals, there are seven members of this family, numbered SIRT1 through SIRT7. These proteins, like the equivalents of Sir2 in other organisms, were collectively called sirtuins. (Proteins that activate other proteins are often given names ending in in; sirtuins is simply a play on “Sir2-ins.” SIRT1 seemed the most similar to Sir2, so it drew the bulk of early attention. The goal was to find a pill—or magic elixir—that would activate sirtuins in some beneficial way.
Here the story takes a rather strange, and rather French, turn. It has long been speculated that the French have a relatively low prevalence of heart disease despite their rich diet because they also drink copious quantities of red wine. Sinclair, collaborating with a biotech company in Boston, identified resveratrol as one of the compounds that stimulated SIRT1. Oenophiles around the world rejoiced, for resveratrol was a compound present in red wine. Finally, here was scientific evidence for the benefits of a French lifestyle. Their enthusiasm was apparently not tempered by the realization that it would take about a thousand bottles of wine to produce the amount of resveratrol used as a dose in those studies.
Sinclair’s team and a competing group appeared to clinch the issue when they administered resveratrol to mice fed a diet high in sugar and fat. Although the mice remained overweight, and their maximum life span was unaffected, they were protected against the diseases of overeating: more of them survived to old age, and their organs were not diseased like those in typically obese mice.
This seemed exactly the Get Out of Kale Free card people were waiting for: permission to overindulge on an unhealthy diet without any ill effects. Never shy when it came to self-promotion, Sinclair was all over the news again when the pharmaceutical giant GlaxoSmithKline bought Sirtris for an astonishing $720 million in 2008. He had hit both the scientific and commercial jackpots—or so it seemed. But even at the time, there was considerable skepticism in the industry about the purchase.
There has been significant pushback against the claims made by sirtuin advocates, some of it coming, oddly enough, from two of Sinclair’s former colleagues in the Guarente lab: Brian Kennedy and Matt Kaeberlein. Among other things, their work showed that contrary to earlier findings, caloric restriction results in an even greater life span extension in yeast cells lacking Sir2, suggesting that the two were not likely to be linked. Rather, Sir2 may have been acting in other ways by modifying the program of gene expression by deacetylating histones on DNA. The two went on to reveal that the activity of resveratrol on SIRT1 was due to the presence of a fluorescent molecule that was used to detect the activation. Without this additional molecule, no increase in activity was observed, so it was not even clear whether resveratrol had any effect on SIRT1. Not only that, but they did not find any effect of resveratrol on Sir2 activity in yeast, including life span. Pharmaceutical companies do not usually spend time proving one another wrong, but in an unusual step, scientists at Pfizer published a report stating that several of the other compounds identified by Sirtris did not directly activate SIRT1 either.
With any machinery, it is much easier to do something that will stop it from working than to improve its performance. It is the same with drug development; many drugs work by inhibiting an enzyme, and manufacturing a new drug that makes an enzyme more effective is always a challenge and relatively rare. So Glaxo’s very expensive purchase of Sirtris raised eyebrows in the industry. Eventually it gave up on the lead compounds it had acquired from Sirtris and shut down the division. Five years after the sale, an article in Forbes magazine concluded that the best way to experience the benefits of red wine was to drink it in moderation.
Of course, following the dictum of the German theoretical physicist Max Planck that scientists rarely change their minds in light of contradictory evidence, Sinclair and others stuck to their guns. They countered the new findings by reporting that resveratrol worked alongside other helper compounds in the cell that had properties similar to the fluorescent molecules they had used to monitor Sir2 activity in the test tube. This led to another commentary, this time in the journal Science, titled, “Red Wine, Toast of the Town (Again).”
However, this optimistic assessment must be weighed against a systematic 2013 study by the National Institute on Aging that evaluated several compounds proposed to increase healthy life or overall life span, including resveratrol. None of them had any significant effect on the longevity of mice. Among the others were curcumin, which is present in the herb turmeric, and green tea extract—not that these findings seem to have put many health food stores out of business.
Beyond resveratrol, skeptics began to question the very premise of the sirtuin idea. Sir2 extends replicative life span, but losing the ability to keep reproducing is only one kind of aging in yeast. There is also chronological life span, which measures how long yeast can survive in a semi-dormant state—for example, when it has run out of nutrients. Sir2 activation actually reduces chronological life span in yeast. We humans—with the exception, perhaps, of a few very rich old men—are not mainly concerned with our ability to reproduce in old age, but with increasing life span and improving health.
Later studies also contradicted some of the early studies about the effect of Sir2 on life span. If you ascribe an effect to a mutation, you need to take care that in creating the mutant strain, you have not changed any of the thousands of other genes in the organism. Scientists clarified that overproduction of Sir2 in worms and flies had no effect on the life span of either worms or flies as long as they did not change anything else about the genetic makeup of their organisms. This considerably deflated enthusiasm for sirtuins as a potential boon to extending life, as illustrated by journal articles titled “Midlife Crisis for Sirtuins” and “Ageing: Longevity Hits a Roadblock.” Feeling embattled, Leonard Guarente repeated the experiment in worms by overproducing Sir2 without changing the genetic background, and had to revise his previous estimate of an up to 50 percent increase in life span down to about 15 percent.
The sirtuin with the most dramatic effect may actually turn out to be SIRT6; mice deficient in SIRT6 develop severe abnormalities within two to three weeks and die in about four weeks. The protein is also a histone deacetylase that may affect how genes are expressed in telomeric chromatin, and some studies suggest that it increases life span in mice, with one study theorizing it does so because it stimulates DNA repair.
It is telling that two of the pioneers of sirtuins in Guarente’s own lab, Kennedy and Kaeberlein, both well-established, respected researchers in their own right, have now entirely moved away from sirtuins to focus on other aspects of aging research such as the TOR pathway and how rapamycin affects it. Sirtuins, through their action on histones, may be involved in patterns of gene expression and genome stability, and are important for human physiology in ways that still need to be understood. But enthusiasm for their use in aging has declined except among the faithful. Many in the gerontology community are highly dubious that they have any direct connection with caloric restriction or extension of life span.
There is one related molecule that has retained considerable prominence regardless of the fate of sirtuins: NAD. Nicotinamide adenine dinucleotide plays many essential roles in the cell, including for sirtuin function. It is made by the body using nicotinic acid (niacin) or nicotinamide, both slightly different forms of vitamin B3, although it can also be made by our cells from the amino acid tryptophan or by salvaging some recycled molecules.
In the cell, NAD cycles between an oxidized and reduced form to help our cells burn glucose to convert it into other forms of energy. This process, called respiration, is absolutely essential for our ability to use glucose as a fuel; however, it does not use up NAD rapidly, since it simply cycles back and forth between its two forms. But NAD performs other essential functions, such as repairing DNA and altering gene expression through sirtuins, and these functions deplete it. Thus, as we grow older, our levels of NAD decline. The brain is one of the body’s biggest consumers of glucose as a source of energy, and you can imagine how a decline in NAD levels might harm brain function. It can also cause a host of other problems, from increased inflammation to neurodegeneration. If that seems a lot for a single molecule, it simply says something about how central NAD is to our metabolism.
Our cells can’t take up NAD directly from our diet. But we can utilize molecules that are direct precursors of NAD, of which two popular ones are called NR (nicotinamide riboside) and NMN (nicotine mononucleotide). Search for them on the internet, and you will find countless websites arguing that one or the other is better as an anti-aging supplement depending on which one they are selling. According to one study, increasing NAD levels by providing NR or NMN to mice slowed their loss of stem cells and protected them from muscle degeneration and other symptoms of decline; in another report, higher NAD levels led to an increase in life span. However, since NAD is so central to the chemistry of life, it may have benefits that have nothing to do with an increase in life span. Indeed, Charles Brenner, a longtime expert on NAD metabolism, says, “I expressly tell people NR is not a life extension drug and that the case for its use has nothing to do with sirtuins and everything to do with acute or chronic losses of redox [reduction/oxidation reactions involved in respiration] and repair functions in the conditions that attack the NAD system. The NR trial I am most interested in is promoting healing from scratches and burns.” The results of taking either NR or NMN in humans are not yet definitive, and so far there have been no long-term studies in humans on their benefits or side effects. However, this has not stopped them from being heavily marketed as anti-aging nutraceuticals, or dietary supplements with real or alleged physiological benefits that don’t require approval from agencies like the FDA. Global sales of NMN register about $280 million annually and are forecast to reach almost $1 billion by 2028.
We have seen how our cells orchestrate a finely tuned protein production program—and how this program starts to wobble as we age. A simple corrective—restricting our calories and eating well—can do much to slow this deterioration through complex interconnected pathways. Much excitement in aging research is about the prospect of producing drugs that inhibit these pathways and produce the benefits of caloric restriction.
The cell, though, is not merely a bag of proteins. It contains large structures and entire organelles that must work together in harmony. When and why those relationships break down is a topic at the forefront of aging research. And it all comes back, strangely enough, to an ancient parasite. We normally think of parasites as harmful, but this one was a mixed blessing. On the one hand, it enabled us to evolve from small unicellular organisms into the complex creatures we are today. On the other hand, it is also a major reason why we age.
9. The Stowaway Within Us
A couple of times a year, I visit my ten-year-old grandson in New York and experience something that must be familiar to all grandparents. Although I am physically fit for my age, I am exhausted after spending a day with him. How does he have such boundless energy that just watching him makes me tired? One reason I lack his energy also explains why we both exist as complex creatures, and it dates back to an event that occurred about 2 billion years ago.
The earliest life forms were single-celled creatures swimming around in a primordial soup. How did they become us? Each cell in our body is much larger and more complex than a typical bacterium, so even how just one of these complex cells evolved was a mystery. In the early 1900s a Russian botanist named Konstantin Mereschkowski proposed that one cell swallowed up another simpler, smaller cell. On its own, this was not remarkable; normally, either the smaller cell was killed and digested, or the cell doing the swallowing bit off more than it could chew and perished from the indigestion. But in one such case, Mereschkowski proposed, the swallower and swallowed both survived—and have continued to coexist and replicate ever since.
The theory hung around for decades but really gained credence in the 1960s when a biologist named Lynn Margulis began working on the idea. Margulis was an iconoclast. She was married to the astronomer Carl Sagan before marrying Thomas Margulis, a chemist, whom she also soon divorced, and is quoted as saying, “I quit my job as a wife twice. It’s not humanly possible to be a good wife, a good mother, and a first-class scientist. No one can do it—something has to go.” One of her more controversial theories is the Gaia hypothesis she proposed with scientist James Lovelock, which states that the entire biosphere—the Earth, its atmosphere, geology, and all the life forms that inhabit it—is a self-regulating, living organism. She also had more extreme, and troubling, views. Margulis wrote an essay suggesting that the 9/11 attacks on the World Trade Center were part of a conspiracy orchestrated by the US government, and questioned whether the human immunodeficiency virus (HIV) was really the cause of acquired immunodeficiency syndrome, or AIDS. Her view of herself as a maverick may have attracted her to conspiracy theories, but this attitude also allowed her to make a major contribution to our understanding of life.
Margulis believed that symbiosis was widespread and that eukaryotes—more complex cells that have a nucleus—evolved as a result of symbiotic relationships among bacteria. At the time, the dogma was that simpler bacteria evolved slowly into more complex forms of cells. You could think of Margulis’s idea as an extension of the one Mereschkowski had proposed almost six decades earlier, but it was still sufficiently controversial that her work was rejected by fifteen academic journals before being published in 1967 by the Journal of Theoretical Biology (under the byline Lynn Sagan). Margulis proposed that the descendants of the bacteria that were swallowed up now exist as organelles in the larger cell. In animal cells, we know these as mitochondria. In addition to mitochondria, plants have another bacterial descendant inside them: chloroplasts, which turn sunlight into sugar through photosynthesis. Neither we nor plants can exist without these stowaways inside us.
Today scientists believe that the key event that led to the formation of eukaryotes occurred about 2 billion years ago, when a single-cell organism called an archaeon swallowed a smaller bacterium. Against the odds, the bacterium survived, and eventually entered into a symbiotic relationship with its archaeon host. In the intervening 2 billion years, the bacterium evolved into mitochondria. In the 170 years since mitochondria were first discovered, scientists have learned that they are highly specialized centers of energy production in the cell. It is that ability to generate energy that allowed our primitive ancestor to evolve into today’s huge and complex variety of cells and spurred the growth of complex life forms. But we also know that energy is conserved and cannot be created out of nothing. So what does it mean to say that mitochondria generate energy?
Contrast today’s world with a primitive, preindustrial one. In a primitive world, there were many different sources of energy. You could use the energy of the sun to warm things; you could burn wood and other fuel to generate heat; you could use the flow of a river or the power of wind to turn a mill wheel; or use wind to sail across oceans. However, these different sources of energy are not interconvertible, and they can be used only in very limited ways. You could not, for example, use wind to cook your food.
Now think of today’s world: virtually every source of energy, from solar and wind, to fossil fuels and nuclear fission, can be converted to electricity. Electricity in turn can be used for almost everything. It provides heat and light, moves us around in cars and trains, entertains us through our television sets and other gadgets, and enables instant communication around the world. Electricity has become the universal currency of energy, in much the same way that monetary currency replaced barter trade hundreds of years ago.
That is exactly what mitochondria do in a cell. They take less versatile forms of energy—for example, the carbohydrates that we consume—and convert them into the universal energy currency of the cell, which is the molecule adenosine triphosphate, or ATP. We have come across ATP before: it is one of the building blocks of RNA and consists of the adenine base attached to a ribose sugar and a string of three phosphates. The bonds between the phosphate groups are what chemists call high-energy bonds. It takes energy to form them, and that energy is released when they are broken. When the cell needs energy for any particular process in the cell, it can break the bond between the second and third phosphate groups and use the energy released as a result. ATP is like a tiny, highly mobile molecular battery.
When we digest food, especially carbohydrates, we are effectively burning the sugar that we obtain by breaking down carbohydrates. In fact, chemically it is the same as if we actually burned sugar in a flame, except that our cells do it in a very controlled way. In both cases, the result is the same: sugar combines with oxygen and releases carbon dioxide and water, and releases energy in the process. That is exactly what we do when we breathe in and out. The energy released during respiration is used by mitochondria to make ATP.
This process is chemically similar to the way we produce electricity using hydroelectric power. Unlike our own cells, which have a single membrane enveloping them, mitochondria, like their bacterial ancestors, have two membranes: each one a thin double layer of fatty molecules called lipids, which separate aqueous compartments from one another. Inside the inner membrane is a large complex of protein molecules that uses the energy of respiration to move hydrogen ions (H+), or protons, across the inner membrane, creating a proton gradient, where one side of the membrane has a higher concentration of protons than the other. And just as water flows downhill, the protons want to go down the concentration gradient. But because the membrane is not generally permeable to protons, they can do so only by traveling through a specialized molecule that acts like a molecular turbine. In the same way that water is made to go down a hydroelectric dam through large pipes to turn turbines that generate electricity, protons go through that special molecule, ATP synthase, which, as a result, actually turns like a turbine, and makes a molecule of ATP by adding on the third phosphate to adenosine diphosphate, or ADP, which has just two phosphates.
Production of energy in our mitochondria.
Just as monetary currency increased trade and prosperity dramatically, enabling complex societies to evolve, and just as the energy currency of electricity allowed societies to become incredibly complex technologically, the efficient production of ATP allowed cells to become ever more complex and specialized. ATP is a small molecule and makes its way, as needed, all over the cell. It provides the energy for everything from making the components of the cell, to moving around parts of the cell, to enabling cells themselves to move. Our muscles use ATP to generate the power to contract. In our brain, ATP maintains the voltage across membranes in our neurons while they transmit electrical signals and fire impulses. The human body has to generate roughly its own weight in ATP every day, and the brain alone uses about a fifth of that. Just thinking uses hundreds of calories a day. And mitochondria provide nearly all of that ATP.
The stowaways within us, which may well have begun their lives as parasites, have made themselves indispensable by producing the ATP we need to survive. Mitochondria differ from their bacterial ancestors in other ways too. For one thing, they’ve shed most of their genes, so the mitochondrial genome is now tiny, typically coding for only a dozen protein genes. More than 99 percent of the mitochondria’s components are made by translating genes that now reside on the chromosomes in our nucleus. These proteins are made in the cytoplasm of our cells and then imported across one or both membranes of the mitochondria using a complicated machinery. How and why mitochondria managed to move most of their genes to their host’s genome, or why they retained any genome at all, is not well understood. This small mitochondrial genome is the source of many problems, though, because mutations in the mitochondrial DNA can give rise to diseases, including diabetes, and heart and liver failure, as well as conditions such as deafness.
We inherit our mitochondria exclusively from our mothers because the sperm contributes none of its mitochondria to the fertilized egg. As a result, diseases due to defects in the mitochondrial genome are inherited entirely from the mother. A few years ago, the United Kingdom made it legal for parents to produce a “three-parent” baby. The nucleus from the egg of a potential mother with defective mitochondria is introduced into the egg of a healthy woman donor that has had its own nucleus removed. This egg is then fertilized with the father’s sperm and placed in the womb of the potential mother. The child will carry mostly the genes of its father and mother, but all of his or her mitochondria, with their tiny genome, will come from the egg donor.
Cells can contain between tens to thousands of mitochondria. These mitochondria don’t lead entirely separate lives as they might if they were bacteria in a culture. Rather, they are constantly fusing and splitting. Mitochondria may be fusing to intermix their contents, partly as a way to compensate for partially damaged components in each of them. They also split in different ways. When cells divide, mitochondria will also split, often down the middle. But sometimes they will also split off parts that are defective so that they can be sent off to be degraded and recycled using processes such as autophagy, which we discussed in chapter 6.
Mitochondria don’t just fuse with one another; they also interact with a cell’s other organelles in interesting ways. It turns out that lipids—the fatty molecules that make up our membranes—are highly specialized, so different organelles and cell types have different compositions of lipids. Mitochondria often exchange components with other organelles so that they can help one another make the specialized lipids they need. Excessive contacts between these organelles and mitochondria can be just as harmful as having too little.
Finally, they do many other things besides making ATP. For example, they are also the place where the final stages of sugar burning occurs. They are the sites of burning our stored fat, which is especially important when our carbohydrate intake is insufficient, such as when we are starving or dieting. The energy from burning fat is also used to make ATP. Beyond energy production, mitochondria are now part of a complicated signaling network with the rest of the cell. They tell the cell when energy levels are low or high, so that it can adapt accordingly by turning on or off appropriate genes and pathways.
Thus, mitochondria are no longer just energy factories but have become a central hub of the cell’s metabolism, which is a far cry from the bacterial stowaway in our cells that they once were. We now coexist in a complex relationship with them. As we age, our mitochondria still work, but they have accumulated defects. Not only do they produce energy less efficiently, but they have become creakier and less effective at their myriad other tasks. Perhaps no other structure in the cell is so intimately connected to the energy of youth and the decline of the old. Aging mitochondria even acquire a different shape as they degrade, transitioning from elongated ovals to spherical blobs. You can see why my grandson, with his young, healthy mitochondria, might feel so much more energetic—and generally healthier—than I do.
IF MITOCHONDRIA ARE UNABLE TO function at some minimum level, we die. Remember, in most countries, death is defined by when our brain stops functioning. If we are unable to provide oxygen and sugar to our brain—which could be for a variety of reasons, such as a heart attack—the mitochondria in our brain tissue can no longer produce enough ATP for neurons to function, leading to brain death. A sudden loss of oxygen from a heart attack is a drastic occurrence, but even over the normal course of life, mitochondria gradually decline until they no longer function at the required level.
What brings mitochondria to this point? Mitochondria age for all the same reasons the rest of the cell does, but they have their own particular burden as well. In 1954, Denham Harman proposed something called the free-radical theory of aging. His idea was that chemically reactive species of molecules, some of them called free radicals, are produced normally as a byproduct of metabolism, and cause damage to the cell over time, accelerating aging. Harman’s idea would seem to help explain the benefits of caloric restriction. If you eat less, you burn fewer calories every day, and you don’t produce as many damaging chemical byproducts. Harman’s theory also explained why animals with high metabolic rates tend to live shorter lives than those with slower metabolism.
Free radicals can be produced throughout the cell, but they and other reactive species are produced in abundance in mitochondria. A primary function of mitochondria is burning sugar by oxidizing it. The oxygen we breathe consists of two oxygen atoms bound tightly together to form the O2 molecule. In mitochondria, this oxygen is reduced ultimately to two water molecules, each of which is H2O. If the reduction of oxygen is not complete, the partially reduced molecules are highly reactive intermediates called reactive oxygen species, or ROS. These highly reactive forms of oxygen can damage other components of the cell, including proteins and DNA. Anyone who has ever had an old car knows what reactive oxygen can do to the chassis; in that case, the reaction is speeded up when there is common salt around, which is why cars in climates where roads are salted in the winter tend to corrode more quickly. So you can think of damage to mitochondria from oxidation as a case of our cells rusting from within.
Normally mitochondria have enzymes to scavenge away these reactive species before they cause harm, but the process is not perfect. A fraction of reactive molecules escape. Over time, they damage the molecules around them, including the proteins that make our cells work. The general breakdown in the function of the cell leads to aging. Apart from causing immediate damage, these reactive species can also affect future generations of mitochondria by damaging our mitochondrial DNA. That DNA codes for parts of the essential machinery for oxidizing sugar and generating ATP, and if it acquires too many mutations, the machinery produced will be defective. This in turn makes the reduction of oxygen less efficient, resulting in even more reactive species, kicking off a vicious cycle. The reactive species can also diffuse to other parts of the cell and generally cause havoc. Slowly with age, mitochondria will perform less and less effectively.
Harman’s mitochondrial free-radical theory didn’t gain much traction at first, but a number of observations supported it. For one thing, the production of these reactive species increases with age; by contrast, the activity of the scavenging enzymes that remove them decreases with age, compounding the harm. But it wasn’t clear whether these changes were simply a result of aging or whether they themselves were further driving the aging process. Strains of mice that made more of an enzyme that scavenged hydrogen peroxide lived about five months longer than average, which is quite an increase in longevity for a mouse. As recently as 2022, scientists in Germany showed that a parasite increases the longevity of its ant hosts severalfold by secreting a cocktail that includes two antioxidant proteins as well as other compounds. You may remember that germ-line cells such as oocytes boast superior DNA repair. One way they may minimize damage is by suppressing one of the enzymes that generates reactive oxygen species.
As the free-radical theory gained credibility, antioxidants took center stage. These compounds, which combat reactive oxygen species, were touted as a panacea for everything from cancer to aging. Sales of antioxidants such as vitamin E, beta-carotene, and vitamin C soared. Cosmetic companies included vitamin E, retinoic acid, and other antioxidants in their lotions and creams to keep skin youthful. People were exhorted to eat foods rich in antioxidants, such as broccoli and kale.
Alas, although there were isolated reports of benefits from antioxidants, an analysis of sixty-eight randomized clinical trials of antioxidant supplements, encompassing a total of 230,000 participants, suggested that not only did they not reduce mortality, but some of them—beta-carotene, vitamin A, vitamin E—actually increased it. This by itself doesn’t mean that the free-radical theory has no merit. But it does mean that you cannot just pop antioxidant supplement pills and expect to get much protection against free-radical damage. Still, don’t give up on the kale just yet; eating fresh fruits and vegetables is beneficial for all sorts of other reasons.
There are many potential reasons why the results from antioxidant dietary supplements have been disappointing. They may be metabolized in a way that doesn’t maintain a lasting effect, or they may not properly mimic the natural process by which enzymes scavenge free radicals and reactive oxygen species. But over the last ten to fifteen years, some in the field have come to doubt that oxidative damage from reactive oxygen species and free radicals are a major cause of aging at all. Studies with other animals, including worms and flies, showed no clear correlation between the level of scavenging enzymes and life span. In fact, contrary to the report on mice I just mentioned above, studies in species as varied as yeast, worms, and mice reveal that increased levels of scavenging enzymes or other defenses don’t extend life span. On the contrary, in one study, mutant worms with higher levels of free radicals lived about a third longer. Giving them a herbicide that stimulates a surge of free-radical activity prolonged their lives even more, while reducing the level of free radicals by giving the worms antioxidant supplements reduced their lives. The naked mole rat lives many times longer than other animals of the same size, yet it has higher levels of reactive oxygen species.
What could possibly be going on? This may be an example of something called hormesis, in which exposure to low levels of a toxin is actually beneficial, whereas those same toxins are harmful at higher levels. Or, as the German philosopher Nietzsche said, that which does not kill us makes us stronger. Free radicals and reactive oxygen species send signals to stimulate the production of detoxification enzymes and repair proteins, which actually have a protective effect. Moreover, these reactive oxygen species have widespread roles as signaling molecules that convey the state of mitochondria to other parts of the cell.
So if free radicals and reactive oxygen species are by themselves not the major problem, what else about mitochondria might make them factors in aging? We know that mitochondrial DNA mutations increase with age, and accumulation of these mutations is correlated with disease. But does it cause aging? One way to settle this was to genetically engineer strains of mice in which the DNA polymerase enzyme that replicates mitochondrial DNA was made more error prone; consequently, mutations would accumulate at a much faster rate. These mutator mice were apparently normal at birth, but they soon showed many of the symptoms of premature aging, including gray hair, hearing loss, and heart disease. At the age of about sixty weeks, most of them were dead, while normal mice were still alive. This is strong evidence that damage to our mitochondrial DNA is an important factor in aging. Tellingly, these mutator mice did not have a higher level of reactive oxygen species, so it was not as if increased mutations led to defective enzymes, which then worsened the problem by accumulating reactive oxygen species. The ultimate reason these mutator mice age rapidly is still not settled. There are reports of a complicated interplay between errors in mitochondrial DNA and the stability of the bulk of the genome in the cell’s nucleus, which can cause all of the more general problems associated with DNA damage.
There is no question that damage to mitochondria is bad for the cell and accelerates aging, but it is remarkably difficult to tease out the precise sources of damage. Each human cell can house tens to thousands of mitochondria, each with its own genome. So if some of them acquire serious errors in their DNA, there will still be lots of healthy mitochondria to keep the cell working. But at some point, a threshold is reached where there are simply too many defective mitochondria in the cell, which cause so many problems that they overwhelm the good mitochondria. There are also situations where some of these defective mitochondria can multiply more quickly because they don’t actually do much of the work that healthy mitochondria do. In these cases, clones of these defective mitochondria can dominate, leading to serious problems for the cell.
Mitochondria are not just energy factories but also are intimately involved in the cell’s metabolism. So as they acquire defects with age, they contribute to the decline of the cells they inhabit and speed up aging. The effect is most pronounced when they contribute to the decline of stem cells, because those cells play such important and diverse roles: when they become dysfunctional, they not only fail to regenerate tissue but also cause cellular senescence and chronic inflammation, all of which are hallmarks of aging.
One characteristic of aging is a chronic low level of inflammation, cleverly dubbed “inflammaging.” Inflammaging owes its existence in part to our mitochondria’s ancient bacterial origins. Older, defective mitochondria are more prone to rupture and can leak their DNA and other molecules into the cytoplasm of the cell. The cell mistakes these as coming from bacterial invaders, triggering inflammation. Our neurons, which are either very long lived or do not regenerate at all, are particularly prone to aging mitochondria. It may be one reason that our cognitive abilities decline. Neurons with aging mitochondria are also less able to use the recycling pathways to clear away defective proteins and organelles, all of which expend energy. As a result, we become more prone to dementia with age.
For all these reasons, maintaining healthy mitochondria is a key to good health. How the cell does this is closely related to some of the pathways involved in caloric restriction that we have come across already. It also uses autophagy to get rid of entire mitochondria that it deems defective, or even just defective parts of mitochondria that are broken off. This process, called mitophagy, targets the mitochondria for destruction and recycling. Some proteins can sense when things are going wrong and coat the surface of defective mitochondria with markers that signal the autophagy apparatus to target them for destruction. The same caloric restriction that increases levels of autophagy by the TOR pathway also increases levels of mitophagy.
If a cell disposes of defective mitochondria, it must replace them with new mitochondria; here too, caloric restriction plays a role. The inhibition of TOR by caloric restriction, or the drug rapamycin, shuts down the synthesis of many proteins but turns on the synthesis of other proteins involved in turning out mitochondria. In studies, the increased mitochondrial activity from this process was tied directly to longer life spans in fruit flies.
Besides TOR, other signals also stimulate production of new mitochondria. Sometimes, though, this effort is futile: if the cell senses a problem with mitochondrial function, it may simply end up making more defective mitochondria.
WHILE SCIENTISTS AND THE PHARMACEUTICAL industry strive to produce a pill that will combat mitochondrial dysfunction, there is a simple way to stimulate the production of new mitochondria, and it doesn’t have to cost a penny: exercise. Physical activity turns on some of the same pathways that stimulate mitochondrial production in tissues ranging from our muscles to our brain. Exercise too is an example of hormesis. Too much exercise can be harmful, and even moderate exercise can temporarily increase blood pressure, oxidative stress, and inflammation, all of which are potentially problematic. Yet as long as the amount of exercise is not so excessive as to injure us, which depends on our health and many individual factors, it is highly beneficial. One way it spurs mitochondrial function is by generating the reactive oxygen species produced by incomplete oxidation when we breathe, which, as discussed earlier in this chapter, can be beneficial in the right amounts. Of course, exercise does far more than that and benefits us in many ways: reducing stress, maintaining muscle and bone mass, countering diabetes and obesity, improving sleep, and strengthening immunity. Add to this list the healthful effects of fresh mitochondria.
Eventually, despite the cell’s best efforts to both recycle defective mitochondria and manufacture new ones, our mitochondria inexorably age, and in turn accelerate other aspects of our overall aging. If accumulated mutations in mitochondrial DNA are a factor in their aging, why does a baby—or my grandson—have healthy mitochondria? The same question we asked for us as individuals could be asked here too. Why is the clock reset at each generation? Recall that the resetting of the aging clock has a few reasons. The first is that germ-line cells that form the next generation have better DNA repair and age more slowly. The second is that the epigenetic marks on DNA get reset with each new generation when germ-line cells are formed. Unlike our nuclear DNA, mitochondrial DNA doesn’t have the same sophisticated epigenetic mechanisms, but it is better repaired in germ-line cells. Moreover, there is a strong selection against mutations in mitochondrial DNA, so defective oocytes are not used for fertilization. There is also a strong selection against defective sperm and even defective early embryos, so any participants with deficient mitochondria should be weeded out. Nevertheless, selection is not perfect: at least some of the loss of fertility with age is due to aging mitochondria.
By now, it should be clear that all the causes of aging described so far are highly interconnected. We started off with perhaps the most fundamental molecule of all: our DNA, which contains the information necessary to make the thousands of proteins in a cell at just the right time and in the right amounts. That information needs to be protected against damage. Those thousands of proteins must work in harmony to ensure the functioning of a healthy cell, and the cell has many mechanisms to deal with problems as they arise. Beyond proteins, entire organelles such as mitochondria need to work in a symbiotic relationship with the rest of the cell. These mitochondria may have started off as an engulfed bacterium inside a larger ancestral cell, but today they have become a central hub in our metabolism. Any defects they acquire with age set off a whole sequence of events that themselves accelerate aging. All of these affect the aging of individual cells.
If individual cells in our body were to age or die, we would hardly notice it—after all, we have trillions of cells. But except in primitive life forms, cells don’t exist in isolation. In our bodies, they have to communicate with one another, and work together as part of our tissues and organs. It is when a sufficient number of cells accumulate defects with age that the symptoms of aging manifest themselves: arthritis, fatigue, susceptibility to infection, decreased cognition, and more generally, bodies that simply do not work as well as they did in our youth. It is time to look at how the aging of individual cells leads to some of the morbidities of old age.
10. Aches, Pains, and Vampire Blood
The coast-to-coast walk is one of the great long-distance treks in England. Starting in St. Bees Head on the west coast, it cuts through the most picturesque parts of the country before ending at Robin Hood’s Bay on the east coast, near Whitby, Dracula’s port of entry to England in the Bram Stoker novel. The entire walk runs about 200 miles. I figured when I finished it, I could get an “I Did the Coast-to-Coast Walk” T-shirt and disingenuously wear it in the States to impress people.
My opportunity came in the summer of 2013, when a group of friends and I set off. Everything was fine for the first week, but then my knee started to become more and more inflamed until I had to abandon the walk with only a few days to go. On my return, a surgeon looked at it and discovered a torn and inflamed meniscus, the result of moderate osteoarthritis. As soon as I had the knee repaired, my right shoulder started to ache—osteoarthritis striking again. I receive little sympathy from my similarly aged friends: aches and pains in our joints are simply part of life as we get older.
Joint pain is a symptom of just one kind of inflammation, and its causes are often physical, such as the wear and tear on the bones in the joint, which then pinch and inflame the soft tissue in it. But as we age, there is a much more pervasive yet less obvious inflammation that affects our health as well as our response to disease.
One cause of inflammation comes from cells that reach a senescent state because they have aged or become damaged. We’ve seen that when a cell senses DNA damage, it can do one of three things. If the damage is mild, it can turn on repair mechanisms. If the damage is more extensive, it can trigger signals that kill the cell; or it can send the cell into a senescent state, in which it is no longer able to divide. We saw an example of the latter when we discussed how cells stop dividing when the telomeres at the ends of their chromosomes shorten beyond a certain point. Whether a cell is killed off or whether it enters senescence, the purpose is the same: to prevent cells with a damaged genome from reproducing. Such cells run the risk of being cancerous; indeed, the entire response to DNA damage can be thought of as a mechanism to prevent cancer. As we saw earlier, nearly half of cancers have mutations in a single protein, p53, that plays a key role in the DNA damage response. These tumor suppressor genes can induce premature senescence to prevent cancer.
Just as evolutionary theories would predict, processes that prevent us from developing cancer early in life can become a problem later on. Our tissues, for instance, would stop functioning if their cells kept getting killed off without being replaced. And even though they are alive and present, senescent cells also lead to problems. The transition from a normal cell to a senescent cell is not clearly understood. It occurs because of extensive changes to the genetic program of the cell triggered by the DNA damage response. In their altered state, senescent cells no longer contribute to the normal functioning of the tissues they serve. If they are no longer functioning as they should, you might well wonder why cells go into senescence at all instead of simply being destroyed, and why they persist.
In fact, senescent cells often don’t just sit there quietly doing nothing. They secrete molecules such as cytokines that cause inflammation and disrupt the surrounding tissue. This is by design. Senescent cells are often produced in response to injury or other damage, and the same secretions that set off inflammation also promote wound healing and tissue regeneration, while at the same time signaling the immune system to clear them from the tissue. But our immune system ages along with the rest of us, and its ability to clear senescent cells declines. As damage to our DNA accumulates and our telomeres shorten, we produce senescent cells in places where they don’t serve any purpose and at a faster rate than our immune system can handle, leading to chronic, widespread inflammation.
In all of the causes of aging we have discussed so far, the processes are so complex and interconnected that it is always a problem to separate cause and effect. Here too, there is the nagging question of whether an increase in senescent cells and accompanying inflammation is just a consequence of aging or whether it accelerates aging further. This question was tackled in a key study led by Jan van Deursen, who was then at the Mayo Clinic in Minnesota. He and his team used a biomarker that identified senescent cells and devised a clever method to eliminate cells with that marker. Using mice that age prematurely—called progeroid mice—they showed that removing senescent cells delayed age-related pathologies in adipose (fatty) tissue, skeletal muscle, and the eye. Even late in life, removing senescent cells delayed the progression of disorders that had already been established. The study concluded by saying that removal of senescent cells could prevent or delay aging disorders and extend healthy life. A few years later, the same team demonstrated that mice whose senescent cells were killed off were healthier in many ways than those in whom these cells were allowed to build up. Their kidneys functioned better, their hearts were more resilient to stress, they were more active, and they fended off cancers for longer. They also lived about 20–30 percent longer.
According to a follow-up study, transplanting even small numbers of senescent cells into young mice was sufficient to cause persistent physical dysfunction, and even spread senescence throughout the tissues. With older mice, introducing even fewer senescent cells had the same effect. When researchers used an oral cocktail that selectively killed senescent cells, it alleviated the symptoms of both the young and old mice and reduced their mortality significantly.
These studies have led to an explosion of experiments examining senescent cells as they relate to aging. The selective targeting of these cells for destruction, called senolytics, is growing rapidly in popularity, both in academic research and industry. But destroying problematic cells like these is only one side of the coin. Most of our tissues are constantly regenerated, and if cells are destroyed either naturally or deliberately, they need to be replaced.
An old saw holds that the human body replaces itself every seven years; in other words, after seven years, you’re an entirely new collection of cells. But this isn’t strictly true. Our tissues don’t all regenerate at the same rate. Some, such as blood and skin cells, are regenerated rapidly. Cuts, bruises, and minor burns will heal over quickly with new skin, and if you donate blood, your body replenishes it in just a few weeks. Other organs are renewed more slowly; for example, most of the cells in your liver are replaced within three years. Heart tissue is replaced even more slowly, with only 40 percent of its muscle cells replaced in a lifetime, which is why the damage caused by a heart attack is often permanent. And it was thought that the neurons in our brain are never renewed—that we are born with every neuron we will ever have. Recently, however, scientists have shown that some brain cells are renewed, albeit very slowly, at a rate of about 1.75 percent annually. Still, most of our neurons were present at birth, and the inability to replenish them is why diseases that destroy them—either suddenly in a stroke or more gradually as in Alzheimer’s—are so horrific.
The majority of our cells, however, are replaced with some regularity, and the key actors responsible for regenerating tissue are those stem cells we discussed earlier. Remember that the ultimate stem cells are the pluripotent stem cells in the early embryo that can give rise to any tissue type in the body as they differentiate. But other stem cells are halfway down the path to development of the complete organism and can regenerate only specific tissues. As Leonard Hayflick discovered in the 1950s, the cells in most tissues can undergo only a certain number of divisions, but stem cells, because they are required for regenerating tissues, are not subject to this limit.
Stem cells that maintain and regenerate tissue must strike a delicate balance. They cannot all differentiate into the mature cells of the tissues, or there would be no stem cells left to carry on this task. And the stem cells that remain behind have to keep dividing into more stem cells to replenish the ones that have differentiated into specific tissue cells. As we age, our stem cells begin to lose this balance between producing more of themselves and regenerating tissue.
Stem cells do not divide and proliferate indiscriminately; rather, they are activated by specific signals that they receive when the body senses a need for tissue regeneration. These signals and their ability to activate stem cells decline with age, for the many reasons we have discussed before, including damage to our genome, and epigenetic marks that our DNA acquires with age. This is one reason our muscles, skin, and other tissues degenerate with age.
Apart from not being activated, stem cells themselves eventually suffer from DNA damage and telomere loss, and accumulate metabolic defects. Eventually they trigger a response such as the DNA damage response, which can lead to either cell death or senescence. With stem cells, death is more likely, partly because a stem cell that has damaged DNA might be too much of a cancer risk to keep around. The result is a gradual depletion of stem cells throughout the body, diminishing the ability to regenerate tissue. When our bones, muscles, and skin cannot regenerate, we become increasingly frail. A particularly significant decline is the population of hematopoietic stem cells, which give rise to all our blood cells, including the cells of our immune system. This leads to immune system decline or even immune dysfunction—something called immunosenescence, which is associated with an increase in disorders such as inflammation, anemia, and various cancers, as well as in increased susceptibility to infections.
Apart from a gradual loss in the number of stem cells, there is a problem with the remaining stem cells. During much of our life, we have a healthy diversity of cells that have acquired different mutations, making us a mosaic of genomes. As we age, our stem cells acquire mutations, some of which cause them to proliferate more rapidly. These rapidly multiplying stem cells are not necessarily the best for regenerating tissues, but because they have a growth advantage, they outcompete their counterparts. Consequently, old age leaves us with stem cells that have all descended from just a few clones. Not only are they less effective, but—of greater concern—the clonal mutants themselves can become sources of cancer.
If the number of stem cells declines with age, and those that remain are descendants of a few clones, some of which may be problematic, can we somehow reverse this process? In chapter 5 on epigenetics, I explained about how turning on just a few genes that code for the so-called Yamanaka factors can reprogram cells so that they can return to being pluripotent stem cells—and thus can again give rise to any tissue in the body. Might scientists learn to regenerate stem cells in the body and reverse some of the effects of aging?
When cells are reprogrammed fully with Yamanaka factors to form induced pluripotent stem cells (iPS cells) and used to grow new tissues, they often produce tumors such as teratomas, which can be benign or malignant. One reason for this is that the Yamanaka factors are not precisely reversing the normal process of development. The truth is, we don’t fully understand what they do or how, but the resulting induced pluripotent stem cells are not exactly the same as our own embryonic stem cells, which develop into our body—after all, teratomas are quite rare in normal development. Given the potential risks associated with the use of Yamanaka factors, one idea is to expose cells to them only transiently, so that they would not go all the way back to being pluripotent stem cells again, but just part of the way back developmentally so they would be transformed into the specialized stem cells for whichever tissue they came from. Even this transient and partial reversal could help rejuvenate tissue.
Many scientists had been working on this in cells in culture, but it wasn’t clear what turning on these factors even transiently in an entire animal would do. A group led by Juan Carlos Izpisua Belmonte at the Salk Institute in La Jolla, California, did exactly this by turning on the Yamanaka factors in entire mice for a short burst. After six weeks, the mice appeared younger, with better skin and muscle tone. They had straighter spines, improved cardiovascular health, healed more quickly when injured, and lived 30 percent longer. These studies involved a special strain of progeroid mice that aged prematurely. Recently, though, both Belmonte’s own group as well as groups led by Manuel Serrano and Wolf Reik, both in Cambridge, England, found that doing the same thing in naturally aged mice—as well as in human cells—induced similar effects. Not only did the animals (or cells) seem younger based on various criteria, but the epigenetic marks on their DNA, and the various markers in their blood and cells, were all characteristic of a more youthful state.
David Sinclair, who had spent much of his earlier career working on sirtuins, has also begun using the Yamanaka factors to reprogram cells. A newborn mouse can regenerate the optic nerve that transmits signals from the eye to the brain, but this ability disappears as the mouse develops. Sinclair and his colleagues crushed the optic nerves of adult mice, and then introduced three of the four Yamanaka factors. They omitted the fourth, c-Myc, because it is known to have cancer-causing properties. The factors prevented the injured cells from dying and prompted some of them to grow new nerve cells reaching out to the brain. In the same study, they introduced the three factors into middle-aged mice and found that their vision was as good as younger ones. Their DNA methylation epigenetic marks resembled those of younger animals. In another experiment, the team deliberately introduced breaks in the DNA of mice, which accelerated aging by inducing the DNA repair response. One of the effects was that the pattern of epigenetic marks in the genome were characteristic of an aged animal. All of these effects could be reversed by introducing the same three Yamanaka factors.
Stem cells have been the basis of a very large biotech industry for a long time because of the promise of regenerating new cells and tissues. But it was still quite astonishing that introducing Yamanaka factors into an entire animal, where they could affect virtually every tissue, could apparently reverse aging without any obvious ill effects, at least in the short term. For example, even though two of the three Yamanaka factors used in Sinclair’s experiments are also linked to cancer, his mice were tumor free for nearly a year and a half after treatment. These studies generated huge excitement in the aging community because, unlike other approaches, which can slow down the inexorable progress of aging, these studies actually promise to reverse aging by restoring cells and tissues to an earlier state. Not surprisingly, Belmonte, Serrano, and Reik, all leading researchers originally in academic labs, were snapped up by Altos Labs, the private company set up to tackle aging, which had also snapped up Peter Walter, whom we encountered in chapter 6. We will have more to say about these anti-aging enterprises later.
BEFORE WE LEAVE THIS CHAPTER, let us turn to blood. Most of us don’t think of blood as an organ in the same way that we consider the liver, kidney, heart, and brain. But perhaps we should. For in many ways, blood circulation is one of the most important systems in the body. It supplies essential nutrients, including oxygen and glucose, to the other organs, as well as disposes of their waste products. It enables our response to hormones, promotes healing by forming structures at the site of injuries, and fights off infections with the immune cells that circulate in our bloodstream. If we have old, defective blood—clonal or not—that is a problem.
The idea of living forever by drinking young blood has been around for a long time. I remember being terrified when I saw my first Dracula movie at the age of ten. But Transylvanian myths and Gothic novels aside, is it possible to replace old blood with young?
Parabiosis attempts to do just that, by surgically connecting the circulatory systems of two animals. Some of the earliest experiments date back to the nineteenth-century French biologist Paul Bert, who was interested in tissue transplantation rather than aging. He not only connected two rats but, amazingly, is reported to have attached a rat to a cat and successfully maintained this state for several months.
Sharing blood between two different animals, let alone different species, could obviously be problematic not only because of the possibility that one or both animals’ immune systems will reject the transfused blood due to incompatibility (this is why blood donors have to be matched to recipients with compatible blood groups), but also psychological issues. Indeed, Clive McCay of Cornell University in Ithaca, New York, is quoted as saying, “If two rats are not adjusted to each other, one will chew the head of the other until it is destroyed.” Nowadays the animals are inbred and matched genetically to avoid biochemical incompatibilities. Then they are socialized with each other for several weeks before attachment.
Early experiments on parabiosis probed questions such as the role that blood plays in metabolic disorders, including obesity. There were, however, some scientists, like McCay, who were looking at the effects on aging as early as the 1950s. His group found that when aged rats were joined to young ones for about a year, their bones became more similar in weight and density to those of their young partners. Other studies showed that the older partners in old-young pairings lived four to five months longer than normal, which for a two-year life span is a significant extension of life. But for some reason, these studies died out in the 1970s.
The field was resuscitated in the early 2000s when Irina and Michael Conboy, a husband-and-wife team in Thomas Rando’s lab at California’s Stanford University, again began pairing old and young mice. Within five weeks, the young blood restored muscle and liver cells in the older subjects. Their wounds healed more easily. The fresh blood even made their fur shinier. By the same criteria, the younger partner in each of the pairs tended to fare worse than usual; it, of course, was receiving older blood in the exchange.
Rando and his colleagues had left out of their 2013 published paper that they had also seen enhanced growth of the older mice’s brain cells. We know that neurons, for the most part, do not regenerate. But these early results motivated one of Rando’s Stanford colleagues, the neurobiologist Tony Wyss-Coray, to investigate the effects of parabiosis on the brain. He showed that old blood could impair memory in young animals, while, conversely, young blood could improve the memories of older animals. There was a threefold increase in the number of new neurons in the older mice. By contrast, the younger mice that received old blood from their conjoined partners generated far fewer nerve cells than young mice allowed to roam free did.
Against the centuries-old backdrop of the vampire myth, these reports captured people’s imaginations. Rando and Wyss-Coray were deluged with phone calls from reporters and from the general public—some of them dubious, not to mention scary. There were reports of rich old men—and, yes, it usually seems to be men—procuring a ready supply of young blood to prolong their lives.
The scientists involved were more circumspect. In a 2013 journal article, the Conboys and Rando pointed out that even in highly inbred strains of mice and rats, the risk of parabiotic disease was as high as 20–30 percent. Moreover, it was not obvious whether all of the positive effects of parabiosis could be attributed to the blood; the older animal would have also benefited from the better-functioning organs of the younger partner, such as its liver and kidneys. To test this, the Conboys conducted a study in which they exchanged blood between two animals that were not joined. They found that the adverse effects of old blood were more pronounced than the beneficial effects of young blood.
Such cautionary views did not stop lots of companies from trying to capitalize on the hype, rushing ahead before any careful human trials were completed. One company, Ambrosia, offered blood plasma from donors aged sixteen to twenty-five for $8,000 a liter. Alarmed, the US Food and Drug Administration (FDA) issued a warning that these treatments were unproven and should not be assumed to be safe, and strongly discouraged consumers from pursuing this therapy outside of clinical trials with appropriate regulatory oversight. In response, Ambrosia stopped offering the treatment, but only briefly: the people involved soon began marketing it again under the aegis of a new but short-lived business named Ivy Plasma—before returning to its original name. Ambrosia’s CEO, Jesse Karmazin, said, “Our patients really want the treatment. The treatment is available now. Trials are very expensive, and they take a really long time.” Most serious scientists, including those who pioneered the discoveries, believe it is premature and potentially dangerous to offer these kinds of treatments to humans without proper clinical trials.
Beyond all the hype, Thomas Rando’s initial findings set off an extensive search for specific protein factors in blood that could be related to aging. In theory, you could have factors in young blood that stimulate growth and improve function; by the same token, old blood might contain factors that made things worse. Wyss-Coray and his colleagues showed that it was both. As they described in a 2017 article in the journal Nature, proteins from umbilical cord plasma revitalized the function of the hippocampus—a part of the brain crucial for the formation of both episodic and spatial memory. As for old blood, they zeroed in on a protein that impaired hippocampus activity; blocking it relieved some of the adverse effects.
Of course, in the parabiosis experiments, young blood improved many organs, not just the brain. Amy Wagers of Harvard University, who was a member of Rando’s original team at Stanford, screened the hundreds of protein factors in blood to pinpoint the ones more prevalent in old or young blood. A factor called GDF11 was abundant in young mice but not in old, and it could rejuvenate heart tissue. But it didn’t just act on heart tissue. She and her colleagues showed that the factor reversed age-related deterioration of muscle tissue by reviving stem cells in old muscles and making them stronger. In a second study with her Harvard colleague Lee Rubin, they showed that it spurred the growth of blood vessels and olfactory neurons in the brain.
Stem cells can decline in number and lose function with age, and clearly some of the factors in blood work by reactivating them. But what about the old blood making the young mice worse off? A recent study by the Conboys and Judith Campisi, another leading aging researcher, showed that treating young mice with old blood quickly increased the number of senescent cells in their circulation. This means that senescence is not just a response to stress and damage from the environment, nor is it something that simply happens over time. It can also be induced rapidly. Clearing those senescent cells reversed some of the harmful effects of old blood on multiple tissues.
Blood need not even be from young animals to confer benefits. We saw in chapter 8 that exercise has a real benefit on many aspects of our metabolism, including insulin sensitivity and mitochondrial biology. It turns out that blood from adult mice that had been subjected to an exercise program can improve cognitive function and regeneration of neuronal tissue. Rando and Wyss-Coray showed that exercised blood can also rejuvenate muscle stem cells. Using a new way of measuring effect based on which mRNAs are made in different tissues, they showed that young blood and exercised blood act in different ways. Parabiosis from young animals reduced the activity of genes that caused inflammation, whereas exercise increased the activity of genes that decline with age. Although they both stimulated growth of brain tissue, each stimulated different types of cells.
Identifying aging factors in blood and understanding how they work is now a major area of research. Scientists hope that one day it might be possible to administer a cocktail of a few factors with real anti-aging effects. This hope is spurring not only basic research but also has resulted in the creation of many biotech companies, including ones founded by some of the pioneers in the field.
While science is advancing to find out precisely which combination of blood factors is most beneficial, some billionaires are unwilling to wait. They continue to be drawn to the Dracula-like allure of young blood. For instance, Bryan Johnson, the middle-aged tech mogul behind the company Braintree Payment Solutions, spends $2 million a year on his anti-aging regimen, which includes two dozen supplements, a strict vegan diet, and, as befits a techie, lots of data, including more than 33,000 images of his bowels. He went to Resurgence Wellness, a Texas outfit that describes itself as a comprehensive health and wellness clinic–slash-spa. There he was transfused with blood from his seventeen-year-old son, Talmage, and in turn donated his own blood to his father in a series of multigenerational blood exchanges that lent new meaning to “all in the family.” Johnson stopped the transfusions from his son after seeing no benefits himself, but still felt that “young plasma exchange may be beneficial for biologically older populations or certain conditions.”
IN THIS AND EARLIER CHAPTERS, we have covered the broad landscape of aging at various levels, from our genes, to the proteins they encode, and how they affect cells and their ability to function as part of an entire animal. These levels are all interconnected, so the state of our proteins and our cells influences how and which genes are expressed, which in turn affects them. By their very nature, the causes of aging encompass virtually all of biology, and as new areas of research emerge, we find new and sometimes surprising connections with aging. So why we age and die is an ongoing story, and this book has focused on processes of the greatest interest or promise.
The quest to defeat aging and death is centuries old, but it is only in the last half century that we have accumulated a detailed biological understanding of the processes that lead to them. That knowledge has brought about an explosion of efforts by both academic institutions and for-profit companies to combat aging. Now we come to these efforts, ranging from sound mainstream science to the wildest crackpot ideas.
11. Crackpots or Prophets?
Last Christmas, when my son’s family was visiting from America, there was a special exhibition at the British Museum about the Rosetta Stone and how it led to the decipherment of Egyptian hieroglyphics. So we trudged off to London, and since it was a cold and wet day during the Christmas break, we found to our dismay that the museum was packed. After we battled the crowds milling about the exhibition, we were naturally curious to see the rest of the Egyptian artifacts in the museum, including its unparalleled collection of mummies. We went over to the long hall with cases enclosing one mummy after another. It was both thrilling and sobering. Thrilling that these mummies had been preserved for a few thousand years and were right there for us to see. Sobering that each of them represented a person who had been alive.
Their corpses, now in varied states of preservation, lay underneath the wrappings and caskets. It was a stark reminder yet again of the extent to which people will go to deny death. After all, Egyptians mummified their pharaohs so that they could arise corporeally at some point in the future for their journey in the afterworld. Surely now, a few millennia after the pharaohs and with more than a century of modern biology behind us, we would not do anything even remotely so superstitious. But in fact, there is a modern equivalent.
Biologists have long wanted to be able to freeze specimens so that they can store and use them later. This is not so straightforward because all living things are composed mostly of water. When this water freezes into ice and expands, it has the nasty habit of bursting open cells and tissues. This is partly why if you freeze fresh strawberries and thaw them, you wind up with goopy, unappetizing mush.
An entire field of biology, cryopreservation, studies how to freeze samples so that they are still viable when thawed later. It has developed useful techniques, such as how to store stem cells and other important samples in liquid nitrogen. It has figured out how to safely freeze semen from sperm donors and human embryos for in vitro fertilization treatment down the road. Animal embryos are routinely frozen to preserve specific strains, and biologists’ favorite worms can be frozen as larvae and revived. For many types of cells and tissues, cryopreservation works. It is often done by using additives such as glycerol, which allow cooling to very low temperatures without letting the water turn into ice—effectively like adding an antifreeze to the sample. In this case, the water forms a glass-like state rather than ice, and the process should be called vitrification rather than freezing (the word vitreous derives from the Latin root for glass), but even scientists casually refer to it as freezing and the specimens as frozen.
Enter cryonics, in which entire people are frozen immediately after death with the idea of defrosting them later when a cure for whatever ailed them has been found. The idea has been around a long time, but it gained traction through the work of Robert Ettinger, a college physics and math teacher from Michigan who also wrote science fiction. Ettinger had a vision of future scientists reviving these frozen bodies and not only curing whatever had ailed them but also making them young again. In 1976 he founded the Cryonics Institute near Detroit and persuaded more than a hundred people to pay $28,000 each to have their bodies preserved in liquid nitrogen in large containers. One of the first people to be frozen was his own mother, Rhea, who died in 1977. His two wives are also stored there—it is not clear exactly how happy they were to be stored next to each other or their mother-in-law for years or decades to come. Continuing this tradition of family closeness, when Ettinger died in 2011 at age ninety-two, he joined them.
Today there are several such cryonics facilities. Another popular one, Alcor Life Extension Foundation, headquartered in Scottsdale, Arizona, charges about $200,000 for whole-body storage. How do these facilities work? Essentially, as soon as a person dies, the blood is drained and replaced with an antifreeze, and the body is then stored in liquid nitrogen. Theoretically, indefinitely.
Then there are the transhumanists who want to transcend our bodies entirely. But they don’t want humanity as we know it to end before we have figured out a way to preserve our minds and consciousnesses indefinitely in some other form. In their view, intelligence and reason may be unique to human beings in the universe (or at least they see no evidence for extraterrestrial intelligence). To them, it is of cosmic importance to preserve our consciousnesses and minds and spread them throughout the universe. After all, what is the point of the universe if there is no intelligence to appreciate it?
These transhumanists are content to have only their brains frozen. This takes up less space and costs less. Moreover, it could be faster to infuse the magic antifreeze directly into the brain after death, increasing the odds of successful preservation. The brain is the seat of memories, consciousness, and reasoning, and that is their sole concern. At some point in the future, when the technology is ripe, the information in the brain will simply be downloaded to a computer or some similar entity. That entity will possess the person’s consciousness and memories and will resume “life.” It won’t be limited by human concerns such as the needs for food, water, oxygen, and a narrow range of temperature. We will have transcended our bodies, with the possibility of traveling anywhere in the universe. Not surprisingly, transhumanists are generally ardent about space travel, viewing it as our only chance to escape destruction on Earth. One such proponent is Elon Musk, said to be the wealthiest person in the world, depending on the year, who is well known for his desire to “die on Mars, just not on impact.” Presumably one of his first goals upon reaching the red planet will be to construct a cryonics facility.
The bad news is that there is not a shred of credible evidence that human cryogenics will ever work. The potential problems are myriad. By the time a technician can infuse the body, minutes or even hours may have elapsed since the moment of death—even if the “client” moved right next to a facility in preparation. During that time, each cell in the deceased person’s body is undergoing dramatic biochemical changes due to the lack of oxygen and nutrients, so that the state of a cryogenically frozen body is not the state of a live human being.
No matter, say cryo advocates: we simply must preserve the physical structure of the brain. As long as it is preserved enough that we can see the connections between all the billions of brain cells, we will be able to reconstruct the person’s entire brain. Mapping all the neurons in a brain is an emerging science called connectomics. Although it has made tremendous advances, researchers are still ironing out the kinks on flies and other tiny organisms. And we don’t yet have the know-how to properly maintain a corpse brain while we wait for connectomics to catch up. Only recently, after many years, has it been possible to preserve a mouse brain, and that requires infusing it with the embalming fluid while the mouse’s heart is still beating—a process that kills the mouse. Not one of these cryonics companies has produced any evidence that its procedures preserve the human brain in a way that would allow future scientists to obtain a complete map of its neuronal connections.
Even if we could develop such a map, it would not be nearly enough to simulate a brain. The idea of each neuron as a mere transistor in a computer circuit is hopelessly naive. Much of this book has emphasized the complexity of cells. Each cell in the brain has a constantly changing program being executed inside it, one that involves thousands of genes and proteins, and its relationship with other cells is ever shifting. Mapping the connections in the brain would be a major step forward in our understanding, but even that would be a static snapshot. It would not allow us to reconstruct the actual state of the frozen brain, let alone predict how it would “think” from that point on. It would be like trying to deduce the entire state of a country and its people, and predict its future development, from a detailed road map.
I spoke to Albert Cardona, a colleague of mine at the MRC Laboratory of Molecular Biology who is a leading expert on the connectomics of the fly brain. Albert stresses that, in addition to the practical difficulties, the brain’s architecture and its very nature are shaped by its relationship to the rest of the body. Our brain evolved along with the rest of our body, and is constantly receiving and acting upon sensory inputs from the body. It is also not stable: new connections are added every day and pruned at night when we sleep. There are both daily and seasonal rhythms involving growth and death of neurons and this constant remodeling of the brain is poorly understood.
Moreover, a brain without a body would be a very different thing altogether. The brain is not driven solely by electrical impulses that travel through connections between neurons. It also responds to chemicals both within the brain and emanating from the rest of the body. Its motivation is driven very much by hormones, which originate in the organs, and includes basic needs such as hunger but also intrinsic desires. The pleasures our brains derive are mostly of the flesh. A good meal. Climbing a mountain. Exercise. Sex. Moreover, if we wait until we age and die, we would be pickling an old, decrepit brain, not the finely tuned machine of a twenty-five-year-old. What would be the point of preserving that brain?
Transhumanists argue that these problems can be solved with knowledge that mankind will acquire in the future. But they are basing their beliefs on the assumption that the brain is purely a computer, just different and more complex than our silicon-based machines. Of course, the brain is a computational organ, but the biological state of its neurons are as important as the connections between them in order to reconstruct its state at any given time. In any case, there is no evidence that freezing either the body or the brain and restoring it to a living state is remotely close to viable. Even if I were one of the customers who was sold on cryonics, I would worry about the longevity of these facilities, and even the societies and countries in which they exist. America, after all, is only about 250 years old.
Despite this, many people have bought into the idea of cryonics. In the United Kingdom, a fourteen-year-old girl who was dying of cancer wanted to have her body cryogenically frozen. She needed the consent of both parents, but they were separated, and her father, who himself suffered from cancer, and was not part of her life, was opposed. She took the matter to court, and the judge ruled that she was entitled to have her wishes followed—but they should be made public only after her death. This elicited an outcry from prominent UK scientists, who called for restrictions on the marketing of cryonics to vulnerable people.
In almost a mirror image of this case, the renowned baseball player Ted Williams wanted to be cremated. Upon his death in 2002 at the age of eighty-three, two of his three children insisted on having his remains frozen, igniting a bitter family feud. In the end, a compromise was reached: only the great athlete’s head would be put on ice, so to speak.
According to press reports, well-known people who intend to be cryopreserved include entrepreneur Peter Thiel, one of the cofounders of PayPal; computer scientist Ray Kurzweil, best known for his prediction that in 2045 we will reach the singularity where machines will become more intelligent than all humans combined; philosopher Nick Bostrom, who is concerned that such machine superintelligence could spell an existential catastrophe for humans; and computer scientist turned gerontologist Aubrey de Grey. More about him in a moment.
Because the brain decays rapidly following death, many cryonics facilities recommend that their clients move somewhere nearby when it’s known that the end is nigh. However, this may not be good enough. Remember that the only way cryopreservation has been shown to merely preserve connections in a mouse brain was by infusing embalming chemicals into its blood while it was still alive, in a procedure that kills the animal. In 2018, a San Francisco company called Nectome was reported to have plans to do exactly that to human beings: infusing a mixture of embalming chemicals into the carotid arteries in the neck—killing the customer immediately in the process. This would be carried out under general anesthesia, although what the embalming would do to the state of the brain was not clear. The company’s cofounder claimed that this assisted suicide will be completely legal under California’s End of Life Option Act. One might think that the prospect of certain euthanasia coupled with an uncertain outcome would be a tough sell, but the same article claimed that twenty-five people had already signed on as customers, and one of them was reported to be thirty-eight-year-old Sam Altman, cofounder of OpenAI, the artificial intelligence research lab that launched ChatGPT, who believes that minds will be digitized in his lifetime and that his own brain will one day be uploaded to the cloud. In response, Robert McIntyre, the founder of Nectome, said that those people were early supporters of his research and had not been promised or even offered anything, certainly not silicon-based mental immortality.
LET US MOVE FURTHER UP the plausibility scale, from cryonics to Aubrey de Grey. With his two-foot-long beard and a matching messianic zeal, de Grey looks the very stereotype of an upper-class English eccentric and has amassed a large cultlike following. He began his career as a computer scientist and, although not a professional mathematician, contributed a major advance toward solving a sixty-year-old mathematics problem. At some point, he met the American fly geneticist Adelaide Carpenter at a party in Cambridge and eventually married her. This sparked his interest in biology—in particular, the mitochondrial free-radical theory of aging. De Grey came to believe that aging was a solvable problem. He asserts that the first humans who will live to be 1,000 years old have already been born. De Grey’s central idea is that if we can improve average life expectancy faster than we age—if, in other words, life expectancy increases by more than a year annually—we can hope to escape death altogether. He calls this “escape velocity.”
To reach escape velocity, de Grey has a plan. Bucking the conventional wisdom of the biological community, he proposes that we can defeat aging if we crack seven key problems: (1) replenish cells that are lost or damaged over time, (2) remove senescent cells, (3) prevent stiffening of structures around the cell with age, (4) prevent mitochondrial mutations, for example by engineering mitochondria so that they don’t make any proteins themselves using their own genome but import them exclusively from the rest of the cell, (5) restore the elasticity and flexibility of the structural support to cells that stiffen with age, (6) do away with telomere lengthening machinery so that we don’t get cancer, and (7) figure out how to reengineer stem cells so that our cells and tissues don’t atrophy. He calls his program to solve these problems SENS: strategies for engineered negligible senescence.
De Grey has learned enough biology to pinpoint many of the things that go wrong as we age. But with the characteristic arrogance that many physicists and computer scientists display toward biologists, he is wildly optimistic about the feasibility of addressing them. In response to his claims, twenty-eight leading gerontologists, including many you’ve come across in this book, wrote a scathing rebuttal arguing that many of his ideas were neither sufficiently well formulated nor justified to even provide a basis for debate, let alone research, and that not a single one of de Grey’s proposed strategies has been shown to extend life span. The coauthors included Steven Austad and Jay Olshansky. Other mainstream researchers too dismissed SENS as pseudoscience. One of them, Richard Miller of the University of Michigan, penned a hilarious parody of SENS in a satirical open letter to de Grey in the journal MIT Technology Review. Since the aging problem had been solved, Miller proposed, perhaps we could turn now to the challenge of producing flying pigs; there are a mere seven reasons why pigs, at present, cannot fly, and we could fix all of them easily. De Grey, in response, huffed that the gerontology community was short-sighted, comparing the field to Lord Kelvin, the famous physicist and former president of the Royal Society who once scoffed that heavier-than-air flying machines were impossible.
Dissatisfied with the lack of support from the academic community and the funding prospects in England, de Grey left for the United States in 2009. He set up the SENS Foundation in well-heeled Mountain View, California, with a private endowment, and initially with the support of some well-known gerontologists. Around this point, he began liaisons with other women, two of whom were forty-five and twenty-four years old. Adelaide Carpenter de Grey, then sixty-five, did not want to move to California to be part of this lifestyle, and they eventually divorced. De Grey remarked that as we solved the aging problem, “There’s going to be much less difference between people of different chronological ages,” and the expectation of living a very long time might very well lead to a reevaluation of the value of permanent monogamy. In 2021 he made the news again after being accused of sexual harassment by two young women, one of whom was only seventeen when she encountered de Grey. He denied the allegations and was suspended by his own foundation initially. But following charges that he’d interfered with an investigation into his conduct, the SENS Foundation fired him. A company report eventually cleared de Grey of being a sexual predator but criticized him over instances of poor judgment and boundary-crossing behavior. De Grey, undaunted, founded the new LEV Foundation, with the letters standing unsurprisingly for Longevity Escape Velocity. His longevity in longevity research is remarkable, as is his ability to continue to obtain funding from rich benefactors.
Even the more mainstream anti-aging industry has some extreme optimists. Among them is David Sinclair, who, unlike the charlatans of the aging field, is a Harvard professor who has published a number of high-profile papers on aging in top journals, including two recent papers on reprogramming cells that made considerable waves. At the same time, Sinclair is known for excessive self-promotion and highly enthusiastic claims. For example, he has predicted that it will be normal to go to a doctor and take a medicine that will make us a decade younger, and that there is no reason why we couldn’t live to be 200. Such statements cause some of his critics to cringe and even fellow scientists who respect his ability to be embarrassed for him. I discussed the fate of resveratrol and his company Sirtris in chapter 8, but it appears to have had no effect on his ability to raise money to found several new companies—or indeed on his large public following, one that rivals de Grey’s. His recent popular book, which doubles down on his beliefs, shows that he is completely unfazed by any criticisms of his work. I doubt whether he would have been bothered much by a scathing review of the book by Charles Brenner.
Although resveratrol has long been discounted by the mainstream community, Sinclair still stands by it. In an essay on LinkedIn, he said coyly that he does not give medical advice—then proceeded to say that he takes resveratrol, metformin, and NMN (an NAD precursor) daily. We have come across these compounds in these pages. There is no evidence that any of them improves life span in humans; they haven’t been tested for this purpose in rigorous clinical trials, and, therefore, have not been approved by the FDA. Moreover, the evidence that metformin is beneficial in healthy adults is mixed; as we saw earlier, there are also problems associated with its use. For a Harvard professor to make this sort of statement on social media is essentially advocating their use, which strikes me as both ethically questionable and potentially dangerous. In the piece, Sinclair also bragged that he had a heart rate of 57 despite not being an athlete and that his lungs functioned as though he were multiple decades younger. Oddly, I am seventy-one, and although I’m no athlete either, my resting heart rate has been in the low 50s for much of my adult life—without taking Sinclair’s nutraceutical supplements. Since he is a scientist, at least he ought to compare himself to close relatives who don’t take the supplements, and also see what would happen if he went off his regimen but preserved his general lifestyle.
Starting a few decades ago, all sorts of dubious commercial enterprises started selling various compounds or procedures purporting to extend health or life. They would often make the most tenuous connection with some genuine research finding to hawk their wares. Respectable scientists founded their own companies—in many cases, several—and some of them gave the impression that the problem of aging would soon be solved. After all, investors are unlikely to fund companies if the payoff is many decades down the road. All of this led to a feeling that the fountain of youth was just around the corner.
Even back in 2002, fifty-one leading gerontologists were already alarmed enough by the hype to write a position statement laying out their views on what was known and what was fantasy or science fiction. They were particularly anxious to draw a clear distinction between serious anti-aging research and questionable claims about extending health and life. Among their key points:
Eliminating all aging-related causes of death would not increase life expectancy by more than fifteen years.
The prospects of humans living forever is as unlikely today as it has ever been.
Antioxidants may have some health benefits for some people, but there is no evidence that they have any effect on human aging.
Telomere shortening may play a role in limiting cellular life span, but long-lived species often have shorter telomeres than do short-lived ones, and there is no evidence that telomere shortening plays a role in determining human longevity.
Hormone supplements sold under the guise of anti-aging medicine should not be used by anyone unless they are prescribed for approved medical uses.
Caloric restriction might extend longevity in humans, since it does so in many species. But there is no study in humans that has proved it will work, since most people prefer quality of life to quantity of life; but drugs that mimic caloric restriction deserve further study.
It is not possible for individuals to grow younger, since that would require performing the impossible feat of replacing all of their cells, tissues, and organs as a means of circumventing aging processes.
While advances in cloning and stem cells may make replacement of tissues and organs possible, replacing and reprogramming the brain is more the subject of science fiction than likely science fact.
Despite these many reservations, the gerontologists enthusiastically supported research in genetic engineering, stem cells, geriatric medicine, and therapies to slow the rate of aging and postpone age-related diseases.
Interestingly, Aubrey de Grey was a signatory to this statement. Notable omissions, though, included Leonard Guarente and David Sinclair, both of sirtuin fame, and Cynthia Kenyon, who had discovered the daf-2 mutant in worms. All three of them were involved with various longevity companies at the time and were on record as being highly optimistic about the prospects of major breakthroughs.
Nevertheless, the explosion in the anti-aging industry has proceeded unabated. Today there are more than 700 biotech companies focused on aging and longevity, with a combined market cap of at least $30 billion. Some of these firms have been around for almost two decades but have yet to produce a single product. Others generate revenue by selling nutraceuticals; these supplements do not require FDA approval, and no randomized clinical trials to assess their safety and effectiveness have been carried out. Many of these companies have highly distinguished scientists on their advisory boards—including some Nobel laureates who have no particular expertise in aging, apart from being old. To the public, the presence of these distinguished scientists lends an air of credibility to the enterprise. How has such an enormous industry flourished for so long with so few actual advances to show for it?
AGING RESEARCH TAPS INTO OUR primeval fear of death, with many people willing to subscribe to anything that might postpone or banish it. California tech billionaires, especially. Many of them made their money in the software industry, and because they were able to write programs to carry out rapid financial transactions or swap information of various sorts, they believe aging to be just another engineering problem to be solved by hacking the code of life. The pace of success in the software industry has made them impatient. They are used to making major breakthroughs in a couple of years, sometimes even a couple of months, and they underestimate the complexity of aging. They want to “move fast and break things.” We all know how that attitude worked out for social media, with consequences for social cohesion and politics that we could never have imagined twenty years ago. Currently, these same people have prematurely unleashed AI on the world while at the same time warning us of its dangers. One can only shudder at applying that attitude to something as profound as aging and longevity.
These enthusiastic tech billionaires are mostly middle-aged men (sometimes married to younger women) who made their money very young, enjoy their lifestyles, and don’t want the party to end. When they were young, they wanted to be rich, and now that they’re rich, they want to be young. But youth is the one thing that they cannot instantly buy, so, not surprisingly, many of the celebrity tech billionaires—such as Elon Musk, Peter Thiel, Larry Page, Sergey Brin, Yuri Milner, Jeff Bezos, and Mark Zuckerberg—have all expressed an interest in anti-aging research. And in many cases, they are funding it. One notable exception is Bill Gates, who recognizes realistically that the best way to improve overall life expectancy remains addressing the serious health care inequalities in the world.
Recently, the company Altos Labs made a big splash, announcing a war chest of several billion dollars of investment money. It was founded by Richard Klausner and Hans Bishop with the active encouragement and financial support of Yuri Milner and several wealthy benefactors, mostly in California, reportedly including Jeff Bezos. Milner, a software billionaire originally from Russia, has had a long-standing interest in science. He founded the Breakthrough Prizes, which are among the most prestigious—and certainly the most lucrative—international awards in science. Recently, he wrote a tract titled Eureka Manifesto: The Mission for Our Civilization, which explains some of his thinking about aging. Some of what he believes seems to be similar to the transhumanists: our evolution of reason, and all the knowledge we humans have accumulated, is precious and should not be lost. Having Earth as our only home could be a huge risk, so we may need to populate other parts of the universe. As I read his essay, I suddenly saw why Milner would want to tackle aging. Outer space is vast, and if we have to travel hundreds if not thousands of years toward a new home, it might be nice to be able to survive the voyage. There is nothing particularly illogical about Milner’s views, but they display the grandiosity—and the optimism bordering on arrogance—typical of this subset of the tech community. In any case, Altos Labs was launched with a big bang in 2022. In one swoop, the company netted some of the biggest stars in anti-aging research, luring them away from their academic positions by offering them huge resources and salaries. Altos now has campuses in both Northern and Southern California (naturally), and also in Cambridge, England, not far from my own lab.
When news of Altos Labs first leaked in the press, it was touted as a company that wanted to defeat death. Rick Klausner, its chief scientist and cochair, denied this and said that its objective is to improve healthy life span. At the launch of the Cambridge campus, he said, “Our goal is for everyone to die young—after a long time.” Klausner and others also pointed out that Altos Labs offers a highly collaborative way of doing science that allows it to tackle big problems in a way that academic labs dependent on individual grants cannot. Some mentioned to me that the company hoped to be gerontology’s version of Bell Labs, the famous private and commercial laboratory in New Jersey where small groups worked in highly collaborative settings to produce major breakthroughs such as the transistor, information theory, and lasers.
If tech billionaires are interested in curing aging in a hurry, many scientists are only too happy to enable them. Many truly distinguished scientists now have financial stakes in the industry, either through their own companies or as employees or consultants. This is not at all a bad thing in itself, but when I see some of them constantly touting their findings or their companies’ prospects, I wonder whether they can all really believe what they are saying. Do they not understand the complexities and difficulties ahead? Or, in the words of Upton Sinclair, is it simply that “It is difficult to get a man to understand something when his salary depends on his not understanding it”?
OF ALL THE LIVING SCIENTISTS I have described in this book, Michael Hall, who led the team that discovered TOR, is one of the most distinguished. Of aging research, he told me, “I went through a period about fifteen years ago when I was thinking a lot about TOR and aging, but was then turned off by the aging meetings I attended. They were three-ring circuses: light science and wackos walking around looking like Father Time. However, I think the field has evolved. It is now on firm ground with rigorous science.”
What has changed? Mainly, gerontology has gone from being a somewhat disrespectable soft science scorned by mainstream biologists to becoming a major research priority, partly because of the need to deal with aging populations in the developed world and, increasingly, worldwide. The result is that we now have a much better handle on the complicated biological causes of aging. Of these, DNA repair, although fundamental to aging, has been used far more to target cancer than aging. Virtually every other aspect of aging is also the target of therapeutic interventions to slow it down or reverse it. We have discussed many of them in context throughout the book, but some of them seem to be more promising than others—and have certainly attracted more investment.
One promising approach is to prevent the accumulation of “bad” proteins and other molecules as we age, either by recognizing them and disposing of them, or by slowing down or altering the rate or program of protein production, which allows the body to cope with these changes. Drugs that essentially mimic caloric restriction fall into this class, and the ones that are most actively investigated are those that target TOR, such as rapamycin and similar drugs, and others like the antidiabetic drug metformin, whose mechanism of action is still not well understood. The vitamin-like precursors of NAD and other nutrients that need to be supplemented with age are also an active area of research. Other drugs aim to target senescent cells, which are the source of inflammation and its accompanying problems, while still others seek to identify factors found in young blood that can slow down aging in various ways.
Some of the biggest excitement today concerns the reprogramming of cells to reverse the effects of aging. You have already read in chapter 10 about how scientists are using transient exposure to Yamanaka factors to try to rejuvenate animals while also trying to minimize the risk of cancer. The early results of this approach have been promising enough that a huge number of start-up companies has sprouted up around this strategy. It is a major focus of Altos Labs, which hired Shinya Yamanaka himself as an adviser. Stem-cell therapy was already a major area of biotechnology because of its potential to regenerate damaged tissue and restore function to organs. Many of these companies already have expertise in reprogramming to generate various kinds of stem cells and have now jumped onto the anti-aging bandwagon. However, patients will be more receptive to stem-cell treatment for serious diseases such as replacing damaged muscle after a heart attack or restoring functional cells in a pancreas to treat diabetes, because the benefits will clearly outweigh the risks. It is not yet clear when this will happen with efforts to tackle aging—clearly the bar for safety and efficacy will be much higher.
That brings us to another, more fundamental problem with aging research. How can researchers tell if their treatments are working? The customary way for any new treatment in medicine would be to carry out a randomized clinical trial. Patients are divided into two groups, with one given either a placebo or the current standard therapy for a particular condition, and the other the agent being tested, to see if the patients given the experimental medicine fare better, or worse. The equivalent for anti-aging medicine would be to see if the treatment prolongs health and life. But this could take years to assess. This long wait for results makes it more difficult to find volunteers for properly randomized trials.
In management, as well as in science and technology, there is a well-known saying that you can’t improve what you can’t measure. The fifty-one gerontologists who criticized the hyperbolic statements from the anti-aging industry pointed out that aging was highly variable from individual to individual. They added pointedly: “Despite intensive study, scientists have not been able to discover reliable measures of the processes that contribute to aging. For these reasons, any claim that a person’s biological or ‘real age’ can currently be measured, let alone modified, by any means must be regarded as entertainment, not science.”
That was true twenty years ago when the authors wrote it. But today, increasingly, there are so-called biomarkers that correlate well with our underlying physiology and the characteristics that arise from it. Some characteristics of age are obvious. Our hair gets thinner and grayer or whiter, our skin becomes more wrinkled and less elastic, our arteries narrow and become more rigid, our brains are— Well, you get the picture. These traits are subjective and tricky to quantify, but if we can come up with measurable biomarkers that are proxies for them, that would be a big step forward. In addition to epigenetic changes to our DNA such as the Horvath clock, explained in chapter 5, there are now a variety of markers that measure inflammation, senescence, hormone levels, and various blood and metabolic markers, as well as the pattern of gene expression in different cell types. So scientists may be able to measure if their treatments are having any effect on aging without having to wait an interminably—or terminably—long time. Although these biomarkers or aging clocks have been rapidly taken up by the industry, their underlying basis is often not clear, and there are few studies that compare them to see how well they agree with one another.
Anti-aging researchers run into a regulatory problem as well: clinical trials are usually only approved for treatment of disease. In the scientific community, debate rages over whether aging is simply a normal progression of life or a disease. The traditional view is that something that happens to everyone and is inevitable can hardly be termed a disease. Gerontologists who subscribe to this view would argue that aging is the result of molecular changes that occur over time, which make us function less optimally and become more prone to diseases. Aging may be a cause of disease but is not a disease in itself. Another stark difference is that disease is usually subject to a clear definition: whether one has it and when one got it. But there is no clear consensus on when you become old. For these reasons, the latest International Classification of Diseases by the World Health Organization (WHO) omitted aging. While many in the gerontology community were disappointed by this decision, others welcomed it because they worried that classifying aging itself as a disease could lead to inadequate care from physicians: rather than pinpoint the cause of a condition, they would simply dismiss it as an unavoidable consequence of old age.
Still, the biggest risk factor for many diseases is age. Even during the recent Covid-19 pandemic, the risk of dying from being infected roughly doubled with every seven to eight years of age, so that an eighty-year-old was about 200 times as likely as a twenty-year-old to die if he or she caught Covid. Drawing on this, some gerontologists argue that we should regard aging as a disease, one that manifests itself in various ways such as diabetes, heart disease and dementia, or indeed being more prone to pneumonia or Covid-19. Of course, with billions of investment and research dollars at stake, there is currently fierce lobbying both by elements of the gerontology community and the anti-aging industry to have aging classified as a disease. So far, the FDA has refused, although it approved clinical trials for progeria, a disease in which patients age prematurely, dying around fifteen years of age. More surprisingly, in 2015 it authorized the TAME trial on the use of metformin in a study of aging in healthy adults; perhaps the federal agency was swayed by the fact that metformin was already an approved drug for diabetes, and at least some data on diabetics suggested a beneficial effect. But unless companies invested in longevity succeed in persuading the FDA to allow clinical trials for normal aging, they will face difficulty carrying out rigorous patient studies and will have to resort to other criteria to show the efficacy of their treatments.
MOST PEOPLE SAY THEY DO not fear death so much as the prolonged debilitation that precedes it. Almost everyone would agree that it is a worthy goal to increase health span, or the number of years of healthy life, by reducing the fraction of years of life that we spend in poor health as a result of age-related diseases. This goal was termed compression of morbidity by James Fries in 1980. Or as Klausner phrased it, we should all die young after a long time. Compression of morbidity rests on two assumptions: that we can alter the process of aging to postpone the onset of the diseases of aging; and that the length of life is fixed. The first, of course, is the goal of much of anti-aging research.
However, there is some debate about the second assumption. Much of the gain in life expectancy in the last hundred years was by reducing infant mortality. However, in the last few decades, tremendous advances have been made in the treatment of diseases that occur as we age, including diabetes, cardiovascular disease, and cancer. These advances have inevitably increased our life expectancy. Aubrey de Grey has argued convincingly that the gerontology community is hypocritical in rejecting life extension because treating the causes of aging will inevitably extend life and that compressing morbidity will “forever remain quixotic.” Even if we accept that there is currently a natural limit of about 120 years to our life span, the reasons for that limit are not well understood beyond a vague notion that it has to do with a general breakdown of our complex biology that leads to general frailty. As de Grey points out, compression of morbidity would require us to eliminate or slow down various causes of aging, while at the same time deliberately not tackle the causes of frailty that eventually make us die. Even Steven Austad, who is far more in the mainstream of the gerontology community than de Grey, made his famous bet that advances in combating aging would enable someone currently alive to live over 150 years.
If anything, data from the Office of National Statistics in the UK suggest that rather than compressing morbidity, advances in treatment of age-related diseases have done the opposite: they show that the number of years we spend with four or more morbidities has not declined but actually slightly increased as a fraction of our lives. A United Nations report on the trend worldwide is similar and concludes that both life span and disability-free years increased but the fraction of our lives spent in disability has not decreased. In short, we are living more years and possibly a greater fraction of our lives in poor health.
Is compression of morbidity even possible? When I first heard the idea, I thought it was absurd: if someone was “young” in Klausner’s sense of being healthy, what would suddenly cause him or her to collapse and die? It would be like a car that was running perfectly suddenly falling apart. In his original 1980 article on compression of morbidity, Fries himself likened the idea to the titular one-hoss-shay of the 1858 Oliver Wendell Holmes poem “The Deacon’s Masterpiece or, the Wonderful ‘One-Hoss Shay’” in which a shay—a horse-drawn carriage for one or two people—was designed so perfectly that all its parts were equally strong and long-lasting. A farmer was merrily riding it when all of a sudden the shay disintegrated under him—“Just as bubbles do when they burst”—and he found himself on the ground in a heap of dust.
There are animals that live a healthy and vigorous life, reproducing right up to the point of death. In his book Methuselah’s Zoo, Steven Austad describes an albatross that lives many decades in perfect health until it dies. However, the albatross’s demise is not the death we might wish for, as centenarians in the peak of health quietly slipping away in our sleep. In nature, life is brutish and merciless. The bird probably reached a point where it could no longer make the long journey to return to its nest and collapsed after a struggle, or it was killed by a predator. Similarly, our hunter-gatherer ancestors probably did not spend many years with the morbidities of old age; instead, they often starved, died of disease, were eaten by predators, or killed by a fellow human being the moment they were not absolutely healthy and fit. Their morbidity was highly compressed but it’s not exactly what most of us are striving for. If compressing morbidity were the only goal, we could squish it all the way to zero if we chose. In Aldous Huxley’s classic 1932 dystopian novel Brave New World, perfectly healthy people are simply euthanized at their appointed time. It is not clear that many people would opt for such a world especially if the timing of “compression” was not up to us. If we were faced with many years of decrepitude, some of us might well consider it, but if we were perfectly healthy, why would we want to die? I don’t think these examples represent true compression of morbidity, because the death of an otherwise healthy being occurs rather suddenly as the result of some unpleasant external cause.
If all this sounds bleak, there is some hope that true compression of morbidity is actually possible. Thomas Perls of the New England Centenarian Study points out that although the number of centenarians has grown in recent decades, the numbers of semisupercentenarians and supercentenarians (those that reach 105 and 110 years of age, respectively) have not and remain very small. This is contrary to what we would expect given medical advances and a general population increase in life expectancy. While many centenarians live extraordinarily long lives in good health, about 40 percent of them had age-related diseases prior to 80. By contrast, supercentenarians are healthy nearly their entire lives. As they approached the limit of the human life span at around 120 years, like the one-hoss-shay they experienced a rapid terminal decline in function and died. This would argue in favor of a fixed life span, with supercentenarians managing to compress morbidity as much as possible and pushing close to the maximum life span of the species.
Perhaps by studying their genetics, metabolism, and lifestyles, we can understand what it would take to achieve a life that is healthy right up to the very end. There may be hundreds of genetic changes that each contribute in a subtle way to longevity, and there may be no magic combination of genes that allows you to live very long. Moreover, although scientists have been able to isolate single genes that extended life in highly artificial situations, we know that those mutants are unable to compete with normal wild-type worms or flies because these genes are detrimental to fitness in other ways. Similarly, a variant of a gene called APOE is overrepresented in centenarians and is thought to protect against Alzheimer’s disease, but this same variant increases the risk of metastatic cancer, and also makes people more likely to die of Covid-19. Findings like these should temper any dreams of using future advances to engineer humans with extremely long lives. Genetic variants that are associated with longevity could make us vulnerable in other unforeseen ways.
Anyway, even these supercentenarians are hardly as fit as they were in their twenties, nor indeed would you mistake them for a younger person. Something about them has still aged, and they become increasingly frail. As I pointed out earlier, Jeanne Calment was deaf and blind near the end. So the question of what characterizes good health or a lack of morbidity bears closer examination.
It is conceptually easy to define mortality, but morbidity is much fuzzier. It is defined as a disease, but many chronic illnesses such as diabetes, high-blood pressure, or atherosclerosis can be treated with medication and people can lead perfectly normal and satisfactory lives. I take medication for high cholesterol and high blood pressure, which might be termed chronic diseases, but I can do most things I like, including bicycling and hiking. If you simply count diagnoses for diseases as morbidities, then you are not capturing a true picture of whether the person is living a reasonably healthy life or is decrepit, incapacitated, and suffering. Statistics regarding morbidities in old age must be looked at carefully.
The efforts to combat aging today span a wide range. At one end are a small and highly vocal minority, including both high-profile scientists and investors, who want to defeat death altogether. They have large, cultlike followings, and I suspect there are many more who want this goal but are too embarrassed to profess it openly. At the other end are those focused strictly on treating specific diseases of old age using what we have learned about their various causes. The broad spectrum in the middle want to tackle aging directly to compress morbidity so that humans might live healthy lives into old age.
Today there is a vast amount of money invested in aging research, both by governments and by private commercial companies. In a decade or two, we will have a clear idea of whether they will succeed and to what extent. If they succeed even partly, it could have profound and unpredictable consequences for society. Let’s now look at what some of those might be.
12. Should We Live Forever?
I am now roughly the age my grandparents were when they died. The physically active lifestyle I lead is something they could not have imagined in their final decade. Today it is increasingly common for people to die in their nineties or later. My personal experience is simply a reflection of demographic changes in the world over the last few decades. Virtually every part of the world is experiencing a growth in the size and proportion of the population over the age of sixty-five. The share of older people is currently almost 20 percent in high-income countries and expected to double between now and 2050 in many regions of the world.
At the same time, people are having fewer children. We first saw this in developed countries and are increasingly seeing it now across the globe. This means that fewer and fewer workers will support an ever larger population of retirees. In some Asian countries, there may eventually be twice as many retired people as there are workers. Many of the elderly will also require expensive medical care for a decade or even two. In countries with weak social safety nets, they will either be at the mercy of their families or will have to be self-reliant, for which they will need to be mentally and physically fit. Even in countries with more robust state support, an aging population will put tremendous strain on pension and social security programs.
The social consequences of extending life span are immense. Nearly all state-backed retirement programs assume that people will stop working around age sixty-five. These measures were introduced when people generally lived only a few years past retirement age, but now they can live two decades beyond it. In both social and economic terms, this is a ticking time bomb, and it is no surprise that governments the world over are enthusiastically funding aging research to improve health in old age in the hopes that this segment of the population can be both more productive and independent for a longer time, and in less need of costly care.
If we increase life span without compressing morbidity, it will simply make our current problems worse. But if researchers manage to combat aging and compress morbidity, we could well see a scenario where people routinely live healthily beyond 100 years, possibly approaching our current natural limit of about 120 years of age. In the context of any one individual that might seem a wonderful outcome, but it will also have profound and unpredictable consequences for society.
When major, disruptive technologies arrive, we are not always good at understanding their long-term ramifications. For example, not so long ago, people gladly adopted social media while giving scarcely a thought to its potential consequences, such as a loss of privacy, monetization of the individual by large corporations, surveillance by governments, and the spread of misinformation, prejudice, and hatred. We cannot afford to repeat that mistake by blindly adopting new anti-aging technologies and sleepwalking into a world for which we are ill-prepared. What might some of the consequences of life extension be?
One of them is even greater inequality. There is already a wide gap in life expectancy between the rich and poor. Even in England, which has a national health service providing universal coverage, this disparity is about ten years. However, the difference in the number of healthy years is almost twice that. The poor not only live shorter lives but also spend more of it in poor health. Things are even worse in the United States, where the richest live about fifteen years longer than the poorest, and the disparity actually increased between 2001 and 2014.
Advances in medicine have always had the potential to increase inequality. Historically, the rich in advanced countries have benefited first. Later, others in these countries may benefit, depending on whether health-care systems and insurance companies view these treatments as necessities. Only then will they eventually spread to the rest of the world, where only those individuals who can afford them will be able to benefit. We already see this in the health and economic status of people from different parts of the world. So any advances in aging research is likely to similarly increase inequality. But unlike other kinds of inequality, an inequality in both the quality and extent of life has the potential to be not just self-sustaining but actually to drive even larger increases in inequality. The economically well off in white-collar jobs will now be able to live and work longer and pass on even more generational wealth to their descendants, thus exacerbating the inequality. Unless treatments become very cheap and generic—such as cholesterol-lowering statins or blood pressure medications—there is a serious risk that we will be creating two permanent classes of humans: those who enjoy much longer lives in good health, and the rest.
Another concern is overpopulation. Such a large increase in life expectancy could lead to a dramatic increase in the world’s population at a time when there are already too many people on Earth. Our current population, and its predicted increase in the coming decades, is partly why we face so many existential disasters, including climate change, loss of biodiversity, and dwindling access to natural resources like fresh water.
Past increases in longevity have indeed led to dramatic increases in the population. This is because fertility rates remained high for some decades after life expectancy increased. Similarly, today, Africa has experienced significant increases in life expectancy, but fertility rates remain high at about 4.2, which is why the population of Africa is still increasing rapidly. However, improvements in life expectancy and standard of living are almost inevitably followed by a demographic transition in which the birth rate gradually falls. For example, in the late eighteenth century, European women had about five children on average at a time when life expectancy was low due to high infant mortality, but that fertility rate now ranges from 1.4 to 2.6, depending on the country. Eventually the birth and death rates became roughly equal, and the population has stabilized at some new higher level. Over the course of the nineteenth and twentieth centuries, this happened in much of the West, as well as in many Asian countries such as Japan and South Korea.
In the past, improvements in infant and childhood mortality meant more people lived to reach reproductive age, which naturally led to rapid population growth. But it is not inevitable that in advanced countries that have already gone through a demographic transition, further increases in life expectancy will necessarily lead to a growth in population. In Japan, people live longer than they did a few decades ago, yet the population of Japan has actually fallen since 2010, because of lower birth rates.
The fertility rate has dropped and is below replacement level in many countries. The average age of childbearing has also been steadily increasing in developed countries. Currently, it is increasingly common for women to have their first child in their thirties, and sometimes even around forty, which is almost a decade or two later than the norms a century ago. Both of these trends are the result of more security and prosperity, the expectation of a long life, and the emancipation of women and their entry into the workforce. Together these factors have slowed down or stopped population growth in many parts of the world, which has been hugely beneficial in many important ways, not least the effect on our environment and natural world. I am puzzled by economists who talk about it as a problem, especially in reference to China’s decline in population growth. Elon Musk believes that an impending global population collapse is a much bigger problem than climate change, which strikes me as absurd.
Nevertheless, as people live longer, the population will grow unless one of two things happens: either the fertility rate decreases even more, or the average age of childbearing increases along with life expectancy. However, both of these scenarios have some problems. In many countries, the average age of childbirth has gradually increased until it is pushing up against the realities of biology. Women from their midthirties on have increasing difficulty in conceiving and soon afterward face menopause. If menopause can be delayed as we increase life expectancy, this would solve the problem of delaying childbirth and would be much fairer to women, many of whom face the problem of deciding whether to have children right when their career is taking off. However, menopause is the result of very complex biology, and there is no evidence that we will be able to alter the age of its onset. Of course, there are ways for women to have children even beyond menopause—for example, by freezing eggs for later implantation along with hormone treatment—but these are expensive and cumbersome, and not without considerable risk. The other solution to prevent population growth in the face of increasing longevity is to have even fewer children, which means that an even greater proportion of the population will be elderly, which has its own consequences.
Let us assume an optimistic scenario: life expectancy surges beyond a hundred years and they are mostly healthy years. The population has stabilized; people are having fewer children and having them as late as possible. If we can’t ask a smaller and smaller fraction of younger people to support an increasing cohort of older people in retirement, there’s really only one solution: careers are going to get longer.
WORKING INTO YOUR SEVENTIES OR eighties—or even longer—is a rather different prospect depending on what your job is. As Paul Root Wolpe, director of the Emory University Center for Ethics, asks: Would hard laborers or people doing menial jobs at the age of sixty-five relish the prospect of doing this for another fifty years? Large percentages of people dislike their jobs and look forward to retirement. In 2023 more than 1.2 million people marched in France to protest against the government’s proposal to raise the retirement age a mere two years from sixty-two to sixty-four. Reacting to the French protests, some have argued that the United States should actually lower retirement age, pointing out that the people who advocate that Americans should work until they are seventy are typically in cushy, remunerative white-collar jobs that are fun and intellectually engaging for octogenarians, and it is different for people who want to stop changing tires or working a cash register for $11 an hour at age sixty-two. In my own institute, I have found that nonscientists on the staff retire as soon as they qualify, while the scientists try to hang on for as long as they can.
When I ask some of my scientific colleagues about their retirement plans, especially in America, where it is not uncommon to see academics work well into their eighties or even longer, the typical response is “I’m having far too much fun to retire!” Some of them go on to claim they are doing the best work of their lives. But the evidence says otherwise. We are all willing to accept that we cannot run a hundred-meter race as fast as we could when we were twenty, but we persist in the delusion that we are intellectually just as capable as we were when we were younger. This may be because we identify too closely with our own thoughts—they define who we are. All the evidence suggests that in general, we are no longer as creative and bold as when we were younger.
One way to assess this is to retrospectively ask how old someone was when they did their best work. In the sciences, Nobel Prize winners nearly always make their key breakthroughs when they are young and not very powerful. Biologists and chemists often achieve their big breakthroughs a decade or so later than physicists and mathematicians, perhaps because it takes time to assimilate a huge body of knowledge, acquire the practical experience, and build up the resources needed. Indeed, the famous mathematician G. H. Hardy wrote in his 1940 book, A Mathematician’s Apology, “No mathematician should ever allow himself to forget that mathematics, more than any other art or science, is a young man’s game. . . . I do not know of an instance of a major mathematical advance initiated by a man past fifty.” In recent times, one of the great achievements of mathematics, the proof of the 350-year-old Fermat’s Last Theorem, was made by Andrew Wiles when he was about forty.
When they are older, many scientists continue to churn out first-rate work from their labs. However, this is not because they themselves are sharp and innovative. Rather, they have become a brand name, have amassed resources and funding, and can attract first-rate young scientists to do the work. Many, if not all, of the new ideas—and certainly the lion’s share of the work—come from these young scientists. Even so, it is very rare for an older scientist—even one who is doing very good work and has a team of young scientists to help—to truly break new ground. Often they are doing more of the same. For example, I have had the good fortune to attract very talented young people thanks to whom my laboratory continues to publish papers in top journals. But it is also true that in some sense, they are extensions of my previous work. The few really new directions have come not from me but from the young people who work with me. It is true that everyone can point to an exception: the chemist Karl Sharpless won his second Nobel Prize at the age of eighty-one for work he had begun when he was around sixty. But that is remarkable because it is so rare.
It is not just in science and mathematics that our creative powers peak when we are relatively young. This is also true in business and industry. Thomas Edison was under thirty when he started the Menlo Park laboratory in New Jersey and invented his version of the lightbulb soon afterward. In today’s world, many of the most innovative companies, such as Google, Apple, Microsoft, and the AI company DeepMind, were started by people in their twenties or thirties.
You might think that things are different in literature, where experience of life and accumulated wisdom would make you more profound as you aged. However, at a Hay Literary Festival event in 2005, the Nobel Prize–winning novelist Kazuo Ishiguro outraged his fellow writers by suggesting that most authors produce their best work when they are young. He said it was hard to find cases where an author’s most renowned work had come after the age of forty-five and pointed out that War and Peace, Ulysses, Bleak House, Pride and Prejudice, Wuthering Heights, and The Trial were all written by writers in their twenties and thirties. Many great writers—Chekhov, Kafka, Jane Austen, the Brontë sisters—died before they reached their midforties. Ishiguro says he is not suggesting that novelists cannot do good work later in life, just that their best work tends to come before their midforties. His main point was actually that authors should not wait until they are older to attempt a great novel. He may have contradicted his own thesis with Klara and the Sun, which he wrote in his midsixties. It was received as one of his finer novels, although only time will tell whether it will rank as highly as his earlier work. Similarly, Margaret Atwood’s recent Booker Prize–winning novel, The Testaments, was published when she was over eighty. It is brilliantly gripping and disturbing, but the novel is really a further exploration of the world she conjured in The Handmaid’s Tale almost forty years before.
Ishiguro posited a theory for why some types of creativity decline with age. As we grow older, one of the first mental abilities to decline is our short-term memory. Perhaps writing a novel requires holding disparate facts and ideas in our heads while we synthesize something new from them. This may well be true in science and mathematics. The process of creativity may be different in other disciplines. For example, many film directors, conductors, and musicians continue to perform at the highest level well into old age, as do many artists.
Advances in healthy aging would not necessarily make us as creative and imaginative later in life as we are in our younger years. Young people see the world with fresh eyes, and in new ways. Ishiguro wonders whether in writing, the proximity to childhood and the experiences of growing up—a time of life when one’s perspective changed from year to year, even month to month, because one was oneself changing so profoundly—is central to the creation of satisfying novels. In science and mathematics, younger practitioners may be less biased by a lifetime accumulation of knowledge, and bolder about questioning paradigms.
So far, we have been talking about big creative breakthroughs declining with age in a variety of fields, but these breakthroughs are outliers and represent a tiny fraction of the whole enterprise. Even in science, the big breakthroughs are built on the vast foundations laid by the majority of scientists productively going about their jobs of gradually advancing our state of knowledge. It would hardly be appropriate to formulate social policy based on these outliers. How would the bulk of white-collar work be affected by age?
Most studies say our general cognitive abilities also decline with age, but there has been some debate about when exactly that happens, with some arguing that it begins as early as age eighteen, and others arguing that it is significant only after sixty. A ten-year study that followed a large cohort of British civil-service workers showed that cognitive scores on tests of memory, reasoning, and verbal fluency all declined from the age of forty-five, with faster decline in older people. The one category not to show a major decline was vocabulary. Other studies also make a distinction between so-called “crystallized abilities” such as vocabulary and “fluid abilities” such as processing speed. The latter declines steadily from the age of twenty, while the former increases and then remains steady, and only declines gradually from about age sixty. All of this affects our ability to learn new tasks and be as mentally agile. Any adult who doubts these findings should try learning the piano, a new language, or advanced mathematics for the first time.
It is of course theoretically possible that as we learn to combat the causes of aging, we can also do something about the deterioration of our mental abilities. But so far, the brain has proved the most difficult frontier to conquer. Neurons regenerate very slowly if at all, and many of the processes that lead to deterioration and eventual disease in the brain remain intractable. It is true that at least one approach, inhibiting the integrative stress response in protein synthesis, has been shown to improve memory, but there is no evidence that it reverses general cognitive decline and ability to learn.
Many argue that any cognitive decline is offset by increased wisdom, a vague and poorly defined trait. It’s true that young people often do lack wisdom and foresight, leading to rash behavior. But there is no evidence that wisdom continues to increase beyond a certain age. In recent elections in both the United States and Great Britain, older age groups have tended to be conservative and swayed by demagoguery and an appeal to their sense of nostalgia. They have acquired a lifetime of biases and prejudices and are generally less open to new ideas. My guess is that we acquire most of our wisdom by our thirties. After that, we become increasingly set in our ways, as likely to be reactionary as wise.
Today there is an imbalance of power that favors the old. This is partly because they have accumulated a great deal of wealth: in both Britain and American, households where the head is over seventy have about fifteen to twenty times the median wealth of those under thirty-five. But it is also because as people age, they accumulate power and a powerful network of connections. Even if they are no longer as qualified or competent to do their job as their younger peers might be, they may cling to power and authority, using their connections and reputation. It is hard to dislodge them from their positions even if they are no longer on top of their game and could be replaced by many more competent people. More generally, Wolpe argues that the political ramifications of a long life span are huge because the elderly vote at much higher rates than the young, and the highest echelons of power have become the preserve of the over-seventies. The United States is led by President Joe Biden, who will be eighty-one as of the 2024 presidential election; his chief rival, Republican Donald Trump, will be seventy-eight. Elsewhere, Rupert Murdoch, until recently the chair of Fox Corporation and executive chairman of News Corp, retains enormous media influence (and with it, political clout) in several countries at the age of ninety-three. Politically, Wolpe argues, young people will be squeezed out, and the fresh ideas they bring to politics and innovation will be suppressed. By contrast, the vast majority of the great innovations, including social advances such as gay marriage, diversity inclusion movements, and before that civil rights and women’s rights, were driven by young people.
The imbalance of power is particularly egregious in academia, where the concept of tenure, which was introduced so faculty members could not be fired for expressing unorthodox opinions, is now being wielded by faculty members to remain in their posts for as long as they possibly can. Many universities in the United States and United Kingdom have abolished mandatory retirement age, and those that haven’t, such as Oxford and Cambridge, are facing lawsuits from disgruntled professors. Recently, Oxford lost a tribunal case brought by three professors who accused the university of ageism, claiming, not surprisingly, that they were dismissed “at the peak of their careers.”
Even if they are not doing groundbreaking work or at the peak of their careers, as long as they are being productive, what harm is there in allowing them to stay on? Some of my academic colleagues argue that established senior scientists have the resources, wisdom, vision, and perspective to provide a great environment to train and mentor the next generation of younger scientists. Not everyone agrees. Fred Sanger, who won two Nobel Prizes, hung up his hat the day he turned sixty-five and spent the rest of his life pursuing hobbies such as building a boat that he sailed around Britain and growing roses. My own mentor, Peter Moore, retired after a long and distinguished career at Yale at the age of seventy. It is not as if he suddenly became intellectually dead. He continues to edit journals, write books, and carry on other intellectual activities that take neither resources nor money from his institution. He had this to say: “I had been telling my colleagues for years that it is an abuse of the privilege of tenure for elderly faculty to hang on to the bitter end, not least because there are no seventy-year-old scientists so wonderful that a thirty-five-year-old scientist who is better cannot be found.”
In academia, the combination of tenure and a lack of retirement age is particularly problematic. Some senior academics have rightly complained that they are far more productive than some younger faculty who have burned out by the age of forty. But this can be solved by abolishing both tenure and retirement age and having regular assessments of productivity.
Moore’s comment goes to the heart of intergenerational fairness. The most senior faculty tend to draw very large salaries, which would often be sufficient to hire two young scientists in their stead. Even if they are not drawing a salary, they are taking up precious resources such as laboratory space that could otherwise be used to recruit new young faculty who would go on to make the breakthroughs of the future and open up entirely new areas. Older researchers also have the clout to influence the agenda at their institution and in science more generally, and tend to be conservative and incremental rather than bold and innovative. The same is true broadly in other sectors of work, including corporate careers.
The problem of intergenerational fairness conflicts with the push for people to work longer as the population ages. So what is to be done?
Ageism is now considered a sin along with other -isms such as racism and sexism. However, ageism is different because we all actually decline with age. Still, it is important to recognize that the rate at which people’s physical and mental abilities decline is highly variable. We must not use chronological age as a proxy for ability, and a rigid retirement age that applies to everyone is highly inappropriate. Moreover, despite the well-documented decline in people’s ability with age, two surveys of the literature concluded that the relationship between age and productivity is more complex. One concluded that as they aged, people did less well at tasks that required problem-solving, learning, and speed, but maintained high productivity in jobs where experience and verbal abilities are important. The other concluded that 41 percent of the reports showed no differences between younger and older workers, and 28 percent reported that older workers had better productivity than younger workers, citing experience and emotional maturity as possible factors.
All of this suggests that we need to be flexible in our approach to work and retirement. As we have seen, many professions are physically or mentally demanding, and people may need to retire earlier. They may be able to switch to less demanding jobs and continue working if they are able. Rather than apply a one-size-fits-all approach, we need to bring in objective measures of assessment that can apply to all age groups, which will also ensure fairness to both young and old. Moreover, even after they can no longer do the job they did for much of their career and have to retire, older people can still be useful and productive in many ways for as much of the rest of their lives as possible.
There is a lot of evidence that having a purpose in life reduces mortality from all causes as well as the incidence of stroke, heart disease, mild cognitive decline, and Alzheimer’s. And elderly professionals do have a wealth of experience and a deep knowledge of their field. They can be unparalleled sources of advice and mentorship; they can participate in civic activities. Peter Moore, whom I mentioned earlier, is a great example of someone who has retired from his professorship but still makes himself extremely valuable to the scientific community.
Even after they have retired, we need to think of ways that allow older citizens to remain independent for as long as possible. This means paying attention to the way houses are constructed, with bedrooms on ground floors, and communities are planned, with nearby amenities such as shopping and mass transit. Social isolation and loneliness are detrimental for the well-being of all people but especially for the elderly. Currently, many Western societies seem to treat the old as a problem to be hidden away in separate retirement enclaves rather than an integral part of society. Perhaps it is better to integrate them fully into the broader community, where they live interspersed with the rest of the population, and through their social and civic activities, they interact routinely and regularly across the entire generational spectrum of society. Their active participation will also benefit the rest of society.
These are all problems we may plausibly soon encounter, if biologists succeed in pushing life spans ever closer to a natural limit of roughly 120 years. Yet there is no hard scientific law that necessarily precludes far more drastic increases in life expectancy. After all, we know of species that live many hundreds of years and others that show no signs of biological aging. If, someday, humans breach our current limit and live for several hundred years as Aubrey de Grey prophecies, all of these issues would only be magnified. Advocates for extreme life extension have no real solutions except to say that we will learn to deal with problems as we encounter them. Some have said that if we have a population crisis as a result of extreme longevity, we should be made to leave Earth and settle other planets once we reach a certain age. As always, the answer to problems created by technology seems to be even more far-fetched technology.
I AM NOT SURE THAT if we lived so much longer, we would be any more satisfied. Now that we live twice as long as we did a century ago, we still aren’t content with that entire extra life. Rather, we seem to be even more obsessed with death. If we live to be 120 or 150 years old, we will fret about why we can’t live to 300. The quest for life extension is like chasing a mirage: nothing will ever be enough short of true immortality. And there is no such thing. Even if we conquer aging, we will die of accidents, wars, viral pandemics, or environmental catastrophes. It may be simpler to accept that our life is limited.
Moreover, our very mortality may give us the incentive and desire to make the most of our time on Earth. A greatly extended life span would deprive our lives of urgency and meaning, a desire to make each day count. It is not clear that even with an entire extra lifetime, we are accomplishing more than the great writers, composers, artists, and scientists of past eras. We may well end up living a very much longer life bored and lacking in purpose. As I mentioned earlier, it could also lead to a stagnant society, since many of the big social changes have been spearheaded by younger generations.
This obsession with mortality is probably unique to humans. It is only the accidental evolution of our brain and consciousness, and our development of language to communicate our fears, that has made our species so fixated on the end. The writer and editor Allison Arieff has pointed out the irony that the same Silicon Valley culture that produces gadgets designed to be obsolete and discarded every few years seems to be obsessed with living forever. She quotes the writer Barbara Ehrenreich, “You can think of death bitterly or with resignation and take every possible measure to postpone it. Or, more realistically, you can think of life as an interruption of an eternity of personal nonexistence, and seize it as a brief opportunity to observe and interact with the living, ever-surprising world around us.” Arieff believes that our very humanness is intertwined with the fact of our mortality.
On a recent trip to India, I met Ganesh Devy, a linguist who works with dozens of rural, forest-dwelling tribes in the country. India has well over a hundred languages, many facing a different kind of death: some of them are now spoken by only a few people and will soon become extinct. He said he himself did not fear death. I was skeptical, but he pointed out that on a field trip once he was bitten by a highly poisonous snake and he felt no fear or panic at the thought of dying. I asked him why. Devy said that we have to regard our individual selves as parts of larger entities like family, community, and society, just as all the cells in our body are part of tissues and organs and us. Millions of our cells die every day. Not only do we not mourn their passing, but we are not even aware of it. So even if we as individuals die, our society and indeed life on Earth will go on. Our own genes will live on through our offspring or other family members. Life has been going on continuously for several billion years while we individuals come and go.
Still, if someone were to offer a pill that would add ten years of healthy life, hardly anyone would decline it. I view myself as more in the philosophical camp, yet take several anti-aging medicines a day: pills for my blood pressure, a statin for high cholesterol, and a low-dose aspirin to protect against thrombosis. All of these are to prevent heart attacks or strokes and have the effect of prolonging my life. I would be a hypocrite to dismiss attempts to alleviate the problems of aging. Physicians are struck by how many people, even faced with terminal illnesses that inflict appalling pain, want every measure taken to prolong their lives, even if only by a few weeks or even days. The will to live is deeply ingrained in us, even if we are sanguine in our more rational moments.
About ten years ago, the Pew Research Center explored American attitudes on living much longer. Respondents were optimistic about cures for cancer and artificial limbs, and they viewed advances that prolong life as generally good. However, over half said that slowing the aging process would be bad for society. When asked if they themselves would take treatments to live longer, a majority of them said no, but two-thirds thought that other people would. Most doubted that an average person living to 120 would happen before 2050. A large majority felt that everyone should be able to get these treatments if they wanted, but two-thirds felt that only the wealthy would actually have access. About two-thirds also said that longer lives would strain our natural resources. About six in ten said that medical scientists would offer treatments before they fully understood how doing so could affect people’s health and that such treatments would be fundamentally unnatural. The clear-eyed view of the American public in the face of relentless hype is certainly heartening.
In this book, I have discussed how advances in molecular biology have shed light on virtually every aspect of aging, often taking a skeptical look at some of the hype. In doing so, I hope that readers acquire not only an appreciation of the underlying causes of aging, but are able to more knowledgeably interpret news reports and PR blurbs about each new “advance” and judge for themselves how realistic various claims are. How long it takes to go from a fundamental discovery to a practical application is hugely variable and unpredictable. It took three centuries for Newton’s laws of motion to be translated into rockets and satellites. It took over a hundred years for Einstein’s theories of relativity to be used in the GPS systems that our phones use to tell us where we are on a map. Neither Newton nor Einstein could have remotely anticipated the use we made of their discoveries. Other advances are much faster: from Alexander Fleming’s discovery of penicillin in 1928 to its use in humans was less than twenty years. With the money and urgency that drive current research on aging, major advances might well come in years rather than decades, but the sheer complexity of aging makes any prediction highly uncertain.
We are at a crossroads. The revolution in biology continues unabated. Artificial intelligence and computing, physics, chemistry, and engineering are all being brought to bear on what was the domain of traditional biologists. Together they are creating new technologies and increasingly sophisticated tools to manipulate cells and genes to advance every aspect of the life sciences, including aging.
I have highlighted the relationship between cancer and aging many times throughout this book. Both are rooted in highly complex biology. Just as cancer is not a single disease, aging too has many interconnected causes. It has now been half a century since President Nixon declared a “war on cancer” in 1971. Since then, our biological understanding of cancer has advanced enormously, resulting in a steady stream of new and improved treatments that continues to this day, saving or prolonging millions of lives. Today, the sheer talent and money committed to aging research is reminiscent of our efforts to combat cancer. This means that just as with cancer, we will eventually make breakthroughs, even if it takes time for them to actually improve and extend our lives. It is well to remember that even today, after a half century of intense effort, cancer is not “solved.” It remains one of the largest killers in most societies. Our progress with aging may follow a similar trajectory, given the similar complexity of both problems.
The American futurist and scientist Roy Amara said that we tend to overestimate the effect of a technology in the short run and underestimate its effect in the long run. This has been true for many things, including the internet and artificial intelligence. If Amara’s law holds, all the hype in the anti-aging industry will lead to considerable disappointment in the short term, but it also means that once we get past the winter of disillusionment and discontent, there will be major advances eventually.
As a society, it is important for us to think about the possibly profound consequences of these changes. However, this task is not just for governments and citizens alone: the anti-aging industry should not repeat the mistakes of the computer industry and plunge ahead without any thought of where it will all lead and leave the rest of us to try and clean up the mess when it is too late. These companies stand to benefit hugely from any breakthroughs in aging research but do not seem to have put much effort into either the social or ethical consequences of their work. In their blurbs, their work is always portrayed as an unmitigated and universal good for humanity.
In the meantime, we need not sit around and wait for a long period of decrepitude and decline. Ironically, the very same advances in biology that are the basis of the anti-aging industry also thoroughly validate some age-old advice for living a long and healthy life: diet, exercise, and sleep. In his book In Defense of Food: An Eater’s Manifesto, Michael Pollan advises us, “Eat food. Not too much. Mostly plants.” This advice is entirely consistent with everything we know about caloric restriction pathways. Exercise and sleep, as we discussed earlier, affect a large number of factors in aging, including our insulin sensitivity, muscle mass, mitochondrial function, blood pressure, stress, and the risk of dementia. These remedies currently work better than any anti-aging medicine on the market, cost nothing, and have no side-effects.
While we wait for the vast gerontology enterprise to solve the problem of death, we can enjoy life in all its beauty. When our time comes, we can go into the sunset with good grace, knowing that we were fortunate to have taken part in that eternal banquet.
Notes
Introduction
Even Carter, a seasoned Egyptologist: Maite Mascort, “Close Call: How Howard Carter Almost Missed King Tut’s Tomb,” National Geographic online, last modified March 4, 2018, https://www.nationalgeographic.com/history/magazine/2018/03-04/findingkingtutstomb.
We may be tempted to think of it: Nuria Castellano, “The Book of the Dead Was Egyptians’ Inside Guide to the Underworld,” National Geographic online, last modified February 8, 2019; Tom Holland, “The Egyptian Book of the Dead at the British Museum,” Guardian online, last modified November 6, 2019, https://www.theguardian.com/culture/2010/nov/06/egyptian-book-of-dead-tom-holland.
They recognize when one: For example, see this study of elephants: S. S. Pokharel, N. Sharma, and R. Sukumar, “Viewing the Rare Through Public Lenses: Insights into Dead Calf Carrying and Other Thanatological Responses in Asian Elephants Using YouTube Videos,” Royal Society Open Science 9, no. 5 (May 2022), https://doi.org/10.1098/rsos.211740, described in Elizabeth Preston, “Elephants in Mourning Spotted on YouTube by Scientists,” New York Times online, May 17, 2022, https://www.nytimes.com/2022/05/17/science/elephants-mourning-grief.html.
But there is no evidence: James R. Anderson, “Responses to Death and Dying: Primates and Other Mammals,” Primates 61 (2020): 1–7; Marc Bekoff, “What Do Animals Know and Feel About Death and Dying?,” Psychology Today online, last modified February 24, 2020, https://www.psychologytoday.com/gb/blog/animal-emotions/202002/what-do-animals-know-and-feel-about-death-and-dying.
Philosopher Stephen Cave argues: Stephen Cave, Immortality: The Quest to Live Forever and How It Drives Civilization (New York: Crown, 2012).
The first emperor of a unified China: Ibid.
Rather, our brains appear: Y. Dor-Ziderman, A. Lutz, and A. Goldstein, “Prediction-Based Neural Mechanisms for Shielding the Self from Existential Threat,” NeuroImage 202 (November 15, 2019): art. 116080, https://doi.org/10.1016/j.neuroimage.2019.116080, cited in Ian Sample, “Doubting Death: How Our Brains Shield Us from Mortal Truth,” Guardian online, last modified October 19, 2019, https://www.theguardian.com/science/2019/oct/19/doubting-death-how-our-brains-shield-us-from-mortal-truth.
1. The Immortal Gene and the Disposable Body
But it turns out to be tricky: A group at the Santa Fe Institute led by David Krakauer and Geoffrey West has held several workshops to define both death as it applies to various entities and the definition of the individual.
The loss of brain function: A meeting about the issue of resuscitation and death was held at the New York Academy of Sciences in 2019. See “What Happens When We Die? Insights from Resuscitation Science” (symposium, New York Academy of Sciences, New York, November 18, 2019), https://www.nyas.org/events/2019/what-happens-when-we-die-insights-from-resuscitation-science/. There is also a movement to make the definition of brain death uniform to prevent legal anomalies such as the one I described.
Her family petitioned: S. Biel and J. Durrant, “Controversies in Brain Death Declaration: Legal and Ethical Implications in the ICU,” Current Treatment Options in Neurology 22, no. 4 (2020): 12, https://doi.org/10.1007/s11940-020-0618-6.
After that, there is a multiday window: Two popular books that discuss these early events are Magdalena Zernicka-Goetz and Roger Highfield, The Dance of Life: The New Science of How a Single Cell Becomes a Human Being (New York: Basic Books, 2020), and Daniel M. Davis, The Secret Body: How the New Science of the Human Body Is Changing the Way We Live (London: Bodley Head, 2021).
Death can occur at every scale: Geoffrey West, Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies (New York: Penguin Press, 2020).
However, the lecture paved: R. England, “Natural Selection Before the Origin: Public Reactions of Some Naturalists to the Darwin-Wallace Papers,” Journal of the History of Biology 30 (June 1997): 267–90, https://doi.org/10.1023/a:1004287720654.
Although humans have known: Matthew Cobb, The Egg and Sperm Race: The Seventeenth-Century Scientists Who Unlocked the Secret of Sex, Life and Growth (London: Simon & Schuster, 2007).
The germ-line cells, protected in the gonads: Today we know that the Weismann barrier is not perfect and that the germ line also ages and is susceptible to changes from the environment, although much more slowly. P. Monaghan and N. B. Metcalfe, “The Deteriorating Soma and the Indispensable Germline: Gamete Senescence and Offspring Fitness,” Proceedings of the Royal Society B (Biological Sciences) 286, no. 1917 (December 18, 2019): art. 20192187, https://doi.org/10.1098/rspb.2019.2187.
“Nothing in biology makes sense”: T. Dobzhansky, “Nothing in Biology Makes Sense Except in the Light of Evolution,” American Biology Teacher 35, no. 3 (March 1973): 125–29, https://doi.org/10.2307/4444260.
If an individual had a mutation: T. B. Kirkwood, “Understanding the Odd Science of Aging,” Cell 120, no. 4 (February 25, 2005): 437–47, https://doi.org/10.1016/j.cell.2005.01.027; T. Kirkwood and S. Melov, “On the Programmed/Non-Programmed Nature of Ageing Within the Life History,” Current Biology 21 (September 27, 2011): R701–R707, https://doi.org/10.1016/j.cub.2011.07.020. There are some exceptions to this rule against group selection, but they apply only under very special circumstances and usually involve species where the members of the colonies are all genetically either identical or very closely related, such as insects. J. Maynard Smith, “Group Selection and Kin Selection,” Nature 201 (March 14, 1964): 1145–47, https://doi.org/10.1038/2011145a0.
Species such as the soil worm: Species that reproduce multiple times in a lifetime are called iteroparous, and those that reproduce only once are semelparous. See T. P. Young, “Semelparity and Iteroparity,” Nature Education Knowledge 3, no. 10 (2010): 2, https://www.nature.com/scitable/knowledge/library/semelparity-and-iteroparity-13260334/.
He was a socialist: N. W. Pirie, “John Burdon Sanderson Haldane, 1892–1964,” Biographical Memoirs of Fellows of the Royal Society 12 (November 1966): 218–49, https://doi.org/10.1098/rsbm.1966.0010; C. P. Blacker, “JBS Haldane on Eugenics,” Eugenics Review 44, no. 3 October (1952): 146–51, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2973346/.
A stained glass window: Two opposing views of Fisher can be found in A. Rutherford, “Race, Eugenics, and the Canceling of Great Scientists,” American Journal of Physical Anthropology 175, no. 2 (June 2021): 448–52, https://doi.org/10.1002/ajpa.24192, and W. Bodmer et al., “The Outstanding Scientist, R. A. Fisher: His Views on Eugenics and Race,” Heredity 126 (April 2021): 565–76, https://doi.org/10.1038/s41437-020-00394-6.
However, the same could not be said: T. Flatt and L. Partridge, “Horizons in the Evolution of Aging,” BMC Biology 16 (2018): art. 93, https://doi.org/10.1186/s12915-018-0562-z.
That understanding came when British biologist Peter Medawar: N. A. Mitchison, “Peter Brian Medawar, 28 February 1915–2 October 1987,” Biographical Memoirs of Fellows of the Royal Society 35 (March 1990): 281–301, https://doi.org/10.1098/rsbm.1990.0013.
Similarly, the disposable soma hypothesis: Kirkwood, “Understanding the Odd Science of Aging,” 437–47, https://doi.org/10.1016/j.cell.2005.01.027.
Exactly as these theories would predict: Flatt and Partridge, “Horizons,” https://doi.org/10.1186/s12915-018-0562-z.
But an unusual analysis: R. G. Westendorp and T. B. Kirkwood, “Human Longevity at the Cost of Reproductive Success,” Nature 396 (December 24, 1998): 743–46, https://doi.org/10.1038/25519. See also the letter responding to this article: D. E. Promislow, “Longevity and the Barren Aristocrat,” Nature 396 (December 24, 1998): 719–20, https://doi.org/10.1038/25440.
Menopause may have arisen: G. C. Williams, “Pleiotropy, Natural Selection and the Evolution of Senescence,” Evolution 11, no. 4 (December 1957): 398–411.
For example, although the fertility of elephants: M. Lahdenperä, K. U. Mar, and V. Lummaa, “Reproductive Cessation and Post-Reproductive Lifespan in Asian Elephants and Pre-Industrial Humans,” Frontiers in Zoology 11 (2014): art. 54, https://doi.org/10.1186/s12983-014-0054-0.
Similarly, while living beyond: J. G. Herndon et al., “Menopause Occurs Late in Life in the Captive Chimpanzee (Pan Troglodytes),” AGE 34 (October 2012): 1145–56, https://doi.org/10.1007/s11357-011-9351-0.
The grandmother hypothesis: K. Hawkes, “Grandmothers and the Evolution of Human Longevity,” American Journal of Human Biology 15, no. 3 (May/June 2003): 380–400, https://doi.org/10.1002/ajhb.10156; P. S. Kim, J. S. McQueen, and K. Hawkes, “Why Does Women’s Fertility End in Mid-Life? Grandmothering and Age at Last Birth,” Journal of Theoretical Biology 461 (January 14, 2019): 84–91, https://doi.org/10.1016/j.jtbi.2018.10.035.
Another idea, based on studying killer whales: D. P. Croft et al., “Reproductive Conflict and the Evolution of Menopause in Killer Whales,” Current Biology 27, no. 2 (January 23, 2017): 298–304, https://doi.org/10.1016/j.cub.2016.12.015.
It could also simply be that the number of eggs: An idea suggested to me by the population biologist Trudy Mackay of Clemson University.
So perhaps there has just not been enough time: Steven Austad, Methuselah’s Zoo: What Nature Can Teach Us about Living Longer, Healthier Lives (Cambridge, MA: MIT Press, 2022), 258–59.
Moreover, scientists have found: R. K. Mortimer and J. R. Johnston, “Life Span of Individual Yeast Cells,” Nature 183, no. 4677 (June 20, 1959): 1751–52, https://doi.org/10.1038/1831751a0; E. J. Stewart et al., “Aging and Death in an Organism That Reproduces by Morphologically Symmetric Division.” PLoS Biology 3, no. 2 (February 2005): e45, https://doi.org/10.1371/journal.pbio.0030045.
2. Live Fast and Die Young
A small aquatic animal: T. C. Bosch, “Why Polyps Regenerate and We Don’t: Towards a Cellular and Molecular Framework for Hydra Regeneration,” Developmental Biology 303, no. 2 (March 15, 2007): 421–33, https://doi.org/10.1016/j.ydbio.2006.12.012.
Still, it is a complex procedure: R. Murad et al., “Coordinated Gene Expression and Chromatin Regulation During Hydra Head Regeneration,” Genome Biology and Evolution 13, no. 12 (December 2021): evab221, https://doi.org/10.1093/gbe/evab221; see also a popular account of this work and hydra in general in Corryn Wetzel, “How Tiny, ‘Immortal’ Hydras Regrow Their Lost Heads,” Smithsonian online, last modified December 13, 2021, https://www.smithsonianmag.com/smart-news/were-closer-to-understanding-how-immortal-hydras-regrow-lost-heads-180979209/.
It is almost as if an injured butterfly: Y. Matsumoto and M. P. Miglietta, “Cellular Reprogramming and Immortality: Expression Profiling Reveals Putative Genes Involved in Turritopsis dohrnii’s Life Cycle Reversal,” Genome Biology and Evolution 13, no. 7 (July 2021): evab136, https://doi.org/10.1093/gbe/evab136; M. Pascual-Torner et al., “Comparative Genomics of Mortal and Immortal Cnidarians Unveils Novel Keys Behind Rejuvenation,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 119, no. 36 (September 6, 2022): e2118763119, https://doi.org/10.1073/pnas.2118763119; see also a popular account by Veronique Greenwood, “This Jellyfish Can Live Forever. Its Genes May Tell Us How,” New York Times online, September 6, 2022, https://www.nytimes.com/2022/09/06/science/immortal-jellyfish-gene-protein.html.
Along the way, he explores: West, Scale. Many of the original findings for relationships between longevity, size, and metabolic rates can be found here.
As a result, biologists do not think: For a biologist’s view of the second law of thermodynamics and the wear-and-tear theory of aging, see Tom Kirkwood, chap. 5, “The Unnecessary Nature of Ageing,” in Time of Our Lives: The Science of Human Aging (New York: Oxford University Press, 1999), 52–62.
From there, he became interested: See Austad’s academic website: University of Alabama at Birmingham online, College of Arts and Science, Department of Biology, https://www.uab.edu/cas/biology/people/faculty/steven-n-austad; see also a description about him and a podcast interview, https://blog.insidetracker.com/longevity-by-design-steven-austad.
The LQ is the ratio: S. N. Austad and K. E. Fischer, “Mammalian Aging, Metabolism, and Ecology: Evidence from the Bats and Marsupials,” Journal of Gerontology 46, no. 2 (March 1991): B47–B53, https://doi.org/10.1093/geronj/46.2.b47.
Over the years, Austad has studied: Austad, Methuselah’s Zoo. There is also a previous short and more technical version of this: S. N. Austad, “Methusaleh’s Zoo: How Nature Provides Us with Clues for Extending Human Health Span,” Journal of Comparative Pathology 142, suppl. 1 (January 2010): S10–S21, https://doi.org/10.1016/j.jcpa.2009.10.024. Much of this section on the life span of various animals is from these two sources.
Two studies that evaluated survival data: B. A. Reinke et al., “Diverse Aging Rates in Ectothermic Tetrapods Provide Insights for the Evolution of Aging and Longevity,” Science 376, no. 6600 (June 23, 2022): 1459–66, https://doi.org/10.1126/science.abm0151; R. da Silva et al., “Slow and Negligible Senescence Among Testudines Challenges Evolutionary Theories of Senescence,” Science 376, no. 6600 (June 23, 2022): 1466–70, https://doi.org/10.1126/science.abl7811.
By the time a person: “Actuarial Life Table,” Social Security Administration online, accessed August 7, 2023, https://www.ssa.gov/oact/STATS/table4c6.html.
Like elderly humans: S. N. Austad and C. E. Finch, “How Ubiquitous Is Aging in Vertebrates?,” Science 376, no. 6600 (June 23, 2022): 1384–85, https://doi.org/10.1126/science.adc9442; Finch is quoted in Jack Tamisiea, “Centenarian Tortoises May Set the Standard for Anti-aging,” New York Times online, June 23, 2022, https://www.nytimes.com/2022/06/23/science/tortoises-turtles-aging.html.
Bats do not live as long: G. S. Wilkinson and J. M. South, “Life History, Ecology and Longevity in Bats,” Aging Cell 1, no. 2 (December 2002): 124–31, https://doi.org/10.1046/j.1474-9728.2002.00020.x.
Austad estimates that its LQ: A. J. Podlutsky et al., “A New Field Record for Bat Longevity,” Journals of Gerontology: Series A 60, no. 11 (November 2005): 1366–68, https://doi.org/10.1093/gerona/60.11.1366.
But even bats that don’t hibernate: Wilkinson and South, “Life History,” 124–31.
Rather, they may have special mechanisms: Podlutsky et al., “New Field Record,” 1366–68.
Rochelle Buffenstein, currently at the University of Illinois in Chicago, has done more: R. Buffenstein, “The Naked Mole-Rat: A New Long-Living Model for Human Aging Research,” Journals of Gerontology: Series A 60, no. 11 (November 2005): 1366–77, https://doi.org/10.1093/gerona/60.11.1369.
Instead of proliferating: S. Liang et al., “Resistance to Experimental Tumorigenesis in Cells of a Long-Lived Mammal, the Naked Mole-Rat (Heterocephalus glaber),” Aging Cell 9, no. 4 (August 2010): 626–35, https://doi.org/10.1111/j.1474-9726.2010.00588.x.
One of the biggest headlines: J. G. Ruby, M. Smith, and R. Buffenstein, “Naked Mole-Rat Mortality Rates Defy Gompertzian Laws by Not Increasing with Age,” eLife 7 (January 24, 2018): e31157, https://doi.org/10.7554/eLife.31157.
This was too much for some scientists: S. Braude et al., “Surprisingly Long Survival of Premature Conclusions About Naked Mole-Rat Biology,” Biological Reviews of the Cambridge Philosophical Society 96, no. 2 (April 2021): 376–93, https://doi.org/10.1111/brv.12660.
As we saw with long-lived tortoises: R. Buffenstein, et al., “The Naked Truth: A Comprehensive Clarification and Classification of Current ‘Myths’ in Naked Mole-Rat Biology,” Biological Reviews of the Cambridge Philosophical Society 97, no. 1 (February 2022): 115–40, https://doi.org/10.1111/brv.12791.
The science writer Steven Johnson: Steven Johnson, Extra Life: A Short History of Living Longer (New York: Riverhead Books, 2021).
The ability to chemically capture nitrogen: The dramatic impact of fertilizers on humanity is told in Thomas Hager’s fascinating book The Alchemy of Air: A Jewish Genius, a Doomed Tycoon, and the Scientific Discovery That Fed the World but Fueled the Rise of Hitler (New York: Crown, 2009).
He and his colleagues contended: S. J. Olshansky, B. A. Carnes, and C. Cassel. “In Search of Methuselah: Estimating the Upper Limits to Human Longevity,” Science 250, no. 4981 (November 2, 1990): 634–40, https://doi.org/10.1126/science.2237414; S. J. Olshansky, B. A. Carnes, and A. Désesquelles, “Prospects for Human Longevity,” Science 291, no. 5508 (February 23, 2001): 1491–92, https://doi.org/10.1126/science.291.5508.1491.
Moreover, in certain species: A. Baudisch and J. W. Vaupel, “Getting to the Root of Aging: Why Do Patterns of Aging Differ Widely Across the Tree of Life?,” Science 338, no. 6107 (November 2, 2012): 618–19, https://doi.org/10.1126/science.1226467; O. R. Jones and J. W. Vaupel, “Senescence Is Not Inevitable,” Biogerontology 18, no. 6 (December 2017): 965–71, https://doi.org/10.1007/s10522-017-9727-3.
The disagreements between the two boiled: See J. Couzin-Frankel, “A Pitched Battle over Life Span,” Science 338, no. 6042 (July 29, 2011): 549–50, https://doi.org/10.1126/science.333.6042.549.
“pernicious belief”: J. Oeppen and J. W. Vaupel, “Demography. Broken Limits to Life Expectancy,” Science 296, no. 5570 (May 10, 2022): 1029–1031, https://doi.org/10.1126/science.1069675.
In agreement with this: F. Colchero et al., “The Long Lives of Primates and the ‘Invariant Rate of Ageing’ Hypothesis,” Nature Communications 12, no. 1 (June 16, 2021): 3666, https://doi.org/10.1038/s41467-021-23894-3.
Unlike most people: There is an entertaining account of Parr in Austad, Methuselah’s Zoo, pages 262–63.
“Until next year, perhaps”: Craig R. Whitney, “Jeanne Calment, World’s Elder, Dies at 122,” New York Times, August 5, 1997, B8.
Vijg predicted: X. Dong, B. Milholland, and J. Vijg, “Evidence for a Limit to Human Lifespan,” Nature 538, no. 7624 (October 13, 2016): 257–59, https://doi.org/10.1038/nature19793.
“if any”: E. Barbi et al., “The Plateau of Human Mortality: Demography of Longevity Pioneers,” Science 360, no. 6396 (June 29, 2018): 1459–61, https://doi.org/10.1126/science.aat3119.
This paper in turn was criticized: Carl Zimmer, “How Long Can We Live? The Limit Hasn’t Been Reached, Study Finds,” New York Times online, June 28, 2018, https://www.nytimes.com/2018/06/28/science/human-age-limit.html.
Others pointed out: H. Beltrán-Sánchez, S. N. Austad, and C. E. Finch, “The Plateau of Human Mortality: Demography of Longevity Pioneers,” Science 361, no. 6409 (September 28, 2018): eaav1200, https://doi.org/10.1126/science.aav1200.
After climbing steadily for the last 150 years: C. Cardona and D. Bishai, “The Slowing Pace of Life Expectancy Gains Since 1950,” BMC Public Health 18, no. 1 (January 17, 2018): 151, https://doi.org/10.1186/s12889-018-5058-9; J. Schöley et al., “Life Expectancy Changes Since COVID-19,” Nature Human Behaviour 6, no. 12 (December 2022): 1649–59, https://doi.org/10.1038/s41562-022-01450-3.
As I write this: “List of the Verified Oldest People,” Wikipedia, last accessed July 10, 2023, https://en.wikipedia.org/wiki/List_of_the_verified_oldest_people.
In fact, about half of centenarians: J. Evert et al., “Morbidity Profiles of Centenarians: Survivors, Delayers, and Escapers,” Journals of Gerontology: Series A, Biological Sciences and Medical Sciences 58, no. 3 (March 2003): 232–37, https://doi.org/10.1093/gerona/58.3.m232.
He agrees with Olshansky: Thomas Perls, email messages to the author, November 27, 2021, and January 17, 2022.
A dozen years later: Described in Austad, Methuselah’s Zoo, 273–74.
But scientists have homed in: C. López-Otín et al., “The Hallmarks of Aging,” Cell 153, no. 6 (June 6, 2013): 1194–217, https://doi.org/10.1016/j.cell.2013.05.039. This classic paper has recently been updated on the tenth anniversary of the original: C. López-Otín et al. “Hallmarks of Aging: An Expanding Universe,” Cell 186, no. 1 (January 19, 2023): 243–78, https://doi.org/10.1016/j.cell.2022.11.001.
3. Destroying the Master Controller
Today we know that our genes: Two very readable accounts of the history of genetics can be found in Matthew Cobb, Life’s Greatest Secret: The Race to Crack the Genetic Code (London: Profile Books, 2015), and Siddhartha Mukherjee, The Gene: An Intimate History (New York: Scribner, 2017).
How instructions in mRNA are read: The decade-long effort to crack the genetic code and understand how proteins are made is described in Cobb, Life’s Greatest Secret.
I have spent much of my life: Venki Ramakrishnan, Gene Machine: The Race to Decipher the Secrets of the Ribosome (London: Oneworld, 2018).
As early as the eighteenth century: H. W. Herr, “Percivall Pott, the Environment and Cancer,” BJU International 108, no. 4 (August 2011): 479–81, https://doi.org/10.1111/j.1464-410x.2011.10487.x.
Hermann Muller was a third-generation American who grew up in New York City: G. Pontecorvo, “Hermann Joseph Muller, 1890–1967,” Biographical Memoirs of Fellows of the Royal Society 14 (November 1968): 348–89, https://doi.org/10.1098/rsbm.1968.0015; Elof Axel Carlson, Hermann Joseph Muller 1890–1967: A Biographical Memoir (Washington, DC: National Academy of Sciences, 2009), available at http://www.nasonline.org/publications/biographical-memoirs/memoir-pdfs/muller-hermann.pdf.
Even a modest application: Errol Friedberg, chap. 1, “In the Beginning,” in Correcting the Blueprint of Life: An Historical Account of the Discovery of DNA Repair Mechanisms (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 1997).
One of Crew’s key collaborators: Geoffrey Beale, “Charlotte Auerbach, 14 May 1899–1917 March 1994,” Biographical Memoirs of Fellows of the Royal Society 41 (November 1995): 20–42, https://doi.org/10.1098/rsbm.1995.0002
But once Watson and Crick revealed its double-helical nature: A very good historical summary of early work on DNA damage and repair can be found in Friedberg, chap. 1, “In the Beginning,” in Correcting the Blueprint of Life.
Sunlight could kill bacteria: A. Downes and T. P. Blunt, “The Influence of Light upon the Development of Bacteria,” Nature, 16 (July 12, 1877), 218, https://doi.org/10.1038/016218a0; F. L. Gates, “A Study of the Bactericidal Action of Ultraviolet Light,” Journal of General Physicology, 14, No. 1 (September 20, 1930): 31–42, https://doi.org/10.1085/jgp.14.1.31.
However, when they tried this: R. B. Setlow and J. K. Setlow, “Evidence That Ultraviolet-Induced Thymine Dimers in DNA Cause Biological Damage,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 48, no. 7 (July 1, 1962): 1250–57, https://doi.org/10.1073/pnas.48.7.1250.
Dick and his colleagues found: R. B. Setlow, P. A. Swenson, and W. L. Carrier, “Thymine Dimers and Inhibition of DNA Synthesis by Ultraviolet Irradiation of Cells,” Science 142, no. 3698 (December 13, 1963): 1464–66, https://doi.org/10.1126/science.142.3598.1464; R. B. Setlow and W. L. Carrier, “The Disappearance of Thymine Dimers from DNA: An Error-Correcting Mechanism, Proceedings of the National Academy of Sciences (PNAS) of the United States of America 51, no. 2 (April 1964): 226–31, https://doi.org/10.1073/pnas.51.2.226.
The same year: R. P. Boyce and P. Howard-Flanders, “Release of Ultraviolet Light-Induced Thymine Dimers from DNA in E. coli K-12,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 51, no. 2 (February 1, 1964): 293–300, https://doi.org/10.1073/pnas.51.2.293; D. Pettijohn and P. Hanawalt, “Evidence for Repair-Replication of Ultraviolet Damaged DNA in Bacteria,” Journal of Molecular Biology 9, no. 2 (August 1964): 395–410, https://doi.org/10.1016/s0022-2836(64)80216-3.
How it worked was something of a mystery: Aziz Sancar, “Mechanisms of DNA Repair by Photolyase and Excision Nuclease (Nobel Lecture, December 8, 2015), available at https://www.nobelprize.org/uploads/2018/06/sancar-lecture.pdf.
That is a very long time: A great account of Thomas Lindahl’s discoveries can be found in his “The Intrinsic Fragility of DNA” (Nobel Lecture, December 8, 2015), available at https://www.nobelprize.org/uploads/2018/06/lindahl-lecture.pdf.
Lindahl estimated later: Tomas Lindahl, “Instability and Decay of the Primary Structure of DNA,” Nature 362, no. 6422 (April 22, 1993): 709–715.
Not surprisingly, the cell: Paul Modrich, “Mechanisms in E. coli and Human Mismatch Repair” (Nobel Lecture, December 8, 2015, https://www.nobelprize.org/uploads/2018/06/modrich-lecture.pdf).
Relying on some very clever experiments: Ibid.
The prize also cannot be given: As is increasingly the case because of the limitation of the Nobel Prize to three people, the prize for DNA repair was not without its controversy: David Kroll, “This Year’s Nobel Prize in Chemistry Sparks Questions About How Winners Are Selected,” Chemical & Engineering News (C&EN) online, last modified November 11, 2015, https://cen.acs.org/articles/93/i45/Years-Nobel-Prize-Chemistry-Sparks.html.
One condition he has focused on: B. Schumacher et al., “The Central Role of DNA Damage in the Ageing Process,” Nature 592, no. 7856 (April 2021): 695–703, https://doi.org/10.1038/s41586-021-03307-7.
In females, defects in how the cell: K. T. Zondervan, “Genomic Analysis Identifies Variants That Can Predict the Timing of Menopause,” Nature 596, no. 7872 (August 2021): 345–46, https://doi.org/10.1038/d41586-021-01710-8; K. S. Ruth et al., “Genetic Insights into Biological Mechanisms Governing Human Ovarian Ageing,” Nature 596, no. 7872 (August 2021): 393–97, https://doi.org/10.1038/s41586-021-03779-7. See also the commentary by H. Ledford, “Genetic Variations Could One Day Help Predict Timing of Menopause,” Nature online, last modified August 4, 2021, https://doi.org/10.1038/d41586-021-02128-y.
Sometimes the cell: Apoptosis, or programmed cell death, is also a feature of normal development, as specific cells die at precise points during the development of an organism from a single cell into the adult animal. This was first discovered by studying how the worm C. elegans develops from a single fertilized egg into an adult of almost a thousand cells, and resulted in the award of the 2002 Nobel Prize to Sydney Brenner, John Sulston, and Robert Horvitz.
When the damage is too extensive: A. J. Levine and G. Lozano, eds., The P53 Protein: From Cell Regulation to Cancer, Cold Spring Harbor Perspectives in Medicine (Cold Spring Harbor, NY: Cold Spring Harbor Laboratory, 2016).
Humans inherit one copy: L. M. Abegglen et al., “Potential Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in Humans,” Journal of the American Medical Association (JAMA) 314, no. 17 (November 3, 2015): 1850–60, https://doi.org/10.1001/jama.2015.13134; M. Sulak et al., “TP53 Copy Number Expansion Is Associated with the Evolution of Increased Body Size and an Enhanced TP Damage Response in Elephants,” eLife 5 (2016): e11994, https://doi.org/10.7554/eLife.11994.
Curiously, in studies: M. Shaposhnikov et al., “Lifespan and Stress Resistance in Drosophila with Overexpressed DNA Repair Genes,” Scientific Reports 5 (October 19, 2015): art. 15299, https://doi.org/10.1038/srep15299.
Some of the long-lived species: D. Tejada-Martinez, J. P. de Magalhães, and J. C. Opazo, “Positive Selection and Gene Duplications in Tumour Suppressor Genes Reveal Clues About How Cetaceans Resist Cancer,” Proceedings of the Royal Society B (Biological Sciences) 288, no. 1945 (February 24, 2021): art. 20202592, https://doi.org/10.1098/rspb.2020.2592; V. Quesada et al., “Giant Tortoise Genomes Provide Insights into Longevity and Age-Related Disease,” Nature Ecology & Evolution 3 (January 2019): 87–95, https://doi.org/10.1038/s41559-018-0733-x.
Humans and naked mole rats: S. L. MacRae et al., “DNA Repair in Species with Extreme Lifespan Differences,” Aging 7, no. 12 (December 2015): 1171–84, https://doi.org/10.18632/aging.100866.
Paradoxically, many new cancer therapies: See, for example, Liam Drew, “PARP Inhibitors: Halting Cancer by Halting DNA Repair,” Cancer Research UK online, last modified September 24, 2020, https://news.cancerresearchuk.org/2020/09/24/parp-inhibitors-halting-cancer-by-halting-dna-repair/.
4. The Problem with Ends
“Perhaps the day”: Scientific American, July 1921, quoted in Mark Fischetti, comp., “1921: Immortality for Humans,” Scientific American online, July 2021, 79, https://robinsonlab.cellbio.jhmi.edu/wp-content/uploads/2021/06/SciAm_2021_07.pdf.
They were not immortal: An engaging history of Hayflick’s discovery and its aftermath is J. W. Shay and W. E. Wright, “Hayflick, His Limit, and Cellular Ageing,” Nature Reviews Molecular Cell Biology 1, no. 1 (October 2000): 72–76, https://doi.org/10.1038/35036093.
It has since become a classic: L. Hayflick and P. S. Moorhead, “The Serial Cultivation of Human Diploid Cell Strains,” Experimental Cell Research 25, no. 3 (December 1961): 585–621, https://doi.org/10.1016/0014-4827(61)90192-6.
Some have even suggested: J. Witkowski, “The Myth of Cell Immortality,” Trends in Biochemical Sciences 10, no. 7 (July 1985): 258–60, https://doi.org/10.1016/0968-0004(85)90076-3.
Given Carrel’s stature: John J. Conley, “The Strange Case of Alexis Carrel, Eugenicist,” in Life and Learning XXIII and XXIV: Proceedings of the Twenty-third (2013) and Twenty-fourth Conferences of the University Faculty for Life Conference at Marquette University, Milwaukee, Wisconsin, vol. 26, ed. Joseph W. Koterski (Milwaukee: University Faculty for Life), 281–88, https://www.uffl.org/pdfs/vol23/UFL_2013_Conley.pdf.
Titia de Lange: Titia de Lange, conversation with the author, September 10, 2021.
He realized that the train: This so-called end replication problem was first pointed out by J. D. Watson, “Origin of Concatemeric T7 DNA,” Nature New Biology 239, no. 94 (October 18, 1972): 197–201, https://doi.org/10.1038/newbio239197a0, and A. M. Olovnikov, “Telomeres, Telomerase, and Aging: Origin of the Theory,” Experimental Gerontology 31, no. 4 (July/August 1996): 443–48, https://www.sciencedirect.com/science/article/abs/pii/0531556596000058. For a good description of how it would work, see M. M. Cox, J. Doudna, and M. O’Donnell, Molecular Biology: Principles and Practice (New York: W. H. Freeman, 2012), 398–400. The Wikipedia page “DNA Replication,” last modified June 14, 2023, https://en.wikipedia.org/wiki/DNA_replication, is also quite informative.
At some point, she discovered: For a long time, McClintock was not believed, but these so-called transposable elements turned out to be a fundamental part of biology, and she was awarded the Nobel Prize for her work in 1983 at the age of eighty-one.
TTGGGG: E. H. Blackburn and J. G. Gall, “A Tandemly Repeated Sequence at the Termini of the Extrachromosomal Ribosomal RNA Genes in Tetrahymena,” Journal of Molecular Biology 120, no. 1 (March 25, 1978): 33–53, https://doi.org/10.1016/0022-2836(78)90294-2.
It worked like a charm: J. W. Szostak and E. H. Blackburn, “Cloning Yeast Telomeres on Linear Plasmid Vectors,” Cell 29, no. 1 (May 1982): 245–55, https://doi.org/10.1016/0092-8674(82)90109-x.
The two of them discovered an enzyme: C. W. Greider and E. H. Blackburn, “Identification of a Specific Telomere Terminal Transferase Activity in Tetrahymena Extracts,” Cell 43, no. 2, pt. 1 (November 1985): 405–13, https://doi.org/10.1016/0092-8674(85)90170-9; C. W. Greider and E. H. Blackburn, “The Telomere Terminal Transferase of Tetrahymena Is a Ribonucleoprotein Enzyme with Two Kinds of Primer Specificity,” Cell 51, no. 6 (December 24, 1987): 887–98, https://doi.org/10.1016/0092-8674(87)90576-9; C. W. Greider and E. H. Blackburn, “A Telomeric Sequence in the RNA of Tetrahymena Telomerase Required for Telomere Repeat Synthesis,” Nature 337, no. 6205 (January 26, 1989): 331–37, https://doi.org/10.1038/337331a0.
Without telomerase: C. B. Harley, A. B. Futcher, and C. W. Greider, “Telomeres Shorten During Ageing of Human Fibroblasts,” Nature 345, no. 5274 (May 31, 1990): 458–60, https://doi.org/10.1038/345458a0.
Even introducing telomerase: A. G. Bodnar et al., “Extension of Life-span by Introduction of Telomerase into Normal Human Cells,” Science 279, no. 5349 (January 16, 1998): 349–52, https://doi.org/10.1126/science.279.5349.349.
It turns out that the telomeric ends: The strand that extends beyond the other is called a 3’ overhang, so the reason for the loss of the ends is not exactly the reason first proposed by Olovnikov and Watson. Aficionados can look at J. Lingner, J. P. Cooper, and T. R. Cech, “Telomerase and DNA End Replication: No Longer a Lagging Strand Problem,” Science 269, no. 5230 (September 15, 1995): 1533–34, https://doi.org/10.1126/science.7545310.
This longer strand: T. de Lange, “Shelterin: The Protein Complex That Shapes and Safeguards Human Telomeres,” Genes & Development 19, no. 18 (September 15, 2005): 2100–10, https://doi.org/10.1101/gad.1346005; I. Schmutz and T. de Lange, “Shelterin,” Current Biology 26, no. 10 (May 23, 2016): R397–99, https://doi.org/10.1016/j.cub.2016.01.056.
This crucial structure is why the cell: W. Palm and T. de Lange, “How Shelterin Protects Mammalian Telomeres,” Annual Review of Genetics 42 (2008): 301–34, https://doi.org/10.1146/annurev.genet.41.110306.130350; P. Martínez and M. A. Blasco, “Role of Shelterin in Cancer and Aging,” Aging Cell 9, no. 5 (October 2010): 653–66, https://doi.org/10.1111/j.1474-9726.2010.00596.x.
The cell then sees: F. d’Adda di Fagagna et al. “A DNA Damage Checkpoint Response in Telomere-Initiated Senescence,” Nature 426, no. 6963 (November 13, 2003): 194–98, https://doi.org/10.1038/nature02118.
People with defective telomerase: M. Armanios and E. H. Blackburn, “The Telomere Syndromes,” Nature Reviews Genetics 13, no. 10 (October 2012): 693–704, https://doi.org/10.1038/nrg3246.
When we are stressed: E. S. Epel et al., “Accelerated Telomere Shortening in Response to Life Stress,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 101, no. 49 (December 1, 2004): 17312–15, https://doi.org/10.1073/pnas.0407162101; J. Choi, S. R. Fauce, and R. B. Effros, “Reduced Telomerase Activity in Human T Lymphocytes Exposed to Cortisol,” Brain, Behavior, and Immunity 22, no. 4 (May 2008): 600–605, https://doi.org/10.1016/j.bbi.2007.12.004. See also the following on stress and premature gray hair in mice: B. Zhang et al., “Hyperactivation of Sympathetic Nerves Drives Depletion of Melanocyte Stem Cells,” Nature 577, no. 792 (January 2020): 676–81, https://doi.org/10.1038/s41586-020-1935-3.
So it may be that the shortening: M. Jaskelioff et al. “Telomerase Reactivation Reverses Tissue Degeneration in Aged Telomerase-Deficient Mice,” Nature 469, no. 7328 (January 6, 2001): 102–6 (2011), https://doi.org/10.1038/nature09603.
According to a number of studies, mice engineered: M. A. Muñoz-Lorente, A. C. Cano-Martin, and M. A. Blasco, “Mice with Hyper-long Telomeres Show Less Metabolic Aging and Longer Lifespans,” Nature Communications 10, no. 1 (October 17, 2019): 4723, https://doi.org/10.1038/s41467-019-12664-x.
There seems to be a delicate balance: Titia de Lange, conversations with and email messages to the author, November and December 2021. See also Jalees Rehman, “Aging: Too Much Telomerase Can Be as Bad as Too Little,” Guest Blog, Scientific American online, last modified July 5, 2014, ttps://blogs.scientificamerican.com/guest-blog/aging-too-much-telomerase-can-be-as-bad-as-too-little/.
On the other hand, those with long telomeres: E. J. McNally, P. J. Luncsford, and M. Armanios, “Long Telomeres and Cancer Risk: The Price of Cellular Immortality,” Journal of Clinical Investigation 129, no. 9 (August 5, 2019): 3474–81, https://doi.org/10.1172/JCI120851.
5. Resetting the Biological Clock
“another great Anglo-American partnership”: The official text of the statement on the publication of the draft human genome sequence by the White House and the UK government is here: National Human Genome Research Institute online, “June 2000 White House Event,” news release, June 26, 2000, https://www.genome.gov/10001356/june-2000-white-house-event. A slightly different text was reported by the New York Times: “Text of the White House Statements on the Human Genome Project,” Science, New York Times online, June 27, 2000, https://archive.nytimes.com/www.nytimes.com/library/national/science/062700sci-genome-text.html. The sequence itself was described in two large, coordinated publications: the public consortium was published as International Human Genome Sequencing Consortium et al., “Initial Sequencing and Analysis of the Human Genome,” Nature 409, no. 6822 (February 15, 2001): 860–921, https://doi.org/10.1038/35057062, while the private Celera effort was published as J. C. Venter et al., “The Sequence of the Human Genome,” Science 291, 1304–51, https://doi.org/10.1126/science.1058040.
“Along with Bach’s music”: Quoted in G. Yamey, “Scientists Unveil First Draft of Human Genome,” BMJ 321, no. 7252 (July 1, 2000): 7, https://doi.org/10.1136/bmj.321.7252.7.
Venter was something: “Profile: Craig Venter,” BBC News online, last modified May 21, 2010, https://www.bbc.co.uk/news/10138849.
The decision by NIH: “US Patent Application Stirs Up Gene Hunters,” Nature, 353 (October 10, 1991): 485–86 (1991), https://doi.org/10.1038/353485a0; N. D. Zinder, “Patenting cDNA 1993: Efforts and Happenings” (abstract), Gene 135, nos. 1/2 (December 1993): 295–98, https://www.sciencedirect.com/science/article/abs/pii/037811199390080M.
Venter said later that he was always against them: Matthew Herper, “Craig Venter Mapped the Genome. Now He’s Trying to Decode Death,” Forbes (online), February 21, 2017, https://www.forbes.com/sites/matthewherper/2017/02/21/can-craig-venter-cheat-death/?sh=8f6fefa16456.
A particularly passionate advocate: John Sulston and Georgina Ferry, The Common Thread: A Story of Science, Politics, Ethics, and the Human Genome (New York: Random House, 2002).
In the run-up: “How Diplomacy Helped to End the Race to Sequence the Human Genome,” Nature 582, no. 7813 (June 2020): 460, https://doi.org/10.1038/d41586-020-01849-w.
The sequence was declared finished: S. Reardon, “A Complete Human Genome Sequence Is Close: How Scientists Filled in the Gaps,” Nature 594, no. 7862 (June 2021): 158–59, https://doi.org/10.1038/d41586-021-01506-w.
The study of this change: Nessa Carey’s The Epigenetics Revolution: How Modern Biology Is Rewriting Our Understanding of Genetics, Disease, and Inheritance (New York: Columbia University Press, 2012) is a great popular introduction to epigenetics. Mukherjee’s The Gene is more broadly about the nature of the gene but has a significant emphasis on epigenetics.
They are too far down: R. Briggs and T. J. King, “Transplantation of Living Nuclei from Blastula Cells into Enucleated Frogs’ Eggs,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 38, no. 5 (May 1952): 455–63, https://doi.org/10.1073/pnas.38.5.455.
He studied languages instead: “Sir John B. Gurdon: Biographical,” Nobel Prize online, accessed August 7, 2023, https://www.nobelprize.org/prizes/medicine/2012/gurdon/biographical/.
The clawed frog became: J. B. Gurdon and N. Hopwood, “The Introduction of Xenopus Laevis into Developmental Biology: Of Empire, Pregnancy Testing and Ribosomal Genes,” International Journal of Developmental Biology 44, no. 1 (2000): 43–50.
This was the first time: J. B. Gurdon, “The Developmental Capacity of Nuclei Taken from Intestinal Epithelium Cells of Feeding Tadpoles,” Development 10, no. 4 (December 1, 1962): 622–40, https://doi.org/10.1242/dev.10.4.622.
Eventually other researchers reproduced: I. Wilmut et al., “Viable Offspring Derived from Fetal and Adult Mammalian Cells,” Nature 385, no. 6619 (February 27, 1997): 810–13, https://doi.org/10.1038/385810a0.
Being able to grow ES cells: M. J. Evans and M. H. Kaufman, “Establishment in Culture of Pluripotential Cells from Mouse Embryos,” Nature 292, no. 5819 (July 9, 1981): 154–56, https://doi.org/10.1038/292154a0; G. R. Martin, “Isolation of a Pluripotent Cell Line from Early Mouse Embryos Cultured in Medium Conditioned by Teratocarcinoma Stem Cells,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 78, no. 12 (December 1, 1981): 7634–38, https://doi.org/10.1073/pnas.78.12.7634.
By experimenting with transcription factors in various combinations: Shinya Yamanaka, “Shinya Yamanaka: Biographical,” Nobel Prize online, https://www.nobelprize.org/prizes/medicine/2012/yamanaka/biographical/.
One of the first and simplest: The lac operator and repressor system was discovered in the 1960s by Jacques Monod and Francois Jacob, and its history, along with another genetic switch in a bacteriophage by Andre Lwoff, resulted in the Nobel Prize in 1965. For an insightful history, see M. Lewis, “A Tale of Two Repressors,” Journal of Molecular Biology 409, no. 1 (May 27, 2011): 14–27, https://doi.org/10.1016/j.jmb.2011.02.023.
You might expect that when cells divide: The British geneticist Adrian Bird showed that the methylation occurs mainly on islands with CG repeats. Because C pairs with a G, if you have a CpG island, the C and G on each strand will be directly across from a G and C on the opposite strand. Each C will then be diagonally across from the C on the other strand. When cells methylate a CpG island, they methylate the Cs on both strands. As soon as the cell divides, you have two molecules of DNA instead of one. Each of them has an original strand where the C is methylated, and a newly made strand in which it isn’t. There are special methyltransferase enzymes that will add a methyl group to a C only if the C diagonally across from it on the other strand already has one. This ensures that both strands end up methylated exactly in the same places they were before.
It is a striking example: E. W. Tobi et al., “DNA Methylation as a Mediator of the Association Between Prenatal Adversity and Risk Factors for Metabolic Disease in Adulthood,” Science Advances 4, no. 1 (January 31, 2018): eaao4364, https://doi.org/10.1126/sciadv.aao4364; described in Carl Zimmer, “The Famine Ended 70 Years Ago, But Dutch Genes Still Bear Scars,” New York Times online, January 31, 2018, https://www.nytimes.com/2018/01/31/science/dutch-famine-genes.html. See also Mukherjee, The Gene, and Carey, The Epigenetics Revolution.
When they looked at the methylation: For an expert popular account of Steve Horvath and epigenetic clocks, see Ingrid Wickelgren, “Epigenetic ‘Clocks’ Predict Animals’ True Biological Age,” Quanta, last modified August 17, 2022, https://www.quantamagazine.org/epigenetic-clocks-predict-animals-true-biological-age-20220817/. Some of the background on Horvath is taken from this article.
He was able to identify 513 sites: M. E. Levine et al., “An Epigenetic Biomarker of Aging for Lifespan and Healthspan,” Aging 10, no. 4 (April 2018): 573–91, https://doi.org/10.18632/aging.101414.
Methylation patterns are like a biological clock: S. Horvath and K. Raj, “DNA Methylation-Based Biomarkers and the Epigenetic Clock Theory of Ageing,” Nature Reviews Genetics 19, no. 6 (June 2018): 371–84, https://doi.org/10.1038/s41576-018-0004-3.
Many other research groups developed: For an example, see G. Hannum et al., “Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates,” Molecular Cell 49, no. 2 (January 24, 2013): 359–67, https://doi.org/10.1016/j.molcel.2012.10.016.
In fact, its methylation pattern: C. Kerepesi et al., “Epigenetic Clocks Reveal a Rejuvenation Event During Embryogenesis Followed by Aging,” Science Advances 7, no. 26 (June 25, 2021): eabg6082, https://doi.org/10.1126/sciadv.abg6082; C. Kerepesi et al., “Epigenetic Aging of the Demographically Non-Aging Naked Mole-Rat,” Nature Communications 13, no. 1 (January 17, 2022): 355, https://doi.org/10.1038/s41467-022-27959-9.
Something about her diet: R. Kucharski et al., “Nutritional Control of Reproductive Status in Honeybees Via DNA Methylation,” Science 319, no. 5871 (March 28, 2008): 1827–30, https://doi.org/10.1126/science.1153069; M. Wojciechowski et al., “Phenotypically Distinct Female Castes in Honey Bees Are Defined by Alternative Chromatin States During Larval Development,” Genome Research 28, no. 10 (October 2018): 1532–42, https://doi.org/10.1101/gr.236497.118.
The first is that germ-line cells: L. Moore et al., “The Mutational Landscape of Human Somatic and Germline Cells,” Nature 597, no. 7876 (September 2021): 381–86, https://doi.org/10.1038/s41586-021-03822-7.
By puberty, this number: Kirkwood, Time of Our Lives, 167–78.
And even within an embryo that is developing normally overall: A recent example is A. Lima et al., “Cell Competition Acts as a Purifying Selection to Eliminate Cells with Mitochondrial Defects During Early Mouse Development,” Nature Metabolism 3, no. 8 (August 2021): 1091–108, https://doi.org/10.1038/s42255-021-00422-7, but there are many ways in which the body rejects defective embryos from developing to term.
This is because the pronuclei: Azim Surani, the scientist in Cambridge who first showed that a fertilized egg needed nuclei from both paternal and maternal germ-line cells to develop normally into a new animal, first suggested the idea of random, environmentally induced, and possibly deleterious epigenetic changes in our genome, which he called “epimutations.” Interview with the author, February 10, 2022.
There were also the lesser-known: Joanna Klein, “Dolly the Sheep’s Fellow Clones, Enjoying Their Golden Years,” New York Times online, July 26, 2016, https://www.nytimes.com/2016/07/27/science/dolly-the-sheep-clones.html, reports on K. D. Sinclair et al., “Healthy Ageing of Cloned Sheep,” Nature Communications 7 (July 26, 2016): 12359, https://doi.org/10.1038/ncomms12359. An extensive analysis of cloned animals in 2017 showed no systematically lower life span or other problems, suggesting that at least some cloned animals live just as long and healthy lives as naturally conceived ones: J. P. Burgstaller and G. Brem, “Aging of Cloned Animals: A Mini-Review,” Gerontology 63, no. 5 (August 2017): 417–25, https://doi.org/10.1159/000452444.
This route to rejuvenating: T. A. Rando and H. Y. Chang, “Aging, Rejuvenation, and Epigenetic Reprogramming: Resetting the Aging Clock,” Cell 148, no. 1/2 (January 20, 2012): 46–57, https://doi.org/10.1016/j.cell.2012.01.003; J. M. Freije and C. López-Otín, “Reprogramming Aging and Progeria,” Current Opinion in Cell Biology 24, no. 6 (December 2012): 757–64, https://doi.org/10.1016/j.ceb.2012.08.009.
6. Recycling the Garbage
Today more than fifty million people: “Dementia,” World Health Organization online, last modified March 15, 2023, https://www.who.int/news-room/fact-sheets/detail/dementia.
In England and Wales: “Dementia Now Leading Cause of Death,” BBC News online, last modified November 14, 2016, https://www.bbc.co.uk/news/health-37972141.
It is estimated: “One-Third of British People Born in 2015 ‘Will Develop Dementia,’” Guardian (US edition) online, last modified September 21, 2015, https://www.theguardian.com/society/2015/sep/21/one-third-of-people-born-in-2015-will-develop-dementia.
Over half of those with dementia: A very engaging and moving book on Alzheimer’s disease is Joseph Jebelli, In Pursuit of Memory: The Fight Against Alzheimer’s (London: John Murray, 2017). The author grew up with a grandfather who suffered from the disease.
There are many ways that the folding process: R. J. Ellis, “Assembly Chaperones: A Perspective,” Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences 368, no. 1617 (March 25, 2013): 20110398, https://doi.org/10.1098/rstb.2011.0398.
But as we age: M. Fournet, F. Bonté, and A. Desmoulière, “Glycation Damage: A Possible Hub for Major Pathophysiological Disorders and Aging,” Aging and Disease 9, no. 5 (October 2018): 880–900, https://doi.org/10.14336/AD.2017.1121.
Cells have an elaborate sensor: For an accessible description of the unfolded protein response, see Evelyn Strauss, “Unfolded Protein Response: 2014 Albert Lasker Basic Medical Research Award,” Lasker Foundation online, accessed July 7, 2023, https://laskerfoundation.org/winners/unfolded-protein-response/#achievement. How exactly the sensor detects that there are too many unfolded proteins is still not entirely clear. I spoke with Dr. David Ron, a scientist at England’s Cambridge Institute for Medical Research, and one of the leaders in this area. One idea is that some chaperones—the proteins that help proteins to fold—are normally abundant and can bind to the sensors, which are then kept in a quiescent state. When the number of unfolded proteins increases, these chaperones are called to action, and they release the sensors, which then go on to trigger the unfolded protein response. S. Preissler and D. Ron, “Early Events in the Endoplasmic Reticulum Unfolded Protein Response,” Cold Spring Harbor Perspectives in Biology 11, no. 4 (April 1, 2019): a033894, https://doi.org/10.1101/cshperspect.a033894.
In extreme cases: A. Fribley, K. Zhang, and R. J. Kaufman, “Regulation of Apoptosis by the Unfolded Protein Response,” in Apoptosis: Methods and Protocols, ed. P. Erhardt and A. Toth (Totowa, NJ: Humana Press, 2009), 191–204, https://doi.org/10.1007/978-1-60327-017-5_14.
Eventually researchers discovered: K. D. Wilkinson, “The Discovery of Ubiquitin-Dependent Proteolysis,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 102, no. 43 (October 17, 2005): 15280–82, https://doi.org/10.1073/pnas.0504842102. There is a popular account of the discovery of the proteasome and the award of the Nobel Prize to Avram Hershko, Aaron Ciechanover, and Irwin Rose in “Popular Information: The Nobel Prize in Chemistry 2004,” Nobel Prize online, accessed July 4, 2023, https://www.nobelprize.org/prizes/chemistry/2004/popular-information/.
Deliberately introducing defects: I. Saez and D. Vilchez, “The Mechanistic Links Between Proteasome Activity, Aging and Age-Related Diseases,” Current Genomics 15, no. 1 (February 15, 2014): 38–51, https://doi.org/10.2174/138920291501140306113344.
By isolating strains: K. Takeshig et al., “Autophagy in Yeast Demonstrated with Proteinase-Deficient Mutants and Conditions for Its Induction,” Journal of Cell Biology 119, no. 2 (October 1992): 301–11, https://doi.org/10.1083/jcb.119.2.301; M. Tsukada and Y. Ohsumi, “Isolation and Characterization of Autophagy-Defective Mutants of Saccharomyces cerevisiae,” FEBS Letters 333, nos. 1/2 (October 25, 1993): 169–74, https://doi.org/10.1016/0014-5793(93)80398-e.
It has so many essential functions: For a very reader-friendly description of autophagy, see “The Nobel Prize in Physiology or Medicine 2016: Yoshinori Ohsumi,” press release, Nobel Prize online, October 3, 2016, https://www.nobelprize.org/prizes/medicine/2016/press-release/.
Integrated stress response or ISR: Two reviews of the integrated stress response are Harding, H. P. et al., “An integrated stress response regulates amino acid metabolism and resistance to oxidative stress,” Molecular Cell 11, no. 3 (March 2003): 619–33, https://doi.org/10.1016/s1097-2765(03)00105-9; and Pakos‐Zebrucka, K. et al. “The integrated stress response,” EMBO Reports 17, no.10 (2016): 1374–95, https://doi.org/10.15252/embr.201642195. Its discovery in amino acid starvation is described in Dever, T. E. et al., “Phosphorylation of initiation factor 2 alpha by protein kinase GCN2 mediates gene-specific translational control of GCN4 in yeast,” Cell 68. no. 3 (February 1992): 585–96, https://doi.org/10.1016/0092-8674(92)90193-g and that in the unfolded protein response in Harding, H. P. et al., “PERK is essential for translational regulation and cell survival during the unfolded protein response,” Molecular Cell 5, no. 5 (May 2000): 897-904, https://doi.org/10.1016/s1097-2765(00)80330-5.
If you delete the genes: M. Delépine et al., “EIF2AK3, Encoding Translation Initiation Factor 2-Alpha Kinase 3, Is Mutated in Patients with Wolcott-Rallison Syndrome,” Nature Genetics 25, no. 4 (August 2000): 406–9, https://doi.org/10.1038/78085; H. P. Harding et al., “Diabetes Mellitus and Exocrine Pancreatic Dysfunction in Perk-/- Mice Reveals a Role for Translational Control in Secretory Cell Survival,” Molecular Cell 7, no. 6 (June 2001): 1153–63, https://doi.org/10.1016/s1097-2765(01)00264-7.
They also extend life span: S. J. Marciniak et al., “CHOP Induces Death by Promoting Protein Synthesis and Oxidation in the Stressed Endoplasmic Reticulum,” Genes & Development 18, no. 24 (December 15, 2004): 3066–77, https://doi.org/10.1101/gad.1250704; M. D’Antonio et al., “Resetting Translational Homeostasis Restores Myelination in Charcot-Marie-Tooth Disease Type 1B Mice,” Journal of Experimental Medicine 210, no. 4 (April 8, 2013): 821–38, https://doi.org/10.1084/jem.20122005; P. Tsaytler et al., “Selective Inhibition of a Regulatory Subunit of Protein Phosphatase 1 Restores Proteostasis,” Science 332, no. 6025 (April 1, 2011): 91–94, https://doi.org/10.1126/science.1201396; H. Q. Jiang et al., “Guanabenz Delays the Onset of Disease Symptoms, Extends Lifespan, Improves Motor Performance and Attenuates Motor Neuron Loss in the SOD1 G93A Mouse Model of Amyotrophic Lateral Sclerosis,” Neuroscience 277 (March 2014): 132–38, https://doi.org/10.1016/j.neuroscience.2014.03.047; I. Das et al., “Preventing Proteostasis Diseases by Selective Inhibition of a Phosphatase Regulatory Subunit,” Science 348, no. 6231 (April 10, 2015): 239–42, https://doi.org/10.1126/science.aaa4484.
whether they even affected ISR directly: A. Crespillo-Casado et al., “PPP1R15A-Mediated Dephosphorylation of eIF2α Is Unaffected by Sephin1 or Guanabenz,” eLife 6 (April 27, 2017): e26109, https://doi.org/10.7554/eLife.26109.
According to their studies, deleting the genes: T. Ma et al., “Suppression of eIF2α Kinases Alleviates Alzheimer’s Disease–Related Plasticity and Memory Deficits,” Nature Neuroscience 16, no. 9 (September 2013): 1299–305, https://doi.org/10.1038/nn.3486.
Even more surprisingly: Adam Piore, “The Miracle Molecule That Could Treat Brain Injuries and Boost Your Fading Memory,” MIT Technology Review 124, no. 5 (September/October 2021): https://www.technologyreview.com/2021/08/25/1031783/isrib-molecule-treat-brain-injuries-memory/; C. Sidrauski et al., “Pharmacological Brake-Release of mRNA Translation Enhances Cognitive Memory,” eLife 2 (2013): e00498,https://doi.org/10.7554/eLife.00498; C. Sidrauski et al., “The Small Molecule ISRIB Reverses the Effects of Eif2α Phosphorylation on Translation and Stress Granule Assembly,” eLife 4 (2015): e05033, https://doi.org/10.7554/eLife.05033; A. Chou et al., “Inhibition of the Integrated Stress Response Reverses Cognitive Deficits After Traumatic Brain Injury,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 114, no. 31 (July 10, 2017): E6420–E6426, https://doi.org/10.1073/pnas.1707661114.
Nahum Sonenberg: Nahum Sonenberg, email message to the author, January 12, 2023.
The key person: D. M. Asher with M. A. Oldstone, Carleton Gajdusek, 1923–2008: Biographical Memoirs (Washington, DC: US National Academy of Sciences, 2013), http://www.nasonline.org/publications/biographical-memoirs/memoir-pdfs/gajdusek-d-carleton.pdf; Caroline Richmond, “Obituary: Carleton Gajdusek,” Guardian (US edition) online, last modified February 25, 2009, https://www.theguardian.com/science/2009/feb/25/carleton-gajdusek-obituary.
On the strength of this: Frank Macfarlane Burnet studied how the immune system distinguishes between our own cells and foreign invaders and shared the 1960 Nobel Prize with Peter Medawar.
“had an intelligence quotient”: Jay Ingram, Fatal Flaws: How a Misfolded Protein Baffled Scientists and Changed the Way We Look at the Brain (New Haven, CT: Yale University Press, 2013), as quoted in M. Goedert, “M. Prions and the Like,” Brain 137, no. 1 (January 2014): 301–5, https://doi.org/10.1093/brain/awt179. See also J. Farquhar and D. C. Gajdusek, eds., Early Letters and Field-Notes from the Collection of D. Carleton Gajdusek (New York: Raven Press, 1981).
This was a recent practice among the Fore: J. Goodfield, “Cannibalism and Kuru,” Nature 387 (June 26, 1997): 841, https://doi.org/10.1038/43043; R. Rhodes, “Gourmet Cannibalism in New Guinea Tribe,” Nature 389 (September 4, 1997): 11, https://doi.org/10.1038/37853.
He showed no remorse: Ivin Molotsky, “Nobel Scientist Pleads Guilty to Abusing Boy,” New York Times online, February 19, 1997, https://www.nytimes.com/1997/02/19/us/nobel-scientist-pleads -guilty-to-abusing-boy.html. Two articles shed light on the sociology of Gajdusek’s extended family: C. Spark, “Family Man: The Papua New Guinean Children of D. Carleton Gajdusek,” Oceania 77, no. 3 (November 2007): 355–69, and C. Spark, “Carleton’s Kids: The Papua New Guinean Children of D. Carleton Gajdusek,” Journal of Pacific History 44, no. 1 (June 2009): 1–19.
The result is that the misfolded form: S. B. Prusiner, “Prions,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 95, no. 23 (November 10, 1998): 13363–83, https://doi.org/10.1073/pnas.95.23.13363.
Alzheimer himself autopsied: A good review of the beta-amyloid hypothesis is R. E. Tanzi and L. Bertram, “Twenty Years of the Alzheimer’s Disease Amyloid Hypothesis: A Genetic Perspective,” Cell 120, no. 4 (February 25, 2005): 545–55, https://doi.org/10.1016/j.cell.2005.02.008.
In 1984, scientists identified: G. G. Glenner and C. W. Wong, “Alzheimer’s Disease and Down’s Syndrome: Sharing of a Unique Cerebrovascular Amyloid Fibril Protein,” Biochemical and Biophysical Research Communications 122, no. 3 (August 16, 1984): 1131–35, https://doi.org/10.1016/0006-291x(84)91209-9.
They turn out to have mutations: A. Goate et al., “Segregation of a Missense Mutation in the Amyloid Precursor Protein Gene with Familial Alzheimer’s Disease,” Nature 349, no. 6311 (February 21, 1991): 704–6, https://doi.org/10.1038/349704a0; M. C. Chartier-Harlin et al., “Early-Onset Alzheimer’s Disease Caused by Mutations at Codon 717 of the Beta-amyloid Precursor Protein Gene,” Nature 353, no. 6347 (October 31, 1991): 844–46, https://doi.org/10.1038/353844a0.
Perhaps these tau filaments: Jebelli, In Pursuit of Memory.
Although scientists were skeptical at first: P. Poorkaj et al., “Tau Is a Candidate Gene for Chromosome 17 Frontotemporal Dementia,” Annals of Neurology 43, no. 6 (June 1998): 815–25, https://doi.org/10.1002/ana.410430617; M. Hutton et al., “Association of Missense and 5’-splice-site Mutations in Tau with the Inherited Dementia FTDP-17,” Nature 393, no. 6686 (June 18, 1998): 702–5, https://doi.org/10.1038/31508; M. G. Spillantini et al., “Mutation in the Tau Gene in Familial Multiple System Tauopathy with Presenile Dementia,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 95, no. 13 (June 23, 1998): 7737–41, https://doi.org/10.1073/pnas.95.13.7737.
Rather, the aberrant: S. H. Scheres et al., “M. Cryo-EM Structures of Tau Filaments,” Current Opinion in Structural Biology 64, 17–25 (2020). https://doi.org/10.1016/j.sbi.2020.05.011; M. Schweighauser et al., “Structures of α-synuclein Filaments from Multiple System Atrophy,” Nature 585, no. 7825 (September 2020): 464–69, https://doi.org/10.1038/s41586-020-2317-6; Y. Yang et al., “Cryo-EM Structures of Amyloid-β 42 Filaments from Human Brains,” Science 375, no. 6577 (January 13, 2022): 167–72, https://doi.org/10.1126/science.abm7285.
We do know that if you delete the genes: H. Zheng et al., “Beta-Amyloid Precursor Protein-Deficient Mice Show Reactive Gliosis and Decreased Locomotor Activity,” Cell 81, no. 4 (May 19, 1995): 525–31, https://doi.org/10.1016/0092-8674(95)90073-x.
There is a growing feeling: M. Goedert, M. Masuda-Suzukake, and B. Falcon, “Like Prions: The Propagation of Aggregated Tau and α-synuclein in Neurodegeneration,” Brain 140, no. 2 (February 2017): 266–78, https://doi.org/10.1093/brain/aww230; A. Aoyagi et al., “Aβ and Tau Prion-like Activities Decline with Longevity in the Alzheimer’s Disease Human Brain,” Science Translational Medicine 11, no. 490 (May 1, 2019): eaat8462, https://doi.org/10.1126/scitranslmed.aat8462; M. Jucker and L. C. Walker, “Self-propagation of Pathogenic Protein Aggregates in Neurodegenerative Diseases,” Nature 501, no. 7465 (September 5, 2013): 45–51, https://doi.org/10.1038/nature12481.
Very recently, therapies: C. H. van Dyck et al., “Lecanemab in Early Alzheimer’s Disease,” New England Journal of Medicine 388, no. 1 (January 5, 2023): 9–21, https://doi.org/10.1056/nejmoa2212948; M. A. Mintun et al, “Donanemab in Early Alzheimer’s Disease,” New England Journal of Medicine 384 (May 6, 2021): 1691–1704, https://doi.org/10.1056/NEJMoa2100708. See also the more recent discussion by S. Reardon, “Alzheimer’s Drug Donanemab: What Promising Trial Means for Treatments,” Nature 617 (May 4, 2023): 232–33, https://doi.org/10.1038/d41586-023-01537-5.
7. Less Is More
Now, in a time of plenty: J. V. Neel, “Diabetes Mellitus: A ‘Thrifty’ Genotype Rendered Detrimental by ‘Progress,’” American Journal of Human Genetics 14, no. 4 (December 1962): 353–62, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1932342/.
“drifty genes”: J. R. Speakman, “Thrifty Genes for Obesity and the Metabolic Syndrome—Time to Call off the Search?,” Diabetes and Vascular Disease Research 3, no. 1 (May 2006): 7–11, https://doi.org/10.3132/dvdr.2006.010; J. R. Speakman, “Evolutionary Perspectives on the Obesity Epidemic: Adaptive, Maladaptive, and Neutral Viewpoints,” Annual Review of Nutrition 33, no. 1 (July 2013): 289–317, https://doi.org/10.1146/annurev-nutr-071811-150711.
The first studies to test this: Two surveys of the field from the mid-2000s are E. J. Masoro, “Overview of Caloric Restriction and Ageing,” Mechanisms of Ageing and Development 126, no. 9 (September 2005): 913–22, https://doi.org/10.1016/j.mad.2005.03.012, and B. K. Kennedy, K. K. Steffen, and M. Kaeberlein, “Ruminations on Dietary Restriction and Aging,” Cellular and Molecular Life Sciences 64, no. 11 (June 2007): 1323–28, doi: 10.1007/s00018-007-6470-y.
Moreover, they appeared to have delayed: R. Weindruch and R. L. Walford, The Retardation of Aging and Disease by Dietary Restriction (Springfield, IL: C. C. Thomas, 1988), as quoted in Kennedy, Steffen, and Kaeberlein, “Ruminations,” 1323–28; L. Fontana and L. Partridge, “Promoting Health and Longevity Through Diet: From Model Organisms to Humans,” Cell 161, no. 1 (March 26, 2015): 106–18, https://doi.org/10.1016/j.cell.2015.02.020.
In 2009: R. J. Colman et al., “Caloric Restriction Delays Disease Onset and Mortality in Rhesus Monkeys,” Science 325, no. 5937 (July 10, 2009): 201–4, https://doi.org/10.1126/science.1173635.
But this was contradicted: J. A. Mattison et al., “Impact of Caloric Restriction on Health and Survival in Rhesus Monkeys from the NIA Study,” Nature 489, no. 7415 (September 13, 2012): 318–21, https://doi.org/10.1038/nature11432. See the accompanying commentary by S. N. Austad, “Aging: Mixed Results for Dieting Monkeys,” Nature 489, no. 7415 (September 13, 2012): 210–11, https://doi.org/10.1038/nature11484, and a related news article in the same journal, A. Maxmen, “Calorie Restriction Falters in the Long Run,” Nature 488, no. 7413 (August 30, 2012), 569, https://doi.org/10.1038/488569a.
Any evidence for the effect of CR: Laura A. Cassiday, “The Curious Case of Caloric Restriction,” Chemical & Engineering News online, last modified August 3, 2009, https://cen.acs.org/articles/87/i31/Curious-Case-Caloric-Restriction.html.
There is 5:2 fasting: Gideon Meyerowitz-Katz, “Intermittent Fasting Is Incredibly Popular. But Is It Any Better Than Other Diets?,” Guardian (US edition) online, last modified January 1, 2020, https://www.theguardian.com/commentisfree/2020/jan/02/intermittent-fasting-is-incredibly-popular-but-is-it-any-better-than-other-diets.
They concluded that matching: V. Acosta-Rodríguez et al., “Circadian Alignment of Early Onset Caloric Restriction Promotes Longevity in Male C57BL/6J Mice,” Science 376, no. 6598 (May 5, 2022): 1192–202, https://doi.org/10.1126/science.abk0297. See the accompanying commentary in S. Deota and S. Panda, “Aligning Mealtimes to Live Longer,” Science 376, no. 6598 (May 5, 2022): 1159–60, https://doi.org/10.1126/science.adc8824.
In particular, sleep deprivation: Matthew Walker, Why We Sleep: The New Science of Sleep and Dreams (New York: Scribner, 2017). See in particular chapter 8 for its effects on aging.
According to a recent study: A. Vaccaro et al., “Sleep Loss Can Cause Death Through Accumulation of Reactive Oxygen Species in the Gut,” Cell 181, no. 6 (June 11, 2020): 1307–28.e15, https://doi.org/10.1016/j.cell.2020.04.049. See also a popular discussion of this in Veronique Greenwood, “Why Sleep Deprivation Kills,” Quanta, last modified June 4, 2020, https://www.quantamagazine.org/why-sleep-deprivation-kills-20200604/, and Steven Strogatz, “Why Do We Die Without Sleep?,” The Joy of Why (podcast, transcription), March 22, 2022, https://www.quantamagazine.org/why-do-we-die-without-sleep-20220322/.
In one study: C.-Y Liao et al., “Genetic Variation in Murine Lifespan Response to Dietary Restriction: From Life Extension to Life Shortening,” Aging Cell 9, no. 1 (February 2010): 92–95, https://doi.org/10.1111/j.1474-9726.2009.00533.x.
He felt that animals: L. Hayflick, “Dietary Restriction: Theory Fails to Satiate,” Science 329, no. 5995 (August 27, 2010): 1014, https://www.science.org/doi/10.1126/science.329.5995.1014; L. Fontana, L. Partridge, and V. Longo, “Dietary Restriction: Theory Fails to Satiate—Response,” Science 329, no. 5995 (August 27, 2010): 1015, https://www.science.org/doi/10.1126/science.329.5995.1015.
Moreover, when scientists: Saima May Sidik, “Dietary Restriction Works in Lab Animals, But It Might Not Work in the Wild,” Scientific American online, last modified December 20, 2022, https://www.scientificamerican.com/article/dietary-restriction-works-in-lab-animals-but-it-might-not-work-in-the-wild/.
On a more granular level: Fontana and Partridge, “Promoting Health and Longevity,” 106–18.
Among its other reported downsides: J. R. Speakman and S. E. Mitchell, “Caloric Restriction,” Molecular Aspects of Medicine 32, no. 3 (June 2011): 159–221, https://doi.org/10.1016/j.mam.2011.07.001.
In 1964: For an intriguing history of the discovery of rapamycin, see Bethany Halford, “Rapamycin’s Secrets Unearthed,” Chemical & Engineering News online, last modified July 18, 2016, https://cen.acs.org/articles/94/i29/Rapamycins-Secrets-Unearthed.html, which is the basis for the next few paragraphs. See also David Stipp, “A New Path to Longevity,” Scientific American online, last modified January 1, 2012), https://www.scientificamerican.com/article/a-new-path-to-longevity/.
Here our story shifts to Basel, Switzerland: U. S. Neill, “A Conversation with Michael Hall,” Journal of Clinical Investigation 127, no. 11 (November 1, 2017): 3916–17, https://doi.org/10.1172/jci97760; C. L. Williams, “Talking TOR: A Conversation with Joe Heitman and Rao Movva,” JCI Insight 3, no. 4 (February 22, 2018): e99816, https://doi.org/10.1172/jci.insight.99816.
How cell size and shape are controlled: M. B. Ginzberg, R. Kafri, and M. Kirschner, “On Being the Right (Cell) Size,” Science 348, no. 6236 (May 15, 2015): 1245075, https://doi.org/10.1126/science.1245075.
His paper was rejected: N. C. Barbet et al., “TOR Controls Translation Initiation and Early G1 Progression in Yeast,” Molecular Biology of the Cell 7, no. 1 (January 1, 1996): 25–42, https://doi.org/10.1091/mbc.7.1.25. For Hall’s recollections about the early days and the difficulty of getting the scientific community to accept that cell growth was actively controlled, see M. N. Hall, “TOR and Paradigm Change: Cell Growth Is Controlled,” Molecular Biology of the Cell 27, no. 18 (September 15, 2016): 2804–6, https://doi.org/10.1091/mbc.E15-05-0311.
We can now see: D. Papadopoli et al., “mTOR as a Central Regulator of Lifespan and Aging,” F1000 Research 8 (July 2, 2019): 998, https://doi.org/10.12688/f1000research.17196.1; G. Y. Liu and D. M. Sabatini, “mTOR at the Nexus of Nutrition, Growth, Ageing and Disease,” Nature Reviews Molecular Biology 21, no. 4 (April 2020): 183–203, https://doi.org/10.1038/s41580-019-0199-y.
It turns out that both a defective TOR: L. Partridge, M. Fuentealba, and B. K. Kennedy, “The Quest to Slow Ageing Through Drug Discovery,” Nature Reviews Drug Discovery 19, no. 8 (August 2020): 513–32, https://doi.org/10.1038/s41573-020-0067-7.
Strikingly, even short courses: D. E. Harrison et al., “Rapamycin Fed Late in Life Extends Lifespan in Genetically Heterogeneous Mice,” Nature 460, no. 7253 (July 16, 2009): 392–95, https://doi.org/10.1038/nature08221; see the accompanying commentary by M. Kaeberlein and R. K. Kennedy, “Ageing: A Midlife Longevity Drug?,” Nature 460, no. 7253 (July 16, 2009): 331–32, https://doi.org/10.1038/460331a.
Rapamycin also delayed: F. M. Menzies and D. C. Rubinsztein, “Broadening the Therapeutic Scope for Rapamycin Treatment,” Autophagy 6, no. 2 (February 2010): 286–87, https://doi.org/10.4161/auto.6.2.11078.
While rapamycin inhibits: K. Araki et al., “mTOR Regulates Memory CD8 T-cell Differentiation,” Nature 460, no. 7251 (July 2, 2009): 108–12, https://doi.org/10.1038/nature08155.
Another study, from 2009, showed that administering rapamycin: C. Chen et al. “mTOR Regulation and Therapeutic Rejuvenation of Aging Hematopoietic Stem Cells,” Science Signaling 2, no. 98 (November 24, 2009): ra75, https://doi.org/10.1126/scisignal.2000559.
As one might expect: A. M. Eiden, “Molecular Pathways: Increased Susceptibility to Infection Is a Complication of mTOR Inhibitor Use in Cancer Therapy,” Clinical Cancer Research 22, no. 2 (January 15, 2016): 277–83, https://doi.org/10.1158/1078-0432.ccr-14-3239.
“warrants caution”: A. J. Pagán et al., “mTOR-Regulated Mitochondrial Metabolism Limits Mycobacterium-Induced Cytotoxicity, Cell 185, no. 20 (September 29, 2022): 3720–38, e13, https://doi.org/10.1016/j.cell.2022.08.018.
“I suppose the rapamycin advocates”: Michael Hall, email message to the author, September 29, 2022.
The consortium will analyze: K. E. Creevy et al., “An Open Science Study of Ageing in Companion Dogs,” Nature 602, no. 7895 (February 2022): 51–57, https://doi.org/10.1038/s41586-021-04282-9.
They go on to suggest: M. V. Blagosklonny and M. N. Hall, “Growth and Aging: A Common Molecular Mechanism,” Aging 1, no. 4 (April 20, 2009): 357–62, https://doi.org/10.18632/aging.100040.
8. Lessons from a Lowly Worm
A study of 2,700 Danish twins: A. M. Herskind et al., “The Heritability of Human Longevity: A Population-Based Study of 2,872 Danish Twin Pairs Born 1870–1900,” Human Genetics 97, no. 3 (March 1996): 319–23, https://doi.org/10.1007/BF02185763.
Once he and Crick: Their views and plans are outlined in a 1971 report by Francis Crick and Sydney Brenner. See F. H. C. Crick and S. Brenner, Report to the Medical Research Council: On the Work of the Division of Molecular Genetics, Now the Division of Cell Biology, from 1961–1971 (Cambridge, UK: MRC Laboratory of Molecular Biology, November 1971), https://profiles.nlm.nih.gov/spotlight/sc/catalog/nlm:nlmuid-101584582X71-doc.
Scientists went on to identify: For this work, Brenner was awarded the 2002 Nobel Prize in Physiology or Medicine, along with two of his former colleagues, John Sulston and Robert Horvitz. “The Nobel Prize in Physiology or Medicine 2002,” Nobel Prize online, accessed July 22, 2023, https://www.nobelprize.org/prizes/medicine/2002/summary/.
As Hirsh recalled: David Hirsh, email message to the author, August 1, 2022.
Instead, it turned out: D. B. Friedman and T. E. Johnson, “A Mutation in the age-1 Gene in Caenorhabditis elegans Lengthens Life and Reduces Hermaphrodite Fertility,” Genetics 118, no. 1 (January 1, 1988): 75–86, https://doi.org/10.1093/genetics/118.1.75.
Johnson went on to show: T. E. Johnson, “Increased Life-Span of age-1 Mutants in Caenorhabditis elegans and Lower Gompertz Rate of Aging,” Science 249, no. 4971 (August 24, 1990): 908–12, https://doi.org/10.1126/science.2392681.
Even after it finally appeared in the prestigious journal Science in 1990: David Stipp’s book The Youth Pill: Scientists at the Brink of an Anti-Aging Revolution (New York: Penguin, 2010) contains an engaging and detailed account of the history, personalities, and science behind the discovery of aging mutants.
she felt inspired: Two firsthand accounts by Kenyon and Johnson of their discoveries are C. Kenyon, “The First Long-Lived Mutants: Discovery of the Insulin/IGF-1 Pathway for Ageing,” Philosophical Transactions of the Royal Society B: Biological Sciences 366, no. 1561 (January 12, 2001): 9–16, https://doi.org/10.1098/rstb.2010.0276, and T. E. Johnson, “25 Years After age-1: Genes, Interventions and the Revolution in Aging Research,” Experimental Gerontology 48, no. 7 (July 2013): 640–43, https://doi.org/10.1016/j.exger.2013.02.023.
her 1993 paper: C. Kenyon et al., “A C. elegans Mutant That Lives Twice as Long as Wild Type,” Nature 366, no. 6454 (December 2, 1993): 461–64, https://doi.org/10.1038/366461a0.
Apart from her stellar academic pedigree: Stipp, Youth Pill.
“I thought, ‘Oh, gosh’”: Ibid.
As it turns out, the age-1 gene originally identified: The key papers for the identity of some of the key genes are (daf-2) K. D. Kimura, H. A. Tissenbaum, and G. Ruvkun, “daf-2, an Insulin Receptor-Like Gene That Regulates Longevity and Diapause in Caenorhabditis elegans,” Science 277, no. 5328 (August 15, 1997): 942–46, https://doi.org/10.1126/science.277.5328.942; (age-1, which turned out to be the same as daf-23), J. Z. Morris, H. A. Tissenbaum, and G. Ruvkun, “A Phosphatidylinositol-3-OH Kinase Family Member Regulating Longevity and Diapause in Caenorhabditis elegans, Nature 382, no. 6591 (August 8, 1996): 536–39, https://doi.org/10.1038/382536a0; (daf-16), S. Ogg et al., “The Fork Head Transcription Factor DAF-16 Transduces Insulin-like Metabolic and Longevity Signals in C. elegans,” Nature 389, no. 6654 (October 30, 1997): 994–99, https://doi.org/10.1038/40194, and K. Lin et al., “daf-16: An HNF-3/Forkhead Family Member That Can Function to Double the Life-Span of Caenorhabditis elegans,” Science 278, no. 5341 (November 14, 1997): 1319–22, https://doi.org/10.1126/science.278.5341.1319.
“constitute a treasure trove”: C. J. Kenyon, “The Genetics of Ageing,” Nature 464, no. 7288 (March 25, 2010): 504–12, https://doi.org/10.1038/nature08980.
Among the many reasons for this: H. Yan et al., “Insulin Signaling in the Long-Lived Reproductive Caste of Ants,” Science 377, no. 6610 (September 1, 2022): 1092–99, https://doi.org/10.1126/science.abm8767.
However, if the experiment is repeated—but this time using a strain: E. Cohen et al., “Opposing Activities Protect Against Age-Onset Proteotoxicity,” Science 313, no. 5793 (September 15, 2006): 1604–10, https://doi.org/10.1126/science.1124646.
Deleting the gene that codes for a protein: D. J. Clancy et al., “Extension of Life-span by Loss of CHICO, a Drosophila Insulin Receptor Substrate Protein,” Science 292, no. 5514 (April 6, 2001): 104–6, https://doi.org/10.1126/science.1057991.
The IGF-1 receptor is essential: M. Holzenberger et al., “IGF-1 Receptor Regulates Lifespan and Resistance to Oxidative Stress in Mice,” Nature 421, no. 6919 (January 9, 2003): 182–87, https://doi.org/10.1038/nature01298; G. J. Lithgow and M. S. Gill, “Physiology: Cost-Free Longevity in Mice,” Nature 421, no. 6919 (January 9, 2003): 125–26, https://doi.org/10.1038/421125a.
An analysis of subjects: D. A. Bulger et al., “Caenorhabditis elegans DAF-2 as a Model for Human Insulin Receptoropathies,” G3 Genes|Genomes|Genetics 7, no. 1 (January 1, 2017): 257–68, https://doi.org/10.1534/g3.116.037184.
Mutations known to impair IGF-1: Y. Suh et al., “Functionally Significant Insulin-like Growth Factor I Receptor Mutations in Centenarians,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 105, no. 9 (March 4, 2008): 3438–42, https://doi.org/10.1073/pnas.0705467105; T. Kojima et al., “Association Analysis Between Longevity in the Japanese Population and Polymorphic Variants of Genes Involved in Insulin and Insulin-like Growth Factor 1 Signaling Pathways,” Experimental Gerontology 39, nos. 11/12 (November/December 2004): 1595–98, https://doi.org/10.1016/j.exger.2004.05.007.
Variants in proteins: See references in Kenyon, “Genetics of Ageing,” 504–12.
Exactly as you might predict: S. Honjoh et al., “Signalling Through RHEB-1 Mediates Intermittent Fasting-Induced Longevity in C. elegans,” Nature 457, no. 7230 (February 5, 2009): 726–30, https://doi.org/10.1038/nature07583.
This means that caloric restriction: B. Lakowski and S. Hekimi, “The Genetics of Caloric Restriction in Caenorhabditis elegans,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 95, no. 22 (October 27, 1998): 13091–96, https://doi.org/10.1073/pnas.95.22.13091.
When worms were subjected: D. W. Walker et al., “Evolution of Lifespan in C. elegans,” Nature 405, no. 6784 (May 18, 2000): 296–97, https://doi.org/10.1038/35012693.
To understand the difference: Stephen O’Rahilly, conversation with the author, August 11, 2022.
Because of recent advances: H. R. Bridges et al., “Structural Basis of Mammalian Respiratory Complex I Inhibition by Medicinal Biguanides,” Science 379, no. 6630 (January 26, 2023): 351–57, https://www.science.org/doi/10.1126/science.ade3332.
Disrupting our ability to utilize glucose: G. Rena, D. G. Hardie, and E. R. Pearson, “The Mechanisms of Action of Metformin,” Diabetologia 60, no. 9 (September 2017): 1577–85, https://doi.org/10.1007/s00125-017-4342-z; T. E. LaMoia and G. I. Shulman, “Cellular and Molecular Mechanisms of Metformin Action,” Endocrine Reviews 42, no. 1 (February 2021): 77–96, https://doi.org/10.1210/endrev/bnaa023.
Although some studies have claimed: L. C. Gormsen et al., “Metformin Increases Endogenous Glucose Production in Non-Diabetic Individuals and Individuals with Recent-Onset Type 2 Diabetes,” Diabetologia 62, no. 7 (July 2019): 1251–56, https://doi.org/10.1007/s00125-019-4872-7.
According to another study, the drug alters: H. Wu et al., “Metformin Alters the Gut Microbiome of Individuals with Treatment-Naive Type 2 Diabetes, Contributing to the Therapeutic Effects of the Drug,” Nature Medicine 23, no. 7 (July 2017): 850–58, https://doi.org/10.1038/nm.4345.
Steve O’Rahilly’s work demonstrates: A. P. Coll et al., “GDF15 Mediates the Effects of Metformin on Body Weight and Energy Balance,” Nature 578, no. 7795 (February 2020): 444–48, https://doi.org/10.1038/s41586-019-1911-y.
In the first, from the National Institute on Aging, long-term treatment: A. Martin-Montalvo et al., “Metformin Improves Healthspan and Lifespan in Mice,” Nature Communications 4 (2013): 2192, https://doi.org/10.1038/ncomms3192.
A second study, in humans: C. A. Bannister et al., “Can People with Type 2 Diabetes Live Longer Than Those Without? A Comparison of Mortality in People Initiated with Metformin or Sulphonylurea Monotherapy and Matched, Non-Diabetic Controls,” Diabetes, Obesity and Metabolism 16, no. 11 (November 2014): 1165–73, https://doi.org/10.1111/dom.12354.
One, from 2016, concluded that metformin: M. Claesen et al., “Mortality in Individuals Treated with Glucose-Lowering Agents: A Large, Controlled Cohort Study,” Journal of Clinical Endocrinology & Metabolism 101, no. 2 (February 1, 2016): 461–69, https://doi.org/10.1210/jc.2015-3184.
Curiously, some of the toxicity: L. Espada et al., “Loss of Metabolic Plasticity Underlies Metformin Toxicity in Aged Caenorhabditis Elegans,” Nature Metabolism 2, no. 11 (November 2020): 1316–31, https://doi.org/10.1038/s42255-020-00307-1.
Metformin also undermined: A. R. Konopka et al., “Metformin Inhibits Mitochondrial Adaptations to Aerobic Exercise Training in Older Adults,” Aging Cell 18, no. 1 (February 2019): e12880, https://doi.org/10.1111/acel.12880.
And one study claimed that diabetics: Y. C. Kuan et al., “Effects of Metformin Exposure on Neurodegenerative Diseases in Elderly Patients with Type 2 Diabetes Mellitus,” Progress in Neuropsychopharmacol and Biological Psychiatry 79, pt. B (October 3, 2017): 1777–83 (2017), https://doi.org/10.1016/j.pnpbp.2017.06.002.
The study’s goal is to see: “The Tame Trial: Targeting the Biology of Aging: Ushering a New Era of Interventions,” American Federation for Aging Research (AFAR) online, accessed August 1, 2023, https://www.afar.org/tame-trial.
That was exactly the skepticism: A detailed account of how Guarente became involved in this research and his laboratory’s early discoveries is found in his book, Lenny Guarente, Ageless Quest: One Scientist’s Search for Genes That Prolong Youth (Cold Spring Harbor, NY: Cold Spring Harbor Press, 2003).
Increasing the amount of Sir2: M. Kaeberlein, M. McVey, and L. Guarente, “The SIR2/3/4 Complex and SIR2 Alone Promote Longevity in Saccharomyces cerevisiae by Two Different Mechanisms,” Genes and Development 13, no. 19, October 1, 1994, 2570–80, https://doi.org/10.1101/gad.13.19.2570.
They soon found, with mounting excitement: B. Rogina and S. L. Helfand, “Sir2 Mediates Longevity in the Fly Through a Pathway Related to Calorie Restriction,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 101, no. 45 (November 2004): 15998–6003, https://doi.org/10.1073/Pnas.040418410; H. A. Tissenbaum and L. Guarente, “Increased Dosage of a Sir-2 Gene Extends Lifespan in Caenorhabditis Elegans,” Nature 410, no. 6825 (March 8, 2001): 227–30, https://doi.org/10.1038/35065638.
Sir2 turns out to be a deacetylase: S. Imai et al., “Transcriptional Silencing and Longevity Protein Sir2 Is an NAD-Dependent Histone Deacetylase,” Nature 403, no. 6771 (February 17, 2000): 795–800, https://doi.org/10.1038/35001622; W. Dang et al., “Histone H4 Lysine 16 Acetylation Regulates Cellular Lifespan,” Nature 459, no. 7248 (June 11, 2009): 802–7, https://doi.org/10.1038/nature08085.
Sure enough, in both flies and yeast: S. J. Lin, P. A. Defossez, and L. Guarente, “Requirement of NAD and SIR2 for Life-span Extension by Calorie Restriction in Saccharomyces cerevisiae,” Science 289, no. 5487 (September 22, 2000): 2126–28, https://doi.org/10.1126/science.289.5487.2126; Rogina and Helfand, “Sir2 Mediates Longevity in the Fly,” 15998–6003.
“When single genes are changed”: L. Guarente and C. Kenyon, “Genetic Pathways That Regulate Ageing in Model Organisms,” Nature 408, no. 6809 (November 9, 2000): 255–62, https://doi.org/10.1038/35041700.
Finally, here was scientific evidence: K. T. Howitz. et al., “Small Molecule Activators of Sirtuins Extend Saccharomyces cerevisiae Lifespan,” Nature 425, no. 6809 (November 9, 2000): 191–96, https://doi.org/10.1038/nature01960.
Although the mice remained overweight: J. A. Baur et al., “Resveratrol Improves Health and Survival of Mice on a High-Calorie Diet,” Nature 444, no. 7117 (November 16, 2006): 337–42, https://doi.org/10.1038/nature05354; M. Lagouge et al., “Resveratrol Improves Mitochondrial Function and Protects Against Metabolic Disease by Activating SIRT1 and PGC-1alpha,” Cell 127, no. 6 (December 15, 2006): 1109–22, https://doi.org/10.1016/j.cell.2006.11.013.
Among other things: M. Kaeberlein et al., “Sir2-Independent Life Span Extension by Calorie Restriction in Yeast,” PLoS Biology 2, no. 9 (September 2004): E296, https://doi.org/10.1371/journal.pbio.0020296.
Not only that, but they did not find: M. Kaeberlein et al., “Substrate-Specific Activation of Sirtuins by Resveratrol,” Journal of Biological Chemistry 280, no. 17 (April 2005): 17038–45, https://doi.org/10.1074/jbc.M500655200.
Pharmaceutical companies do not usually: M. Pacholec et al., “SRT1720, SRT2183, SRT1460, and Resveratrol Are Not Direct Activators of SIRT1,” Journal of Biological Chemistry 285, no. 11 (March 2010): 8340–51, https://doi.org/10.1074/jbc.M109.088682.
Five years after the sale: John La Mattina, “Getting the Benefits of Red Wine from a Pill? Not Likely,” Forbes online, last modified March 19, 2013, https://www.forbes.com/sites/johnlamattina/2013/03/19/getting-the-benefits-of-red-wine-from-a-pill-not-likely/.
This led to another commentary: B. P. Hubbard et al., “Evidence for a Common Mechanism of SIRT1 Regulation by Allosteric Activators,” Science 339, no. 6124 (March 8, 2013): 1216–19, https://doi.org/10.1126/science.1231097; H. Yuan and R. Marmorstein, “Red Wine, Toast of the Town (Again),” Science 339, no. 6124 (March 8, 2013): 1156–57, https://doi.org/10.1126/science.1236463.
None of them had any significant effect: R. Strong et al., “Evaluation of Resveratrol, Green Tea Extract, Curcumin, Oxaloacetic Acid, and Medium-Chain Triglyceride Oil on Life Span of Genetically Heterogeneous Mice,” Journals of Gerontology: Series A 68, no. 1 (January 2013): 6–16, https://doi.org/10.1093/gerona/gls070.
Sir2 activation actually reduces: P. Fabrizio et al., “Sir2 Blocks Extreme Life-span Extension,” Cell 123, no. 4 (November 18, 2005): 655–67, https://doi.org/10.1016/j.cell.2005.08.042; see also commentary by B. K. Kennedy, E. D. Smith, and M. Kaeberlein, “The Enigmatic Role of Sir2 in Aging,” Cell 123, no. 4 (November 18, 2005): 548–50, https://doi.org/10.1016/j.cell.2005.11.002.
Feeling embattled: C. Burnett et al., “Absence of Effects of Sir2 Overexpression on Lifespan in C. elegans and Drosophila,” Nature 477, no. 7365 (September 21, 2011): 482–85, https://doi.org/10.1038/nature10296; K. Baumann, “Ageing: A Midlife Crisis for Sirtuins,” Nature Reviews Molecular Cell Biology 12, no. 11 (October 21, 2011): 688, https://doi.org/10.1038/nrm3218; D. B. Lombard et al., “Ageing: Longevity Hits a Roadblock,” Nature 477, no. 7365 (September 21, 2011): 410–11, https://doi.org/10.1038/477410a; M. Viswanathan and L. Guarente, “Regulation of Caenorhabditis elegans lifespan by sir-2.1 Transgenes,” Nature 477, no. 7365 (September 21, 2011): E1–2, https://doi.org/10.1038/nature10440.
The protein is also a histone: R. Mostoslavsky et al., “Genomic Instability and Aging-like Phenotype in the Absence of Mammalian SIRT6,” Cell 124, no. 2 (January 24, 2006): 315–29, https://doi.org/10.1016/j.cell.2005.11.044; E. Michishita et al. “SIRT6 Is a Histone H3 Lysine 9 Deacetylase That Modulates Telomeric Chromatin,” Nature 452, no. 7186 (March 27, 2008): 492–96, https://doi.org/10.1038/nature06736; A. Roichman et al., “SIRT6 Overexpression Improves Various Aspects of Mouse Healthspan,” Journals of Gerontology: Series A 72, no. 5 (May 1, 2017): 603–15, https://doi.org/10.1093/gerona/glw152; X. Tian et al., “SIRT6 Is Responsible for More Efficient DNA Double-Strand Break Repair in Long-Lived Species,” Cell 177, no. 3 (April 18, 2019): 622–38.e22, https://doi.org/10.1016/j.cell.2019.03.043.
Many in the gerontology community: C. Brenner, “Sirtuins Are Not Conserved Longevity Genes,” Life Metabolism 1, no. 2 (October 2022), 122–33, https://doi.org/10.1093/lifemeta/loac025.
It is made by the body: P. Belenky, K. L. Bogan, and C. Brenner, “NAD+ Metabolism in Health and Disease,” Trends in Biochemical Sciences 32, no. 1 (January 2017): 12–19, https://doi.org/10.1016/j.tibs.2006.11.006.
It can also cause a host: H. Massudi et al., “Age-Associated Changes in Oxidative Stress and NAD+ Metabolism in Human Tissue,” PLoS One 7, no. 7 (2012): e42357, https://doi.org/10.1371/journal.pone.0042357; X. H. Zhu et al., “In Vivo NAD Assay Reveals the Intracellular NAD Contents and Redox State in Healthy Human Brain and Their Age Dependences,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 112, no. 9 (February 17, 2015): 2876–81, https://doi.org/10.1073/pnas.1417921112; A. J. Covarrubias et al., “NAD+ Metabolism and Its Roles in Cellular Processes During Ageing,” Nature Reviews Molecular Cell Biology 22, no. 2 (February 2021): 119–41, https://doi.org/10.1038/s41580-020-00313-x.
Increasing NAD levels: H. Zhang et al., “NAD+ Repletion Improves Mitochondrial and Stem Cell Function and Enhances Life Span in Mice,” Science 352, no. 6292 (April 28, 2016): 1436–43, https://doi.org/10.1126/science.aaf2693; see also the commentary on this report by L. Guarente, “The Resurgence of NAD+,” Science 352, no. 6292 (April 28, 2016): 1396–97, https://doi.org/10.1126/science.aag1718; K. F. Mills et al., “Long-Term Administration of Nicotinamide Mononucleotide Mitigates Age-Associated Physiological Decline in Mice,” Cell Metabolism 24, no. 6 (December 13, 2016): 795–806, https://doi.org/10.1016/j.cmet.2016.09.013.
“I expressly tell people”: Charles Brenner, email message to the author, January 22, 2023.
The results of taking: Partridge, Fuentealba, and Kennedy, “Quest to Slow Ageing,” 513–32.
Global sales of NMN: Global News Wire, “Nicotinamide Mononucleotide (NMN) Market Will Turn Over USD 251.2 to Revenue to Cross USD 953 Million in 2022 to 2028 Research by Business Opportunities, Top Companies, Opportunities Planning, Market-Specific Challenges,” August 19, 2022, https://www.globenewswire.com/en/news-release/2022/08/19/2501489/0/en/Nicotinamide-Mono nucleotide-NMN-Market-will-Turn-over-USD-251-2-to-Revenue-to-Cross-USD-953-million-in-2022-to-2028-Research-by-Business-Opportunities-Top-Companies-opportunities-p.html.
9. The Stowaway Within Us
“I quit my job”: Martin Weil, “Lynn Margulis, Leading Evolutionary Biologist, Dies at 73,” Washington Post online, November 26, 2011, https://www.washingtonpost.com/local/obituaries/lynn-margulis-leading-evolutionary-biologist-dies-at-73/2011/11/26/gIQAQ 5dezN_story.html.
Margulis wrote an essay: Lynn Margulis, “Two Hit, Three Down—The Biggest Lie: David Ray Griffin’s Work Exposing 9/11,” in Dorion Sagan, ed., Lynn Margulis: The Life and Legacy of a Scientific Rebel (White River Junction, VT: Chelsea Green, 2012), 150–55.
questioned whether the human immunodeficiency virus (HIV): Joanna Bybee, “No Subject Too Sacred,” in Sagan, ed. Lynn Margulis, 156–62.
You could think of Margulis’s idea: L. Sagan, “On the Origin of Mitosing Cells,” Journal of Theoretical Biology 14, no. 3 (March 14, 1967): 255–74, https://doi.org/10.1016/0022-5193(67)90079-3.
In the same way that water: The idea that ATP is made by using a proton gradient across a membrane was proposed by Peter Mitchell and highly controversial initially. He went on to receive the 1978 Nobel Prize. See: Royal Swedish Academy of Sciences, “The Nobel Prize in Chemistry 1978: Peter Mitchell,” press release, October 17, 1978, available at Nobel Prize online, https://www.nobelprize.org/prizes/chemis try/1978/press-release/. Part of the 1997 Chemistry Nobel Prize was awarded to Paul Boyer and John Walker for their work on the molecular turbine that actually makes the ATP. The Nobel press release has an excellent description of it: Royal Swedish Academy of Sciences, “The Nobel Prize in Chemistry 1997: Paul D. Boyer, John E. Walker, Jens C. Skou,” press release, October 15, 1997, available at Nobel Prize online, https://www.nobelprize.org/prizes/chemistry/1997/press-release/.
The human body has to generate: F. Du et al., “Tightly Coupled Brain Activity and Cerebral ATP Metabolic Rate,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 105, no. 17 (April 29, 2008): 6409–14, https://doi.org/10.1073/pnas.0710766105. For a popular account of this article, see N. Swaminathan, “Why Does the Brain Need So Much Power?,” Scientific American online, April 29, 2008, https://www.scientificamerican.com/article/why-does-the-brain-need-s/.
The child will carry mostly: Ian Sample, “UK Doctors Select First Women to Have ‘Three-Person Babies,’” Guardian (US edition) online, last modified February 1, 2018, https://www.theguardian.com/science/2018/feb/01/permission-given-to-create-britains-first-three-person-babies.
Excessive contacts: J. Valades et al, “ER Lipid Defects in Neuropeptidergic Neurons Impair Sleep Patterns in Parkinson’s Diseases,” Neuron 98, no. 6 (June 27, 2018): 1155–69, https://doi.org/10.1016/j.neuron.2018.05.022.
Perhaps no other structure: N. Sun, R. J. Youle, and T. Finkel, “The Mitochondrial Basis of Aging,” Molecular Cell 61, no. 5 (March 3, 2016): 654–66, https://doi.org/10.1016/j.molcel.2016.01.028.
In 1954: D. Harman, “Origin and Evolution of the Free Radical Theory of Aging: A Brief Personal History, 1954–2009,” Biogerontology 10, no. 6 (December 2009): 773–81, https://doi.org/10.1007/s10522-009-9234-2.
Harman’s idea: R. S. Sohal and R. Weindruch, “Oxidative Stress, Caloric Restriction, and Aging,” Science 273, no. 5271 (July 5, 1996): 59–63, https://doi.org/10.1126/science.273.5271.59.
Over time, they damage: E. R. Stadtman, “Protein Oxidation and Aging,” Free Radical Research 40, no. 12 (December, 2006): 1250–58, https://doi.org/10.1080/10715760600918142.
Strains of mice that made: S. E. Schriner et al., “Extension of Murine Life Span by Overexpression of Catalase Targeted to Mitochondria,” Science 308, no. 5730 (June 24, 2005): 1909–11, https://doi.org/10.1126/science.1106653.
As recently as 2022: J. Hartke et al., “What Doesn’t Kill You Makes You Live Longer—Longevity of a Social Host Linked to Parasite Proteins,” bioRxiv (2022): https://doi.org/10.1101/2022.12.23.521666.
One way they may minimize: A. Rodríguez-Nuevo et al., “Oocytes Maintain ROS-free Mitochondrial Metabolism by Suppressing Complex I,” Nature 607, no. 7920 (July 2022): 756–61, https://doi.org/10.1038/s41586-022-04979-5.
Alas, although there were isolated reports: G. Bjelakovic et al., “Mortality in Randomized Trials of Antioxidant Supplements for Primary and Secondary Prevention: Systematic Review and Meta-analysis,” Journal of the American Medical Association (JAMA) 297, no. 8 (2007): (February 28, 2007): 842–57, https://doi.org/10.1001/jama.297.8.842.
But over the last ten to fifteen years: S. Hekimi, J. Lapointe, and Y. Wen, “Taking a ‘Good’ Look at Free Radicals in the Aging Process,” Trends in Cell Biology 21, no. 10 (October 2011): 569–76, https://doi.org/10.1016/j.tcb.2011.06.008. There are also first-rate discussions of the evidence in López-Otín et al., “Hallmarks of Aging,” 1194–217, and A. Bratic and N. G. Larsson, “The Role of Mitochondria in Aging,” Journal of Clinical Investigation 123, no. 3 (March 2013): 951–57, https://doi.org/10.1172/JCI64125.
Studies with other animals: See the papers cited in Bratic and Larsson, “Role of Mitochondria,” 951–57.
In fact, contrary to the report: V. I. Pérez et al., “The Overexpression of Major Antioxidant Enzymes Does Not Extend the Lifespan of Mice,” Aging Cell 8, no. 1 (February 2009): 73–75, https://doi.org/10.1111/j.1474-9726.2008.00449.x.
Giving them a herbicide: W. Yang and S. Hekimi, “A Mitochondrial Superoxide Signal Triggers Increased Longevity in Caenorhabditis elegans,” PLoS Biology 8, no. 12 (December 2010): e1000556, https://doi.org/10.1371/journal.pbio.1000556.
The naked mole rat lives: B. Andziak et al., “High Oxidative Damage Levels in the Longest-Living Rodent, the Naked Mole-Rat,” Aging Cell 5, no. 6 (December 2006): 463–71, https://doi.org/10.1111/j.1474-9726.2006.00237.x; F. Saldmann et al., “The Naked Mole Rat: A Unique Example of Positive Oxidative Stress,” Oxidative Medicine and Cellular Longevity 2019 (February 7, 2019): 4502819, https://doi.org/10.1155/2019/450281.9.
This may be an example of something called hormesis: V. Calabrese et al., “Hormesis, Cellular Stress Response and Vitagenes as Critical Determinants in Aging and Longevity,” Molecular Aspects of Medicine 32, nos. 4–6 (August–December 2011): 279–304, https://doi.org/10.1016/j.mam.2011.10.007.
At the age of about sixty weeks: A. Trifunovic et al., “Premature Ageing in Mice Expressing Defective Mitochondrial DNA Polymerase,” Nature 429, no. 6990 (May 27, 2004): 417–23, https://doi.org/10.1038/nature02517. This and several other papers published the following year are reviewed in L. A. Loeb, D. C. Wallace, and G. M. Martin, “The Mitochondrial Theory of Aging and Its Relationship to Reactive Oxygen Species Damage and Somatic MtDNA Mutations,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 102, no. 52 (December 19, 2005): 18769–70, https://doi.org/10.1073/pnas.0509776102.
There are reports of a complicated interplay: E. F. Fang et al., “Nuclear DNA Damage Signalling to Mitochondria in Ageing,” Nature Reviews Molecular Cell Biology 17, no. 5 (May 2016): 308–21, https://doi.org/10.1038/nrm.2016.14; R. H. Hämäläinen et al., “Defects in mtDNA Replication Challenge Nuclear Genome Stability Through Nucleotide Depletion and Provide a Unifying Mechanism for Mouse Progerias,” Nature Metabolism 1, no. 10 (October 2019): 958–65, https://doi.org/10.1038/s42255-019-0120-1.
In these cases, clones: T. E. S. Kauppila, J. H. K. Kauppila, and N. G. Larsson, “Mammalian Mitochondria and Aging: An Update,” Cell Metabolism 25, no. 1 (January 10, 2017): 57–71, https://doi.org/10.1016/j.cmet.2016.09.017.
The effect is most pronounced: N. Sun, R. J. Youle, and T. Finkel, “The Mitochondrial Basis of Aging,” Molecular Cell 61, no. 5 (March 3, 2016): 654–66, https://doi.org/10.1016/j.molcel.2016.01.028.
One characteristic of aging: C. Franceschi et al., “Inflamm-aging. An Evolutionary Perspective on Immunosenescence,” Annals of the New York Academy of Sciences 908, no. 1 (June 2000): 244–54, https://doi.org/10.1111/j.1749-6632.2000.tb06651.x.
Some proteins can sense: N. P. Kandul et al., “Selective Removal of Deletion-Bearing Mitochondrial DNA in Heteroplasmic Drosophila,” Nature Communications 7 (November 14, 2016): art. 13100, https://doi.org/10.1038/ncomms13100.
The inhibition of TOR: M. Morita et al., “mTORC1 Controls Mitochondrial Activity and Biogenesis Through 4E-BP-Dependent Translational Regulation,” Cell Metabolism 18, no. 5 (November 5, 2013): 698–711, https://doi.org/10.1016/j.cmet.2013.10.001.
In studies, the increased mitochondrial activity: B. M. Zid et al., “4E-BP Extends Lifespan upon Dietary Restriction by Enhancing Mitochondrial Activity in Drosophila,” Cell 139, no. 1 (October 2, 2009): 149–60, https://doi.org/10.1016/j.cell.2009.07.034.
Besides TOR, other signals: C. Cantó and J. Auwerx, “PGC-1α, SIRT1 and AMPK, an Energy Sensing Network That Controls Energy Expenditure,” Current Opinion in Lipidology 20, no. 2 (April 2009): 98–105, https://doi.org/10.1097/mol.0b013e328328d0a4.
Sometimes, though, this effort is futile: C. Cantó and J. Auwerx, “PGC-1α, SIRT1 and AMPK, an Energy Sensing Network That Controls Energy Expenditure,” Current Opinion in Lipidology 20, no. 2 (April 2009): 98–105, https://doi.org/10.1097/mol.0b013e328328d0a4.
Physical activity turns on: See Sun, Youle, and Finkel, “Mitochondrial Basis of Aging,” 654–66; J. L. Steiner et al., “Exercise Training Increases Mitochondrial Biogenesis in the Brain,” Journal of Applied Physiology 111, no. 4 (October 2011): 1066–71, https://doi.org/10.1152/japplphysiol.00343.2011.
One way it spurs mitochondrial function: Z. Radak, H. Y. Chung, and S. Goto, “Exercise and Hormesis: Oxidative Stress-Related Adaptation for Successful Aging,” Biogerontology 6, no. 1 (2005): 71–75, https://doi.org/10.1007/s10522-004-7386-7.
Of course, exercise does far more: G. C. Rowe, A. Safdar, and Z. Arany, “Running Forward: New Frontiers in Endurance Exercise Biology,” Circulation 129, no. 7 (February 18, 2014): 798–810, https://doi.org/10.1161/circulationaha.113.001590.
But it is better repaired: J. B. Stewart and N. G. Larsson, “Keeping mtDNA in Shape Between Generations,” PLoS Genetics 10, no. 10 (October 9, 2014): e1004670, https://doi.org/10.1371/journal.pgen.1004670.
Nevertheless, selection is not perfect: Y. Bentov et al., “The Contribution of Mitochondrial Function to Reproductive Aging,” Journal of Assistive Reproduction and Genetics 28, no. 9 (September 2011): 773–83, https://doi.org/10.1007/s10815-011-9588-7.
10. Aches, Pains, and Vampire Blood
These tumor suppressor genes: M. Serrano et al., “Oncogenic ras Provokes Premature Cell Senescence Associated with Accumulation of p53 and p16INK4a,” Cell 88, no. 5 (March 7, 1997): 593–602, https://doi.org/10.1016/s0092-8674(00)81902-9; M. Narita and S. W. Lowe, “Senescence Comes of Age,” Nature Medicine 11, no. 9 (September 2005): 920–22, https://doi.org/10.1038/nm0905-920.
Senescent cells are often produced: M. Demaria et al., “An Essential Role for Senescent Cells in Optimal Wound Healing Through Secretion of PDGF-AA,” Developmental Cell 31, no. 6 (December 22, 2014): 722–33, https://doi.org/10.1016/j.devcel.2014.11.012; M. Serrano, “Senescence Helps Regeneration,” Developmental Cell 31, no. 6 (December 22, 2014): 671–72, https://doi.org/10.1016/j.devcel.2014.12.007.
As damage to our DNA accumulates: These reviews offer a comprehensive view of senescent cells’ role in aging: J. Campisi and F. d’Adda di Fagagna, “Cellular Senescence: When Bad Things Happen to Good Cells,” Nature Reviews Molecular Cell Biology 8, no. 9 (September 2007): 729–40, https://doi.org/10.1038/nrm2233; J. M. van Deursen, “The Role of Senescent Cells in Ageing,” Nature 509, no. 7501 (May 22, 2014): 439–46, https://doi.org/10.1038/nature13193; J. Gil, “Cellular Senescence Causes Ageing,” Nature Reviews Molecular Cell Biology 20 (July 2019): 388, https://doi.org/10.1038/s41580-019-0128-0.
They also lived: D. J. Baker et al., “Clearance of p16Ink4a-Positive Senescent Cells Delays Ageing-Associated Disorders,” Nature 479, no. 7372 (November 2, 2011): 232–36, https://doi.org/10.1038/nature10600; D. J. Baker et al., “Naturally Occurring p16(Ink4a)-Positive Cells Shorten Healthy Lifespan,” Nature 530, no. 7589 (February 11, 2016): 184–89, https://doi.org/10.1038/nature16932; see also the commentary by E. Callaway, “Destroying Worn-out Cells Makes Mice Live Longer,” Nature (February 3, 2016): https://doi.org/10.1038/nature.2016.19287.
When researchers used an oral cocktail: M. Xu et al., “Senolytics Improve Physical Function and Increase Lifespan in Old Age,” Nature Medicine 24, no. 8 (August 2018): 1246–56, https://doi.org/10.1038/s41591-018-0092-9.
But this isn’t strictly true: Donavyn Coffey, “Does the Human Body Replace Itself Every 7 Years?,” Live Science, last modified July 22, 2022, https://www.livescience.com/33179-does-human-body-replace-cells-seven-years.html; P. Heinke et al., “Diploid Hepatocytes Drive Physiological Liver Renewal in Adult Humans,” Cell Systems 13, no. 6 (June 15, 2022): 499–507.e12, https://doi.org/10.1016/j.cels.2022.05.001; K. L. Spalding et al., “Dynamics of Hippocampal Neurogenesis in Adult Humans,” Cell 153, no. 6 (June 6, 2013): 1219–27, https://doi.org/10.1016/j.cell.2013.05.002; A. Ernst et al., “Neurogenesis in the Striatum of the Adult Human Brain,” Cell 156, no. 5 (February 27, 2014): 1072–83, https://doi.org/10.1016/j.cell.2014.01.044.
This leads to immune system decline: For a comprehensive discussion of stem cell depletion, see López-Otín et al., “Hallmarks of Aging,” 1194–217, https://doi.org/10.1016/j.cell.2013.05.039.
After six weeks, the mice: A. Ocampo et al., “In Vivo Amelioration of Age-Associated Hallmarks by Partial Reprogramming,” Cell 167, no. 7 (December 15, 2016): 1719–33.e12, https://doi.org/10.1016/j.cell.2016.11.052.
Not only did the animals: K. C. Browder et al., “In Vivo Partial Reprogramming Alters Age-Associated Molecular Changes During Physiological Aging in Mice,” Nature Aging 2, no. 3 (March 2022): 243–53, https://doi.org/10.1038/s43587-022-00183-2; D. Chondronasiou et al., “Multi-omic Rejuvenation of Naturally Aged Tissues by a Single Cycle of Transient Reprogramming,” Aging Cell 21, no. 3 (March 2022): e13578, https://doi.org/10.1111/acel.13578; D. Gill et al., “Multi-omic Rejuvenation of Human Cells by Maturation Phase Transient Reprogramming,” eLife 11 (April 8, 2022): e71624, https://doi.org/10.7554/eLife.71624.
Their DNA methylation: Y. Lu et al., “Reprogramming to Recover Youthful Epigenetic Information and Restore Vision,” Nature 588, no. 7836 (December 2020): 124–29, https://doi.org/10.1038/s41586-020-2975-4; see also the news item K. Servick, “Researchers Restore Lost Sight in Mice, Offering Clues to Reversing Aging,” Science online, last modified December 2, 2020, https://doi.org/10.1126/science.abf9827.
These effects could be reversed: J.-H. Yang et al., “Loss of Epigenetic Information as a Cause of Mammalian Aging,” Cell 186, no. 2 (January 19, 2023), https://doi.org/10.1016/j.cell.2022.12.027.
He not only connected two rats: R. B. S. Harris, “Contribution Made by Parabiosis to the Understanding of Energy Balance Regulation,” Biochimica et Biophysica Acta (BBA)—Molecular Basis of Disease 1832, no. 9 (September 2013): 1449–55, https://doi.org/10.1016/j.bbadis.2013.02.021.
“If two rats are not adjusted”: C. M. McCay, F. Pope, and W. Lunsford, “Experimental Prolongation of the Life Span,” Journal of Chronic Diseases 4, no. 2 (August 1956): 153–58, https://www.sciencedirect.com/science/article/abs/pii/0021968156900157. Quoted in an overview of the field by M. Scudellari, “Ageing Research: Blood to Blood,” Nature 517, no. 7535 (January 22, 2015): 426–29, https://doi.org/10.1038/517426a.
But for some reason: Scudellari, “Ageing Research,” 426–29.
By the same criteria: M. J. Conboy, I. M. Conboy, and T. A. Rando, “Heterochronic Parabiosis: Historical Perspective and Methodological Considerations for Studies of Aging and Longevity,” Aging Cell 12, no. 3 (June 2013): 525–30, https://doi.org/10.1111/acel.12065.
He showed that old blood: S. A. Villeda et al., “The Ageing Systemic Milieu Negatively Regulates Neurogenesis and Cognitive Function,” Nature 477, no. 7362 (August 31, 2011): 90–94, https://doi.org/10.1038/nature10357; S. A. Villeda et al., “Young Blood Reverses Age-Related Impairments in Cognitive Function and Synaptic Plasticity in Mice,” Nature Medicine 20, no. 6 (June 2014): 659–63, https://doi.org/10.1038/nm.3569.
the Conboys and Rando pointed out: Conboy, Conboy, and Rando, “Heterochronic Parabiosis,” 525–30.
that were not joined: J. Rebo et al, “A Single Heterochronic Blood Exchange Reveals Rapid Inhibition of Multiple Tissues by Old Blood,” Nature Communications 7, no. 1 (June 10, 2016): art. 13363, https://doi.org/10.1038/ncomms13363.
Such cautionary views: Rebecca Robbins, “Young-Blood Transfusions Are on the Menu at Society Gala,” Scientific American online, last modified March 2, 2018, https://www.scientificamerican.com/article/young-blood-transfusions-are-on-the-menu-at-society-gala/.
Alarmed, the US Food and Drug Administration (FDA): Scott Gottlieb, “Statement from FDA Commissioner Scott Gottlieb, M.D., and Director of FDA’s Center for Biologics Evaluation and Research Peter Marks, M.D., Ph.D., Cautioning Consumers Against Receiving Young Donor Plasma Infusions That Are Promoted as Unproven Treatment for Varying Conditions,” U.S. Food and Drug Administration, press release, February 19, 2019, https://www.fda.gov/news-events/press-announcements/statement-fda-commissioner-scott-gottlieb-md-and-director-fdas-center-biologics-evaluation-and-0.
“Our patients really want”: Emily Mullin, “Exclusive: Ambrosia, the Young Blood Transfusion Startup, Is Quietly Back in Business,” OneZero, last modified November 8, 2019, https://onezero.medium.com/exclusive-ambrosia-the-young-blood-transfusion-startup-is-quietly-back-in-business-ee2b7494b417.
As for old blood, they zeroed in: J. M. Castellano et al., “Human Umbilical Cord Plasma Proteins Revitalize Hippocampal Function in Aged Mice,” Nature 544, no. 7651 (April 27, 2017): 488–92, https://doi.org/10.1038/nature22067; H. Yousef et al., “Aged Blood Impairs Hippocampal Neural Precursor Activity and Activates Microglia Via Brain Endothelial Cell VCAM1,” Nature Medicine 25, no. 6 (June 2019): 988–1000, https://doi.org/10.1038/s41591-019-0440-4.
In a second study: F. S. Loffredo et al., “Growth Differentiation Factor 11 Is a Circulating Factor That Reverses Age-Related Cardiac Hypertrophy,” Cell 153, no. 4 (May 9, 2013): 828–39, https://doi.org/10.1016/j.cell.2013.04.015; M. Sinha et al., “Restoring Systemic GDF11 Levels Reverses Age-Related Dysfunction in Mouse Skeletal Muscle,” Science 344, no. 6184 (May 9, 2014): 649–52, https://doi.org/10.1126/science.1251152; L. Katsimpardi et al., “Vascular and Neurogenic Rejuvenation of the Aging Mouse Brain by Young Systemic Factors,” Science 344, no. 6184 (May 9, 2014): 630–34, https://doi.org/10.1126/science.1251141. These findings are described in a very accessible article by Carl Zimmer, “Young Blood May Hold Key to Reversing Aging,” New York Times online, May 4, 2014, https://www.nytimes.com/2014/05/05/science/young-blood-may-hold-key-to-reversing-aging.html.
Clearing those senescent cells: O. H. Jeon et al., “Systemic Induction of Senescence in Young Mice After Single Heterochronic Blood Exchange,” Nature Metabolism 4, no. 8 (August 2022): 995–1006, https://doi.org/10.1038/s42255-022-00609-6.
It turns out that blood: A. M. Horowitz et al., “Blood Factors Transfer Beneficial Effects of Exercise on Neurogenesis and Cognition to the Aged Brain,” Science 369, no. 6500 (July 10, 2020): 167–73, https://doi.org/10.1126/science.aaw2622.
Rando and Wyss-Coray: J. O. Brett et al., “Exercise Rejuvenates Quiescent Skeletal Muscle Stem Cells in Old Mice Through Restoration of Cyclin D1,” Nature Metabolism 2, no. 4 (April 2020): 307–17, https://doi.org/10.1038/s42255-020-0190-0.
Although they both stimulated: M. T. Buckley et al., “Cell Type–Specific Aging Clocks to Quantify Aging and Rejuvenation in Regenerative Regions of the Brain,” Nature Aging 3 (January 2023): 121–37, https://www.nature.com/articles/s43587-022-00335-4.
He went to Resurgence Wellness, a Texas outfit: David Averre and Neirin Gray Desai, “Tech Billionaire, 45, Who Spends $2 Million a Year Trying to Reverse His Ageing Reveals Latest Gadget He Uses That Puts His Body Through the Equivalent of 20,000 Sit Ups in 30 Minutes,” Daily Mail (London) online, last modified April 5, 2023, https://www.dailymail.co.uk/news/article-11942581/Tech-billionaire-45-spends-2million-year-trying-reverse-ageing-reveals-latest-gadget.html; Orianna Rosa Royle, “Tech Billionaire Who Spends $2 Million a Year to Look Young Is Now Swapping Blood with His 17-Year-Old Son and 70-Year-Old Father,” Fortune online, last modified May 23, 2023, https://fortune.com/2023/05/23/bryan-johnson-tech-ceo-spends-2-million-year-young-swapping-blood-17-year-old-son-talmage-70-father/; Alexa Mikhail, “Tech CEO Bryan Johnson admits he saw ‘no benefits’ after controversially injecting his son’s plasma into his body to reverse his biological age,” Fortune, July 8, 2023, https://fortune.com/well/2023/07/08/bryan-johnson-plasma-exchange-results-anti-aging/.
11. Crackpots or Prophets?
An entire field of biology: S. Bojic et al., “Winter Is Coming: The Future of Cryopreservation,” BMC Biology 19, no. 1 (March 24, 2021): 56, https://doi.org/10.1186/s12915-021-00976-8.
The idea has been around a long time: Paul Vitello, “Robert C. W. Ettinger, a Proponent of Life After (Deep-Frozen) Death, Is Dead at 92,” New York Times online, July 29, 2011, https://www.nytimes.com/2011/07/30/us/30ettinger.html; Associated Press, “Cryonics Pioneer Robert Ettinger Dies,” Guardian (US edition) online, last modified July 26, 2011, https://www.theguardian.com/science/2011/jul/26/cryonics-pioneer-robert-ettinger-dies.
One such proponent is Elon Musk: See “Elon Musk on Cryonics,” Elon Musk, interviewed by Zach Latta, YouTube video, 2:09, uploaded by Hack Club on May 4, 2020, https://www.youtube.com/watch?v=MSIjNKssXAc.
“die on Mars”: Daniel Terdiman, “Elon Musk at SXSW: ‘I’d Like to Die on Mars, Just Not on Impact,’” CNET, last modified March 9, 2013, https://www.cnet.com/culture/elon-musk-at-sxsw-id-like-to-die-on-mars-just-not-on-impact/.
It would be like trying to deduce the entire state of a country: See a particularly cutting article that deals with this and the general issue of cryonics by the neurobiologist Michael Hendrick, “The False Science of Cryonics,” MIT Technology Review, September 15, 2015, https://www.technologyreview.com/2015/09/15/109906/the-false-science-of-cryonics.
What would be the point: Albert Cardona, conversation with the author, January 12, 2023.
She took the matter to court: Owen Bowcott and Amelia Hill, “14-Year-Old Girl Who Died of Cancer Wins Right to Be Cryogenically Frozen,” Guardian (US edition) online, last modified November 18, 2016, https://www.theguardian.com/science/2016/nov/18/teenage-girls-wish-for-preservation-after-death-agreed-to-by-court.
This elicited an outcry: Alexandra Topping and Hannah Devlin, “Top UK Scientist Calls for Restrictions on Marketing Cryonics,” Guardian (US edition) online, last modified November 18, 2016, https://www.theguardian.com/science/2016/nov/18/top-uk-scientist-calls-for-restrictions-on-marketing-cryonics.
In almost a mirror image: Tom Verducci, “What Really Happened to Ted Williams?,” Sports Illustrated online, last modified August 18, 2003, https://vault.si.com/vault/2003/08/18/what-really-happened-to-ted-williams-a-year-after-the-jarring-news-that-the-splendid-splinter-was-being-frozen-in-a-cryonics-lab-new-details-including-a-decapitation-suggest-that-one-of-americas-greatest-heroes-may-never-rest-in.
According to press reports: See sources cited in https://en.wikipedia.org/wiki/List_of_people_who_arranged_for_cryonics; when I wrote to Nick Bostrom, he replied, “It has been thus reported in the media. My general stance however has been not to comment on my funereal or other posthumous arrangements . . .”, email January 11, 2023.
a San Francisco company called Nectome: Antonio Regalado, “A Startup Is Pitching a Mind-Uploading Service That Is ‘100 Percent Fatal,’” MIT Technology Review online, last modified March 13, 2018, https://www.technologyreview.com/2018/03/13/144721/a-startup-is-pitching-a-mind-uploading-service-that-is-100-percent-fatal/.
In response, Robert McIntyre, the founder of Nectome said: Sharon Begley, “After Ghoulish Allegations, a Brain-Preservation Company Seeks Redemption,” Stat (online), January 30, 2019, https://www.statnews.com/2019/01/30/nectome-brain-preservation-redemption.
He began his career: Evelyn Lamb, “Decades-Old Graph Problem Yields to Amateur Mathematician,” Quanta, last modified April 17, 2018, https://www.quantamagazine.org/decades-old-graph-problem-yields-to-amateur-mathematician-20180417/.
He asserts that the first humans: Aubrey de Grey, “A Roadmap to End Aging,” TED Talk, July 2005, 22:35, https://www.ted.com/talks/aubrey_de_grey_a_roadmap_to_end_aging/.
if we crack seven key problems: A. D. de Grey et al., “Time to Talk SENS: Critiquing the Immutability of Human Aging,” Annals of the New York Academy of Sciences 959, no. 1 (April 2002): 452–62, discussion 463, https://doi.org/10.1111/j.1749–6632.2002.tb02115.x; A. D. de Grey, “The Foreseeability of Real Anti-Aging Medicine: Focusing the Debate,” Experimental Gerontology 38, no. 9 (September 1, 2013): 927–34, https://doi.org/10.1016/s0531-5565(03)00155-4.
In response to his claims: H. Warner et al., “Science Fact and the SENS Agenda: What Can We Reasonably Expect from Ageing Research,” EMBO Reports 6, no. 11 (November 2005): 1006–8, https://doi.org/10.1038/sj.embor.7400555.
Other mainstream researchers: Estep et al., “Life Extension Pseudoscience and the SENS Plan,” MIT Technology Review, 2006, http://www2.technologyreview.com/sens/docs/estepetal.pdf; Sherwin Nuland, “Do You Want to Live Forever?,” MIT Technology Review online, last modified February 1, 2005, https://www.technologyreview.com/2005/02/01/231686/do-you-want-to-live-forever/.
One of them, Richard Miller: Richard Miller, open letter to Aubrey de Grey, MIT Technology Review online, November 29, 2005, https://www.technologyreview.com/2005/11/29/274243/debating-immortality/.
“There’s going to be much less difference”: Comments by Aubrey de Grey in The Immortalists, ibid.
He denied the allegations: Analee Armstrong, “Anti-Aging Foundation SENS Fires de Grey After Allegations He Interfered with Investigation into His Conduct,” Fierce Biotech, last modified August 23, 2021, https://www.fiercebiotech.com/biotech/anti-aging-foundation-sens-turfs-de-grey-after-allegations-he-interfered-investigation-into.
A company report: SENS Research Foundation, “Announcement from the SRF Board of Directors,” news release, March 23, 2022, https://www.sens.org/announcement-from-the-srf-board-of-directors/.
De Grey, undaunted: “Meet the Team,” LEV Foundation online, accessed August 7, 2023, https://www.levf.org/team.
For example, he has predicted: David Sinclair, quoted in Antonio Regalado, “How Scientists Want to Make You Young Again,” MIT Technology Review online, last modified October 25, 2022, https://www.technologyreview.com/2022/10/25/1061644/how-to-be-young-again/.
Such statements: Catherine Elton, “Has Harvard’s David Sinclair Found the Fountain of Youth,” Boston online, last modified October 29, 2019, https://www.bostonmagazine.com/health/2019/10/29/david-sinclair/.
I doubt whether: David Sinclair and Matthew LaPlante, Lifespan: Why We Age, and Why We Don’t Have To (New York: Atria Books, 2019). For a sharply critical review of the book, see C. A. Brenner, “A Science-Based Review of the World’s Best-Selling Book on Aging,” Archives of Gerontology and Geriatrics 104 (January 2023): art. 104825, https://doi.org/10.1016/j.archger.2022.104825.
In an essay on LinkedIn: David Sinclair, “This Is Not an Advice Article,” LinkedIn, last modified June 25, 2018, https://www.linkedin.com/pulse/advice-article-david-sinclair.
They would often make: As one of hundreds of examples, see this description of companies founded in response to findings on blood transfusions: Rebecca Robbins, “Young-Blood Transfusions Are on the Menu at Society Gala,” Scientific American online, last modified March 2, 2018, https://www.scientificamerican.com/article/young-blood-transfusions-are-on-the-menu-at-society-gala/.
Even back in 2002: S. J. Olshansky, L. Hayflick, and B. A. Carnes, “Position Statement on Human Aging,” Journals of Gerontology: Series A 57, no. 8 (August 1, 2002): B292–97, https://doi.org/10.1093/gerona/57.8.b292. A total of fifty-one gerontologists cosigned the statement, and the three lead authors also published a popular summary, “Essay: No Truth to the Fountain of Youth,” Scientific American 286, no. 6 (June 2002): 92–95, https://doi.org/10.1038/scientific american0602-92.
California tech billionaires, especially: See, for example, Todd Friend, “Silicon Valley’s Quest to Live Forever,” New Yorker online, last modified March 27, 2017, https://www.newyorker.com/mag azine/2017/04/03/silicon-valleys-quest-to-live-forever; Anjana Ahuja, “Silicon Valley’s Billionaires Want to Hack the Ageing Process,” Financial Times online, last modified September 7, 2021, https://www.ft.com/content/24849908-ac4a-4a7d-b53c-847963ac1228; Anjana Ahuja, “Can We Defeat Death?,” Financial Times online, last modified October 29, 2021, https://www.ft.com/content/60d9271c-ae0a-4d44-8b11-956cd2e484a9.
When they were young, they wanted to be rich: This paraphrases an idea expressed previously by Antonio Regalado, “Meet Altos Labs, Silicon Valley’s Latest Wild Bet on Living Forever,” MIT Technology Review online, last modified September 4, 2021, https://www.technologyreview.com/2021/09/04/1034364/altos-labs-silicon-valleys-jeff-bezos-milner-bet-living-forever/.
Recently, he wrote a tract: Yuri Milner, Eureka Manifesto, available for downloading at https://yurimilnermanifesto.org/.
When news of Altos Labs: Antonia Regalado, “Meet Altos Labs, Silicon Valley’s Latest Wild Bet on Living Forever,” MIT Technology Review online, last modified September 4, 2021, https://www.technologyreview.com/2021/09/04/1034364/altos-labs-silicon-valleys-jeff-bezos-milner-bet-living-forever/.
Rick Klausner, its chief scientist: Hannah Kuchler, “Altos Labs Insists Mission Is to Improve Lives Not Cheat Death,” Financial Times online, last modified January 23, 2022, https://www.ft.com/content/f3bceaf2-0d2f-4ec7-b767-693bf01f9630.
“Our goal is for everyone”: The author was present at the launch of the Cambridge campus of Altos Labs on June 22, 2022.
“I went through a period”: Michael Hall, email message to the author, September 2, 2021.
Other drugs aim to target: A more comprehensive list of strategies and drugs that are used to combat aging is found in Partridge, Fuentealba, and Kennedy, “Quest to Slow Ageing,” 513–32.
Some of the biggest excitement: M. Eisenstein, “Rejuvenation by Controlled Reprogramming Is the Latest Gambit in Anti-Aging,” Nature Biotechnology 40, no. 2 (February 2022): 144–46, https://doi.org/10.1038/d41587-022-00002-4.
“Despite intensive study”: Olshansky, Hayflick, and Carnes, “Position Statement,” B292–97.
In addition to epigenetic changes: K. S. Kudryashova et al., “Aging Biomarkers: From Functional Tests to Multi-Omics Approaches,” Proteomics 20, nos. 5/6 (March 2020): art. E1900408, https://doi.org/10.1002/pmic.201900408; Buckley et al., “Cell Type–Specific Aging Clocks.”
This goal was termed: Kudryashova et al., “Aging Biomarkers: From Functional Tests to Multi-Omics Approaches”; Buckley et al., “Cell Type–Specific Aging Clocks.”
“forever remain quixotic”: A. D. de Grey, “The Foreseeability of Real Anti-Aging Medicine: Focusing the Debate,” Experimental Gerontology 38, no. 9 (September 1, 2003): 927–34, https://doi.org/10.1016/s0531-5565(03)00155-4.
If anything, data: “Health State Life Expectancies, UK: 2018 to 2020,” Office of National Statistics (UK) online, last modified March 4, 2022, https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandlifeexpectancies/bulletins/health statelifeexpectanciesuk/latest.
A United Nations report: Jean-Marie Robine, “Aging Populations: We Are Living Longer Lives, But Are We Healthier?,” United Nations Department of Economic and Social Affairs, Population Division, online, September 2021, https://desapublications.un.org/file/653/download.
A farmer was merrily riding: Oliver Wendell Holmes, The Deacon’s Masterpiece or the Wonderful One-Hoss Shay, Cambridge, MA: Houghton, Mifflin, 1891. With illustrations by Howard Pyle. Reproduced in http://www.ibiblio.org/eldritch/owh/shay.html.
Thomas Perls: Perls, email, November 27, 2021.
This would argue in favor: S. L. Andersen et al., “Health Span Approximates Life Span Among Many Supercentenarians: Compression of Morbidity at the Approximate Limit of Life Span,” Journals of Gerontology: Series A 67, no. 4 (April 2012): 395–405 (2012), https://doi.org/10.1093/gerona/glr223.
Similarly, a variant of a gene: P. Sebastiani et al., “A Serum Protein Signature of APOE Genotypes in Centenarians,” Aging Cell 18, no. 6 (December 2019): e13023, https://doi.org/10.1111/acel.13023; B. N. Ostendorf et al., “Common Germline Variants of the Human APOE Gene Modulate Melanoma Progression and Survival,” Nature Medicine 26, no. 7 (July 2020): 1048–53, https://doi.org/10.1038/s41591-020-0879-3; B. N. Ostendorf et al., “Common Human Genetic Variants of APOE Impact Murine COVID-19 Mortality,” Nature 611, no. 7935 (November 2022): 346–51, https://doi.org/10.1038/s41586-022-05344-2.
12. Should We Live Forever?
The share of older people: United Nations Department of Economic and Social Affairs, Population Division, World Population Prospects 2022: Summary of Results (New York: United Nations, 2022), https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/wpp2022_summary_of_results.pdf.
In both social and economic terms: David E. Boom and Leo M. Zucker, “Aging Is the Real Population Bomb,” Finance & Development online, June 2022, 58–61, https://www.imf.org/en/Publications/fandd/issues/Series/Analytical-Series/aging-is-the-real-population-bomb-bloom-zucker.
The poor not only live: Veena Raleigh, “What Is Happening to Life Expectancy in England?,” King’s Fund online, last modified August 10, 2022, https://www.kingsfund.org.uk/publications/whats-happening-life-expectancy-england.
Things are even worse in the United States: R. Chetty et al., “The Association Between Income and Life Expectancy in the United States, 2001–2014,” Journal of the American Medical Association (JAMA) 315, no. 16 (April 26, 2016): 1750–66, https://doi.org/10.1001/jama.2016.4226.
Advances in medicine: V. J. Dzau and C. A. Balatbat, “Health and Societal Implications of Medical and Technological Advances,” Science Translational Medicine 10, no. 463 (October 17, 2018): eaau4778, https://doi.org/10.1126/scitranslmed.aau4778; D. Weiss et al. “Innovative Technologies and Social Inequalities in Health: A Scoping Review of the Literature,” PLoS One 13, no. 4 (April 3, 2018): e0195447 (2018), https://doi.org/10.1371/journal.pone.0195447; Fiona McMillan, “Medical Advances Can Exacerbate Inequality,” Cosmos online, last modified October 21, 2018, https://cosmosmagazine.com/people/medical-advances-can-exacerbate-inequality/.
This is because fertility: D. R. Gwatkin and S. K. Brandel, “Life Expectancy and Population Growth in the Third World,” Scientific American 246, no. 5 (May 1982): 57–65, https://doi.org/10.1038/scientificamerican0582-57.
Elon Musk believes: Tweet by Elon Musk, August 26, 2022, https://twitter.com/elonmusk/status/1563020169160851456.
Nevertheless, as people live longer: J. R. Goldstein and W. Schlag, “Longer Life and Population Growth,” Population and Development Review 25, no. 4 (December 1999): 741–47, https://doi.org/10.1111/j.1728-4457.1999.00741.x.
Large percentages of people: Paul Root Wolpe, quoted in Jenny Kleeman, “Who Wants to Live Forever? Big Tech and the Quest for Eternal Youth,” New Statesman online, last modified October 13, 2021, https://www.newstatesman.com/long-reads/2022/12/live-forever-big-tech-search-quest-eternal-youth-long-read.
In 2023: Angelique Chrisafis, “More Than 1.2 Million March in France over Plan to Raise Pension Age to 64,” Guardian (US edition) online, last modified March 7, 2023, https://www.theguardian.com/world/2023/mar/07/nationwide-strikes-in-france-over-plan-to-raise-pension-age-to-64.
Reacting to the French protests: Annie Lowrey, “The Problem with the Retirement Age Is That It’s Too High,” Atlantic online, last modified April 15, 2023, https://www.theatlantic.com/ideas/archive/2023/04/social-security-benefits-france-pension-protests/673733/.
However, at a Hay Literary Festival event: Interview on Channel 4 (UK), May 27, 2005.
Ishiguro posited a theory: Kazuo Ishiguro, email to the author, August 6, 2021.
Most studies say our general cognitive abilities: T. A. Salthouse, “When Does Age-Related Cognitive Decline Begin?,” Neurobiology of Aging 30, no. 4 (April 2009): 507–14, https://doi.org/10.1016/j.neurobiolaging.2008.09.023; L. G. Nilsson et al., “Challenging the Notion of an Early-Onset of Cognitive Decline,” Neurobiology of Aging 30, no. 4 (April 2009): 521–24, discussion 530, https://doi.org/10.1016/j.neurobiolaging.2008.11.013; T. Hedden and J. D. Gabrieli, “Insights into the Ageing Mind: A View from Cognitive Neuroscience,” Nature Reviews Neuroscience 5, no. 2 (February 2004): 87–96, https://doi.org/10.1038/nrn1323.
The one category: A. Singh-Manoux et al., “Timing of Onset of Cognitive Decline: Results from Whitehall II Prospective Cohort Study,” BMJ 344, no. 7840 (January 5, 2012): d7622, https://doi.org/10.1136/bmj.d7622.
The latter declines steadily: D. Murman, “The Impact of Age on Cognition,” Seminars in Hearing 36, no. 3 (2015): 111–21, https://doi.org/10.1055/s-0035-1555115.
This is partly because: Household total wealth in Great Britain: April 2018 to March 2020, Office of National Statistics, January 7, 2022, https://www.ons.gov.uk/peoplepopulationandcommunity/per sonalandhouseholdfinances/incomeandwealth/bulletins/totalwealth ingreatbritain/april2018tomarch2020; Donald Hays and Briana Sullivan, The Wealth of Households:2020, United States Census Bureau, August 2022, https://www.census.gov/content/dam/Census/library/publications/2022/demo/p70br-181.pdf.
By contrast, the vast majority: D. Murman, “The Impact of Age on Cognition,” Seminars in Hearing 36, no. 3 (2015): 111–21, https://doi.org/10.1055/s-0035-1555115.
“at the peak of their careers”: “Tom Williams, “Oxford Professors ‘Forced to Retire’ Win Tribunal Case,” Times Higher Education, March 17, 2023, https://www.timeshighereducation.com/news/oxford-professors-forced-retire-win-tribunal-case.
“I had been telling”: P. B. Moore, “Neutrons, Magnets, and Photons: A Career in Structural Biology,” Journal of Biological Chemistry 287, no. 2 (January 2012): 805–18, https://doi.org/10.1074/jbc.X111.324509.
The other concluded: V. Skirbekk, “Age and Individual Productivity: A Literature Survey” (MPIDR working paper WP 2003–028, Max Planck Institute for Demographic Research, Rostock, Ger., August 2003), https://www.demogr.mpg.de/papers/working/wp-2003-028.pdf; C. A. Viviani. et al. “Productivity in Older Versus Younger Workers: A Systematic Literature Review,” Work 68, no. 3 (2021): 577–618, https://doi.org/10.3233/WOR-203396.o.
There is a lot of evidence: P. A. Boyle et al., “Effect of a Purpose in Life on Risk of Incident Alzheimer Disease and Mild Cognitive Impairment in Community-Dwelling Older Persons,” Archives of General Psychiatry 67, no. 3 (March 2010): 304–10, https://doi.org/10.1001/archgenpsychiatry.2009.208; R. Cohen, C. Bavishi, and A. Rozanski, “Purpose in Life and Its Relationship to All-Cause Mortality and Cardiovascular Events: A Meta-Analysis,” Psychosomatic Medicine 78, no. 2 (February/March 2016): 122–33, https://doi.org/10.1097/PSY.0000000000000274.
Social isolation and loneliness: A. Steptoe et al., “Social Isolation, Loneliness, and All-Cause Mortality in Older Men and Women,” Proceedings of the National Academy of Sciences (PNAS) of the United States of America 110, no. 15 (March 25, 2013): 5797–801, https://doi.org/10.1073/pnas.1219686110; J. Holt-Lunstad et al., “Loneliness and Social Isolation as Risk Factors for Mortality: A Meta-Analytic Review,” Perspectives on Psychological Science 10, no. 2 (March 2015): 227–37, https://doi.org/10.1177/1745691614568352.
Arieff believes: Allison Arieff, “Life Is Short. That’s the Point,” New York Times online, August 18, 2018, https://www.nytimes.com/2018/08/18/opinion/life-is-short-thats-the-point.html.
The clear-eyed view: Report: Living to 120 and Beyond: Americans’ Views on Aging, Medical Advances and Radical Life Extension (Washington, DC: Pew Research Center, August 6, 2013), https://www.pewresearch.org/religion/2013/08/06/living-to-120-and-beyond-americans-views-on-aging-medical-advances-and-radical-life-extension/.
Index
A specific form of pagination for this digital edition has been developed to match the print edition from which the index was created. If the application you are reading this on supports this feature, the page references noted in this index should align. At this time, however, not all digital devices support this functionality. Therefore, we encourage you to please use your device’s search capabilities to locate a specific entry.
2024-06-02

Stephen Wolfram《What Is ChatGPT Doing…and Why Does It Work》

前言

本书试图用第一性原理解释ChatGPT的工作原理，以及它为何奏效。可以说这是一个关于技术的故事，也可以说这是一个关千科学的故事、一个关于哲学的故事。为了讲述这个故事，我们必须汇集数个世纪以来的一系列非凡的想法和发现。
看到自己长期以来感兴趣的众多事物一起得到突飞猛进的发展，我感到非常兴奋。从简单程序的复杂行为到语言及其含义的核心特征，再到大型计算机系统的实用性，所有这些都是ChatGPT故事的一部分。
ChatGPT的基础是人工神经网络（本书中一般简称为神经网络或网络），后者最初是在20世纪40年代为了模拟理想化的大脑运作方式而发明的。我自己在1983年第一次编写出了一个神经网络，但它做不了什么有趣的事情。然而40年后，随着计算机的速度提高上百万倍，数十亿页文本出现在互联网上，以及一系列重大的工程创新，情况已然大不相同。出乎所有人意料的是，一个比我在 1983年构建的神经网络大10亿倍的神经网络能够生成有意义的人类语言，而这在之前被认为是人类独有的能力。
本书包含我在 ChatGPT 问世后不久写的两篇长文。第一篇介绍了ChatGPT，并且解释了它为何拥有像人类一样的生成语言的能力。第二篇则展望了 ChatGPT 的未来，预期它能使用计算工具来做到人类所不能做到的事，特别是能够利用 Wolfram/Alpha 系统对知识进行计算（computational knowledge，在后文中简称为计算知识）的“超能力”。
虽然距离 ChatGPT 的发布仅过了三个月，我们也才刚刚开始了解它给我们的实际生活和思维能力可能带来的影响，但就目前而言，它的到来提醒我们，即使在已经发明和发现一切之后，仍有收获惊喜的可能。
斯蒂芬·沃尔弗拉姆 2023 年2 月28日

第一篇 ChatGPT 在做什么？它为何能做到这些？
它只是一次添加一个词／概率从何而来／什么是模型类人任务 (human-like task) 的模型／神经网络／机器学习和神经网／络的训练／神经网络训练的实践和学问／ “足够大的神经网络当然无所不能! ” ／“嵌入＂的概念／ChatGPT 的内部原理／ChatGPT 的训练／在基础训练之外／真正让 ChatGPT 发挥作用的是什么／意义空间和语义运动定律 / 语义语法和计算语言的力量 / 那么， ChatGPT 到底在做什么？它为什么能做到这些？
第二篇利用 WolframlAlpha为ChatGPT 赋予计算知识超能力
ChatGPT和Wolfram!Alpha / 一个简单的例子／再举几个例子／前方的路
相关资源

第一篇 ChatGPT在做什么？它为何能做到这些？

它只是一次添加一个词

ChatGPT 可以自动生成类似于人类书写的文本，这非常了不起，也非常令人意外。它是如何做到的呢？这为什么会奏效呢？我在这里将概述 ChatGPT 内部的工作方式，然后探讨为什么它能够如此出色地产生我们认为有意义的文本。必须在开头说明，我会重点关注宏观的工作方式，虽然也会提到一些工程细节，但不会深入探讨。［这里提到的本质不仅适用于 ChatGPT, 也同样适用于当前的其他“大语言模型”(large language model, LLM)。]

首先需要解释， ChatGPT 从根本上始终要做的是，针对它得到的任何文本产生 “合理的延续＂。这里所说的 “合理” 是指， “人们在看到诸如数十亿个网页上的内容后，可能期待别人会这样写”。

假设我们手里的文本是 “The best thing about Al is its ability to” (AI最棒的地方在于它能）。想象一下浏览人类编写的数十亿页文本（比如在互联网上和电子书中），找到该文本的所有实例，然后看看接下来出现的是什么词，以及这些词出现的概率是多少。ChatGPT 实际上做了类似的事情，只不过它不是查看字面上的文本，而是寻找在某种程度上 ”意义匹配＂的事物（稍后将解释）。

最终的结果是，它会列出随后可能出现的词及其出现的“概率” (按“概率”从高到低排列）。

The best thing about AI is its ability to

learn	4.5%
predict	3.5%
make	3.2%
understand	3.1%
do	2.9%

值得注意的是，当ChatGPT做一些事情，比如写一篇文章时，它实质上只是在一遍又一遍地询问“根据目前的文本，下一个词应该是什么”，并且每次都添加一个词。[正如我将要解释的那样，更准确地说，它是每次都添加一个“标记”（token），而标记可能只是词的一部分。这就是它有时可以“造词”的原因。]

好吧，它在每一步都会得到一个带概率的词列表。但它应该选择将哪一个词添加到正在写作的文章中呢？有人可能认为应该选择 “排名最高”的词，即分配了最高“概率”的词。然而，这里出现了一点儿玄学_{（voodo，巫术）}的意味。出于某种原因一也许有一天能用科学解释——如果我们总是选择排名最高的词，通常会得到一篇非常“平淡”的文章，完全显示不出任何“创造力”〔有时甚至会一字不差地重复前文〕。但是，如果有时（随机）选择排名较低的词，就会得到一篇“更有趣”的文章。

这里存在随机性意味着，如果我们多次使用相同的提示（prompt），每次都有可能得到不同的文章。而且，符合玄学思想的是，有一个所谓的“温度”参数来确定低排名词的使用频率。对于文章生成来说，“温度”为0.8似乎最好。（值得强调的是，这里没有使用任何“理论”，“温度”参数只是在实践中被发现有效的一种方法。例如，之所以采用“温度”的概念，是因为碰巧使用了在统计物理学中很常见的某种指数分布_{[玻尔兹曼分布]}，但它与物理学之间并没有任何实际联系，至少就我们目前所知是这样的。）

在进入下一节之前，需要解释一下，为了方便阐述，我在大多数情况下不会使用ChatGPT中的完整系统，而是使用更简单的GPT-2 系统，它的优点是足够小，可以在标准的台式计算机上运行。因此，对于书中展示的所有原理，我都能附上明确的Wolfram语言代码，你可以立即在自己的计算机上运行。

例如，通过以下方式可以获得前页列出的概率表。首先，需要检索底层的“语言模型”。

稍后，我们将深入了解这个神经网络，并谈谈它的工作原理。现在，我们可以把这个“网络模型”当作黑盒，应用到之前的文本中，并询问模型哪5个词紧随其后的概率最高。

in[ ]:= model[“The best thing about Al is its ability to”, {“TopProbabilities”, 5}]

out[ ]={do → 0.0288508, understand → 0.0307805, make → 0.0319072, predict → 0.0349748, learn → 0.0445305}

如下获取结果并将其转换为明确格式化的“数据集”。

in[ ]:=Dataset[ReverseSort[Association[%]], ItemDisplayFunction→(PercentForm[#,2]&)]

Out[ ]:=

learn	4.5%
predict	3.5%
make	3.2%
understand	3.1%
do	2.9%

如果反复“应用模型”，在每一步都添加概率最高的词[在此代码中指定为模型所做的“决策”(decision)]，则会发生以下情况。

in[ ]:=NestList[StringJoin[# , model[# , “Decision”]] &, “The best thing about Al is its ability to”, 7]

out[ ]= {The best thing about Al is its ability to,
The best thing about Al is its ability to learn,
The best thing about Al is its ability to learn from,
The best thing about Al is its ability to learn from experience,
The best thing about Al is its ability to learn from experience.,
The best thing about Al is its ability to learn from experience. It,
The best thing about Al is its ability to learn from experience. It’s,
The best thing about Al is its ability to learn from experience. It’s not}

如果继续下去呢？在此（“零温度”_{[温度参数为0，即选择按概率排名最高的词]})情况下，文本很快就会变得混乱和重复。

The best thing about Al is its ability to learn from experience.
It’s not just a matter of learning from experience, it’s learning from the
world around you. The Al is a very good example of this. It’s a very good example of how to use Al to improve your life. It’s a very good example of
how to use Al to improve your life. The Al is a very good example of how
to use Al to improve your life. It’s a very good example of how to use Al to

但是，如果我们不总是选择“排名最高”的词，而是有时随机选择“非排名最高”的词（通过将“温度”参数从0调高到0.8来获得这种随机性）呢？我们同样可以构建文本：

{The best thing about Al is its ability to,
The best thing about Al is its ability to create,
The best thing about Al is its ability to create worlds,
The best thing about Al is its ability to create worlds that,
The best thing about Al is its ability to create worlds that are,
The best thing about Al is its ability to create worlds that are both,
The best thing about Al is its ability to create worlds that are both exciting,
The best thing about Al is its ability to create worlds that are both exciting,}

每次执行此操作时，都会进行不同的随机选择，文本也会不同，就像这5个例子一样。

The best thing about Al is its ability to learn. I’ve always liked the
The best thing about Al is its ability to really come into your world and just
The best thing about Al is its ability to examine human behavior and the way it
The best thing about Al is its ability to do a great job of teaching us
The best thing about Al is its ability to create real tasks, but you can

值得指出的是，即使在（温度为0.8的）第一步，也有许多可能的“下一个词”可供选择，尽管它们的概率迅速减小（是的，如下面的对数图所示，点的连线对应于，n^-1次幂律衰减，这是语言的一般统计特征）。

如果再继续下去会发生什么呢？下面是一个随机的例子。虽然比选择排名最髙的词（零温度）的情况好，但还是有点奇怪。

The best thing about Al is its ability to see through, and make sense of, the world around us, rather than panicking and ignoring. This is known as Al “doing its job” or Al “run-of-the-mill.” Indeed, taking an infinite number of steps, developing a machine that can be integrated with other systems, or controlling one system that’s truly a machine, is one of the most fundamental processes of Al. Aside from the human -machine interaction, Al was also a big part of creativity

这是使用最简单的GPT-2模型（发布于2019年）完成的。使用更新更大的GPT-3模型，结果会更好。下面是在提示相同但使用最大的GPT-3模型的情况下生成的零温度文本。

The best thing about Al is its ability to automate processes and make decisions quickly and accurately. Al can be used to automate mundane tasks, such as data entry, and can also be used to make complex decisions, such as predicting customer behavior or analyzing large datasets. Al can also be used to improve customer service, as it can quickly and accurately respond to customer inquiries. Al can also be used to improve the accuracy of medical diagnoses and to automate the process of drug discovery.

下面是一个温度为0.8的随机示例。

The best thing about Al is its ability to learn and develop over time, allowing it to continually improve its performance and be more efficient at tasks. Al can also be used to automate mundane tasks, allowing humans to focus on more important tasks. Al can also be used to make decisions and provide insights that would otherwise be impossible for humans to figure out.

概率从何而来

ChatGPT总是根据概率选择下一个词，但是这些概率是从何而来的呢？让我们从一个更简单的问题开始：考虑逐字母（而非逐词）地生成英文文本。怎样才能计算出每个字母应当出现的概率呢？

我们可以做一件很小的事，拿一段英文文本样本，然后计算其中不同字母的出现次数。例如，下面的例子统计了维基百科上“cats” (猫）的条目中各个字母的出现次数。

对“dogs”（狗）的条目也做同样的统计。

结果有些相似，但并不完全一样。（毫无疑问，在”dogs”的条目中，字母o更常见，毕竟dog—词本身就含有o。）不过，如果我们采集足够大的英文文本样本，最终就可以得到相当一致的结果。

这是在只根据这些概率生成字母序列时得到的样本。

我们可以通过添加空格将其分解成“词”，就像这些“词”也是具有一定概率的字母一样。

还可以通过强制要求“词长”的分布与英文中相符来更好地造“词”。

ni hilwhuei kjtn isjd erogofnr n rwhwfao rcuw lis fahte uss cpnc nlu oe nusaetat llfo oeme rrhrtn xdses ohm oa tne ebedcon oarvthv ist

虽然并没有碰巧得到任何“实际的词”，但结果看起来稍好一些了。不过，要进一步完善，我们需要做的不仅仅是随机地挑选每个字母。举例来说，我们知道，如果句子中有一个字母q，那么紧随其后的下一个字母几乎一定是u。

以下是每个字母单独出现的概率图。

下图则显示了典型英文文本中字母对[二元（2-gram或bigram）字母]的概率。可能出现的第一个字母横向显示，第二个字母纵向显示。

可以看到，q列中除了u行以外都是空白的（概率为零）。现在不再一次一个字母地生成“词”，而是使用这些二元字母的概率，一次关注两个字母。下面是可以得到的一个结果，其中恰巧包括几个“实际的词”。

on inguman men ise forernoft weat iofobato buc ous corew ousesetiv fa lie tinouco ryefo ra the ecederi pasuthrgr cuconom tra tesla will tat pere thi

有了足够多的英文文本，我们不仅可以对单个字母或字母对（二元字母）得到相当好的估计，而且可以对更长的字母串得到不错的估计。如果使用逐渐变长的n元（n-gram）字母的概率生成“随机的词”，就能发现它们会显得越来越“真实”。

0	on gxeeetowmt tsifhy ah aufnsoc ior oia itlt bnc tu ih uls
1	ri io os ot timumumoi gymyestit ate bshe abol viowr wotybeat mecho
2	wore hi usinallistin Ilia ale warou pothe of premetra beet upo pr
3	qual musin was witherins wil por vie surgedygua was suchinguary outheydays theresist
4	stud made yello adenced through theirs from cent intous wherefo proteined screa
5	special average vocab consumer market prepara injury trade consa usually speci utility

现在假设——多少像ChatGPT所做的那样——我们正在处理整个词，而不是字母。英语中有大约50000个常用词。通过查看大型的英文语料库（比如几百万本书，总共包含几百亿个词），我们可以估计每个词的常用程度。使用这些信息，就可以开始生成“句子”了，其中的每个词都是独立随机选择的，概率与它们在语料库中出现的概率相同。以下是我们得到的一个结果。

of program excessive been by was research rate not here of of other is men were against are show they the different the half the the in any were leaved

毫不意外，这没有什么意义。那么应该如何做得更好呢？就像处理字母一样，我们可以不仅考虑单个词的概率，而且考虑词对或更长的n元词的概率。以下是考虑词对后得到的5个结果，它们都是从单词cat开始的。

cat through shipping variety is made the aid emergency can the

cat for the book flip was generally decided to design of

cat at safety to contain the vicinity coupled between electric public

cat throughout in a confirmation procedure and two were difficult music

cat on the theory an already from a representation before a

结果看起来稍微变得更加“合理”了。可以想象，如果能够使用足够长的n元词，我们基本上会“得到一个ChatGPT”，也就是说，我们得到的东西能够生成符合“正确的整体文章概率”且像文章一样长的词序列。但问题在于：我们根本没有足够的英文文本来推断出这些概率。

在网络爬取结果中可能有几千亿个词，在电子书中可能还有另外几百亿个词。但是，即使只有4万个常用词，可能的二元词的数量也已经达到了16亿，而可能的三元词的数错则达到了 60万亿。因此，我们无法根据已有的文本估计所有这些三元词的概率。当涉及包含20个词的“文章片段”时，可能的20元词的数量会大于宇宙中的粒子数量，所以从某种意义上说，永远无法把它们全部写下来。

我们能做些什么呢？最佳思路是建立一个模型，让我们能够估计序列出现的概率——即使我们从未在已有的文本语料库中明确看到过这些序列。ChatGPT的核心正是所谓的“大语言模型”，后者已经被构建得能够很好地估计这些概率了。

什么是模型

假设你想（像16世纪末的伽利略一样）知道从比萨斜塔各层掉落的炮弹分别需要多长时间才能落地。当然，你可以在每种情况下进行测量并将结果制作成表格。不过，你还可以运用理论科学的本质：建立一个模型，用它提供某种计算答案的程序，而不仅仅是在每种情况下测量和记录。

假设有一些（理想化的）数据可以告诉我们炮弹从斜塔各层落地所需的时间。

如何计算炮弹从一个没有明确数据的楼层落地需要多长时间呢？在这种特定情况下，可以使用已知的物理定律来解决问题。但是，假设我们只有数据，而不知道支配它的基本定律。那么我们可能会做出数学上的猜测，比如也许应该使用一条直线作为模型。

虽然我們可以选择不同的直线，但是上图中的这条直线平均而言最接近我们拥有的数据。根据这条直线，可以估计炮弹从任意一层落地的时间。

我们怎么知道要在这里尝试使用直线呢？在某种程度上说，我们并不知道。它只是在数学上很简单，而且我们已经习惯了许多测量数据可以用简单的数学模型很好地拟合。还可以尝试更复杂的数学模型，比如a+bx+cx²,能看到它在这种情况下做得更好。

不过，这也可能会出大问题。例如，下面是我们使用a+b/x+c·sinx似能得到的最好结果。

必须理解，从来没有“无模型的模型”。你使用的任何模型都有某种特定的基本结构，以及用于拟合数据的一定数量的“旋钮”（也就是可以设置的参数）。ChatGPT使用了许多这样的“旋钮”——实际上有1750亿个。

但是值得注意的是，ChatGPT的基本结构——“仅仅”用这么少的参数一足以生成一个能“足够好”地计算下一个词的概率的模型，从而生成合理的文章。

类人任务(human-like task)的模型

上文提到的例子涉及为数值数据建立模型，这些数据基本上来自简单的物理学——几个世纪以来，我们已经知道可以用一些“简单的数学工具”为其建模。但是对于ChatGPT，我们需要为人脑产生的人类语言文本建立模型。而对于这样的东西，我们（至少目前）还没有“简单的数学”可用。那么它的模型可能是什么样的呢？

在讨论语言之前，让我们谈谈另一个类人任务：图像识别。一个简单的例子是包含数字的图像（这是机器学习中的一个经典例子）。

我们可以做的一件事是获取每个数字的大量样本图像。

要确定输入的图像是否对应于特定的数字，可以逐像素地将其与已有的样本进行比较。但是作为人类，我们似乎肯定做得更好:

因为即使数字是手写的，有各种涂抹和扭曲，我们也仍然能够识别它们。

当为上一节中的数值数据建立模型时，我们能够在取得给定的数值x之后，针对特定的a和b来计算出a+bx。那么，如果我们将图像中每个像素的灰度值视为变量x_i，是否存在涉及所有这些变量的某个函数，能（在运算后）告诉我们图像中是什么数字？事实证明，构建这样的函数是可能的。不过难度也在意料之中，一个典型的例子可能涉及大约50万次数学运算。

最终的结果是，如果我们将一个图像的像素值集合输入这个函数，那么输出将是一个数，明确指出该图像中是什么数字。稍后，我们将讨论如何构建这样的函数，并了解神经网络的思想。但现在，让我们先将这个函数视为黑盒，输入手写数字的图像(作为像素值的数组)，然后得到它们所对应的数字。

Out[ ]:= {7,0,9,7,8,2,4,1,1,1}

这里究竟发生了什么？假设我们逐渐模糊一个数字。在一小段时间内，我们的函数仍然能够“识别”它，这里为2。但函数很快就无法准确识别了，开始给出“错误”的结果。

Out[ ]:= {2, 2, 2,1,1,1,1,1,1}

为什么说这是“错误”的结果呢？在本例中，我们知道是通过模糊数字2来得到所有图像的。但是，如果我们的目标是为人类在识别图像方面的能力生成一个模型，真正需要问的问题是：面对一个模糊的图像，并且不知道其来源，人类会用什么方式来识别它？

如果函数给出的结果总是与人类的意见相符，那么我们就有了一个“好模型”。一个重大的科学事实是，对于图像识别这样的任务，我们现在基本上已经知道如何构建不错的函数了。

能“用数学证明”这些函数有效吗？不能。因为要做到这一点，我们必须拥有一个关于人类所做的事情的数学理论。如果改变2的图像中的一些像素，我们可能会觉得，仍应该认为这是数字2。但是随着更多像素发生改变，我们又应该能坚持多久呢？这是一个关于人类视觉感知的问题。没错，对于蜜蜂或章鱼的图像，答案无疑会有所不同，而对于虚构的外星人的图像，答案则可能会完全不同。

神经网络

用于图像识别等任务的典型模型到底是如何工作的呢？目前最受欢迎而且最成功的方法是使用神经网络。神经网络发明于20世纪40 年代——它在当时的形式与今天非常接近——可以视作对大脑工作机制的简单理想化。

人类大脑有大约1000亿个神经元（神经细胞），每个神经元都能够产生电脉冲，最高可达每秒约1000次。这些神经元连接成复杂的网络，每个神经元都有树枝状的分支，从而能够向其他数千个神经元传递电信号。粗略地说，任意一个神经元在某个时刻是否产生电脉冲，取决于它从其他神经元接收到的电脉冲，而且神经元不同的连接方式会有不同的“权重”贡献。

当我们“看到一个图像”时，来自图像的光子落在我们眼睛后面的 (光感受器）细胞上，它们会在神经细胞中产生电信号。这些神经细胞与其他神经细胞相连，信号最终会通过许多层神经元。在此过程中，我们“识别”出这个图像，最终“形成”我们“正在看数字 2”的“想法”（也许最终会做一些像大声说出“二”这样的事情）。

上一节中的“黑盒函数”就是这样一个神经网络的“数学化”版本。它恰好有11层（只有4个“核心层”）。

我们对这个神经网络并没打明确的“理论解释”，它只是在1998年作为一项工程被构述出来的，而且被发现可以奏效。（当然，这与把我们的大脑描述为通过生物进化过程产生并没有太大的区别。）

好吧，但是这样的神经网络是如何“识别事物”的呢？关键在于吸引子（attractor）的概念。假设我们有手写数字1和2的图像。

我们希望通过某种方式将所有的1 “吸引到一个地方”，将所有的 2 “吸引到另一个地方”。换句话说，如果一个图像“更有可能是1” 而不是2,我们希望它最终出现在“1的地方”，反之亦然。

让我们做一个直白的比喻。假设平面上有一些位置，用点表示（在实际生活场景中，它们可能是咖啡店的位置然后我们可以想象，自己从平面上的任意一点出发，并且总是希望最终到达最近的点（即我们总是去最近的咖啡店）。可以通过用理想化的“分水岭”将平面分隔成不同的区域〈“吸引子盆地”〉来表示这一点。

我们可以将这看成是执行一种“识别任务”，所做的不是识别一个给定图像“看起来最像”哪个数字，而是相当直接地看出哪个点距离给定的点最近。[这里展示的沃罗诺伊图将二维欧几里得空间中的点分隔开来。可以将数字识别任务视为在做一种非常类似的操作——只不过是在由每个图像中所有像素的灰度形成的784维空间中。]

那么如何让神经网络“执行识别任务”呢？让我们考虑下面这个非常简单的情况。

我们的目标是接收一个对应于位置{x,y}的输入，然后将其“识别”为最接近它的三个点之一。换句话说，我们希望神经网络能够计算出一个如下图所示的关于{x,y}的函数。

如何用神经网络实现这一点呢？归根结底，神经网络是由理想化的 “神经元”组成的连接集合——通常是按层排列的。一个简单的例子如下所示。

每个“神经元”都被有效地设置为计算一个简单的数值函数。为了“使用”这个网络，我们只需在顶部输入一些数（像我们的坐标x和y)，然后让每层神经元“计算它们的函数的值”并在网络中将结果前馈，最后在底部产生最终结果。

在传统（受生物学启发）的设置中，每个神经元实际上都有一些来自前一层神经元的“输入连接”，而且每个连接都被分配了一个特定的“权重”（可以为正或为负）。给定神经元的值是这样确定的: 先分别将其“前一层神经元”的值乘以相应的权重并将结果相加，然后加上一个常数，最后应用一个“阈值”（或“激活”）函数。用数学术语来说，如果一个神经元有输入x={x1,x2,…}，那么我们要计算f[w·x+b]。对于权重w和常量b,通常会为网络中的每个神经元选择不同的值；函数7则通常在所有神经元中保持不变。

计算w·x+b需要进行矩阵乘法和矩阵加法运算。激活函数f则使用了非线性函数（最终会导致非平凡的行为）。下面是一些常用的激活函数，这里使用的是Ramp（或ReLU）。

对于我们希望神经网络执行的每个任务〈或者说，对于我们希望它计算的每个整体函数〉，都有不同的权重选择。（正如我们稍后将讨论的那样，这些权重通常是通过利用机器学习根据我们想要的输出的示例“训练”神经网络来确定的。）最终，每个神经网络都只对应于某个整体的数学函数，尽管写出来可能很混乱。对于上面的例子，它是

w₅₁₁ f(w₃₁₁f(b₁₁ +xw₁₁₁ + yw₁₁₂) ⁺w₃₁₂f(b₁₂+xw₁₂₁+yw₁₂₂) +w₃₁₃f(b₁₃ +xw₁₃₁+yw₁₃₂) + w₃₁₄f(b₁₄ +xw₁₄₁ +yw₁₄₂) + b₃₁) +w₅₁₂f(w₃₂₁f(b₁₁ + xw₁₁₁ + yw₁₁₂) + w₃₂₂f(b₁₂+xw₁₂₁ +yw₁₂₂) +w₃₂₃f(b₁₃+xw₁₃₁ +yw₁₃₂) + w₃₂₄f(b₁₄+xw₁₄₁+yw₁₄₂) + b₃₂) +w₅₁₃f(w₃₃₁ f(b₁₃+ xw₁₁₁ +yw₁₁₂) +w₃₃₂f(b₁₂+xw₁₂₁ + yw₁₂₂) +w₃₃₃ f(b₁₃+ xw₁₃₁ + yw₁₃₂) +w₃₃₄f(b₁₄ + xw₁₄₁ + yw₁₄₂) + b₃₃) +b₅₁

同样，ChatGPT的神经网络也只对应于一个这样的数学函数——它实际上有数十亿项。

现在，让我们回头看看单个神经元。下图展示了一个具有两个输入 (代表坐标X和y）的神经元可以通过各种权重和常数（以及激活函数Ramp)计算出的一些示例。

对于上面提到的更大的网络呢？它的计算结果如下所示。

虽然不完全“正确”，但它接近上面展示的“最近点”函数。

再来看看其他的一些神经网络吧。在每种情况下，我们都使用机器学习来找到最佳的权重选择。这里展示了神经网络用这些权重计算出的结果。

更大的神经网络通常能更好地逼近我们所求的函数。在“每个吸引子盆地的中心”，我们通常能确切地得到想要的答案。但在边界处，也就是神经网络“很难下定决心”的地方，情况可能会更加混乱。

在这个简单的数学式“识别任务”中，“正确答案”显而易见。但在识别手写数字的问题上，答案就不那么明显了。如果有人把2写得像7一样怎么办？类似的问题非常常见。尽管如此，我们仍然可以询问神经网络是如何区分数字的，下面给出了一个答案。

我们能“从数学上”解释网络是如何做出区分的吗？并不能。它只是在“做神经网络要做的事”。但是事实证明，这通常与我们人类所做的区分相当吻合。

让我们更详细地讨论一个例子。假设我们有猫的图像和狗的图像，以及一个经过训练、能区分它们的神经网络。以下是该神经网络可能对某些图像所做的事情。

这里的“正确答案”更加不明显了。穿着猫咪衣服的狗怎么分？等等。无论输入什么，神经网络都会生成一个答案。结果表明，它的做法相当符合人类的思维方式。正如上面所说的，这并不是我们可以“根据第一性原则推导”出来的事实。这只是一些经验性的发现，至少在某些领域是正确的。但这是神经网络有用的一个关键原因：它们以某种方式捕捉了 “类似人类”的做事方式。

找一张猫的图片看看，并问自己：“为什么这是一只猫？”你也许会说“我看到了它尖尖的耳朵”，等等。但是很难解释你是如何把这个图像识别为一只猫的。你的大脑就是不知怎么地想明白了。但是（至少目前还）没有办法去大脑“内部”看看它是如何想明白的。那么，对于（人工）神经网络呢？当你展示一张猫的图片时，很容易看到每个“神经元”的作用。不过，即使要对其进行基本的可视化，通常也非常困难。

在上面用于解决“最近点”问题的最终网络中，有17个神经元; 在用于识别手写数字的网络中，有2190个神经元；而在用于识别猫和狗的网络中，有60650个神经元。通常很难可视化出60 650 维的空间。但由于这是一个用于处理图像的网络，其中的许多神经元层被组织成了数组，就像它查看的像素数组一样。

下面以一个典型的猫的图像为例。

我们可以用一组衍生图像来表示第一层神经元的状态，其中的许多可以被轻松地解读为“不带背景的猫”或“猫的轮廓”。

到第10层，就很难解读这些是什么了。

但是总的来说，我们可以说神经网络正在“挑选出某些特征”(也许尖尖的耳朵是其中之一)，并使用这些特征来确定图像的内容。但是，这些特征能否用语言描述出来（比如“尖尖的耳朵”)呢？大多数情况下不能。

我们的大脑是否使用了类似的特征呢？我们多半并不知道。但值得注意的是，一些神经网络（像上面展示的这个)的前几层似乎会挑选出图像的某些方面（例如物体的边缘)，而这些方面似乎与我们知道的大脑中负责视觉处理的第一层所挑选出的相似。

假设我们想得到神经网络中的“猫识别理论”，可以说：“看，这个特定的网络可以做到这一点。”这会立即让我们对“问题的难度” 有一些了解（例如，可能需要多少个神经元或多少层)。但至少到目前为止，我们没办法对网络正在做什么“给出语言描述”。也许这是因为它确实是计算不可约的，除了明确跟踪每一步之外，没有可以找出它做了什么的一般方法。也有可能只是因为我们还没有 “弄懂科学”，也没有发现能总结正在发生的事情的“自然法则”。

当使用生成语言时，我们会遇到类似的问题，而且目前尚不清楚是否有方法来“总结它所做的事情”。但是，语言的丰富性和细节（以及我们的使用经验）可能会让我们比图像处理取得更多进展。

机器学习和神经网络的训练

到目前为止，我们一直在讨论“已经知道”如何执行特定任务的神经网络。但神经网络之所以很有用（人脑中的神经网络大概也如此），原因不仅在于它可以执行各种任务，还在于它可以通过逐步 “根据样例训练”来学习执行这些任务。

当构建一个神经网络来区分猫和狗的图像时，我们不需要编写一个程序来（比如）明确地找到胡须，只需要展示很多关于什么是猫和什么是狗的样例，然后让神经网络从中“机器学习”如何区分它们即可。

重点在于，已训练的神经网络能够对所展示的特定例子进行“泛化”。正如我们之前看到的，神经网络不仅能识别猫图像的样例的特定像素模式，还能基于我们眼中的某种“猫的典型特征”来区分图像。神经网络的训练究竟是如何起效的呢？本质上，我们一直在尝试找到能使神经网络成功复现给定样例的权重。然后，我们依靠神经网络在这些样例“之间”进行“合理”的“插值”（或“泛化”)。让我们看一个比“最近点”问题更简单的问题，只试着让神经网络学习如下函数。

对于这个任务，我们需要只有一个输入和一个输出的神经网络。

但是，应该使用什么样的权重呢？对于每组可能的权重，神经网络都将计算出某个函数。例如，下面是它对于几组随机选择的权重计算出的函数。

可以清楚地看到，这些函数与我们想要的函数相去甚远。那么，如何才能找到能够复现函数的权重呢？

基本思想是提供大量的“输入—输出”样例以供“学习”，然后尝试找到能够复现这些样例的权重。以下是逐渐增加样例后所得的结果。在该“训练”的每个阶段，都会逐步调整神经网络的权重，我们会发现最终得到了一个能成功复现我们想要的函数的神经网络。应该如何调整权重呢？基本思想是，在每个阶段看一下我们离想要的函数“有多远”，然后朝更接近该函数的方向更新权重。

为了明白离目标“有多远”，我们计算“损失函数”(有时也称为 “成本函数”）。这里使用了一个简单的（L2）损失函数，就是我们得到的值与真实值之间的差异的平方和。随着训练过程不断进行，我们看到损失函数逐渐减小（遵循特定的“学习曲线”，不同任务的学习曲线不同），直到神经网络成功地复现（或者至少很好地近似）我们想要的函数。

最后需要解释的关键是，如何调整权重以减小损失函数。正如我们所说的，损失函数给出了我们得到的值和真实值之间的“距离”。但是“我们得到的值”在每个阶段是由神经网络的当前版本和其中的权重确定的。现在假设权重是变量，比如w_i。我们想找出如何调整这些变量的值，以最小化取决于它们的损失。

让我们对实践中使用的典型神经网络进行极大的简化，想象只有两个权重w₁和w₂。然后，我们可能会有一个损失函数，它作为w₁和w₂的函数看起来如下所示。

数值分析提供了各种技术来帮我们找到这种情况下的最小损失。一个典型的方法就是从之前的任意w₁和w₂开始，逐步沿着最陡的下降路径前进。

就像水从山上流下来一样，只能保证会到达表面上的某个局部最小值（“一个山湖”），但不一定能到达最终的全局最小值。

似乎不太容易在“权重景观”中找到最陡的下降路径，但是微积分可以拯救我们。正如上面提到的，我们总是可以将神经网络视为计算出一个数学函数一取决于其输入和权重。现在考虑对这些权重进行微分。结果表明，微积分的链式法则实际上让我们解开了神经网络中连续各层所做操作的谜团。结果是，我们可以一至少在某些局部近似中一“反转”神经网络的操作，并逐步找到使与输出相关的损失最小化的权重。

上图展示了，在仅有两个权重的情况下可能需要进行的最小化工作。但是事实证明，即使有更多的权重（ChatGPT使用了 1750亿个权重），也仍然可以进行最小化，至少可以在某种程度上进行近似。实际上，“深度学习”在2012年左右的重大突破与如下发现有关：与权重相对较少时相比，在涉及许多权重时，进行最小化（至少近似）可能会更容易。

换句话说，有时候用神经网络解决复杂问题比解决简单问题更容易一这似乎有些违反直觉。大致原因在于，当有很多”权重变量”时，髙维空间中有“很多不同的方向”可以引导我们到达最小值；而当变量较少时，很容易陷入局部最小值的“山湖”，无法找到“出去的方向”。

值得指出的是，在典型情况下，有许多不同的权重集合可以使神经网络具有几乎相同的性能。在实际的神经网络训练中，通常会做出许多随机选择，导致产生一些“不同但等效”的解决方案，就像下面这些一样。

但是每个这样的“不同解决方案”都会有略微不同的行为。假如在我们给出训练样例的区域之外进行“外插”（extrapolation）,可能会得到截然不同的结果。

哪一个是“正确”的呢？实际上没有办法确定。它们都“与观察到的数据一致”。但它们都对应着“在已知框架外”进行“思考”的不同的“固有方式”。只是有些方式对我们人类来说可能“更合理”。

神经网络训练的实践和学问

在过去的十年中，神经网络训练的艺术已经有了许多进展。是的，它基本上是一门艺术。有时，尤其是回顾过去时，人们在训练中至少可以看到一丝“科学解释”的影子了。但是在大多数情况下，这些解释是通过试错发现的，并且添加了一些想法和技巧，逐渐针对如何使用神经网络建立了一门重要的学问。

这门学问有几个关键部分。首先是针对特定的任务使用何种神经网络架构的问题。然后是如何获取用于训练神经网络的数据的关键问题。在越来越多的情况下，人们并不从头开始训练网络：一个新的网络可以直接包含另一个已经训练过的网络，或者至少可以使用该网络为自己生成更多的训练样例。

有人可能会认为，每种特定的任务都需要不同的神经网络架构。但事实上，即使对于看似完全不同的任务，同样的架构通常也能够起作用。在某种程度上，这让人想起了通用计算（universal computation）的概念和我的计算等价性原理〈Principle of Computational Equivalence〉，但是，正如后面将讨论的那样，我认为这更多地反映了我们通常试图让神经网络去完成的任务是“类人”任务，而神经网络可以捕捉相当普遍的“类人过程”。

在神经网络的早期发展阶段，人们倾向于认为应该“让神经网络做尽可能少的事”。例如，在将语音转换为文本时，人们认为应该先分析语音的音频，再将其分解为音素，等等。但是后来发现，（至少对于“类人任务”)最好的方法通常是尝试训练神经网络来“解决端到端的问题”，让它自己“发现”必要的中间特征、编码等。

还有一种想法是，应该将复杂的独立组件引入神经网络，以便让它有效地“显式实现特定的算法思想”。但结果再次证明，这在大多数情况下并不值得；相反，最好只处理非常简单的组件，并让它们 “自我组织”〔尽管通常是以我们无法理解的方式〕来实现（可能）等效的算法思想。

这并不意味着没有与神经网络相关的“结构化思想”。例如，至少在处理图像的最初阶段，拥有局部连接的神经元二维数组似乎非常有用。而且，拥有专注于“在序列数据中‘回头看’”的连接模式在处理人类语言方面，例如在ChatGPT中，似乎很有用（后面我们将看到）。

神经网络的一个重要特征是，它们说到底只是在处理数据一和计算机一样。目前的神经网络及其训练方法具体处理的是由数值组成的数组，但在处理过程中，这些数组可以完全重新排列和重塑。例如，前面用于识别数字的网络从一个二维的“类图像”数组开始，迅速“增厚”为许多通道，但然后会“浓缩”成一个一维数组，最终包含的元素代表可能输出的不同数字。

但是，如何确定特定的任务需要多大的神经网络呢？这有点像一门艺术。在某种程度上，关键是要知道“任务有多难”。但是类人任务的难度通常很难估计。是的，可能有一种系统化的方法可以通过计算机来非常“机械”地完成任务，但是很难知道是否有一些技巧或捷径有助于更轻松地以“类人水平”完成任务。可能需要枚举一棵巨大的对策树才能“机械”地玩某个游戏，但也可能有一种更简单的（“启发式”）方法来实现“类人的游戏水平”。

当处理微小的神经网络和简单任务时，有时可以明确地看到“无法从这里到达那里”。例如，下面是在上一节任务中的几个小神经网络能够得到的最佳结果。

我们看到的是，如果神经网络太小，它就无法复现我们想要的函数。但是只要超过某个大小，它就没有问题了一一前提是至少训练足够长的时间，提供足够的样例。顺便说一句，这些图片说明了神经网络学问中的一点：如果中间有一个“挤压”〈squeeze〉，迫使一切都通过中间较少的神经元，那么通常可以使用较小的网络。 [值得一提的是，“无中间层”（或所谓的“感知机”）网络只能学习基本线性函数，但是只要有一个中间层（至少有足够的神经元），原则上就始终可以任意好地逼近任何函数，尽管为了使其可行地训练，通常会做某种规范化或正则化。]

好吧，假设我们已经确定了一种特定的神经网络架构。现在的问题是如何获取用于训练网络的数据。神经网络（及广义的机器学习）的许多实际挑战集中在获取或准备必要的训练数据上。在许多情况 (“监督学习”）下，需要获取明确的输入样例和期望的输出。例如，我们可能希望根据图像中的内容或其他属性添加标签，而浏览图像并添加标签通常需要耗费大量精力。不过很多时候，可以借助已有的内容或者将其用作所需内容的替代。例如，可以使用互联网上提供的alt标签。还有可能在不同的领域中使用为视频创建的隐藏式字幕。对于语言翻译训练，可以使用不同语言的平行网页或平行文档。

为特定的任务训练神经网络需要多少数据？根据第一性原则很难估计。使用“迁移学习”可以将已经在另一个神经网络中学习到的重要特征列表“迁移过来”，从而显著降低对数据规模的要求。但是，神经网络通常需要“看到很多样例”才能训练好。至少对于某些任务而言，神经网络学问中很重要的一点是，样例的重复可能超乎想象。事实上，不断地向神经网络展示所有的样例是一种标准策略。在每个“训练轮次”〈training round或epoch〉中，神经网络都会处于至少稍微不同的状态，而且向它“提醒”某个特定的样例对于它“记忆该样例”是有用的。（是的，这或许类似于重复在人类记忆中的有用性。）

然而，仅仅不断重复相同的样例并不够，还需要向神经网络展示样例的变化。神经网络学问的一个特点是，这些“数据增强”的变化并不一定要很复杂才有用。只需使用基本的图像处理方法稍微修改图像，即可使其在神经网络训练中基本上“像新的一样好”。与之类似，当人们在训练自动驾驶汽车时用完了实际的视频等数据，可以继续在模拟的游戏环境中获取数据，而不需要真实场景的所有细节。

那么ChatGPT呢？它有一个很好的特点，就是可以进行“无监督学习”，这样更容易获取训练样例。回想一下，ChatGPT的基本任务是弄清楚如何续写一段给定的文本。因此，要获得“训练样例”，要做的就是取一段文本，并将结尾遮盖起来，然后将其用作“训练的输入”，而“输出”则是未被遮盖的完整文本。我们稍后会更详细地讨论这个问题，这里的重点是一（与学习图像内容不同）不需要“明确的标签”，ChatGPT实际上可以直接从它得到的任何文本样例中学习。

神经网络的实际学习过程是怎样的呢？归根结底，核心在于确定哪些权重能够最好地捕捉给定的训练样例。有各种各样的详细选择和“超参数设置”(之所以这么叫，是因为权重也称为“参数”），可以用来调整如何进行学习。有不同的损失函数可以选择，如平方和、绝对值和，等等。有不同的损失最小化方法，如每一步在权重空间中移动多长的距离，等等。然后还有一些问题，比如“批量” (batch）展示多少个样例来获得要最小化的损失的连续估计。是的，我们可以（像在语言中所做的一样）应用机器学习来自动化机器学习，并自动设置超参数等。

最终，整个训练过程可以通过损失的减小趋势来描述（就像这个经过小型训练的Wolfram语言进度监视器一样)。

损失通常会在一段时间内逐渐减小，但最终会趋于某个恒定值。如果该值足够小，可以认为训练是成功的；否则可能暗示着需要尝试更改网络的架构。

能确定“学习曲线”要多久才能趋于平缓吗？似乎也存在—种取决于神经网络大小和数据量的近似幂律缩放关系。但总的结论是，训练神经网络很难，并且需要大量的计算工作。实际上，绝大部分工作是在处理数的数组，这正是GPU擅长的一’这也是为什么神经网络训练通常受限于可用的GPU数量。

未来，是否会有更好的方法来训练神经网络或者完成神经网络的任务呢？我认为答案几乎是肯定的。神经网络的基本思想是利用大量简单（本质上相同）的组件来创建一个灵活的“计算结构”，并使其能够逐步通过学习样例得到改进。在当前的神经网络中，基本上是利用微积分的思想（应用于实数）来进行这种逐步的改进。但越来越清楚的是，重点并不是拥有高精度数值，即使使用当前的方法，8位或更少的数也可能已经足够了。

对于像元胞自动机这样大体是在许多单独的位上进行并行操作的计算系统，虽然我们一直不明白如何进行这种增量改进，但没有理由认为这不可能实现。实际上，就像“2012年的深度学习突破”一样，这种增量改进在复杂情况下可能会比在简单情况下更容易实现。

神经网络（或许有点像大脑）被设置为具有一个基本固定的神经元网络，能改进的是它们之间连接的强度（“权重”)。（或许在年轻的大脑中，还可以产生大量全新的连接。）虽然这对生物学来说可能是一种方便的设置，但并不清楚它是否是实现我们所需功能的最佳方式。涉及渐进式网络重写的东西（可能类似于我们的物理项目）可能最终会做得更好。

但即使仅在现有神经网络的框架内，也仍然存在一个关键限制：神经网络的训练目前基本上是顺序进行的，每批样例的影响都会被反向传播以更新权重。事实上，就目前的计算机硬件而言，即使考虑到神经网络的大部分在训练期间的大部分时间里也是“空闲”的，一次只有一个部分被更新。从某种意义上说，这是因为当前的计算机往往具有独立于CPU (或GPU）的内存。但大脑中的情况可能不同一一每个“记忆元素”(即神经元）也是一个潜在的活跃的计算元素。如果我们能够这样设置未来的计算机硬件，就可能会更高效地进行训练。

“足够大的神经网络当然无所不能!”

ChatGPT的能力令人印象深刻，以至于人们可能会想象，如果能够在此基础上继续努力，训练出越来越大的神经网络，那么它们最终将“无所不能”。对于那些容易被人类思维理解的事物，这确实很可能是成立的。但我们从科学在过去几百年间的发展中得出的教训是，有些事物虽然可以通过形式化的过程来弄清楚，但并不容易立即为人类思维所理解。

非平凡的数学就是一个很好的例子，但实际而言，一般的例子是计算。最终的问题是计算不可约性。有些计算虽然可能需要很多步才能完成，但实际上可以“简化”为相当直接的东西。但计算不可约性的发现意味着这并不总是有效的。对于一些过程〈可能像下面的例子一样〉，无论如何都必须回溯每个计算步骤才能弄清楚发生了什么。

我们通常用大脑做的那类事情，大概是为了避免计算不可约性而特意选择的。在大脑中进行数学运算需要特殊的努力。而且在实践中，仅凭大脑几乎无法“想透”任何非平凡程序的操作步骤。

当然，我们可以用计算机来做这些。有了计算机，就可以轻松地完成耗时很长、计算不可约的任务。关键是，完成这些任务一般来说没有捷径可走。

是的，我们可以记住在某个特定计算系统中发生的事情的许多具体例子，也许甚至可以看到一些（计算可约的)模式，使我们能够做一些泛化。但关键是，计算不可约性意味着我们永远不能保证意外不会发生一一只有通过明确的计算，才能知道在任何特定的情况下会实际发生什么。

说到底，可学习性和计算不可约性之间存在根本的矛盾。学习实际上涉及通过利用规律来压缩数据，但计算不可约性意味着最终对可能存在的规律有一个限制。

在实践中，人们可以想象将（像元胞自动机或图灵机这样的）小型计算设备构建到可训练的神经网络系统中。实际上，这样的设备可以成为神经网络的好“工具”，就像Wolfram|Alpha可以成为ChatGPT的好工具一样。但是计算不可约性意味着人们不能指望 “进入”这些设备并让它们学习。

换句话说，能力和可训练性之间存在着一个终极权衡：你越想让一个系统“真正利用”其计算能力，它就越会表现出计算不可约性，从而越不容易被训练；而它在本质上越易于训练，就越不能进行复杂的计算。
(对于当前的ChatGPT，情况实际上要极端得多，因为用于生成每个输出标记的神经网络都是纯“前馈”网络、没有循环，因此无法使用非平凡“控制流”进行任何计算。）

当然，你可能会问，能够进行不可约计算是否真的很重要。实际上，在人类历史的大部分时间里，这并不是特别重要。但我们的现代技术世界是建立在工程学的基础上的，而工程学利用了数学计算，并且越来越多地利用了更一般的计算。看看自然界，会发现它充满了不可约计算^我们正在慢慢地理解如何模拟和利用它们来达到我们的技术目的。

神经网络确实可以注意到自然界中我们通过“无辅助的人类思维” 也能轻易注意到的规律。但是，如果我们想解决数学或计算科学领域的问题，神经网络将无法完成任务，除非它能有效地使用一个 “普通”的计算系统作为“工具”。

但是，这一切可能会带来一些潜在的困惑。过去，我们认为计算机完成很多任务（包括写文章)在“本质上太难了”。现在我们看到像ChatGPT这样的系统能够完成这些任务，会倾向于突然认为计算机一定变得更加强大了，特别是在它们已经基本能够完成的事情 (比如逐步计算元胞自动机等计算系统的行为）上实现了超越。

但这并不是正确的结论。计算不可约过程仍然是计算不可约的，对于计算机来说仍然很困难，即使计算机可以轻松计算其中的每一步。我们应该得出的结论是，（像写文章这样）人类可以做到但认为计算机无法做到的任务，在某种意义上计算起来实际上比我们想象的更容易。

换句话说，神经网络能够在写文章的任务中获得成功的原因是，写文章实际上是一个“计算深度较浅”的问题，比我们想象的简单。从某种意义上讲，这使我们距离对于人类如何处理类似于写文章的事情（处理语言)“拥有一种理论”更近了一步。

如果有一个足够大的神经网络，那么你可能能够做到人类可以轻易做到的任何事情。但是你无法捕捉自然界一般而言可以做到的事情，或者我们用自然界塑造的工具可以做到的事情。而正是这些工具的使用，无论是实用性的还是概念性的，近几个世纪以来使我们超越了“纯粹的无辅助的人类思维”的界限，为人类获取了物理宇宙和计算宇宙之外的很多东西。

嵌入”的概念

神经网络，至少以目前的设置来说，基本上是基于数的。因此，如果要用它来处理像文本这样的东西，我们需要一种用数表示文本的方法。当然，我们可以（本质上和ChatGPT 一样）从为字典中的每个词分配一个数开始。但有一个重要的思想一也是ChatGPT的中心思想一更胜一筹。这就是“嵌入”（embedding）的思想。可以将嵌入视为一种尝试通过数的数组来表示某些东西“本质”的方法，其特性是“相近的事物”由相近的数表示。

例如，我们可以将词嵌入视为试图在一种“意义空间”中布局词, 其中“在意义上相近”的词会出现在相近的位置。实际使用的嵌入（例如在ChatGPT中）往往涉及大量数字列表。但如果将其投影到二维平面上，则可以展示嵌入对词的布局方式。

可以看到，这确实非常成功地捕捉了我们典型的日常印象。但是如何才能构建这样的嵌入呢？大致的想法是查看大量的文本〔这里查看了来自互联网的50亿个词〕，然后看看各个词出现的“环境”有多“相似”。例如，alligator(短吻鳄）和crocodile（鳄鱼）在相似的句子中经常几乎可以互换，这意味着它们将在嵌入中被放在相近的位置。但是，turnip（芜菁）和eagle（鹰）一般不会出现在相似的句子中，因此将在嵌入中相距很远。

如何使用神经网络实际实现这样的机制呢？让我们从讨论图像的嵌入而非词嵌入开始。我们希望找到一种以数字列表来表征图像的方法，以便为“我们认为相似的图像”分配相似的数字列表。

如何判断我们是否应该“认为图像相似”呢？对于手写数字图像，如果两个图像是同一个数字，我们就可能会认为它们是相似的。前面，我们讨论了一个被训练用于识别手写数字的神经网络。可以将这个神经网络看作被设置成在最终输出中将图像放入10个不同的箱（bin）中，每个箱对应一个数字。

如果在神经网络做出“这是4”的最终决策之前“拦截”其内部进程，会发生什么呢？我们可能会期望，神经网络内部有一些数值，将图像表征为“大部分类似于4但有点类似于2”。想法是获取这些数值并将其作为嵌入中的元素使用。

这里的关键概念是，我们不直接尝试表征“哪个图像接近哪个图像”，而是考虑一个定义良好、可以获取明确的训练数据的任务 (这里是数字识别)，然后利用如下事实：在完成这个任务时，神经网络隐含地必须做出相当于“接近度决策”的决策。因此，我们不需要明确地谈论“图像的接近度”，而是只谈论图像代表什么数字的具体问题，然后“让神经网络”隐含地确定这对于“图像的接近度”意味着什么。

对于数字识别网络来说，这是如何具体操作的呢？我们可以将该网络想象成由11个连续的层组成，并做如下简化（将激活函数显示为单独的层）。

在开始，我们将实际图像输入第一层，这些图像由其像素值的二维数组表示。在最后，我们（从最后一层)得到一个包含10个值的数组，可以认为这些值表示网络对图像与数字0到9的对应关系的确定程度。

输入图像4，最后一层中神经元的值为

{1.42071×10^-22, 7.69857×10^-14,1.9653×10^-16, 5.55229×10^-21, 1., 8.33841×10^-14, 6.89742×10^-17,6.52282×10^-19, 6.51465×10^-12, 1.97509×10^-14)

换句话说，神经网络现在“非常确定”这个图像是一个4——为了得到输出的4，我们只需要找出具有最大值的神经元的位置。

如果我们再往前看一步呢？网络中的最后一个操作是所谓的softmax，它试图“强制推出确定性”。在此之前，神经元的值是

{-26.134, -6.02347, -11.994, -22.4684, 24.1717, -5.94363, -13.0411, -17.7021, -1.58528, -7.38389}

代表数字4的神经元仍然具有最大的数值，但是其他神经元的值中也有信息。我们可以期望这个数字列表在某种程度上能用来表征图像的“本质”，从而提供可以用作嵌入的东西。例如，这里的每个 4都具有略微不同的“签名”（或“特征嵌入”），与8完全不同。

这里，我们基本上是用10个数来描述图像的。但使用更多的数通常更好。例如，在我们的数字识别网络中，可以通过接入前一层来获取一个包含500个数的数组。这可能是一个可以用作“图像嵌入”的合理数组。

如果想要对手写数字的“图像空间”进行明确的可视化，需要将我们得到的500维向量投影到（例如）三维空间中来有效地“降维”。

我们刚刚谈论了为图像创建特征（并嵌入）的方法，它的基础实际上是通过（根据我们的训练集〉确定一些图像是否对应于同一个手写数字来识别它们的相似性。如果我们有一个训练集，可以识别每个图像属于5000种常见物体（如猫、狗、椅子……）中的哪一种，就可以做更多这样的事情。这样，就能以我们对常见物体的识别为“锚点”创建一个图像嵌入，然后根据神经网络的行为“围绕它进行泛化”。关键是，这种行为只要与我们人类感知和解读图像的方式一致，就将最终成为一种“我们认为正确”且在实践中对执行 “类人判断”的任务有用的嵌入。

那么如采用相同的方法来找到对词的嵌入呢？关键在于，要从一个我们可以轻松训练的任务开始。一个这样的标准任务是词预测。想象一下，给定问题“the_cat”。基于一个大型文本语料库，比如互联网上的文本内容，可能用来“填空”的各个词的概率分别是多少？或者给定“_black_”，不同的“两侧词”的概率分别是多少？

如何为神经网络设置这个问题呢？最终，我们必须用数来表述一切。一种方法是为英语中约50000个常用词分别分配一个唯一的数。例如，分配给the的可能是914,分配给cat的的可能是3542。（这些是GPT-2实际使用的数。）因此，对于“the_cat”的问题，我们的输入可能是丨914, 3542丨。输出应该是什么样的呢？应该是一个大约包含500000个数的列表，有效地给出了每个可能“填入”的词的概率。为了找到嵌入，我们再次在神经网络“得到结论”之前“拦截”它的“内部”进程，然后获取此时的数字列表，可以认为这是“每个词的表征”。

这些表征是什么样子的呢？在过去10年里，已经出现了一系列不同的系统(word2vec、G1oVe、BERT、GPT……），每个系统都基于一种不同的神经网络方法。但最终，所有这些系统都是通过有几百到几千个数的列表对词进行表征的。

这些“嵌入向量”在其原始形式下是几乎无信息的。例如，下面是为三个特定的词生成的原始嵌入向量。

如果测量这些向量之间的距离，就可以找到词之间的“相似度”。我们稍后将更详细地讨论这种嵌入的“认知”意义可能是什么，而现在的要点是，我们有一种有用的方法能将词转化为“对神经网络友好”的数字集合。

实际上，比起用一系列数对词进行表征，我们还可以做得更好一可以对词序列甚至整个文本块进行这样的表征。ChatGPT内部就是这样进行处理的。它会获取到目前为止的所有文本，并生成一个嵌入向量来表示它。然后，它的目标就是找到下一个可能出现的各个词的概率。它会将答案表示为一个数字列表，这些数基本上给出了大约50000个可能出现的词的概率。

[严格来说，ChatGPT并不处理词，而是处理“标记”（token）——这是一种方便的语言单位，既可以是整个词，也可以只是像pre、ing或ized这样的片段。使用标记使ChatGPT更容易处理罕见词、复合词和非英语词，并且会发明新单词（不论结果好坏）。]

ChatGPT的内部原理

我们终于准备好讨论ChatGPT的内部原理了。从根本上说，ChatGPT是一个庞大的神经网络——GPT-3拥有1750亿个权重。它在许多方面非常像我们讨论过的其他神经网络，只不过是一个特别为处理语言而设置的神经网络。它最显著的特点是一个称为Transformer的神经网络架构。

在前面讨论的神经网络中，任何给定层的每个神经元基本上都与上一层的每个神经元相连（起码有一些权重）。但是，如果处理的数据具有特定的已知结构，则这种全连接网络就（可能）大材小用了。因此，以图像处理的早期阶段为例，通常使用所谓的卷积神经网络(convolutional neural net或convnet)，其中的神经元被有效地布局在类似于图像像素的网格上，并且仅与在网格上相邻的神经元相连。

Transformer的思想是，为组成一段文本的标记序列做与此相似的事情。但是，Transformer不是仅仅定义了序列中可以连接的固定区域，而是引入了“注意力”的概念一即更多地“关注”序列的某些部分，而不是其他部分。也许在将来的某一天，可以启动一个通用神经网络并通过训练来完成所有的定制工作。但至少目前来看，在实践中将事物“模块化”似乎是至关重要的——就像Transformer所做的那样，也可能是我们的大脑所做的那样。

ChatGPT（或者说它基于的GPT-3网络）到底是在做什么呢？它的总体目标是，根据所接受的训练（查看来自互联网的数十亿页文本，等等〉，以“合理”的方式续写文本。所以在任意给定时刻，它都有一定量的文本，而目标是为要添加的下一个标记做出适当的选择。

它的操作分为三个基本阶段。第一阶段，它获取与目前的文本相对应的标记序列，并找到表示这些标记的一个嵌入（即由数组成的数组兑第二阶段，它以“标准的神经网络的方式”对此嵌入进行操作，值“像涟漪一样依次通过”网络中的各层，从而产生一个新的嵌入（即一个新的数组第三阶段，它获取此数组的最后一部分，并据此生成包含约50000个值的数组，这些值就成了各个可能的下一个标记的概率。（没错，使用的标记数量恰好与英语常用词的数量相当，尽管其中只有约3000个标记是完整的词，其余的则是片段。）

关键是，这条流水线的每个部分都由一个神经网络实现，其权重是通过对神经网络进行端到端的训练确定的。换句话说，除了整体架构，实际上没有任何细节是有“明确设计”的，一切都是从训练数据中“学习”来的。

然而，架构设置中存在很多细节，反映了各种经验和神经网络的学问。尽管会变得很复杂，但我认为谈论一些细节是有用的，至少有助于了解构建需要做多少工作。

首先是嵌入模块。以下是GPT-2的Wolfram语言示意图。

输入是一个包含n个〈由整数1到大约50 000表示的〉标记的向量。每个标记都（通过一个单层神经网络）被转换为一个嵌入向量 (在中长度为768,在ChatGPT-2的GPT-3中长度为12288 同时，还有一条“二级路径”，它接收标记的（整数〉位置序列，并根据这些整数创建另一个嵌入向量。最后，将标记值和标记位置的嵌入向量相加，产生嵌入糢块的最终嵌入向量序列。

为什么只是将标记值和标记位置的嵌入向量相加呢？我不认为有什么特别的科学依据。只是因为尝试了各种不同的方法，而这种方法似乎行得通。此外，神经网络的学问告诉我们，（在某种意义上）只要我们的设置“大致正确”，通常就可以通过足够的训练来确定细节，而不需要真正“在工程层面上理解”神经网络是如何配置自己的。

嵌入模块对字符串“hello hello hello hello hello hello hello hello hello hello bye bye bye bye bye bye bye bye bye)所做的操作如下所示。

(在上图的第一个数组中）每个标记的嵌入向量元素都在图中纵向显示，而从左往右看，首先是一系列hello嵌入，然后是一系列bye嵌入。上面的第二个数组是位置嵌入，它看起来有些随机的结构只是（这里是在中〉“碰巧学到”的。

在嵌入模块之后，就是Transformer的“主要事件”了：一系列所谓的“注意力块”〈GPT-2有12个，ChatGPT的GPT-3有96个〉。

整个过程非常复杂，让人想起难以理解的大型工程系统或者生物系统。以下是(GPT-2中）单个“注意力块”的示意图。

在每个这样的注意力块中，都有―组“注意力头”〈GPT-2有12个，ChatGPT的GPT-3有96个〉——每个都独立地在嵌入向量的不同值块上进行操作。（我们不知道为什么最好将嵌入向量分成不同的部分，也不知道不同的部分“意味”着什么。这只是那些“被发现奏效”的事情之一。）

注意力头是做什么的呢？它们基本上是一种在标记序列（即目前已经生成的文本〉中进行“回顾”的方式，能以一种有用的形式“打包过去的内容”，以便找到下一个标记。在“概率从何而来”一节中，我们介绍了使用二元词的概率来根据上一个词选择下一个词。Transformer中的“注意力”机制所做的是允许“关注”更早的词，因此可能捕捉到（例如〉动词可以如何被联系到出现在句子中很多词之前的名词。

更详细地说，注意力头所做的是，使用一定的权重重新加权组合与不同标记相关联的嵌入向量中的块。例如，对于上面的“hello，bye”字符串，(GPT-2中）第一个注意力块中的12个注意力头具有以下（“回顾到标记序列开头”的）“权重重组”模式。

经过注意力头的处理，得到的“重新加权的嵌入向量”（在GPT-2中长度为768，在GPT-3中长度为12288）将被传递通过标准的“全连接”神经网络层。虽然很难掌握这一层的作用，但是可以看看它（这里是在GPT-2中）使用的768 X 768权重矩阵。

(对上图进行）4 X 64的滑动平均处理，一些（随机游走式的）结构开始显现。

是什么决定了这种结构？说到底，可能是对人类语言特征的一些 “神经网络编码”。但是到目前为止，这些特征到底是什么仍是未知的。实际上，我们正在“打开ChatGPT(或者至少是的GPT-2）大脑”，并发现里面很复杂、难以理解——尽管它最终产生了可识别的人类语言。

经过一个注意力块后，我们得到了一个新的嵌入向量，然后让它依次通过其他的注意力块（GPT-2中共有12个，GPT-3中共有96个）。每个注意力块都有自己特定的“注意力”模式和“全连接” 权重。这里是GPT-2对于“hello，bye”输入的注意力权重序列，用于第一个注意力头。

全连接层的（移动平均）“矩阵”如下所示。

奇怪的是，尽管不同注意力块中的“权重矩阵”看起来非常相似，徂权重大小的分布可能会有所不同〈而且并不总是服从髙斯分布）。

在经过所有这些注意力块后，Transformer的实际效果是什么？本质上，它将标记序列的原始嵌入集合转换为最终集合。ChatGPT的特定工作方式是，选择此集合中的最后一个嵌入，并对其进行“解码”，以生成应该出现的下一个标记的概率列表。

以上就是对ChatGPT内部原理的概述。它虽然（由于许多难免有些随意的“工程选择”）可能看起来很复杂，但实际上涉及的最终元素非常简单。因为我们最终处理的只是由“人工神经元”构成的神经网络，而每个神经元执行的只是将一组数值输入与一定的权重相结合的简单操作。

ChatGPT的原始输入是一个由数组成的数组（到目前为止标记的嵌入向量）。当ChatGPT“运行”以产生新标记时，这些数就会“依次通过”神经网络的各层，而每个神经元都会“做好本职工作”并将结果传递给下一层的神经元。没有循环和“回顾”。—切都是在网络中“向前馈送”的。

这是与典型的计算系统（如图灵机）完全不同的设置——在这里，结果不会被同一个计算元素“反复处理”。至少在生成给定的输出标记时，每个计算元素（神经元）仅使用了一次。

但是在某种意义上，即使在ChatGPT中，仍然存在一个重复使用计算元素的“外部循环”。因为当ChatGPT要生成一个新的标记时，它总是“读取”〈即获取为输入〉之前的整个标记序列，包括ChatGPT自己先前“写入”的标记。我们可以认为这种设置意味着确实，至少在其最外层，包含一个“反馈循环”，尽管其中的每次迭代都明确显示为它所生成文本中的一个标记。

让我们回到ChatGPT的核心：神经网络被反复用于生成每个标记。在某种程度上，它非常简单：就是完全相同的人工神经元的一个集合。网络的某些部分仅由（“全连接”的）神经元层组成，其中给定层的每个神经元都与上一层的每个神经元（以某种权重〉相连。但是由于特别的架构，ChatGPT的一些部分具有其他的结构，其中仅连接不同层的特定神经元。（当然，仍然可以说 “所有神经元都连接在一起”，但有些连接的权重为零。）

此外，ChatGPT中神经网络的有些方面并不能被顺理成章地认为只由“同质”层组成。例如，（正如本节中单个“注意力块”的示意图所示）在注意力块内有一些对传入的数据“制作多个副本”的地方，每个副本都会通过不同的“处理路径”，可能涉及不同数量的层，然后才被重新组合。虽然这可能简便地表示了正在发生的事情，但至少原则上总是可以将事实考虑为“密集填充”各层，只是有一些权重为零。

看一下ChatGPT最长的路径，会发现大约有400个（核心）层——在某种程度上看来并不是很多。但是它们包括数百万个神经元，总共有1750亿个连接，因此有1750亿个权重。需要认识到的一件事是，ChatGPT每生成一个新的标记，都必须进行一次包括所有这些权重在内的计算。在实现上，这些计算可以“按层”组织成高度并行的数组操作，方便地在上完成。但是对于每个产生的标记，仍然需要进行1750亿次计算（并在最后进行一些额外的计算）——因此，不难理解使用ChatGPT生成一段长文本需要一些时间。

值得注意的是，所有这些操作——尽管各自都很简单——可以一起出色地完成生成文本的“类人”工作。必须再次强调，（至少就我们目前所知〕没有“理论上的终极原因”可以解释为什么类似于这样的东西能够起作用。事实上，正如我们将讨论的那样，我认为必须将其视为一项（可能非常惊人的）科学发现：在像ChatGPT这样的神经网络中，能以某种方式捕捉到人类大脑在生成语言时所做事情的本质。

ChatGPT的训练

我们已经概述了ChatGPT在设置后的工作方式。但是它是如何设置的呢？那1750亿个神经元的权重是如何确定的呢？基本上，这是基于包含人类所写文本的巨型语料库（来自互联网、书籍等〉，通过大规模训练得出的结果。正如我们所说，即使有所有这些训练数据，也不能肯定神经网络能够成功地产生“类人”文本。似乎需要细致的工程设计才能实现这一点。但是，ChatGPT带来的一大惊喜和发现是，它完全可以做到。实际上，“只有1750亿个权重”的神经网络就可以构建出人类所写文本的一个“合理模型”。

现代社会中，人类写的很多文本以数字（digital）形式存在。公共互联网上至少有数十亿个包含人类所写文本的网页，总词数可能达到万亿级别。如果包括非公开的网页，词数可能会增加至少100 倍。到目前为止，已经有超过500万本电子书可供阅读（全球发行的图书品种总数为1亿左右〉，提供了另外约1000亿个词的文本。这还不包括视频中的口述文本等。（就个人而言，我一生中发表的文字总量不到300万个词，在过去30年中写下了约1500万个词的电子邮件，总共敲了大约5000万个词一而且仅在过去几年的直播中，我就说了超过1000万个词。是的，我会从中训练一个机器人。）

但是，有了所有这些数据，要如何训练神经网络呢？基本过程与上面讨论的简单示例非常相似：先提供一批样例，然后调整网络中的权重，以最小化网络在这些样例上的误差（“损失”)。根据误差“反向传播”的主要问题在于，每次执行此操作时，网络中的每个权重通常都至少会发生微小的变化，而且有很多权重需要处理。〈实际的“反向传播”通常只比前向传播难—点儿一一相差一个很小的常数系数。）使用现代硬件，可以轻松地从成千上万个样例中并行计算出结果。但是，当涉及实际更新神经网络中的权重时，当前的方法基本上会要求逐批进行。（是的，这可能是结合了计算元素和记忆元素的真实大脑至少在现阶段具有架构优势的地方。）即使在学习数值函数这样看似简单的案例中，我们通常也需要使用数百万个样例才能成功地训练网络，至少对于从头开始训练来说是这样的。那么需要多少样例才能训练出“类人语言”模型呢？似乎无法通过任何基本的“理论”方法知道。但在实践中，ChatGPT成功地在包含几百亿个词的文本上完成了训练。

虽然有些文本被输入了多次，有些只输入了一次，但ChatGPT从它看到的文本中“得到了所需的信息”。考虑到有这么多文本需要学习，它需要多大的网络才能“学得好”呢？目前，我们还没有基本的理论方法来回答这个问题。最终，就像下面将进一步讨论的那样，对于人类语言和人类通常用它说什么，可能有某种“总体算法内容”。而下一个问题是：神经网络在基于该算法内容实现模型时会有多高效？我们还是不知道，尽管ChatGPT的成功表明它是相当高效的。

最终，我们只需注意到ChatGPT使用了近2000亿个权重来完成其工作一数愤与其接受的训练数据中的词（或标记）的总数相当。在某些方面，运作良好的“网络的规模”与“训练数据的规模”如此相似或许令人惊讶（在与ChatGPT结构相似的较小网络中实际观察到的情况也丛如此）。毕竟，ChatGPT内部并没有直接存储来自互联网、书籍等的所有文本。因为ChatGP内部实际上是一堆数 (精度不到10位），它们是所有文本的总体结构的某种分布式编码。

换句话说，我们可以问人类语言的“有效信息”是什么，以及人类通常用它说些什么。我们有语言样例的原始语料库。在ChatGPT的神经网络中，还有对它们的表示。这些表示很可能远非“算法上最小”的表示，正如下面将讨论的那样。但它们是神经网络可以轻松使用的表示。在这种表示中，训练数据的“压缩”程度似乎很低。平均而言，似乎只需要不到一个神经网络的权重就可以承载一个词的训练数据的“信息内容”。

当我们运行ChatGPT来生成文本时，基本上每个权重都需要使用一次。因此，如果有n个权重，就需要执行约n个计算步骤一尽管在实践中，许多计算步骤通常可以在GPU中并行执行。但是，如果需要约n个词的训练数据来设置这些权重，那么如上所述，我们可以得出结论：需要约n²个计算步骤来进行网络的训练。这就是为什么使用当前的方法最终需要耗费数十亿美元来进行训练。

在基础训练之外

训练ChatGPT的重头戏是在向其“展示”来自互联网、书籍等的大量现有文本，但事实证明训练还包括另一个（显然非常重要的）部分。

一旦根据被展示的原始文本语料库完成“原始训练”，ChatGPT内部的神经网络就会准备幵始生成自己的文本，根据提示续写，等等。尽管这些结果通常看起来合理，但它们很容易〈特别是在较长的文本片段中〉以“非类人”的方式“偏离正轨”。这不是通过对文本进行传统的统计可以轻易检测到的。但是，实际阅读文本的人很容易注意到。

构建ChatGPT的一个关键思想是，在“被动阅读”来自互联网等的内容之后添加一步：让人类积极地与ChatGPT互动，看看它产生了什么，并且在“如何成为一个好的聊天机器人”方面给予实际反馈。但是神经网络是如何利用这些反馈的呢？首先，仅仅让人类对神经网络的结果评分。然后，建立另一个神经网络模型来预测这些评分。现在，这个预测模型可以在原始网络上运行一一本质上像损失函数一样一一从而使用人类的反馈对原始网络进行“调优”。实践中的结果似乎对系统能否成功产生“类人”输出有很大的影响。

总的来说，有趣的是，“原本训练好的网络”似乎只需要很少的“介入”就能在特定方向上有效地进步。有人可能原本认为，为了让网络表现得好像学到了新东西，就必须为其训练算法、调整权重，等等。

但事实并非如此。相反，基本上只需要把东西告诉ChatGPT一次——作为提示的一部分——它就可以成功用其生成文本。再次强调，我认为这种方法有效的事实是理解ChatGPT“实际上在做什么”以及它与人类语言和思维结构之间关系的重要线索。

它确实有些类人：至少在经过所有预训练后，你只需要把东西告诉它一次，它就能“记住”一至少记住足够长的时间来生成一段文本。这里面到底发生了什么事呢？也许“你可能告诉它的一切都已经在里面的某个地方了”，你只是把它引导到了正确的位置。但这似乎不太可能。更可能的是，虽然这些元素已经在里面了，但具体情况是由类似于“这些元素之间的轨迹”所定义的，而你告诉它的就是这条轨迹。

就像人类一样，如果ChatGPT接收到一些匪夷所思、出乎意料、完全不符合它已有框架的东西，它就似乎无法成功地“整合”这些信息。只有在这些信息基本上以一种相对简单的方式依赖于它已有的框架时，它才能够进行“整合”。

值得再次指出的是，神经网络在捕捉信息方面不可避免地存在“算法限制”。如果告诉它类似于“从这个到那个”等“浅显”的规则，神经网络很可能能够不错地表示和重现这些规则，并且它“已经掌握”的语言知识将为其提供一个立即可用的模式。但是，如果试图给它实际的“深度”计算规则，涉及许多可能计算不可约的步骤，那么它就行不通了。（请记住，它在每一步都只是在网络中“向前馈送数据”，除非生成新的标记，否则它不会循环。〉

当然，神经网络可以学习特定的“不可约”计算的答案。但是，一旦存在可能性的组合数，这种“表查找式”的方法就不起作用了。因此，就像人类一样，神经网络此时需要使用真正的计算工具。 (没错，Wolfram|Alpha和Wolfram语言就非常适用，因为它们正是被构建用于“谈论世界中的事物”的，就像语言模型神经网络 —样。）

真正让ChatGPT发挥作用的是什么

人类语言，及其生成所涉及的思维过程，一直被视为复杂性的巅峰。人类大脑“仅”有约1000亿个神经元（及约100万亿个连接〉，却能够做到这一切，确实令人惊叹。人们可能会认为，大脑中不只有神经元网络，还有某种具有尚未发现的物理特性的新层。但是有了ChatGPT之后，我们得到了一条重要的新信息：一个连接数与大脑神经元数量相当的纯粹的人工神经网络，就能够出色地生成人类语言。

这仍然是一个庞大而复杂的系统，其中的神经网络权重几乎与当前世界上可用文本中的词一样多。但在某种程度上，似乎仍然很难相信语言的所有丰富性和它能谈论的事物都可以被封装在这样一个有限的系统中。这里面的部分原理无疑反映了一个普遍现象[这个现象最早在规则30_{（是本书作者在1983年提出的单维二进制元胞自动机规则。这个简单、已知的规则能够产生复杂且看上去随机的模式）}的例子中变得显而易见]：即使基础规则很简单，计算过程也可以极大地放大系统的表面复杂性。但是，正如上面讨论的那样，ChatGPT使用的这种神经网络实际上往往是特别构建的，以限制这种现象（以及与之相关的计算不可约性）的影响，从而使它们更易于训练。

那么，ChatGPT是如何在语言方面获得如此巨大成功的呢？我认为基本答案是，语言在根本上比它看起来更简单。这意味着，即使是具有简单的神经网络结构的ChatGPT，也能够成功地捕捉人类语言的“本质”和背后的思维方式。此外，在训练过程中，ChatGPT已经通过某种方式“隐含地发现” 了使这一切成为可能的语言（和思维）规律。

我认为，ChatGPT的成功为一个基础而重要的科学事实向我们提供了证据：它表明我们仍然可以期待能够发现重大的新“语言法则”，实际上是“思维法则”。在ChatGPT中，由于它是一个神经网络，这些法则最多只是隐含的但是，如果我们能够通过某种方式使这些法则变得明确，那么就有可能以更直接、更高效和更透明的方式做出ChatGPT所做的那些事情。

这些法则可能是什么样子的呢？最终，它们必须为我们提供某种关于如何组织语言及其表达方式的指导。我们稍后将讨论“在 ChatGPT内部”可能如何找到一些线索，并根据构建计算语言的经验探索前进的道路。但首先，让我们讨论两个早已知晓的“语言法则”的例子，以及它们与ChatGPT的运作有何关系。

第一个是语言的语法。语言不仅仅是把一些词随机拼凑在一起。相反，不同类型的词之间有相当明确的语法规则。例如，在英语中，名词的前面可以有形容词、后面可以有动词，但是两个名词通常不能挨在一起。这样的语法结构可以通过一组规则来（至少大致地）捕捉，这些规则定义了如何组织所谓的“解析树”。

ChatGPT并不明确地“了解”这些规则。但在训练过程中，它隐含地发现了这些规则，并且似乎擅长遵守它们。这里的原理是什么呢？在“宏观”上还不清楚。但是为了获得一些见解，也许可以看看一个更简单的例子。

考虑一种由“(”和“)”的序列组成的“语言”，其语法规定括号应始终保持平衡，就像下面的解析树—样。

我们能训练神经网络来生成“语法正确”的括号序列吗？在神经网络中，有各种处理序列的方法，但是这里像ChatGPT一样使用Transformer网络。给定一个简单的Transformer网络，我们可以首先向它馈送语法正确的括号序列作为训练样例。一个微妙之处（实际上也出现在ChatGPT的人类语言生成中）是，除了我们的“内容标记”〔这里是“(”和“)”）之外，还必须包括一个“end”标记，表示输出不应继续下去了〈即对于ChatGPT来说，已经到达了 “故事的结尾”）。

如果只使用一个有8个头的注意力块和长度为128的特征向量来设置Transformer网络（ChatGPT也使用长度为128的特征向量，但有96个注意力块，每个块有96个头），似乎不可能让它学会括号语言。但是使用2个注意力块，学习过程似乎会收敛——至少在给出1000万个样例之后（并且，与Transformer网络一样，展示更多的样例似乎只会降低其性能）。

通过这个网络，我们可以做类似于ChatGPT所做的事情，询问括号序列中下一个符号是什么的概率。

在第一种情况下，网络“非常确定”序列不能在此结束——这很好，因为如果在此结束，括号将不平衡。在第二种情况下，网络 “正确地识别出”序列可以在此结束，尽管它也“指出”可以“重新开始下一个标记是“(”，后面可能紧接着一个“)”。但糟糕的是，即使有大约400 000个经过繁重训练的权重，它仍然说下一个标记是“)”的概率是15%——这是不正确的，因为这必然会导致括号不平衡。

如果要求网络以最高概率补全逐渐变长的“(”序列，结果将如下所示。

在一定长度内，网络是可以正常工作的。但是一旦超出这个长度，它就开始出错。这是在神经网络（或广义的机器学习〉等“精确” 情况下经常出现的典型问题。对于人类“一眼就能解决”的问题，

神经网络也可以解决。但对于需要执行“更算法式”操作的问题(例如明确计算括号是否闭合），神经网络往往会“计算过浅”，难以可靠地解决。顺便说一句，即使是当前完整的ChatGPT在长序列中也很难正确地匹配括号。

对于像ChatGPT这样的程序和英语等语言的语法来说，这意味着什么呢？括号语言是“严谨”的，而且是“算法式”的。而在英语中，根据局部选词和其他提示“猜测”语法上合适的内容更为现实。是的，神经网络在这方面做得要好得多一一尽管它可能会错过某些“形式上正确”的情况，但这也是人类可能会错过的。重点是，语肓存在整体的句法结构，而且它蕴含猎规律性。从某种意义上说，这限制了神经网络滿要学习的内容“多少”。一个关键的“类自然科学”观察结果是，神经网络的Transformer架构，就像ChatGPT中的这个，好像成功地学会了似乎在所有人类语言中都存在（至少在某种程度上是近似的）的嵌套树状的句法结构。

语法为语言提供了一种约束，但显然还有更多限制。像“Inquisitive electrons eat blue theories for fish”〈好奇的电子为了鱼吃蓝色的理论）这样的句子虽然在语法上是正确的，但不是人们通常会说的话。ChatGPT即使生成了它，也不会被认为是成功的一因为用其中的词的正常含义解读的话，它基本上是毫无意义的。

有没有一种通用的方法来判断一个句子是否有意义呢？这方面没有传统的总体理论。但是可以认为，在用来自互联网等处的数十亿个 (应该有意义的）句子对ChatGPT进行训练后，它已经隐含地“发展出”了一个这样的“理论”。

这个理论会是什么样的呢？它的冰山一角基本上已经为人所知了 2000多年，那就是逻辑。在亚里士多德发现的三段论（syllogistic) 形式中，逻辑基本上用来说明遵循一定模式的句子是合理的，而其他句子则不合理。例如，说“所有X都是Y。这不是Y，所以它不是X”（比如“所有的鱼都是蓝色的。这不是蓝色的，所以它不是鱼” 是合理的。就像可以异想天开地想象亚里士多德是通过（“机器学习式”地）研究大量修辞学例子来发现三段论逻辑一样，也可以想象ChatGPT在训练中通过查看来自互联网等的大量文本能够 “发现三段论逻辑”。（虽然可以预期ChatGPT会基于三段论逻辑等产生包含“正确推理”的文本，但是当涉及更复杂的形式逻辑时，情况就完全不同了。我认为可以预期它在这里失败，原因与它在括号匹配上失败的原因相同。〉

除了逻辑的例子之外，关于如何系统地构建（或识别）有合理意义的文本，还有什么其他可说的吗？有，比如像Mad Libs这样使用非常具体的“短语模板”的东西。但是，ChatGPT似乎有一种更一般的方法来做到这一点。也许除了“当你拥有1750亿个神经网络权重时就会这样”，就没有什么别的可以说广。但是我强烈怀疑有 —个更简单、更有力的故事。

意义空间和语义运动定律

之前讨论过，在ChatGPT内部，任何文本都可以被有效地表示为一个由数组成的数组，可以将其视为某种“语言特征空间”中一个点的坐标。因此，ChatGPT续写一段文本，就相当于在语言特征空间中追踪一条轨迹。现在我们会问：是什么让这条轨迹与我们认为有意义的文本相对应呢？是否有某种“语义运动定律”定义（或至少限制〕了语言特征空间中的点如何在保持“有意义” 的同时到处移动？

这种语言特征空间是什么样子的呢？以下是一个例子，展示了如果将这样的特征空间投影到二维平面上，单个词（这里是常见名词）可能的布局方式。

我们在介绍嵌入时见过一个包含植物词和动物词的例子。这两个例子都说明了，“语义上相似的词”会被放在相近的位置。

再看一个例子，下图展示了不同词性的词是如何布局的。

当然，一个词通常不只有“一个意思” ^也不一定只有一种词性通过观察包含一个词的句子在特征空间中的布局，人们通常可以 “分辨出”它们不同的含义，就像如下例子中的crane这个词（指的是“鹤”还是“起重机”？）。

看来，至少可以将这个特征空间视为将“意思相近的词”放在这个空间中的相近位置。但是，我们能够在这个空间中识别出什么样的额外结构呢？例如，是否存在某种类似于“平行移动”的概念，反映了空间的“平坦性”？理解这一点的一种方法是看一下相似的词。

即使投影到二维平面上，也通常仍然有一些“平坦性的迹象”，虽然这并不是普遍存在的。

那么轨迹呢？我们可以观察ChatGPT的提示在特征空间中遵循的轨迹，然后可以看到ChatGPT是如何延续这条轨迹的。

这里无疑没有“几何上显而易见”的运动定律。这一点儿也不令人意外，我们充分预期到了这会相当复杂。例如，即使存在一个“语义运动定律”，我们也远不淸楚它能以什么样的嵌入（实际上是 “变量”）来最自然地表述。

在上图中，我们展示了“轨迹”中的几步——在每一步，我们都选择了ChatGPT认为最有可能（“零温度”的情况）出现的词。不过，我们也可以询问在某一点处可能出现的“下一个”词有哪些以及它们出现的概率是多少。

在这个例子中，我们看到的是由高概率词组成的一个“扇形”，它似乎在特征空间中朝着一个差不多明确的方向前进。如果继续前进会发生什么？沿轨迹移动时出现的连续“扇形”如下所示。

下面是一幅包含40步的三维示意图。

这看起来很混乱，并且没有特别推动通过实证研究“ChatGPT内部的操作”来识别“类似数学物理”的“语义运动定律”。但也许我们只是关注了“错的变量”〔或者错的坐标系〉，如果关注对的那 —个，就会立即看到ChatGPT正在做“像数学物理一样简单”的事情，比如沿测地线前进。但目前，我们还没有准备好从它的“内部行为”中“实证解码”ChatGPT已经“发现”的人类语言的“组织”规律。

语义语法和计算语言的力量

产生“有意义的人类语言”需要什么？过去，我们可能认为人类大脑必不可少。但现在我们知道，ChatGPT的神经网络也可以做得非常出色。这或许就是我们所能达到的极限，没有比这更简单（或更易于人类理解〉的方法可以使用了。不过，我强烈怀疑ChatGPT的成功暗示了一个重要的“科学”事实：有意义的人类语言实际上比我们所知道的更加结构化、更加简单，最终可能以相当简单的规则来描述如何组织这样的语言。

正如上面提到的，句法语法为如何组织人类语言中属于不同词性的词提供了规则。但是为了处理意义，我们需要更进一步。一种方法是不仅考虑语言的句法语法，还要考虑语义语法。

对于句法，我们识别出名词和动词，等等。但对于语义，我们需要 “更精细的分级”。例如，我们可以识别出“移动”的概念和一个“不因位置而改变身份”的“对象”的概念。这些“语义概念”的例子数不胜数。但对于我们要用的语义语法，只需要一些基本的规则，基本上来说就是“对象”可以“移动”。关于这可能如何工作，有很多要说的（其中一些之前已经说过但我在这里只会说几句表明一些潜在前进道路的话。

值得一提的是，即使一句话在语义语法上完全没问题，也不意味着它已经（或者能）在实践中成真。“The elephant traveled to the moon”_{(大象去了月球）}这句话毫无疑问会“通过”我们的语义语法，但 (至少目前）在我们的现实世界中还没有成真，虽然它绝对可以在虚构的世界中成真。当我们开始谈论“语义语法”时，很快就会问：它的底层是什么？它假设了什么样的“世界模型”？句法语法实际上只是关于由词构建语言的。但是语义语法必然涉及某种“世界模型” 一一类似于“骨架”，由实际的词构成的语言可以基于它分层。

直到不久之前，我们可能还是认为（人类）语言将是描述“世界模型”的唯一通用方式。几个世纪前，人们就已经开始针对特定种类的事物进行形式化，特别是基于数学。但是现在有了一种更通用的形式化方法：计算语言。

是的，这是我四十多年来一直在研究的大型项目（现在体现在Wolfram语言中开发一种精确的符号表示，以尽可能广泛地谈论世界上的事物，以及我们关心的抽象事物。例如，我们有城市、分子、图像和神经网络的符号表示，还有关于如何计算这些事物的内置知识。

经过几十年的努力，我们已经在许多领域中运用了这种方法。但是过去，我们并没有特别用其处理“日常话语”。在“我买了两斤苹果”中，我们可以轻松地表示“两斤苹果”（并进行有关的营养和其他计算），但是（还）没有找到“我买了”的符号表示。

这一切都与语义语法的思想有关一~目标是拥有一个对各种概念通用的符号“构造工具包”，用于对什么可以与什么组合在一起给出规则，从而对可以转化为人类语言的“流”给出规则。

假设我们有这种“符号话语语言”，我们会用它做什么呢？首先可以生成“局部有意义的文本”。但最终，我们可能想要更有“全局意义”的结果一一这意味着“计算”更多实际存在或发生于世界 (或某个与现实一致的虚构世界）中的事情。

在Wolfram语言中，我们已经拥有了关于许多种事物的大量内置计算知识。但如果要建立一种完整的符号话语语言，我们还需要纳入关于世界上一般事物的额外“计算方法”（calculi）：如果一个物体从A移动到B，然后从B移动到C，那么它就从A移动到了C，等等。

我们不仅可以用符号话语语言来做“独立的陈述”，而且可以用它来问关于世界的问题，就像对Wolfram|Alpha所做的那样。此外，也可以用它来陈述我们“想要实现”的事情，这可能需要一些外部激活机制；还可以用它来做断言一也许是关于实际世界的，也许是关于某个我们正在考虑的（无论是虚构还是其他的）特定世界的。人类语言是不精确的，这主要是因为它没有与特定的计算实现相“结合”，其意义基本上只由其使用者之间的“社会契约”定义。但是，计算语言在本质上具

有一定的精确性，因为它指定的内容最终总是可以“在计算机上毫无歧义地执行”。人类语言有一定的模糊性通常无伤大雅。（当我们说“行星”时，是否包括外行星呢？等等。但在计算语言中，我们必须对所做的所有区别进行精确和清晰的说明。

在计算语言中，利用普通的人类语言来创造名称通常很方便。但是这些名称在计算语言中的含义必须是精确的，可能涵盖也可能不涵盖典型人类语言用法中的某些特定内涵。

如何确定适用于一般符号话语语言的“本体论”以）呢？这并不容易。也许这就是自亚里士多德2000多年前对本体论做出原始论述以来，在这些方面几乎没有什么进展的原因。但现在，我们已经知道了有关如何以计算的方式来思考世界的许多知识，这确实很有帮助（从我们的Physics Project和ruliad_{[本书作者创造的概念，即所有可能的计算过程的纠缠上限：以各种可能的方式遵循所有可能的计算规则的结果。详见文章“the concept of the ruliad”]}思想中得到“基本的形而上学”也无妨）。

所有这些在ChatGPT中意味着什么呢？在训练中，有效地“拼凑出”了一定数量（相当惊人）的相当于语义语法的东西。它的成功让我们有理由认为，构建在计算语言形式上更完整的东西是可行的。与我们迄今为止对ChatGPT内部的理解不同的是，我们可以期望对计算语言进行设计，使其易于被人类理解。

当谈到语义语法时，我们可以将其类比于三段论逻辑。最初，三段论逻辑本质上是关于用人类语言所表达的陈述的一组规则。但是，当形式逻辑被发展出来时（没错，在2000多年之后〉，三段论逻辑最初的基本结构也可以用来构建巨大的“形式化高塔”，能用于解释（比如〉现代数字电路的运作。因此，我们可以期待更通用的语义语法也会如此。起初，它可能只能处理简单的模式，例如文本。但是，一旦它的整体计算语言框架被建立起来，我们就可以期待用它来搭建“广义语义逻辑”的高塔，让我们能够以精确和形式化的方式处理以前接触不到的各种事物（相比之下，我们现在只能在“地面层”处理人类语言，而且带有很大的模糊性我们可以将计算语言一一和语义语法一一的构建看作一种在表示事物方面的终极压缩。因为它使我们不必（比如）处理存在于普通人类语言中的所有“措辞”，就能够谈论可能性的本质。可以认为 ChatGPT的巨大优势与之类似：因为它也在某种意义上“钻研”至|』了，不必考虑可能的不同措辞，就能“以语义上有意义的方式组织语言”的地步。

如果我们将ChatGPT应用于底层计算语言，会发生什么呢？计算语言不仅可以描述可能的事物，而且还可以添加一些“流行”之感，例如通过阅读互联网上的所有内容做到。但是，在底层，使用计算语言操作意味着像ChatGPT这样的系统可以立即并基本地访问能进行潜在不可约计算的终极工具。这使ChatGPT不仅可以生成合理的文本，而且有望判断文本是否实际上对世界（或其所谈论的任何其他事物）做出了“正确”的陈述。

那么ChatGPT到底在做什么? 它为什么能做到这些？

ChatGPT的基本概念在某种程度上相当简单：首先从互联网、书籍等获取人类创造的海量文本样本，然后训练一个神经网络来生成 “与之类似”的文本。特别是，它能够从“提示”开始，继续生成 “与其训练数据相似的文本”。

正如我们所见，ChatGPT中的神经网络实际上由非常简单的元素组成，尽管有数十亿个。神经网络的基本操作也非常简单，本质上是对于它生成的每个新词（或词的一部分），都将根据目前生成的文本得到的输入依次传递“给其所有元素一次”〔没有循环等〕。

值得注意和出乎意料的是，这个过程可以成功地产生与互联网、书籍等中的内容“相似”的文本。ChatGPT不仅能产生连贯的人类语言，而且能根据“阅读”过的内容来“循着提示说一些话”。它并不总是能说出“在全局上有意义”（或符合正确计算）的话，因为 (如果没有利用Wolffram|Alpha的“计算超能力”〉它只是在根据训练材料中的内容“听起来像什么”来说出“听起来正确”的话。

ChatGPT的具体工程非常引人注目。但是，（至少在它能够使用外部工具之前）ChatGPT“仅仅”是从其积累的“传统智慧的统计数据”中提取了一些“连贯的文本线索”。但是，结果的类人程度已经足够令人惊讶了。正如我所讨论的那样，这表明了一些至少在科学上非常重要的东西：人类语言及其背后的思维模式在结构上比我们想象的更简单、更“符合规律”。ChatGPT已经隐含地发现了这一点。但是我们可以用语义语法、计算语言等来明确地揭开它的面纱。

在生成文本方面表现得非常出色，结果通常非常类似于人类创作的文本。这是否意味着ChatGPT的工作方式像人类的大脑一样？它的底层人工神经网络结构说到底是对理想化大脑的建模。当人类生成语言时，许多方面似乎非常相似。

当涉及训练（即学习）时，大脑和当前计算机在“硬件”（以及一些未开发的潜在算法思想）上的不同之处会迫使ChatGPT使用一种可能与大脑截然不同的策略（在某些方面不太有效率还有一件事值得一提：甚至与典型的算法计算不同，ChatGPT内部没有 “循环”或“重新计算数据”。这不可避免地限制了其计算能力——即使与当前的计算机相比也是如此，更谈不上与大脑相比了。

我们尚不清楚如何在“修复”这个问题的同时仍然让系统以合理的效率进行训练。但这样做可能会使未来ChatGPT能够执行更多“类似大脑的事情”。当然，有许多事情大脑并不擅长，特别是涉及不可约计算的事情。对于这些问题，大脑和像 ChatGPT 这样的东西都必须寻求“外部工具”，比如 Wolfam 语言的帮助。

但是就目前而言，看到 ChatGPT已经能够做到的事情是非常令人兴奋的。在某种程度上，它是一个极好的例子，说明了大量简单的计算元素可以做出非凡、惊人的事情。它也为我们提供了 2000多年以来的最佳动力，来更好地理解人类条件（human condition）的核心特征——人类语言及其背后的思维过程——的本质和原则。

第二篇利用Wolfram|Alpha为ChatGPT赋予计算知识超能力

ChatGPT和Wolfram|Alpha

当事物不知怎么突然开始“发挥作用”时，总是让人惊叹不已。这在2009年的Wolfram|Alpha上发生过，在2020年的Physics Project上也发生过。现在，它正在ChatGPT上发生。

我已经研究神经网络技术很长时间了〈实际上已经有43年了〉。即使目睹了过去几年的发展，我仍然认为ChatGPT的表现非常出色。最终，突然出现了一个系统，可以成功地生成关于几乎任何东西的文本，而且非常类似于人类可能编写的文本。这非常令人佩服，也很有用。而且，正如我讨论过的那样，我认为它的成功可能向我们揭示了人类思维本质的一些基本规律。

虽然ChatGPT在自动化执行主要的类人任务方面取得了显著的成就，但并非所有有用的任务都是如此“类人”的。一些任务是更加形式化、结构化的。实际上，我们的文明在过去几个世纪中取得的一项伟大成就就是建立了数学、精密科学一最重要的是计算一的范式，并且创建了一座能力高塔，与纯粹的类人思维所能达到的高度完全不同。

我自己已经深度参与计算范式的研究多年，追求建立一种计算语言，以形式化符号的方式来表示世界中尽可能多的事物。在此过程中，我的目标是建立一个系统，用于“在计算上辅助”和增强人类想要做的事情。虽然我本人只能用人类的方式来思考事物，但我也可以随时调用Wolfram语言和Wolfram|Alpha来利用一种独特的 “计算超能力”做各种超越人类的事情。

这是一种非常强大的工作方式。重点是，它不仅对我们人类很重要，而且对类人AI 同样（甚至更）重要——可以直接为其赋予计算知识超能力，利用结构化计算和结构化知识的非类人力跫。

尽管我们才刚刚开始探索这对ChatGPT意味着什么，但很明显，惊喜是可能出现的。虽然Wolfram|Alpha和ChatGPT所做的事情完全不同，做事的方式也完全不同，但它们有一个公共接口：自然语言。这意味着可以像人类一样与Wolfram|Alpha“交谈”，而Wolfram|Alpha会将它从ChatGPT获得的自然语言转换为精确的符号计算语言，从而应用其计算知识能力。

几十年来，对AI的思考一直存在着两极分化：ChatGPT使用的 “统计方法”，以及实际上是Wolfram|Alpha的起点的“符号方法”。现在，由于有了 ChatGPT的成功以及我们在使Wolfram|Alpha理解自然语言方面所做的所有工作，终于有机会将二者结合起来，发挥出比单独使用任何一种方法都更强大的力量。

一个简单的例子

ChatGPT本质上是一种生成语言输出的系统，其输出遵循来自互联网和书籍等的训练材料中的“模式”。令人惊奇的是，输出的类人特征不仅体现在小范围内，而且在整个文章中都很明显。它可以表达连贯的内容，通常以有趣和出人意料的方式包含它所学的概念产生的内容始终是“在统计学上合理”的，至少是在语言层面上合理的。尽管它的表现非常出色，但这并不意味着它自信给出的所有事实和计算都一定是正确的。

下面是我刚刚注意到的一个例子（ChatGPT具有内在的随机性，因此如果你尝试问相同的问题，可能会得到不同的答案）。

听起来相当有说服力。但是事实证明它是错误的，因为Wolfram|Alpha可以告诉我们如下答案。

当然，这显得不太公平，因为这个问题正是Wolfram|Alpha擅长的问题类型：可以基于其结构化、有条理的知识进行精确计算。

有趣之处是，我们可以想象让Wolfram|Alpha自动帮助ChatGPT。可以通过编程向Wolfram|Alpha提问（也可以使用Web API等）。

现在再次向ChatGPT提问，并附上此结果。

ChatGPT非常礼貌地接受了更正。如果你再次提出该问题，它会给出正确的答案。显然，可以用一种更精简的方式处理与Wolfram|Alpha的交流，但是看到这种非常简单的纯自然语言方法已经基本奏效也很令人高兴。

不过，为什ChatGPT一开始会犯这个错误呢？如果它在训练时从某个地方（例如互联网上）看到了芝加哥和东京之间的具体距离，它当然可以答对。但在本例中，仅仅依靠神经网络能轻松完成的泛化（例如对于许多城市之间距离的许多示例的泛化）并不够，还需要一个实际的计算算法。

Wolfram|Alpha的处理方式则截然不同。它接受自然语言，然后 (假设可能的话）将其转换为精确的计算语言（即Wolfram语言），在本例中如下所示。

城市的坐标和计算距离的算法是Wolfram语言内置的计算知识的一部分。是的，Wolfram语言拥有大量内置的计算知识——这是我们几十年的工作成果，我们精心梳理了不断更新的海泔数据，实现 (而且经常发明）了各种方法、模型和算法——并且系统地为一切构建了一整套连贯的计箅语言。

再举几个例子

ChatGPT和Wolfram|Alpha的工作方式截然不同，各有优势。为了理解ChatGPT可以如何利用Wolfram|Alpha的优势，让我们讨论ChatGPT本身并不能完全回答正确的一些情况。ChatGPT像人类一样，经常在数学领域遇到困难。

很有趣的文章式回答，但实际结果是错误的。

如果让ChatGPT “咨询”Wolfram|Alpha，它当然可以得到正确的答案。

让我们尝试一些稍微复杂的问题。

乍一看，这个结果似乎很棒，我很容易相信它。然而，事实证明它是错误的，因为可以告诉我们如下答案。

因此，使用（不能咨询Wolfram|Alpha的）ChatGPT做数学作业可能不是一个好主意。它可以给你一个看似非常可信的答案。

但是如果ChatGPT没有“真正理解数学”，就基本上不可能可靠地得出正确答案。所以，答案又是错误的。

ChatGPT甚至可以为“它得出答案的方式”〈尽管并不是它所“做” 的真正方式）编造一个非常像样的解释。此外，迷人（和有趣）的是，它给出的解释里存在不理解数学的人类可能会犯的错误。

在各种各样的情况下，“不理解事物的含义”都可能会引起麻烦。

听起来颇有说服力，但不正确。

ChatGPT似乎在某处正确地学习了这些基础数据，但它并没有充分 “理解数据的含义”以正确地排列这些数字。

是的，可以找到一种方法来“修复这个特定的bug”。但问题在于，像ChatGPT这样基于生成语言的AI系统的基本思想并不适用于需要执行结构化计算任务的情况。换句话说，需要“修复”几乎无穷多的“bug”，才能追赶上Wolfram|Alpha以其结构化方式所能实现的几乎无穷小的成就。

“计算链”越复杂，就越有可能需要调用Wolfram|Alpha以来正确处理。对于下面的问题，ChatGPT给出了一个相当混乱的答案。

正如Wolfram|Alpha告诉我们的那样，ChatGPT的结论并不正确（就像它自己在某种程度上“已经知道”的)。

每当涉及特定的（例如数量）数据时，即使是相当原始的形式，也往往更适合用Wolfram|Alpha处理。以下这个例子受到了长期以来最受喜爱的Wolfram|Alpha测试查询“How many turkeys are there in Turkeys？”〈土耳其有多少只火鸡〉的启发。

这（一开始）看起来完全有道理，甚至引用了相关的来源。然而事实证明，这些数据基本上只是“捏造”的。

不过，非常好的一点是，ChatGPT可以轻松地“请求事实来做检查”。

现在将这些请求通过Wolfram|Alpha API进行馈送。

现在我们可以注入这些数据，要求ChatGPT修正其原始回答（甚至以粗体显示它所做的修正)。

当涉及实时（或依赖位置等的）数据或计算时，“注入事实”的能力特别好。ChatGPT不会立即回答下面这个问题。

下面是一些相关的Wolfram|Alpha API输出。

如果将其输入ChatGPT，它会生成漂亮的文章式结果。

有时，计算系统和类人系统之间会有有趣的相互作用。下面是一个向Wolfram|Alpha提出的相当异想天开的问题，而它甚至会询问你是否想要“soft-serve ice cream”（软冰激凌）。

ChatGPT最开始对于“volume”（体积）的概念有些困惑。

但后来它似乎“意识到”那么多冰激凌是相当愚蠢的。

A cubic light year is an extremely large unit of volume. In fact, it is so large that it is not possible to fit any known substance, including ice cream, into a volume of this size.

For reference, a light year is a unit of distance that is equal to the distance that light travels in one year. The distance that light travels in one year is approximately 5.88 trillion miles (9.46 trillion kilometers).

Given the size of a light year, it is clear that a cubic light year is an impossibly large unit of volume. Therefore, it is not possible to calculate the number of calculate in a cubic light year of ice cream, as there is simply no way to fit that much ice cream into a single volume.

前方的路

机器学习是一种强大的方法，特别是在过去十年中，它取得了一些非凡的成功——ChatGPT是最新的成功案例。除此之外，还有图像识别、语音转文字、语言翻译……在每个案例中，都会跨越一个门槛一一通常是突然之间。一些任务从“基本不可能”变成了“基本可行”。

但结果从来不是“完美”的。也许有的东西能够在95%的时间内运作良好。但是不论怎样努力，它的表现在剩下的5%时间内仍然难以捉摸。对于某些情况来说，这可能被视为失败。但关键在于，在各种重要的用例中，95%往往就“足够好了”。原因也许是输出是一种没有“正确答案”的东西，也许是人们只是在试图挖掘一些可能性供人类（或系统算法）选择或改进。

拥有数百亿参数的神经网络一次一个标记地生成文本，能够做到ChatGPT所能做的事情，这着实是非同凡响的。鉴于这种戏剧性、意想不到的成功，人们可能会认为，如果能够“训练一个足够大的网络”，就能够用它来做任何事情。但事实并非如此。关于计算的基本事实，尤其是计算不可约的概念，表明它最终是无法做到的。

不过不要紧，重点在于我们在机器学习的实际历史中看到的：会取得（像ChatGPT这样的）重大突破，进步不会停止。更重要的是，我们会发现能做之事的成功用例，它们并未因不能做之事受阻。

虽然“原始ChatGPT”可以在许多情况下帮助人们写作、提供建议或生成对各种文档或交流有用的文本，但是当必须把事情做到完美时，机器学习并不是解决问题的方法一就像人类也不是一样。

这正是我们在以上例子中看到的。ChatGPT在“类人的部分”表现出色，因为其中没有精确的“正确答案”。但当它被“赶鸭子上架”、需要提供精确的内容时，往往会失畋，这些例子要表达的重点是，有一种很好的方法可以解决该问题一将ChatGPT连接到Wolfram|Alpha以利用其全部的计算知识”超能力”。

在Wolfram|Alpha内部，一切都被转换为计算语言，转换为精确的Wolfram语言代码。这些代码在某种程度上必须是“完美”的，才能可靠地使用。关键是，ChatGPT无须生成这些代码。它可以生成自己常用的自然语言，然后由Wolfram|Alpha利用其自然语言理解能力转换为精确的Wolfram语言。

在许多方面，可以说ChatGPT从未“真正理解”过事物，它只 “知道如何产生有用的东西”。但是购丨加咖则完全不同。因为一旦Wolfram|Alpha将某些东西转换为加Wolfram语言，我们就拥有了它们完整、精确、形式化的表示，可以用来可靠地计算事物。不用说，有很多“人类感兴趣”的事物并没有形式化的计算表示一尽管我们仍然可以用自然语言谈论它们，但是可能不够准确。对于这些事物，ChatGPT只能靠自己，而且能凭借自己的能力做得非常出色。

就像我们人类一样，ChatGPT有时候需要更形式化和精确的“助力”。重点在于，它不必用“形式化和精确”的语言表达自己，因为Wolfram|Alpha可以用相当于ChatGPT母语的自然语言进行沟通。当把自然语言转换成自己的母语——Wolfram语言时，Wolfram|Alpha会负责“添加形式和精度”。我认为这是一种非常好的情况，具有很大的实用潜力。

这种潜力不仅可以用于典型的聊天机器人和文本生成应用，还能扩展到像数据科学或其他形式的计算工作（或编程）中。从某种意义上说，这是一种直接把ChatGPT的类人世界和Wolfram语言的精确计算世界结合起来的最佳方式。

ChatGPT能否直接学习Wolfram语言呢？答案是肯定的，事实上它已经开始学习了。我十分希望像ChatGPT这样的东西最终能够直接在Wolfram语言中运行，并且因此变得非常强大。这种有趣而独特的情况之所以能成真，得益于Wolfram语言的如下特点：它是一门全面的计算语言，可以用计算术语来广泛地谈论世界上和其他地方的事物。

Wolfram语言的总体概念就是对我们人类的所思所想进行计算上的表示和处理。普通的编程语言旨在确切地告诉计算机要做什么，而作为一门全面的计算语言，Wolfram语言涉及的范围远远超出了这 —点。实际上，它旨在成为一门既能让人类也能让计算机“用计算思维思考”的语言。

许多世纪以前，当数学符号被发明时，人类第一次有了“用数学思维思考”事物的一种精简媒介。它的发明很快导致了代数、微积分和最终所有数学科学的出现。Wolfram语言的目标则是为计算思维做类似的事情，不仅是为了人类，而且是要让计算范式能够开启的所有“计算XX学”领域成为可能。

我个人因为使用Wolfram语言作为“思考语言”而受益匪浅。过去几十年里，看到许多人通过Wolfram语言“以计算的方式思考”而取得了很多进展，真的让我喜出望外。那么ChatGPT呢？它也可以做到这一点，只是我还不确定一切将如何运作。但可以肯定的是，这不是让ChatGPT学习如何进行Wolfram语言已经掌握的计算，而是让ChatGPT学习像人类一样使Wolfram语言，让ChatGPT用计算语言（而非自然语言）生成“创造性文章”，等等。

我在很久之前就讨论过由人类撰写的计算性文章的概念，它们混合使用了自然语言和计算语言。现在的问题是，ChatGPT能否撰写这些文章，能否使用Wolfram语言作为一种提供对人类和计算机而言都“有意义的交流”的方式。是的，这里存在一个潜在的有趣的反馈循环，涉及对Wolfram语言代码的实际执行。但至关重要的是Wolfram语言代码所代表的“思想”的丰富性和“思想”流——与普通编程语言中的不同，更接近ChatGPT在自然语言中“像魔法一样”处理的东西。

换句话说，Wolfram语言是和自然语言一样富有表现力的，足以用来为ChatGPT编写有意义的“提示”。没错，Wolfram语言代码可以直接在计算机上执行。但作为ChatGPT的提示，它也可以用来 “表达”一个可以延续的“想法”。它可以描述某个计算结构，让 ChatGPT “即兴续写”人们可能对于该结构的计算上的说法，而且根据它通过阅读人类写作的大量材料所学到的东西来看，这“对人类来说将是有趣的”。

ChatGPT的意外成功突然带来了各种令人兴奋的可能性。就目前而言，我们能马上抓住的机会是，通过Wolfram|Alpha赋予ChatGPT计算知识超能力。这样，ChatGPT不仅可以产生“合理的类人输出”，而且能保证这些输出利用了封装在Wolfram|Alpha和Wolfram语言内的整座计算和知识高塔。

PLACENTAL MAMMALS	MARSUPIALS

Dog	Thylacine

European mole	Marsupial mole

Mouse	Marsupial mouse

Flying squirrel	Sugar glider

Tamandua	Numbat

	Water vole	Vole
	Water shrew	Shrew
	Desman	Mole
	Platypus	Echidna
	Water tenrec	Land tenrec
	Otter	Badger
	Seal	Wolf
	Yapok	Opossum
	Polar bear	Brown bear

Geryon	Corystes

Chorinus	Scyramathia

Lupa	Paralomis

	Role	Maximises
Gene	Replicator	Survival
Organism	Vehicle	Inclusive fitness


Litoptern	Horse

Herbivore gut	Carnivore gut


Paca	Chevrotain


Pill woodlouse	Pill millipede

分类： 自然科学

1 Introduction to Deep Learning

Artificial Intelligence, Machine Learning, and Deep Learning

What Is Machine Learning?

Why Is Machine Learning Difficult?

The Key Ingredients of Machine Learning

Supervised, Unsupervised, and Reinforcement Learning

Why Is Deep Learning So Successful?

Summary and the Road Ahead

2 Conceptual Foundations

What Is a Mathematical Model?

Linear Models with Multiple Inputs

Setting the Parameters of a Linear Model

Learning Model Parameters from Data

Combining Models

Input Spaces, Weight Spaces, and Activation Spaces

Summary

3 Neural Networks: The Building Blocks of Deep Learning

Artificial Neural Networks

How an Artificial Neuron Processes Information

Why Is an Activation Function Necessary?

How Does Changing the Parameters of a Neuron Affect Its Behavior?

Accelerating Neural Network Training Using GPUs

Summary

4 A Brief History of Deep Learning

Early Research: Threshold Logic Units

The Least Mean Squares Algorithm

The XOR Problem

Connectionism: Multilayer Perceptrons

The Era of Deep Learning

Summary

5 Convolutional and Recurrent Neural Networks

Convolutional Neural Networks

Recurrent Neural Networks

6 Learning Functions

Gradient Descent

Training a Neural Network Using Backpropagation

7 The Future of Deep Learning

Big Data Driving Algorithmic Innovations

The Emergence of New Models

New Forms of Hardware

The Challenge of Interpretability

Final Thoughts

1 Reading the Animal

2 ‘Paintings’ and ‘Statues’

3 In the Depths of the Palimpsest

4 Reverse Engineering

5 Common Problem, Common Solution

6 Variations on a Theme

7 In Living Memory

8 The Immortal Gene

9 Out Beyond the Body Wall

10 The Backward Gene’s-Eye View

11 More Glances in the Rear-View Mirror

12 Good Companions, Bad Companions

13 Shared Exit to the Future

序言

第一章 起源

起立，猴子！

“大有可为”的基因突变

开始双足行走

双足行走有何好处？

第二章 南方古猿

汤恩幼儿

露西

“开枝散叶”的南方古猿

南方古猿湖畔种（Australopithecus anamensis）

南方古猿加扎勒河种（Australopithecus bahrelghazali）

南方古猿惊奇种（Australopithecus garhi）

南方古猿近亲种（Australopithecus deyiremeda）

南方古猿源泉种（Australopithecus sediba）

平脸肯尼亚人（Kenyanthropus platyops）

傍人

第三章 原始人类

在人属诞生之前

怎样才算人类？

最初的人属

最初的工具

容量与日俱增的大脑

直立人是个大个子

分类：自然科学

第一章　起源

第二章　南方古猿

第三章　原始人类

第四章　去往世界尽头

第五章　其他人属物种

第六章　最初的智人

第七章　征服地球

第八章　史前时代的结束

结语　今天的智人

第一号染色体生命

第二号染色体物种

第三号染色体历史

第四号染色体命运

第六号染色体智慧

第七号染色体本能

第八号染色体自身利益

第九号染色体疾病

第十号染色体压力

第十五号染色体性别

第十六号染色体记忆

第十七号染色体死亡

第十八号染色体疗法

第十九号染色体预防

第二十号染色体政治

第二十一号染色体优化人种论

第二十二号染色体自由意志