作者： deepoo

王颖：揭开自由心证的面纱：德国意涵与中国叙事

一、引言：自由心证理论研究之迷思

自由心证的迷雾一直笼罩在我国刑事诉讼理论与实务之上，似有似无，似虚似实。在传统客观主义真实观与对法官自由裁量的质疑之下，《中华人民共和国刑事诉讼法》（以下简称“《刑事诉讼法》”）及司法解释设立了大量证据规范限缩法官自由裁量权，以期实现刑事审判之客观化。然而，纯粹客观的司法裁判仅是一种理想的乌托邦，所谓“良法善治”，良法经由法官运用才能善治，实然司法之中不仅无法回避自由心证，亦需要法官自由心证回应个案特性与现实需求。

揆诸现实，自由心证原则是否存在于我国刑事诉讼规范之中仍存有争议，但与之相关的理论研究却方兴未艾。有学者将我国刑事证据立法与现实定义为“新法定证据主义”；还有学者肯定了我国自由心证的存在，并提出我国刑事诉讼证据证明模式属于自由心证之亚类型“印证证明”；亦有学者认为我国证据制度实属自由心证制度。不论观点差异，在此之中“自由心证”至少在三个维度使用：作为证据制度的自由心证，与法定证据制度相呼应；作为司法证明模式的自由心证，与印证、拼图、综合证明等相比较；作为证明标准的自由心证，与证据确实充分、事实清楚抑或排除合理怀疑相对照。自由心证似乎飘渺无形，却又无处不在。这不禁让人困惑，自由心证的内涵与外延究竟如何？自由心证是证据制度、司法证明模式抑或证明标准？自由心证与印证证明的关系又如何厘清？经验法则、逻辑法则是否属于自由心证范畴？自由心证与证据确实充分、排除合理怀疑究竟是何种关系？此看似涉及自由心证原则与我国刑事证据核心理论之逻辑关联，实则关涉自由心证内涵、法律性质、适用场域等基本问题之厘清，最终直指自由心证原则在我国刑事诉讼理论中的体系地位。

事实上，由于长期的语言隔阂与对大陆法系职权主义之偏见，我国对自由心证原则的历史嬗变与当代意涵仍存有不少理论误读与研究缺位。虞于此种现状，学界不仅对自由心证原则存有迷思与混沌，亦导致相关理论研究无法从当代理论与司法实践中汲取灵感，容易偏离法学规范视角而走向虚无主义，在未厘清法学问题的教义学内涵与边界之时，却又将其异化成哲学或心理学问题，容易导致研究之根基不稳与立论偏离。当然，随着学界对德、法等国刑事诉讼理论的直接引介、深入研究与审慎反思，误解得到一定澄清，偏见得以部分破除。然而，对自由心证理论沿革的研究似乎起于神明裁判、止于法国大革命；对自由心证原则内涵之理解亦止于“法官根据自己的理性、经验和良心，对证据的证明力大小强弱进行自由判断，法律不作任何限制性的规定”，而对与我国刑事法基础理论能够有效承接，与证据属性理论、证明责任理论等证据基础理论深度融合的德国自由心证原则之深入研究甚少。

不容置喙，近代意义的自由心证原则萌芽于18世纪的法国，在法国大革命中由杜波尔提出并确立。然而，在此后相当长的时间里，法国学术界并不承认存在刑事证据一般理论，直至20世纪初才日渐出现体系性的刑事证据一般理论。与之形成鲜明对比，自由心证原则自法国大革命后传入德国，1846年由时任普鲁士立法部长、著名法学家萨维尼（Savigny）提出并确立。此后近两百年间，在德国哲学思辨传统与严谨的刑事法理论影响之下，自由心证原则得以独立发展，在改革的浪潮中不断与证据属性理论、严格证明理论、证据责任理论等德国证据理论深度融合，并在司法实践中伴随法官裁判呈现出历久弥新的强大生机与活力。

作为一个以职权主义为底色的国家，无论是刑事证据理论之完善抑或刑事程序之革新均应建立在深入理解职权主义传统及传统职权主义理论之上。而鉴于德国刑事法对大陆法系国家之深远影响，德国自由心证原则对我国刑事诉讼亦具有重要学理价值。事实上，我国学界出现的前述概念混用、术语误解和研究迷思均与德国自由心证原则的理论研究缺位存有一定关联。有鉴于此，有必要对德国自由心证原则的历史嬗变与当代意涵进行系统梳理与阐述。只有在明晰自身机理的前提下，才能对自由心证原则的中国叙事脉络予以准确勾勒与检视，以揭开自由心证原则之面纱，澄清学界之误解，明晰其中国意象，并为厘清晚近新法定证据主义、印证证明、经验法则与逻辑法则、刑事证明标准等相关证据核心理论奠定基础。

二、自由心证原则的德国嬗变

从18世纪直至19世纪上半叶，随着法国自由心证思潮的涌入、刑讯逼供的废除与陪审团制度的引入，德国围绕新的刑事司法理念与制度展开了论战。由于法国自由心证原则与人民民主理论和陪审团制度直接相关，因此，德国首先探讨的并非“是否应当引入自由心证原则”，而是“是否应当确立陪审团制度”。在对陪审团制度的质疑中，怀疑论者继而展开了针对刑事证据理论与自由心证原则的反思。

（一）消极证据理论

1813年费尔巴哈（Feuerbach）对以陪审团制度与自由心证原则为核心的法式哲学展开批判：“陪审团靠直觉判案与教会裁判并无本质区别，前者漫不经心地等待自然启示的光芒，而后者则是等待上帝的灵感。”他并不认为在一种“清醒梦境状态下”的裁判能够比基于理性的权衡更加公正，而认为化解证据体系危机的方法并非是放弃证据规则，而是通过法定规则与法官心证结合互补，以“消极替代积极”。

所谓消极证据理论（Negative Beweistheorie），是相对于法定证据主义下的积极证据理论（Positive Beweistheorie）而言的。积极证据理论指立法者之积极，由立法预先规定法官证据裁判的规则与证明力大小；消极证据理论则主张，立法者不应预先规定应当在何处找寻心证，而仅能规定在何处不可找寻心证，证据规则应当仅最低限度干涉裁判过程，不能代替法官探寻实质真实的权利和义务。因此，消极证据理论的核心并非完全消除传统法定证据，而是通过改良而承认法官自由心证的可能性。消极证据理论一度受到学界认同并获得立法确认，但最终伴随自由心证原则的确立而在德国法律演变的历史长河中消亡。

（二）自由心证与陪审团一体化的整体印象论

与此同时，伴随着法国大革命的民主思潮，德国兴起了一种将内心确信视为整体印象的理论（Totaleindruck）。整体印象论也与法国民主至上思想相承接、与陪审团制度相呼应，其秉承的基本立场是：职业法官受法定证据规则的约束展开证据评判，而陪审团则直接基于不受规则约束的总体印象判案。陪审团作为普通公民，能在审判中直观地体验犯罪呈现在法庭上的过程，并基于良知与理性形成犯罪与否的整体印象。故此，整体印象论认为职业法官是传统法定证据主义之产物，而陪审团则是新兴自由心证理论之成果。

（三）自由心证与陪审团分离的思想启蒙

然而，完全受证据约束的职业法官审判与只受良心约束的外行群众审判，这种非此即彼的替代关系受到越来越多的质疑。米特麦耶（Mittermaier）认为康德所定义的真实才是自由心证最好的诠释：真实是认识主体与被认识客体之统一。因此，他认为对法律确定力的寻求是一项专业工作，法定证据制度保证了“决策理由之价值”，仅在预先告知适用证据规则并给出判决理由的前提下，陪审团才具有正当性，没有证据规则制约的心证只会形成不合理、无依据的主观揣测。在这种理念之下，陪审团事实上类似于德国传统法官的角色，这也意味着，米特麦耶重新定义了自由心证的内涵，自由心证并非单纯的猜测，而是基于理性的权衡，主观的心证塑造是一种理性的思维过程。

基于此，自由心证的认识论基础从法国强调主观的绝对理性转向了德国康德主义哲学的主客体一致性，自由心证并不必然与陪审团制度如影相随，而具有独立的价值与意义。与此同时，对职业法官的过度限制亦受到了越来越多的批判。实际上，赋予法官根据个案审查证据的权利与过度限制法官证据评估的标准相矛盾。至此，自由心证理念最终得以脱离陪审团制度而独立存在，适用范围进而拓展到职业法官的范畴。

（四）自由心证原则的正式确立

1846年，时任普鲁士立法部长的萨维尼在《刑事程序基本问题备忘录》中提出应当完全摒弃法定证据理论，主张法官应当根据理由和法规推导判决，探寻和适用证据规则的权力亦完全属于法官，这样才能充分顾及思维规律、经验与洞察力。萨维尼并不怀疑塑造内心理性确信的必要性，但是怀疑抽象的证据规则不能穷尽所有个案特殊性。与此同时，他并不赞同陪审团，因为证据裁判是根据法定证据规则或者没有此种规则时需要持续不断训练的专业工作。总而言之，萨维尼所认同的自由心证概念更多地植根于德国传统，而非借鉴法国经验。

在此种理念的倡导下，1846年《普鲁士法》第19条正式确立自由心证原则。该法废除了所有针对柏林刑事法院的证据规则，赋予了职业法官根据证据自由心证达到内心确信而判决的权力。伴随此种自由权，立法者亦规定法官阐明判决理由的义务，即法官享有自由心证的权力，但亦应当在判决中述明推导此判决的理由。相比于法式的内心确信（l’intime conviction），萨维尼所理解的自由心证（freie Beweiswürdigung）更接近康德的理解，即内心确信需要每一个具有理智之人的认同。因此，德国自由心证原则更多地受到德国哲学与法学传统思想中自由主义与理性主义的影响，而非法国的天赋人权与陪审团制度，并具有更强的内驱动力与司法目的导向，旨在消除传统法定证据规则之桎梏。至此，自由心证原则成为德国刑事诉讼中法官证据裁判的一项基本原则，并随着司法理念的变迁与司法实践的发展而获得新的意涵与生命力。

三、自由心证原则的理论困境与突破

（一）理论困境

从法定证据演变到自由心证是一种历史进步，法定证据的教条与僵化得以克服，刑讯逼供在立法上得以废除，但接踵而来的是新的理论困境与实践难题。德国自由心证原则的核心条款为《德国刑事诉讼法典》第261条，即“法官根据审判所建立起来的内心确信判定证据调查之结果”。条款如此简明扼要，如此主观抽象，如何替代原本法定证据制度中的证明标准？不同法官所抵达的自由心证亦存在差异，如何保障类案正义？

即便在理想条件下，法庭所呈现的事实与客观发生的事实不可避免地存在距离，法官通过理性推理判断很难作出完全一致的重构，仅能获得一种概率意义上的认知与见解。因此，自由心证原则虽避免了法定证据主义之弊端，却又产生了两个理论难题：首先，自由心证法律属性与构成要件之澄清。自由心证到底是一种纯粹主观判定抑或存在客观性？这涉及主观主义与客观主义之争。其次，自由心证到底需要在待证事实与证据间建立何种程度的确证？这关涉证明标准之确立。这两个根本问题随着联邦最高法院的判例而在实践中逐渐明晰。

（二）主观主义与客观主义之争

1.帝国法院观点之对峙

主观主义与客观主义之争，是一个发轫于德国刑事司法实务的理论问题。从帝国法院到联邦最高法院，均致力于在个案中明晰自由心证原则的内涵与边界，其中，帝国法院的两份判决对于现今自由心证内涵之明晰与基本范畴之厘定有着重大意义，分别代表着客观主义与主观主义的基本立场。

（1）客观主义：高度盖然性论（RGSt 61, 202）

自由心证原则之确立以人类理性觉醒与确定性认知为前提，在法官内心确信中体现为对心证盖然性的承认和盖然性程度确证的理解差异。1927年，帝国法院首次在RGSt 61, 202判决中确立了刑事案件定罪意义上的内心确信，并提出以高度盖然性（die hohe Wahrscheinlichkeit）作为自由心证内心确信的标准。此种观点的核心要旨在于，在纯粹主观主义的自由心证原则之中引入客观主义底色的概率判断，即高度盖然性，与此同时强调不通过主观因素过分夸大“高度盖然性”，而是通过推定降低真相所要求的盖然性程度。虽然判决中并未涉及证明标准的维度，但至少明确确立了证明标准是基于诉讼材料所获得的高度盖然性，此种高度盖然性由法官基于理性推理而来，代表着自由心证的客观化。

（2）主观主义：内心确信论（RGSt 66, 163）

然而，1932年帝国法院在RGSt 66, 163判决中认为，基于盖然性的内心确信并不足以支撑起定罪量刑，法官必须达到完全的内心确信（ volle überzeugung ），但又强调人类认知能力具有边界，将事实与最高程度的盖然性相提并论实则属于概念上的不精确。由于此种客观真相（ objektive Wahrheit ）实质上无法获得，因此，法官必须尽其所能达到一种基于司法良知有效的确信，以避免可能的错误与误判，尽可能地消除每一项怀疑。但是这其中的逻辑悖论是，如何通过人类去避免所有基于人类认知可能导致的错误？在案件审理中逐一排除怀疑是一种无法实现的乌托邦。

事实上，高度盖然性论与内心确信论完全针锋相对，代表着客观主义与主观主义的两个维度。高度盖然性论实则降低了真相查明的标准，法官基于理性所能获得的信息与认知作出裁判，其标准在于达到高度可能性或高概率的内心标准，而并不需要排除所有怀疑；而内心确信论要求法官即便是严格审理了诉讼材料，亦需要逐一排除可能的怀疑。

2.联邦最高法院观点之争鸣

在早期阶段，联邦最高法院的判决总体带有明显的客观主义色彩，认为证据评判标准在于法官内心所确信的高度盖然性，然而仍旧回避确定“盖然性”的概念与具体标准。但是1957年的判决发生重大转折，联邦最高法院的主流观点从客观高度盖然性论走向了主观内心确信论。

（1）主观主义论（Subjektive Theorie, BGHSt 10, 208）

联邦最高法院在BGHSt 10, 208判决中阐明：“法官必须在主观上排除客观可能存在的怀疑，才可达到内心确信并作出判决。此种个人内心确证（ pers?nliche Gewissheit ）是判决的必要条件，亦是充分条件。”这意味着对于最终判决起决定作用的不是客观可能的怀疑，而仅是法官自己主观产生的怀疑。主观主义论由此判决确立，即只要法官对案件事实产生主观确信，认为事实与法律认定不存在错误，并在判决理由中阐明自己的主观确信即可。但是主观主义招致了众多的反对意见，判决所需要的内心确信是否真的应当完全取决于初审法官的个人内心确证？若存在对被告人更有利的结论，法官是否仍可以定罪？此种推论是否与无罪推定原则冲突？由于主观主义存在这一明显缺陷，联邦最高法院此后事实上放弃了绝对的主观主义，认为即便是遵循主观主义判案，法官亦不能违背类似于法律规范的逻辑法则或已确证的科学知识；并且，为了验证法官是否遵循了逻辑法则，法官有义务在判决中全面列出证据评估的内容。

（2）生活经验论（Theorie der Lebenserfahrung）

联邦最高法院在后续的判决中实际上采用了一个相对折中的客观化视角，即生活经验论。联邦最高法院重申了一个不言而喻的基本原则，即当同时存在多种可能性时，法官必须说明他优先认同和选择某种可能性的理由，法官基于实际生活经验所能获得的确证即视为高度盖然性。但生活经验论的反对者认为，“生活经验”本身并未得到有效定义，且不清楚应当根据一般法官抑或理想法官的生活经验进行案件审理，并不具有普遍适用的可能性。

因此，黑德根（Herdegen）对生活经验论进行了论证，明确了生活经验论背后的推理逻辑。判决由法官作出并负责，因此不可能放弃法官个人确信，但是人类的认知很大程度取决于个人经历、认知局限、偏见及欲望，因此个人确信并非充分的判定标准。故此，人类无法完全获得案件真相，司法认定仅是一种盖然性判决。但是此种盖然性判决必须达到极高程度，却又不可能达到绝对确定。总而言之，应然层面能够达到的盖然性程度取决于人类理性推理的水平，而理性推理必须符合经验与智识要求并考量生活经验与公认价值原则。

3.主客观自由心证理论之确立（Objektive-subjektive Beweiswürdigungstheorie）

直到20世纪50年代，无论是“高度盖然性”抑或“内心确信”均不能在联邦最高法院的判决中占据通说地位，学界亦未深入探讨证据评判方法与标准问题。但当联邦最高法院日渐走向主观主义，放宽法官的自由裁判权并放弃了对法官心证塑造之限制时，学界却出现了反对的声音，认为联邦最高法院打开了非理性而无法控制的潘多拉之门。

以彼特斯（Peters）为代表的学者认为，基于事实评估的客观因素才应当是定罪量刑的决定性因素，在此基础上可以建立基于内心确证的主观因素；并且，法官基于自由心证之判决应可由其他法官理解和推导。心证的塑造不仅与高度盖然性紧密相连，更与案件真实息息相关，法官的自由心证不仅涉及个人内心确证的实现，亦涉及到盖然性之确定。并且，与法国相反，德国的判例和文献从未确认法官内心确信能够免除法官对案件理性审查之义务，故此，兼具主客观的自由心证原则并不存在理论障碍。彼特斯最大的贡献在于，他并未如此前的判例与学说般仅对“理性”进行概括性阐述，而是提出了具体的标准。在方法论层面，他提出法官必须首先审查单一证据的效力范围与可靠性，然后再经由逐步评估全面审查证据，形成证据链；在内容层面，法官在心证塑造过程中应当基于专业知识与经验，采取统一标准进行证据评估；在后果层面，为了防止自由专断，法官应当对自由心证之裁判负责。

彼特斯致力于刑事程序的理性化，他的核心观点是个人内心确证必须基于合理的基础，受到学界的普遍认可。此后，此种客观化的自由心证理论得到联邦最高法院的广泛认同，以客观理性与事实基础作为个人内心确证之前提的主客观自由心证理论获得通说地位：“司法定罪所需要的法官个人内心确证以客观事实基础为前提，必须基于理性论证得出已确定的事实与客观现实高概率相符合之结论。”具言之，法官定罪量刑的前提是获得内心确信，此种内心确信并非主观臆断或恣意评判，而是经过客观事实认定与理性论证过程，认为已查明的案件情况与客观发生的犯罪事实符合具有高度盖然性。

从以上对德国重要判例与学者观点之梳理可见，判例观点一直在自由评估证据与受约束评估证据之间摇摆。旧的帝国法院更为关注客观性与高度盖然性，联邦最高法院早期的判决以主观主义为基础，更倾向于追求主观个人确证；从20世纪80年代以来，基于合理性和主观确证性的客观化趋势越来越明显；但从20世纪90年代以来，又开始强调法官自身认知在判决确定中的重要地位。事实上，完全基于法官自由心证而不考量客观基础的绝对主观主义已无人支持；而对被告人罪责之完全确信亦是不可达到，纯粹的客观主义仅存于乌托邦之中。故此，现代德国的自由心证理论始终以主客观论为基本立场，形成一种动态博弈的平衡：当一段时间客观主义占上风时，判例便开始强调主观主义的功能，使得整体理论趋势归于平衡。

四、自由心证原则的德国当代意涵

从萨维尼确立德国自由心证原则到帝国法院提出内心确信论与高度盖然性论，从联邦最高法院不断探索到自由心证主客观理论的最终证立，德国自由心证原则在理性主义的光芒与主观主义的摇篮中成长，继而探索心证形成之过程并将其规范化。当代自由心证原则由法官内心确信（Richterliche überzeugung ）与证据自由评判（ Freie Beweiswürdigung ）两大核心要素构成，同时亦涵括实质性庭审、判决理由的书面阐述与自由心证之限制三大要旨。其中，法官内心确信、证据自由评判归于自由心证原则的积极实质要件；自由心证原则的限制为消极实质要件；实质性庭审、裁判说理与心证公开则属于自由心证原则的程序保障。

（一）法官内心确信

在证据使用禁止的基础上，具有证据能力之证据乃为法官自由心证之对象，而自由心证的终点需要抵达“法官内心确信”。那么，到底何为内心确信？经由前述判决的梳理与诠释，现代意义上的“法官内心确信”包含个人内心确证、客观事实基础、高度盖然性与高度个人化的判决四个维度的要件，并且强调法官心证塑造过程之公开。

首先，法官内心确信毋庸置疑具有主观性，它是法官的个体化主观确证。法官自由判断证据的依据是经验、理性与良知，并不可避免地受到法官主观感受与情感因素的影响，最终达到内心确信的程度。与此同时，由于刑事案件个案的偶发性与特殊性，既不存在亦不需要绝对的或概率的确定性，法官根据整体证据情况确定特定事实为真即可，因此对于法官内心确信不应当设立过高的、无法满足的要求。

其次，法官的主观确信必须以客观事实为基础，并非天马行空之恣意。内心确信建立在对犯罪主客观情况的全面审查与理性判断之上。虽然并不存在系统的刑事证明理论来确认证明力，但法官认定案件事实必须基于合法收集、具有证据能力、符合法定证据种类之证据。法官有义务全面收集证据并进行审查，最后基于专业知识、逻辑与经验法则获得高度盖然性的判决结论，并将此种心证过程公开。因此，法官内心确信并非打开了主观专断之大门，而是构建起了刑事诉讼规范与主观沟通的桥梁。

再次，内心确信并非某种必然性结论，亦可给予符合思维规律或生活经验的司法权衡，以高度盖然性结论的形式出现。必须承认，心证与犯罪事实无法完全印合，要求绝对与犯罪事实一致的心证结论并不现实。故此，司法裁判只能退而求其次，将高度盖然性视为真实，将法官对此种高度盖然性存在的认知视为对真实之确信。原则上，当法官已尽其所能评估现有证据后，认为犯罪事实存在具有高度盖然性则可判决有罪。若在内心确信过程中，对被告人的犯罪或罪责有所怀疑，则缺乏定罪所需的内心确信。

最后，自由心证原则要求法官作出高度个人化之判决，具有独立性与不可替代性。一方面，法官不可采用他人未经审验的观点或意见，比如排除证人的意见性证言；另一方面，原则上法官亦不受其他无罪释放或生效判决中事实认定之约束。判决之确立需要法官的内心确信，但是立法不得规定，在何种条件下法官才能达到此种内心确信。被告人是否有罪、有何种罪，是法官需要单独完成的任务，不受法定证据规则之约束。

（二）证据自由评判

自由心证原则的另一核心要素为“证据自由评判”。通常而言，法官不受成文法所确立的证明力规则约束，根据每种证据的个案价值，基于自身专业知识、审判经验、生活逻辑与常识、良心与正义感评判全案证据。法官不仅自由认定案件事实确证所需的条件、证据的证明力大小，而且自由确定多种证据的评判顺序、证据之间的相互关联。间接证据的评判亦应符合此基本原则。在基本原则之下，司法判例给证据自由评判设定了外部框架与评判标准。

首先，证据自由评判建立在理性客观基础之上。这里的客观基础，一方面指证据材料的客观性，即证据材料必须是客观、详尽、完整且不存在矛盾；另一方面亦包含法官证据评判的理性基础，即基于专业知识与逻辑基础进行案件事实的论理与论证。根据统计学分析发展而来的与“证据链/间接证据链”相对应的“证据闭环/间接证据闭环”理论，为证据评判的可控性提供了可能。

其次，证据自由评判要求全面审查合法证据。证据审查以法官在个案中审查单一证据展开，在确定单一证据并未因程序违法或基本权侵犯而禁止使用后，根据单一证据的性质、与案件的关联，确立证据在案件中的功能。在此基础上，法官综合评估全案证据，获得对案件事实之整体印象。

最后，证据自由评判必须详尽完整地评估全案证据。法官有义务详尽评估每个证据事实，考虑所有可能影响判决的细节，并在确证单个证据证明力大小的基础上，再基于一般逻辑法则、经验法则与专业知识综合评估全案证据。此种评估并非单个证据的孤立评价，而是对全案证据与案件事实的联系进行整体性评判，确立其中的逻辑关系与因果关系。因此，缺少整体评判的单一证据之孤立评判存在缺陷；而在法官竭其所能详尽评估全案证据后，仍无法对案件事实与罪责问题达成内心确信的情况下，则基于无罪推定原则作出最终判决。

（三）自由心证原则的限制

德国在长期的司法实践中亦通过判例设立了自由心证之边界，以防止司法专横与法官主观擅断。虽然判决是基于法官自身确信而获得的主观确定性，但它必须是基于合法取得之证据、客观可靠之事实基础和符合逻辑之结论。

首先，自由心证原则受到证据禁止制度的限制。自由心证原则不仅是证据证明力评判原则，亦是法官审理案件之综合原则。从证据能力与证明力评判角度，证据使用禁止的立法规制了证据资格问题，法官仅能对具有证据能力之证据展开证明力评判；而从刑事审判程序角度，证据使用禁止的确认实则无法跳脱自由心证，在绝对证据使用禁止的情形下，根据立法即可否定证据之证据能力，但在更多裁量证据使用禁止的情形下亦需要法官之自由裁量。

其次，自由心证原则受到逻辑法则、经验法则、证据综合评判规则及联邦最高法院判例等的引导与制约。在证据评判中最为重要的逻辑法则为司法三段论，在此涵摄过程中事实与规范交融，事实涵摄于法律，将具体的案件事实置于法律规范的构成要件之下，并据此判决。经验法则系基于一般生活经验、科学知识，以经验归纳或逻辑抽象等方式而获得的关于事实因果关系或属性状态的概括性结论与规则。证据综合评判规则是指在刑事证据审查中应当综合、详尽评估已查证属实之证据，证据需能够形成证据闭环或证据链。与此同时，联邦法院的判例赋予了证据调查与事实认定的具体规则，无论是嫌疑人/被告人的供述、辩解与沉默的审查，证人资格之确立，传闻证言、矛盾证言、利益相关证言、同案犯证言的审查，书证与勘验的审查，证据链证据环的形成、详尽评估与综合审查，直至存疑有利于被告人原则的适用，均存在着判例体系，因此证据评估受到《德国刑事诉讼法典》第337条法律事实审查的规范，经由此将抽象的法官自由心证过程规范化与具体化，最终达到对自由心证展开实质性约束之目的。

再次，在严格证明程序中，法官自由心证受到证据法定种类、法定证明程序的限制。具体而言，对于涉及被告人定罪量刑事实之认定必须遵循严格证明程序，采用法定种类的证据、严格遵循法定取证程序；可能的程序障碍亦会限制法官的自由心证，例如诉讼时效、有瑕疵的起诉、欠缺行为能力等。

最后，自由心证原则亦受到少量积极法定规则的限制，例如，在刑事程序中受到《德国刑法典》第190条侮辱罪中真相证明规则和《德国刑事诉讼法典》第274条庭审记录证明力条款的限制。

（四）自由心证原则的程序保障

1.实质性庭审

实质性庭审是自由心证原则践行的制度保障。法官应以实质性庭审中获取的信息与证据材料展开内心确信之塑造，在庭审中展开证据调查，得出不利于或有利于被告人的证据结论，并展开详尽评判。此种意义上的自由心证原则与直接言词原则相辅相成，即法官对案件事实之认定原则上均源自于庭审中呈现的证据材料，并对证据材料的提出均以言词陈述的方式进行，证据调查亦以口头方式展开。因此，用于案件判决的证据必须呈现于庭审中，并经过法庭质证程序认定。

一方面，法官在实质性庭审中应当遵循证据的用尽原则与证据绝对使用禁止原则。具言之，法官在庭审中应当充分利用审判程序中所有的证据材料，与此同时，判决的作出不可建立在法律规定应当禁止获取的证据上，包括《德国刑事诉讼法典》第136a条规定的禁止强迫自证其罪与第100d条规定的禁止侵犯公民绝对隐私权条款。另一方面，实质性庭审的内容原则上包括法官在审判中及通过审判知悉的一切信息，例如被告人供述、证人证言、鉴定人的询问、文件的宣读等。其中，涉及对被告人定罪量刑的证据则必须采用严格证明程序审理。但是，联邦最高法院与联邦宪法法院亦通过判例认定了一系列不能成为庭审对象的内容，主要包括：案卷内容，例如被告人在此前庭审中的陈述内容；法官在庭审外了解到的与案件相关的业务知识。

2.裁判说理与心证公开

防止法官恣意裁判、确保自由心证正当性的一个重要衡量标准是心证的可重复性。在应然层面，法官自由心证虽然是主观的内心认知活动，却是基于客观证据材料之理性判定。案件的实质真实虽无法百分百确证抵达，但可经由证据材料与法官心证尽可能接近，相似的理性人审查亦应可获得相同或相似的心证结论。

故此，除前述心证经验与理性、客观证据材料的指引外，自由心证原则的重要程序性保证即为裁判说理与心证公开，法官需要在判决中释明自由心证所凭据之事实和内心确信确立之理由，并阐明证据评估的事实基础。判决理由的书面阐明，一方面能够让普通公民获悉法官的心证塑造过程与判决理由，防止裁判之恣意；另一方面亦为上诉法院对判决内容进行审查提供基础，为救济错误的证据评判提供现实可能。具体而言，法官在判决中不能仅简单叙明心证结果，而应当清晰阐明案件犯罪事实与法律基础、所收集证据与逻辑推理裁判过程及因果关系；法官亦不能仅在判决中简单列举待证事实与证据，而应当对证据的合法性展开逐一审查，证据评判必须包括对单个证据证明力之确定并结合案件事实之权衡；此外，法官的释理应当结合案件事实与逻辑关联，对全案证据进行系统性综合评估。

判决的书面理由必须以谨慎和有条理的方式阐明，其标准在于上诉法院能够准确理解裁判过程和结论，有效审查裁判内容，判断是否存在证据裁判错误。而所谓无救济则无权利，在存在证据裁判错误时，即裁判存在矛盾、缺陷或不明确，或违反逻辑法则、经验法则，或对法官内心确信建立和定罪提出过高要求时，可以通过上告（Revision）实现法律救济。

五、自由心证原则的中国叙事

经由新中国初始的批判和20世纪末的反思，到21世纪刑事证据理论研究之勃兴、新法定证据主义的提出，到印证证明理论对自由心证原则的承认，直至经验法则、逻辑法则等研究的展开，自由心证原则在我国经历了一段曲折前进的历程。在此之中，自由心证原则的中国叙事图景不仅展现于自由心证之本，亦显露在新法定证据主义、印证证明等本土理论之中，其中掺杂不少混沌与误读。故此，有必要在澄清本土理论对自由心证误读的基础上，明晰中国自由心证的未来之路。

（一）现实主义的悲观：与自由心证悖离的新法定证据主义

我国立法者出于对法官自由裁量权之不信任、对客观裁判的心之向往、对证据真实性之优先考量，倾向于限制法官自由裁量权。故此，刑事诉讼立法中存在对单个证据证明力的普遍限制与对案件事实认定的一般规则。有学者将此种现象称之为新法定证据主义，并总结了典型特征：首先，立法区分证据证明力大小与强弱，并确立一系列证明力规则；其次，立法确认证据相互印证规则；再次，基于客观主义法定化证据裁判的证明标准；最后，法定化间接证据的证明体系。毋庸置疑，新法定证据主义旗帜鲜明地描述了我国刑事证据立法的部分特征，但却建立在对自由心证原则“传统但不系统”的理解之上。

首先，新法定证据主义论所描述的部分特征，并非对自由心证原则之否定，而是对纯粹主观主义自由心证原则之修正。例如，严格证明程序与刑事证据禁止制度均与当代自由心证原则共存，限制法官自由心证之边界。严格证明程序所包含的证据法定与程序法定属于传统法定证据主义之基本特征。而体系庞杂、覆盖全面的证据禁止制度亦是传统法定证据主义之体现，属于对自由心证原则之限制。并且，相比于我国非法证据排除规则狭窄的适用范围，德国刑事证据禁止所涉范围广泛，立法规定与司法裁判共存，实则涵盖我国部分“证明力规则”，其“法定证据”之特征甚至强于我国，但这并不构成对德国自由心证原则的否定。

其次，证据相互印证规则并非（新）法定证据主义之特征，证据相互印证是一种证明方法，是在逻辑层面对证据与事实间因果关系建立之确证。证据印证亦是德国自由心证原则下的一项非常重要的证据审查规则。因此，以我国刑事证据审查具有证据相互印证特征而去论证新法定证据主义，存在论据属性错误之嫌。事实上，在刑事证明中通过证据印证推理案件事实并无过错，需要反思的只是过度强调证据印证与证据印证的僵化适用问题。

再次，我国刑事诉讼法确立的证明标准“证据确实充分”并不能代表证明标准的法定化。立法仅是从主观与客观角度描述证明标准，并未确立精确刻度，实然层面导致了刑事证明标准的模糊与恣意。与此形成鲜明对比的是，《德国刑事诉讼法典》并未明确高度盖然性与排除合理怀疑之证明标准，但无论是判例、评注均认可主客观主义之证明标准，并明晰了证明标准之维度。此种精细化教义学解释与判例指引比我国的模糊化立法更具规范属性。

又次，间接证据证明的体系化亦非法定证据主义独有。虽然《德国刑事诉讼法典》并未明确列举间接证据审查之标准，但联邦最高法院之判例与评注均就自由心证、间接证据的综合评判作出详述，并形成立法之外的判例体系与规则。只是由于我国学界对德国法的比较研究与学术引介相对滞后，尚未对德国刑事诉讼法评注与判例给予足够的关注，并不了解德国刑事司法中基于判例产生、具有影响力的证据规则。

最后，中德刑事证据立法的差异并非全然源于传统法定证据与自由心证之立场对立与理念鸿沟，相反，其一定程度上是两国立法例与司法差异所致。中德虽均以法典为基础，然而，德国联邦最高法院之判例具有极高约束力，但并不存在类似于我国的司法解释；我国法院的判决并不具有此种高约束力，却存在独特的司法解释体系。很难断定究竟是德国最高法院的“禁止性”判例，还是我国司法解释中“原则性”规则更具约束力。故此，当视线经由简明扼要的法规延申至背后的判例与评注，便会发现自由心证原则并非想象般自由，仍保留了传统法定证据主义合理的成分，并在判例与教义学理论之下展开了一场暗流涌动的客观化变革。

综上所述，当代自由心证原则早已不是那种基于康德式的、纯粹依靠人类理性与经验的自由心证原则，它具有浓厚的主客观统一色彩，虽然在路线上摒弃了传统法定证据主义，但实质上保留了其中合理的部分。不仅如此，在当今的自由心证之理论研究中，一个明显的趋势即是通过判例与教义学理论将自由心证原则客观化。概言之，刑事证据之审查判断无法避免由法官主导、审查与评判，即自由心证原则的适用具有客观必然性与必要性；并且，当代自由心证原则并非康德主义的人类理性与经验，而是一种基于客观证据、证明规则而抵达的法官主观确信，它亦吸收了传统法定证据主义中的合理部分；最后，我国新法定证据主义实则并非与当代自由心证原则完全对立。与其用“新法定证据主义”来定义我国刑事证据立法，不如更深层次地探究自由心证主义，将特定证据规则作为自由心证主义之限制，限缩自由心证主义之边界，并将视角转向过去、现在、将来均无法避免的法官心证内容、过程与方法之中，似乎更具有现实意义与价值。

事实上，新法定证据主义之提出是基于一种现实主义的悲观，甚至带有几分戏谑，我国法定证据理念的存在是一个“问题”，而不应当成为一种“定性”或“主义”，更不可能成为证据裁判一劳永逸的公式，亦容易导致我国立法与司法受之束缚，更难以革新前行。与此同时，新法定证据主义论者亦认为，自由心证是一项美妙的证据评判原则，是对法定证据主义的合理扬弃，但自由心证之确立需要现实制度保障。故此，在刑事司法不断前进、以审判为中心的诉讼体制改革不断深入的今日，不应当再以新法定证据主义去界定、限制、束缚我国刑事证据理论之发展，而应当看到日渐明亮而清晰的自由心证之曙光。

（二）理想主义的偏差：作为“自由心证亚类型”的印证证明

新法定证据主义直面法定证据主义与自由心证主义之争，而在另外一个维度——刑事证明领域，作为“自由心证亚类型”的印证证明亦声势浩大地占领我国刑事证据理论研究之阵地。印证证明论者认为，我国刑事司法以印证证明的方式证明案件事实，证据的证明力判断及证据的综合判断主要依靠法官根据个案情况作出。印证证明兼具证明力审查不受法定限制、允许法官在个案中基于具体情况审查证明力等基本属性，因此“印证证明模式”属于自由心证的一种亚类型。此后，印证证明论者又提出修正理论，例如印证证明是以“印证”为核心，但同时包含“心证”“追证”“验证”共同作用的刑事证明模式，并引发了印证证明理论的研究、批判与反思热潮，带来了“原子主义”与“整体主义”之争等。

跳出争论、回归原点，便会发现争议与迷思实则源自印证证明理论自身对自由心证原则的体系定位偏差与范畴理解错位。早期的印证证明理论将自由心证与印证证明作为同一维度的概念进行参照性研究，并认为印证证明模式属于自由心证的一种亚类型；而修正后的印证证明理论甚至将自由心证纳入印证证明体系，将自由心证变为印证证明的下位概念，尝试构建体系化的印证证明理论，而此种尝试遭致学界的多方批判与反思。相比于新法定证据主义将印证证明作为司法僵化的象征而系统批判，印证证明理论将印证证明划归于自由心证无疑更具说服力。但是，将两者定性为同一位阶却实为不妥。

首先，自由心证原则与法定证据主义相对立，并承载着相对明确的基本内涵：在审判中，法官通过审查证据获得对案件的认知，并达到内心确信进行裁判。而此种心证既包括主客观统一的内心确信之证明标准，亦包含理性、经验、逻辑、智识综合而成的一系列思维过程，例如归纳与演绎、推论、经验法则与逻辑法则、印证分析与综合分析等。故此，印证证明既非一种“自由心证”的亚类型，亦不涵盖证明标准，而属于法官自由心证的一种证明方法。即便再退一步，按照印证证明理论所述的自由心证指那种康德式的、基于理性主义、主观主义的自由心证概念，那它更不属于“典型的、通行的自由心证原则”，而属于在当代德国几乎没有学者支持的、萌芽期的自由心证原则。此种体系定位错误导致了印证证明的概念泛化与研究迷思，带来了对印证证明诸如无视经验法则、逻辑法则、全案证据综合分析等的批判，或引出了“印证为主，心证为辅” 的改良路径。如此种种缺陷与随之而来的批判实质上并不源于印证证明本身，而存在于被赋予过多内涵与期待的印证证明之外。印证证明理论的过度膨胀反而使自己进退维谷：毕竟要求几种位于同一层次、基于不同逻辑的心证方法又彼此融合，是一种逻辑悖论。

其次，印证证明理论混淆了印证证明与证明标准的作用场域。修正后的印证证明理论认为“心证既为证明方法，亦为证据标准，其特点是主观的内省性——事实判断者基于自身经验进行证据感知和思维，从而建立内心确信”。不可否认，无论是萌芽期的主观主义自由心证原则抑或现代的主客观统一的自由心证原则，均是涵括证明标准的内容的。但是，这并不意味着印证证明亦涵括证明标准。无论在大陆法系刑事诉讼语境还是我国刑事诉讼语境，印证证明与证明标准均分属于两个范畴：在德国，印证证明属于实现法官自由心证的一种证明方法，刑事证明标准则包括客观高度盖然性与主观排除合理怀疑双重意涵；在我国，虽然刑事证明标准仍有待明确和具体化，但不会跳脱犯罪事实清楚、证据确实充分、排除合理怀疑等基本范畴。故而，印证证明理论出于“修正”之目的对“印证证明”的广义解读与扩张适用，试图将证明标准纳入自身场域的尝试，实质上导致了证明方法与证明标准概念之混乱，继而进一步模糊了印证证明理论的基本定位与核心要旨。

再次，“以证据裁判为主，自由心证为辅”的证明模式亦存在一个基本范畴理解与定位的偏差：证据裁判原则与自由心证原则亦并非同一范畴内对等的概念。证据裁判是自由心证的前提和基础，两者并非是谁“主”谁“次”的关系，证据裁判对应的应属神明裁判，而自由心证所对应的却是传统法定证据主义。自由心证并非法官的凭空恣意裁判，自由心证必须以收集到的证据材料及可抵达的客观事实为基础。换言之，此观点实质上与前述对“自由心证（主义）”的理解存在相同的概念与范畴时空错位的问题，即在当代的讨论中仍在使用一百多年前的自由心证的概念与范畴。

此外，印证证明理论中对印证证明与直接言词原则的关系亦存在理解偏差。不可否认，自由心证原则实施的核心保障即为直接言词原则，直接言词原则与自由心证原则相伴相生，但是，并不代表着自由心证原则与直接言词原则下所获取之刑事证据证明案件事实不需要印证证明。由于刑事案件的特殊性，能够获取的直接证据极为有限，在大概率仅存在间接证据的情况下，无论是直接、言词审理还是间接、书面审理，证据间的相互印证对案件主要事实的判定均至关重要。庭审中审查的证据证明力的确证亦需要其他证据进行印证和补强，各证据之间亦需要形成证据链或证据环。在此维度上，自由心证原则与我国的印证证明具有契合之处。

事实上，在刑事诉讼领域，包括德国在内的大陆法系国家并没有孤证定案的传统，印证证明是大陆法系国家刑事审判的一种基本的，但并非唯一的证明方法或证据审查方法。自由心证原则的两大核心内容为自由裁判与内心确信，而基于印证证明的核心内容与基本属性，它属于自由裁判中的一种证明方法，一如经验法则与逻辑法则、证实与证伪、系统分析等。

六、回归与展望：中国自由心证原则的未来之路

在证据裁判原则下，传统法定证据主义与传统自由心证主义均属具有高度概括性的证据裁判类型，代表了证据与事实认定模式的两端。但是，两种证据审查模式并非泾渭分明、水火不容，既不存在法官毫无法律约束之绝对自由，亦不存在不考虑司法裁判特殊性、对证明力及事实认定标准的完全强制性规定。在传统法定证据主义与传统自由心证主义处于两端的光谱上，现代法治国家多依据司法传统与法律文化寻找合适本国的位置。

自由心证原则早已摒弃法国大革命时期的纯粹主观主义和康德式的纯粹理性主义，转而寻求一种主客观之平衡。德国自由心证原则的客观化趋势日渐明显，刑事程序中对自由心证原则的释明、限制与制约亦呈现增多的趋势。首先，在证据审查维度，自由心证必须在客观证据的基础之上、在证据裁判原则的框架之内展开；其次，在程序保障维度，以审判为中心的诉讼结构是自由心证原则运行的基础；再次，在司法制度维度，审判独立与法官职业化是自由心证原则的基本保障。故此，当代自由心证原则与我国法官的证据审查模式并无实质差异。

自由心证的德国迷雾已然散去，而中国面纱依旧若隐若现。事实上，自由心证原则在我国不是“是否存在的问题”，而是“是否承认它存在”的问题。一方面，法官裁判无法避免自由心证，亦不自觉地心证裁判；另一方面，立法者拒不承认自由心证的存在，导致立法上缺乏必要规范对“心证”予以保障和限制。这样一种悖论导致我国法官自由心证保障制度与限制规则之阙如。具言之，刑事司法中法官裁量必要性与立法者试图否认此种裁量必要性之间存在悖论与冲突。我国法官受到不合理证据规则的重重束缚而缺乏实质性事实认定权，但个案偶发性与证据多样性又决定了审判中法官自由裁量居于核心地位，证据裁判与案件认定必然要借助其专业知识、生活经验与逻辑推理。与此同时，我国刑事立法基于限制法官自由裁判的基本立场，对法官自由心证必然存在的客观司法现象视而不见，在客观主义“证据确实充分”的外衣之下，实际上隐藏着法官心证过度自由之风险。故此，在裁判中普遍存在套用法条程式化办案、对事实认定说理不充分、不公开阐述心证过程、不详细论证证据、法律与事实的逻辑关系与推理细节等问题，最终难以以理服人，司法公正性与权威性受到不断的质疑与挑战。

否认或回避自由心证在我国刑事司法的现实存在并非有效解决现有刑事证据审查与案件裁判难题的路径，掩耳盗铃只会导致在客观主义外衣之下纵容更多司法潜规则与恣意裁判。在承认自由心证之现实存在与自由心证原则确立之必要性的前提下，深入研究并领悟自由心证原则，结合我国刑事立法与司法语境，建构符合我国法治文化与立法背景的自由心证原则，才是真正促进我国刑事证据理论发展、推动庭审实质化、保障刑事裁判的必由之路。然而，我国自由心证原则之构建不仅需要传统法定证据理念与自由心证理念之融合，亦无法回避证明标准客观主义与主观主义之角力，更存在刑事立法回应司法现实关切之考验。故此，在厘清当代自由心证原则内涵、廓清我国新法定证据主义与印证证明理论之后，思路便回归到核心问题：基于我国刑事司法传统与立法构架，我国自由心证原则之轮廓何如？

我国自由心证原则在宏观维度应当包括基本内涵、原则下的具体规则、法规限制及程序性保障；微观维度则涉及法官自由心证的过程及影响因素、心证确信之标准即证明标准。此外，亦需要思考：自由心证原则如何平衡自由与约束的关系，如何防止法官自由裁量权之滥用，又如何适应和面对现代科技的发展及随之带来的挑战？尤其是在我国立法已经确立了众多证明力规则的现实面前，如何平衡法官自由心证与证明力规则的关系？囿于篇幅，本文无法细致描绘我国自由心证原则之全然面貌，而仅能基于自由心证原则之本源，勾勒其在我国刑事诉讼理论体系中的基本轮廓与应然地位。

首先，自由心证原则为刑事证据审查与司法裁判之基本原则。自由心证原则并不能归于纯粹的主观主义，而是主客观之融合，是基于客观合法证据的主观裁判。在我国证据理论研究中，无论是印证证明理论的研究，最佳解释推理、叙事或拼图综合证明模式的提出、原子主义与整体主义之争，还是对经验法则、逻辑法则之探讨，均属于自由心证研究的基本范畴。因此，可以说自由心证理念早已为我国学界所认可，静水深流的自由心证研究早已展开，然而对自由心证原则内涵之研究仍有必要围绕证据自由评判与法官内心确信两大核心要素深入展开：一方面，证据自由评判之客观基础为合法证据。故此，对取证程序合法之保障、对违法证据之有效排除实则是自由评判之前提，证据能力之评判为自由心证的前置性审查。在此基础上，法官应当主动行使证据调查权，对获取的合法证据进行单独审查与全案审查。另一方面，法官内心确信是证据自由评判旨在抵达的终点，是司法裁判的前提。“法官内心确信”既不是一种主观黑洞，亦非精确的刑事证明标准。它包含个人内心确证、客观事实基础、高度盖然性与高度个人化的判决四个维度的要件，并且强调法官心证的塑造过程，此四个要件保证法官内心确信的可信赖、可追溯、可救济。

其次，自由心证原则本身并非刑事证明标准，但却蕴含着刑事证明标准，即法官内心确信。自由心证要求法官基于证据之自由裁判达到内心确信的程度，而何为“内心确信”成为界定刑事证明标准之关键。自由心证主客观主义之争的核心议题即关涉证明标准之明晰。诚然，内心确信归于主观证明标准，但却融合了基于客观主义立场的高度盖然性与基于主观主义立场的排除合理怀疑。我国学界对刑事证明标准、证据确实充分与排除合理怀疑的研究可与之接轨，而其中所暗含的客观主义与主观主义之争或许可从“内心确信”的明晰之路中汲取灵感。事实上，若拨茧抽丝般地真正厘清了自由心证原则之内涵，即能明确刑事证明之标准。最后，在综合评判全案证据仍无法确保内心确信之时，法官则依据存疑有利于被告人之基本原则作出判决。

再次，自由心证原则并非无边界，而应通过立法与司法予以限制。此种心证之限制围绕实体定罪量刑程序（抑或称之“严格证明程序”）展开，以限制证据之证据能力、规范法定程序为核心，亦不全然排斥少量证明力规则。回归我国司法语境，法官自由心证似乎从未有过主观主义阶段，而是在与客观主义的博弈中才渐渐获得些许话语权，故此，相比于德国自由心证原则，我国自由心证原则存在更广维度的客观限制。而我国刑事证据研究对传统法定证据主义与自由心证原则的思维禁锢亦始于斯，自由心证原则建构难点亦显于斯。如前所述，相比于德国重视证据能力之审查而给予证明力审查之自由，我国明显是轻证据能力之审查而重证明力之规制，无论是取证程序规范还是证据排除规范均存在不少缺漏。因此，在德国大量程序违法之证据因不具有证据能力而被禁止进入证明力审查阶段，而在我国仅排除极少数非法获取之证据，从而导致大量程序违法证据进入证明力审查阶段。而司法解释所确立的众多证明力规则，其合法性、合理性与必要性暂且不论，实质多归属于经验法则之法定化，不具强制性规范属性。故此，我国自由心证原则的建构必须建立在全面审视、反思与重塑我国证据能力、证明力理论与规范的基础之上。在立法层面，有必要基于公民基本权保护理念，适当扩充证据能力限制条款，扩大非法证据排除范围；与此同时，全面梳理、缩减司法解释中的证明力规则，剔除不具强制性之规则，合并具有类似功能之规则。在司法层面，通过教义学理论与典型判例设立法官自由裁判与证明力规则之边界。

最后，职业法官的专业素养与正义良知固然是自由心证原则历久弥新、长久发展的基础，但程序保障才是其行稳致远的核心。实质性庭审的落实、判决说理与心证公开、法律救济途径的保障是自由心证原则从应然走向实然之需，以维护自由心证原则之正当性，并防止法官心证之恣意，以平衡自由心证原则的主观性与自由特性所带来的任意性与不确定性。

第一，无证据则无心证，无庭审则无裁判，实质性庭审是刑事诉讼的应然之意。“司法的根本特性是判断性，司法判断的前提是亲历性。”一方面，法院审判阶段应当成为刑事诉讼的中心，被告人的刑事责任应当在审判阶段而非在侦查、审查起诉或其他阶段认定；另一方面，法院庭审活动决定被告人的罪与罚的问题，即“审判案件应当以庭审为中心，事实证据调查在法庭，定罪量刑辩护在法庭，裁判结果形成于法庭”。故此，实质性庭审应以直接言词原则之落实、证人出庭义务之强化、被告人质证权之保障与法官证据调查权之赋予为核心，强调被告人、证人、鉴定人等亲自出庭陈述、接受质证与调查，并坚守主审法官与裁判法官合一，将主审法官庭审作为心证的主要来源渠道，继而从根本上确保法官能够获得足够的、真实的证据以支撑心证的形成，为以客观证据为基础构建自由心证提供可能。

第二，判决说理与心证公开是自由心证原则的防护墙，对心证的公正性起到实质性保障作用。首先，法官有义务分析证据之证明力大小及有无与案件待证事实的关联，阐明证据与事实认定、法律适用间的逻辑关系与推理过程，并释明得出有罪或无罪判决的理由。在此过程之中，法官必须重新考量心证过程、斟酌判决的逻辑推导过程，进一步保证判决的合理性与合法性。其次，自由心证的书面化能够监督、迫使法官在证据裁判与案件审理中更为谨慎、缜密，约束心证、防止恣意裁判。再次，判决理由的书面阐明能够保证被告人、上级法官及公众能够了解自由心证形成的过程，认可判决的合情合理，并藉此保障判决心证的可重复性，保证心证的正当化。又次，判决理由的书面阐述亦为被告人获得法律救济提供前提性基础，被告人有受保障的路径探知法官判决之理由，若其认为法官判决理由存在错误或不合法，则可有针对性地提起上诉。最后，上级法官亦能够通过书面说理的判决知晓下级法官审查证据与认定事实的思路与过程，亦为上诉审提供重要证据，从而对下级法官的心证起到间接的监督作用。

第三，无救济之权利非权利，无后果之义务非义务，法律救济途径的保障从内源倒逼法官合理、合法自由心证，避免恣意裁判。自由心证原则的有效执行亦需要保障被告人之上诉权，即若被告人发现法官自由心证存在特定法律错误，可以基于违反自由心证原则提起上诉。此类法律错误应当至少包括：法官未阐明判决理由或判决理由存在矛盾；法官未全面评估全案证据或遗漏重要证据；法官违反经验法则或逻辑法则进行事实认定或法律适用。故此，相比于印证证明理论、经验法则与逻辑法则等研究，自由心证原则的程序保障实质更具法学视角与规范属性，理应获得更多理论研究之重视。

七、结语

在主观主义与客观主义之间，德国自由心证原则在教义学理论与司法判例的影响中日益客观化。而在我国传统客观化的证据审查规则之下，自由心证理念亦早已静水深流地影响着我国司法实践。无论是新法定证据主义抑或印证证明理论，均属于对本土化自由心证理论的探索与尝试。在反思与纠误之间，本土化自由心证原则的轮廓亦逐渐明晰：作为刑事裁判之基本原则，自由心证原则并非证明标准，却蕴含证明标准；心证自由而非恣意，受到证据能力与证明力规则之限制；实质性庭审、判决说理与公开、法律救济途径是自由心证原则由应然走向实然之基本程序保障。

本文转自《比较法研究》2024年第6期

2024-12-15
杨联陞：传统中国对城市商人的统制

本文主旨，在就传统中国政府对城市商人之统制（包括控制与利用），提出若干看法，以供讨论。所谓商人，系用广义，一切行商坐贾、铺户店号，乃至当铺钱业牙行，均在讨论之列。所谓城市，亦取广义，兼指城镇，不论大小。所谓传统中国，时限可长可短。在本文多指帝国时代末期，自清初至鸦片战争一段，但亦有时兼及前后。

中国传统，远自二千余年以前，早已以农为本，视工商为末业，政府对四民之待遇，因有重轻。然就全帝国时代而言，亦不可一概而论。如《史记》、《汉书》所载，政府对商人之统制，包括贾人有市籍，不得为吏，不得名田，重其租税，乃至其车马服饰，亦受限制。此种政策，虽起于汉初（或更早），至武帝时，因财政关系，已有孔仅、桑弘羊等，由市井跃登朝列。其他限制，似亦渐成具文。此后在理论上，虽仍轻商，实则对于商人之控制与利用，力图兼顾。唐、宋以来，此种情形，更为显著，议论亦略有改变。读史者当就各时代分别观之，始能得其真象。如就清初至中叶一段论之，则对商人之控制，已不甚严，租税负担，亦非特重，政府且颇以恤商自许。利用则积前代之经验，特重“保”（如保商、保结、连环保）“包”（如包办、包额）诸术，颇有成效。

在清代商人入仕，远较前代为易。在隋、唐与辽代，工商及其子弟，均不得应科举。但此限制至北宋已见宽弛。据《宋会要·选举》，庆历四年（1044年）定“诸科举人，每三人为一保，所保之事有七”，其七为“身是工商杂类及曾为僧道者”并不得取应。细玩“身是”与“曾为”字样，则不但工商子孙可以应举，即曾为工商而今已改儒业者，似亦可以应举。更早者为淳化三年（992年）所定，“如工商杂类人内有奇才异行卓然不群者，亦并解送”。虽属特例，已开商贾应举之门矣。

金元时代，对商人应科举，似乎已无限制。明清更有所谓“商籍”，专为盐商子弟在本籍之外盐商营业之地报考生员，而且特为保留名额。据何炳棣教授之计数，盐商子弟，成进士者，明代近一百九十人，举人三百四十人。清代进士至乾隆之末，已达四百二十余人，举人八百二十余人，其中在18世纪，人数尤众。按明清商籍，盖仿元代河东之运学运籍。当异族入主之世，商人往往特受优待，亦可注意也。

科举之外，尚有捐纳一途，为富商入仕之捷径。清代捐纳制度，近人已有专书详论。在清代主要自为财政关系，然如雍正上谕所言，捐纳进身，可救偏重科举之弊，则其中亦不无政治意味也。

宋、元以降，商人入仕之途渐广，此与一般社会经济之发展，关联自极密切，在思想上，亦有反映。如宋元儒者，已不讳言治生，明末黄梨洲，已有工商皆本之论，清代沈垚（《落帆楼文集》）更谓“古者四民分，后世四民不分。古者士之子恒为士，后世商之子方能为士。此宋明以来变迁之大较也”。其言虽近于偏激，亦有相当根据。

秦汉所谓市籍，至少延至唐代。中唐以后，政府对于市场之管制，大见松弛，对商人之特别注籍，似亦不及以前之注意。明代户籍，分军民匠灶四大类。商人似亦属于民户。清代《嘉庆会典》有“军民商灶”之别，然此所谓商，即上文商籍之商，专指盐商而言，不得误解为一般商人。惟以商人当行及纳税（如门摊、铺税等）之故，政府对于孰为商人，及各商资力之大小，亦当有相当了解。保甲调查，亦分住户铺户，此在19世纪之纪录特为显著，京师所在，固不待言，如《津门保甲图说》（1846年）所记天津各区人户，分类详细，数目似亦相当可信也。

政府就商人收取关卡通过税及落地税等，几于无代无之。关卡之弊，记述议论，亦复多有。工商当行，在政府视为应尽之义务。然行户采买，名为给值，实多白取。所谓和买、坐办等，皆是此类，深为商民之患。就一般税役而论，明清虽有以货币代实物之趋势，实际负担，仍属不小。惟清代在未创设厘金之前，税额较之前代，似为稍轻。

牙行中之官牙，领有牙帖（纳费），实只相当于唐代之市司，除介绍买卖外，并可评定物价，有时且可为商人之居停主人。在水路则有埠头，亦称船埠头，其作用与牙行同。牙行之作用，与同业商人自组之行，有时相辅，有时相竞，其关系殊为微妙。在政府用为统制之工具，则无甚异同。政府对物价与币值之控制，普通最重视米粮价格与银钱比价，对米粮与货币之流通，有时亦加管制。惟自宋元以后，亦不时有人论及过分统制之恶果，提倡自由流通，此亦经济发展之反映也。

政府利用商人之一常法，为发商生息。此在若干情形之下，对商人可能有利。但商人须负责偿还本息，往往为难。至于盐商洋商等之捐输报效，名曰情愿，号为踊跃，实际则多出强迫，不过政府与商人分利之美名而已。

一般言之，清政府对商人，尚属宽大。商人之苦于苛虐者，罢市、请愿，乃至短期暴动，虽有其例，大规模之变乱，则未有商人为领袖者。此中因素，虽甚复杂，与政府对都市商人统制之和缓，似不无关系也。

一、导论

这篇关于政府对城市商人之统制的文章并不是一篇研究论文，文中所提出的数点建议只是一个社会史学者所做的一般性的观察，希望或可作为进一步讨论的基础。文中“商人”一词是用的广义，包括各种商人与生意人，固定的与流动的，甚至牙人（经纪人），经营当铺、钱庄的人，以及投资于传统手工业的生意人。这样使用的理由是中国传统上把这些人都称做“商”。“铺户”一词，是登记职业用的，差不多包括所有从事各行生意的人。“店”这个字或指商店或指旅店。因此商人一词必须使用广义才能把一些有意味与相关的事实包括在内。“城市”一词也是用的广义，兼指城、镇与郊区，而不限于城墙以内的地区。事实上，通称为“镇”的市场中心，大抵是没有城墙的。商人只要是在城市做生意的都可称为城市商人，虽然他并不一定住在城里。“统制”这一词包括与商人的地位、活动以及税役等有关的规定与限制。

本文的讨论集中于清初到鸦片战争（1644—1840年）这一段时期，换言之，即是传统中国开始受到西方势力的空前冲击以前的两个世纪。这段时期特别令人感觉兴趣的理由，其中之一是这段时期内，中国的统治者是几位相当开明而且非常能干的异族皇帝；这段时期中国正经验到社会与经济方面重要的变迁，即是中国大陆学者称为“资本主义萌芽”或初期成分者。[1]此外，中国在这段时期仍保留有许多传统的面貌。

一般对传统中国只有初步了解的研究者，可能认为旧社会商人的地位是这样的：农人所从事的职业是“本业”，相对的，商人与工匠的职业被视为次等的、非基本性的“末业”。此外，商人多被视为奸狡、惟利是图，因而受到轻视。他们的投机、操纵物价、屯积货财，都被认为不但害及消费者（特别是无助的农民），也对整个经济有害。商人的这些活动有违于公正与安定的原则，因而各种规限与税役必须加在商人身上，对于他们的地位必须加以降抑。但是，像这种一般性的说法至多不过是粗略的说明罢了。

这种一般性的说法所以流行的一个原因，是受到古代中国某些时期的史籍的影响。差不多三十年前，如果中国学生曾读过一点点中国的正史，很可能不是《史记》，便是《汉书》；前者的范围是从中国古代至西元前100年左右，后者则从西元前206年到西元23年。上述的说法大部分便取材自这两部史书中谈到食货与商人的篇章。[2]那时候大学里中国通史的课程仍然只着重于古代史方面。比如就制度史来说，教授们认为只要说明与讨论汉代的制度史就可以，因为后代差不多都是因袭汉代的模式，只有很少的修改与出入。

当然，中国古代史与中国第一个官僚帝国确有许多值得研究之处。简单地说，在战国时代（西元前403—西元前221年），政治、社会与经济上巨大的动乱与变迁中，游士、游侠与行商坐贾这些人变得非常流动而活跃。他们成为各独立邦国以及后来帝国的政治资本。因此，他们可能是艾森斯塔教授（S.N.Eisenstadt）所称的“自由浮动资源”的最好的例子，对于他所谓“历史性官僚帝国”之成立，有过重要作用。[3]

到西元前221年秦统一各国，这个中国史上第一个帝国要面对的问题是如何处理这些自由浮动分子。明显的办法是统制，包括操纵与利用——为了政府的利益，绝对不能让他们自由集附到另一个政治中心，或是自己形成一个有影响力的集团。秦朝只是短短的十几年（西元前221—西元前207年），未能完成这项工作。它的失败也许由于过分注重法家思想，过分独裁。汉朝从这里学到教训，成绩较好。当温驯的儒家学者（借用顾里雅教授H.G.Creel的定义：儒乃懦弱者也）成群地协助或加入汉朝的统治集团，中国官僚帝国的模式便开始形成了。

汉代是否真正采用压制商人的政策是值得讨论的。支持这方面看法的人会说，商人得缴纳额外的重税，他们不准拥有土地，不准穿着丝绸，他们的子孙不得做官，他们的活动在政府有专卖权的一些基本货物上受到限制。事实上，上面这些说法，除了有关纳税那一项之外，大多数是不难修改的。一个富裕的商人可以很容易放弃他登录的商人身份，变成一个地主，而仍然做谷物、丝帛或其他生意。汉高祖命商人不得衣帛，这道命令恐怕当时并未认真执行过，以后更是完全被忽略了。雄心勃勃的汉武帝即位后，即打破政府不任用商人为官的规定，两个在盐铁买卖上非常成功的商人成为他的主要参谋。把盐铁收归国有的建议，就是他们提出来的。他们主管专卖事业之后，就引进更多的生意人担任政府官职以协助他们办事。桑弘羊，贾人之子，精明而有谋略，深得武帝信任，由侍中官升御史大夫（副相）。由此看来，中国第一个持久的帝制朝代——汉朝，对商人的态度就已经是模棱两可的，至少在一段相当时期内，政府是有意兼用一种对商人限制、征税而又加以利用的政策。

在后来的朝代里，商人的命运也走着一条曲折的路途。为了解某一段时期商人的地位，一般历史背景的知识是需要的，因为只有与其他时期商人的地位相比较，才可能对某一时期的情形得到一个有意义的评价。

二、政府对城市商人的统制

如果回顾一下清代最初两百年间政府对城市商人的统制，很明显的是，这段时期我们见不到什么特别的障碍妨害商人改善他们的地位；政府对商业活动的控制是有限的，加于他们身上的课税与勒索，相对来说较轻（或至少不特别重），另外是，在统制的执行上，往往都离不开“保”与“包”这两个古老而特别重要的观念。我们可以先从最后一点谈起，以作为了解的背景。

“保”与“包”这两个观念与“报”不可相混。关于“报”我已有另文谈及。这三个观念都是传统中国盛行的观念，而且还继续到现代。在“保”与“包”这两个观念中，“包”流行较晚，大致是自宋代以降，这点也许可以反映出中国从宋代以来就对有限而可确保的利益或结果越来越感到兴趣。

保的观念几乎在政治、社会与经济生活的每一方面都可以发现。参加科举考试、进入官场、担保贷款、申请护照等等，都需要某种地位的人或某级以上的店铺担保。几个人或店铺联合起来担保的称为“连环保”，执行地方警卫与地方统制的保甲制度，是中国史上最为人熟知的制度之一。包的观念最常见的是包税（常与另一个字“额”连用）此外还用于包车、包船、包工乃至包饭等等。

我们可以就商业活动范围之内举出更多的例子：政府核准的牙行的一个作用是保证某种程度内的公平交易。政府要求商人行会的领袖负责保证会员的行会，而且要供应清廷官方所需要的应用物品（这些往往牵制到所谓规费以及类似的勒索）。有引票经营盐运的商人首领称做“总商”，责任重大。经营出入口贸易的“公行”，有时称为“保商”，必须负责一个港口的对外贸易。大规模的商业组织，政府往往要他们成为多头制，以便维持制衡。这种预防办法，类似政治圈内所使用的，例如数名省级的高级官员并列。这是中国统治者从历史上得到的经验，知道倚重惟主管首领是问与联合负责的原则。

（一）地位与登记

在清朝统治下，阻止商人爬上政治阶梯的障碍，显然很少。中国帝制早期的几百年内，统治阶级经常妒忌地守住他们的政治权力，商人即使想占一席地位都极端困难。隋代（581—618年）所建立的进士制度，一直成为学者经由考试进入官场的最佳途径。但是这项考试，在隋唐（618—907年），以及辽代（907—1125年），对商人、工匠及其子孙是不开放的。[4]这种歧视政策到宋代（960—1279年）似乎减轻了不少。1444年颁布的规定要进士级的考生之间组成相互担保的团体，每一组三人（首都区开封府内五人）。担保的条例有一项是“身是工商杂类，及曾为僧道者”不得取应。条文中所用的“身是”与“曾为”两词似乎指出，出于商人家庭而自己不是商人，或甚至曾为商人而目下已非商人，都准许参加考试。如果我的解释正确，这点值得研究中国历史的人记在心里。同时要注意的是，在金（1115—1234年）、元（1206—1368年）两个异族入主的朝代，似乎没有禁止商人或工匠参加考试的规定。因此我们可以说，最近数百年中商人已经得到了政治解放。

事实上，在明清两代，盐商还有一项特权，可以令其子弟注册入“商籍”，参加生员考试，以进入商人居住地与经商地的学府，而不必如一般人须返回本籍才能参加考试[5]。此外，学府中特别为商籍学生保留名额，这些生员以后多半在省城参加考试。这种特权无疑地为清代盐商的后代造就了几百位进士，与更多的举人。何炳棣教授在他的研究中，曾举有数字。[6]把这些资料大略地再检查一遍，可以发现这些举人进士大多数是在18世纪通过考试的。

令人感兴趣的是，为盐商家庭子弟设置学校的制度，可以追溯到元代。1299年，一位蒙古籍的盐政在河东为盐商家子弟设立了一个学校，称为“运学”。注册的学生称为“运籍”，这名词是“商籍”的前身。这件事以后在16世纪末，被人提出来当作在别处成立类似设施的前例。[7]也许，就元朝来说，给予商人特权是很自然的事，因为蒙古的统治阶级十分依赖维吾尔商人与中国商人给他们带来的巨大利润。

除了考试以外，商人获得荣耀乃至官位的另一途径是“捐纳”，这是一种花钱买头衔、职位的制度。卖官鬻爵自然不是新事，它甚至可以追溯到汉代，但清代的制度无疑地是最完备，而且是最被倚重的一项主要收入。在18世纪早期更是重要。这个制度显然也包含有政治动机。正如雍正皇帝曾公开承认，有才能的人不由正途，而借着捐纳等非正途出身，可以平衡由科举出身者造成的过分影响力。在理论上，正规的捐纳，虽然本身不是正途，却是让生员得官或小官取得晋升的主要台阶，当然也有例外的情形。实际上，所有的富人都能为他们的父母买一个荣衔，并有不少替自己捐买监生、荣衔甚至官职者。富有的商人任意利用这种机会不难想像得出，18世纪的盐商就可以举出很多例来。商人捐官这件事，在19世纪下半叶曾经遭到章奏强烈的反对，但是清政府不能也不肯放弃这笔每年给国库带来几百万两银子的财源。有人曾说，这一大笔收入使得清代早期统治者不必重视商税，结果是商人得利。此外让人感兴趣的一点是，大约从1851年开始，旧式银行称做“银号”者，为人办理捐纳而大赚其钱。[8]

在结束我们对商人地位的讨论之前，我们需要注意到明清两代社会系统的流动性，这点何炳棣教授已有畅论。[9]其中很有趣的是，我们可以看到家庭分工的例子，父亲或兄弟经营家中的田产或生意，而让儿子或另一个兄弟去读书、参加考试。清代学者沈垚（1798—1840年）曾上溯到宋代，认为这种经济基础是帮助考生成功的重要因素。沈垚认为，从那时候起，所谓四民的士、农、工、商已有了结合与混合的现象。[10]另一位清代学者钱大昕（1728—1804年）也注意到，宋元时代的儒家学者已经鼓励学生首先应获得适当的生活方式（谋生方法），这样才可以使他们在进入官场前专心读书，日后在任位上才能维持正直与清廉。[11]农夫的职业当然是基本的，一个诚实的商人或制造有用而非奢侈品的工匠，他们的职业也可视为基本的，黄宗羲（1610—1695年）曾强调过这一点。[12]这种态度上的改变，无疑地反映出当时的社会环境。在一个较为流动的社会里，不只富商成为有威势、有影响力的人物，就是普通商人也发现他们的地位改善了。另一方面，我们也不能说，古老的轻商观念，此时已经归于消灭。举例来说，乾隆皇帝在1742年下诏免除米与豆在国内所有的通过税，诏令中他依然提出“重本抑末”的老调作为理由。[13]

与商人地位密切关连的问题是他们在人民中如何登记。中国历史上，登记（著籍）一直是政府统制人民的一项重要手段。从帝制中国开始，正规商人就得登记在“市籍”项下。秦汉时代由于用兵频繁，有时那些名字登记在市籍下的人是第一批被征召入伍的，然后是那些以前曾入市籍的人，再其次是那些父亲或祖父入市籍的人。[14]

市籍的登记至少继续到唐代，那时候由政府密切统制与监督的城内集中市场颇为繁荣。关于唐代的市场制度，杜希德教授（Denis Twitchett）曾有精辟的论述。[15]但是到了唐代后期，这种市场制度开始衰落，大多数城市市场的规定都被忽略或遗忘，很可能不久以后市籍登记便终止了。

在明代，户口的登记主要分为四大项：军、民、匠、灶（制盐者）。[16]工匠有专籍，因为他们必须轮班应差。明中叶以降，班匠可以纳银代差，渐渐得到解放。

军民工匠四种户籍在名义上延至清初。《嘉庆会典》列举“军、民、商、灶”[17]，这一条很容易引致误解，因为此处之商即上述之“商籍”，单指盐商而言，而非指一般的商人。

户口的登记从1772年正式成为保甲制度的一部分。然而，保甲制度起初并未认真执行，直到1813年冬天，国内发生一连串暴动事件，特别是这年秋天“天理教”的一次暴动，震动了北京皇城，以后保甲制度才比较认真。清代的保甲制度并不是划一的，大致来说是“门牌”的登录以及登记入籍。登记的事项包括户长的“生理”或“行业”。这分为“住户”或“民户”，与“铺户”两个主要项目。有趣的是，铺户的登记只包括那些不与家人同住的店家（我们可以称为离家商人）。店主与家人同住的则归入民户。我们需记住，在中国帝制时代，远离家乡的老百姓很可能引起别人的猜疑，他们得随身携带执照或护照之类的文件以证明他们的身份。

根据1851年秋天的官方报告，北京的内城（西洋文献称之为“鞑靼城”，因为大多数居民均为旗人）住户七六四四三户，铺户一五三三三户。[18]在北京的外城或所谓“中国城”，铺户的数目可能更多些。另外从天津在1846年施行保甲制度下登记的民众，我们可以发现某些有趣的项目与细数。[19]生意人分成三个项目：“盐商”、“铺户”与“负贩”。在天津城围内登记的九九一四户中，盐商一五九户，铺户三一三二户，负贩一九三五户。在东郊，即东城门外，登记有七○七七户，其中一一○户为盐商，二九七五户为铺户，一三三○户为负贩。在北部的六六三五户中，盐商五二户，铺户三一九六户，负贩七九九户。其他西郊、南郊、东北郊与西北郊四个郊区，登记的户数较少。但在这些区中，生意人三项登记的总数仍超过总户数的三分之一或接近半数。这些显然相当可靠的数字，很可指出在我们所讨论的这段时期的末期，天津市的商业化程度。

（二）限制、征税与利用

唐代的各种民法与刑法包括许多关于市场的详细规定，但清代的《大清会典事例》与明代的会典相似，对于贸易与商业方面较少提及。会典中的“市廛”即市场统制一节，仅包括短短的五项：经纪业务、公平价格、市场的独占（把持行市）、度量衡，以及市场上出售的衣料与用具的品质标准。除了第一款内规定私营经纪业务为非法（私充牙行埠头），这点是从《明会典》中抄袭而来，其他各款都依照唐代标准而制定。[20]关于上述最后两项事务的规定，其起源最为古老，也可能最不受人重视。晚清的法律专家薛允升氏（1820—1901年）曾特别感慨这方面执行的松懈，他强调维持货物品质与统一度量衡的重要性，但并未引起作用。[21]

根据禁止私充经纪的一款，在城镇乡村的各行业的经纪人（诸色牙行），以及类似泊船地方（船埠头）的经理人，应从殷实人中选出来担任。政府发给他们盖有官方印记的登记簿，让他们记录来往商人或船主的姓名、固定住址、通行证号码，以及货物的数量。登记簿每月要送交政府当局检查。那些未经官方核准而营经纪业务的人应受杖刑六十大板，他们所收取的佣金（牙钱）应予没收，如果官方认可的经纪人或埠头（官牙埠头）有掩饰藏匿，应受杖刑五十大板，然后免职。关于物价一款，将制定公平价格的责任给予经纪人（行人，即牙行），而非唐律上所规定的市场官员（市司）。[22]

经纪人的作用是在买者与卖者中间协调商定一个合理的价格，除此之外，许多经纪人也充当店家，招待来往商人的食住与寄放货物，当然也照章收费。这些费用是在交易时所收的佣金（牙钱、用钱、或称行用）之外的。经纪人也可能充任商人买卖的代理人，为他们接洽贷款，安排他们的交通与货物运输问题。因此经纪人在贸易商业上能担任不少职务。[23]政府要借着经纪人以钳制商人是很自然的事。

在理论上，只有有执照的经纪人才准许担任这些职务。根据规定，这种执照（称做“牙帖”）只有省级当局才能发给，并有固定的名额，这个执照每隔五年检查一遍，并重新发给（北京从1725年开始），同时，名额亦可能变更。[24]实际上，省区与地方官员常常不顾名额而自行发给执照，因为这项业务是州县政府收入相当可观的一个来源。对省府与清朝政府而言，从经纪人的执照所收取的费用只是非常小的数目，但是，自太平天国叛乱以来，情况有了重大改变，从那时起，特别捐也由经纪人收取，并与厘金合在一起。在湖北与湖南，从经纪人处收取的年度捐税估计有他们的牙帖费的一百倍之多。[25]

这些经纪人，特别是那些私营的，带给商人的麻烦实多于帮助。当某一行业的商人组成一个行会后，通常都会被与他们这一行打交道的经纪人控制住。通常借着使官准牙人或为本行会员而达到目的。有关这类做法的例子我们在北京18世纪时组成的行会记录上可以看到。[26]

在这里要强调的一点是，“行”这个字在中文里经常是表示“行业”而非“行会”，除非我们将行会的意思扩大到包括那些没有会馆或公所，甚至没有行规的原始行会。政府热衷于让商人按行业组织起来的主要理由是配合它对各种物资的需要，这种要求可能来自清廷当局或任何大小衙门。商人有义务应付这种要求，称之为“当行”，意思是“本行的当值”。理论上，政府需要的物品应该用“时价”或“实价”买进。事实上，真正照办的很少，即使政府付给相当的价钱，经办人在中间索取的陋规也成为当值商行的一个沉重的负担。1738年，清廷诏令全国各大小衙门纠正这种陋习。[27]在雍正皇帝名义下发布的《州县须知》，警告地方政府官员，不得向商人与百姓强索物品。[28]然而这些命令与警告实际上完全没效。举例来说，为了供应清廷光禄寺所需用的猪肉与鸡，北京城内宛平与大兴两县特别从这两行里挑选了殷实的商人来负责供给，结果害他们从1752年到1756年之间，每年都赔上两三千或三四千两银子，直到这两行在1756年被废除为止。[29]

在明代末叶以后，这种“当行”制度照规定本可纳银替代。16世纪时，北京城的铺户分为九等，每户每年要付一钱至九钱的银子称做“行银”，以免当行。到1582年冬天，政府批准一项奏折，免除最下三等的铺户缴纳这笔行银。中间三等的铺户，其资金从三百两银到五百两以上的，以及上三等的铺户，其资金多至数千两银者，则需继续缴纳。同一年早期，政府也批准北京城内两县中一三二家官方认可的行业中，三二家小号得以免除缴纳这笔银钱。[30]

到清代，北京城内的两县获准从内城以外的铺户收取这笔银钱。上等的铺户每年缴付五两银，中等每年二两五钱，下等的铺户则免缴。北京内城九门内的铺户得以免缴的理由是他们得负责整理街道，特别是填土、洒水的工作。

大多数城市中对商店开设的地点都没有严格规定，只要不太靠近衙门损其尊严就行。但是暂时性的货摊与浮摊不准见于大街上。在皇都里的规定就比较严格，举例来说，北京的内城不准开设戏院与旅店。1756年所做的调查，显示城里有十五家旅店，其中有好几家“关东店”，显然是为在满洲做生意的商人开设的。[31]还有四四家店铺，夜间也经营旅馆业。所有这些店铺都得迁到外城去。另外七二家经营猪肉、酒、鸡、水果与烟草的店铺则准许留在内城。[32]叫卖的负贩有时不准喊出某些被认为是忌讳的字眼。在1648年与1649年，北京城内的负贩曾被禁止叫卖，因为多尔衮嫌他们的声音太吵。[33]

有关这方面我们可以再加上一点是北京城内一般都实行宵禁，特别是在内城。为了便利警卫，许多较小的街道，特别是通往大道的傍道都树立起栅栏，夜晚关闭，禁止通行。根据《金吾事例》，1729年北京外城有四四○个官准的栅栏；1763年内城有一○九九个栅栏，皇城内有一九六个。这些栅栏似乎一直维持到19世纪初年。[34]栅栏与宵禁令人想起唐代首都长安城内坊门夜闭的严格规定。

北京城内的九个城门的征收货物税都是在恶名昭著的崇文门税关管制之下。这从明代起就如此，一直继续到民国时。更早的朝代当然也有类似的税。记得南唐时代曾有官吏幽默地对皇帝说，首都不下雨的原因是雨恐怕在城门要缴税。结果，皇帝下令减轻这些税捐。[35]

像清朝其他的税关，崇文门的税关也有年度的定额。在本文讨论的这段时期内一般定额是十万两银多一点，这笔数目不算大，留给税吏足够的余地去充实他们自己的腰包。[36]税关的主管者照规定都是旗人，他们在这位置做了几年后，大概都得到类似的下场：借某一个罪名免职，其大部分财产充公，但也罕见完全破产之例。清朝皇帝与这些权贵税吏之间的关系正像渔夫与他豢养的鱼鹰之间的关系。

州县政府的一个重要收入来源称做“落地税”的，是对所有进入其管辖的地方市场的货品所征收的税。这些税通常都是包给衙门的衙役或牙子，自然有滥用职权与腐败的情事。1735年清廷曾下令废止所有乡、镇、近郊的落地税，仅保留县城与州城的。[37]这道命令是否曾广泛执行以及行之多久却是值得怀疑的事。

总结来说，清代最初两百年内对地区间以及地方贸易的税收并不特别重，尤其当我们比较一下明代万历朝（1573—1619年）的下半期，朝廷的宦官在征收全国商业税那种无情的勒索时，或是比较一下从1850年代加之于各省的厘金，给朝廷从1869年到1908年每年都带来一千四百万至二千一百万两银子的收入时，就可明白。[38]

在物价管制方面，政府关心的主要是谷价的稳定，以及铜钱与银两的兑换率。为了防止大量囤积铜钱与米谷，政府曾试用各种方式，下令禁止这种事情发生。当谷价太高的时候，最有效的办法显然就是抛售政府所存积的米谷。在北京，官方的米局特别用来供给旗人。由于北京城人口众多，因而有严格的规定管制米谷运出京城。原则上只有少量的米，村民买来供自己食用的才准许运出京城。此外，不论米或谷都不准运出城或甚至京畿地区。[39]清廷对未去壳的谷子管制更为严格，原因是谷子能保存得更长久。

银与钱的兑换是钱铺的主要生意。通常，北京城的钱铺得五家一组连合互保。18世纪有一段时期清廷依靠官方认可的钱币经纪人（称做钱行）来稳定兑换率。[40]大体来说，雍正与乾隆两朝在北京的成效相当好。兑换率的波动幅度是从八○○到一一○○文铜钱对一两银，但大多数时间都维持在八五○或九五○上下。[41]谈到钱铺间的连保，可注意的是类似的要求初期并未应用到旧式的银行（称做银号）上面，直到1860年数家半官方的银行宣告破产以后，银号才需要连保。由此看出，尽管银本位经济已经继续了几个世纪，政府对银的控制总是落后一步。

利用城市商人的一个主要方式，是托付给他们一笔公家资金作为投资之用。这种制度称做“发商生息”，在前几个朝代就有了。受到这种资金的商人绝大多数都是当铺与盐商。政府收取的利息是月息一分至二分。一般来说，这笔利息是指定作为特殊用途的。[42]雍正皇帝特别爱好这个制度，用所得的利息来资助八旗与绿营军。清廷的内务府也非常依赖发商的利息为其财源。乾隆皇帝时仍继续这个制度，后来他改变主意，1759年时宣告发商生息于政体有损，下令加以限制。1769年，他下令将已经发给长芦盐商的资金改称做“赏借项款”。[43]使用这个新名词的理由是政府所订的利率较法定准许的月息三分利率低得多。但是，旧的名词与制度仍被清廷、省府、州县政府以及半官方或非官方的组织继续使用下去。可注意的是信托资金对商人并不一定有好处。1783年长沙府的当铺为某种原因婉拒从省府接受更多的资金，托辞说他们手头已有足够的信托资金了。[44]

另一种利用城市商人的方式是“自动捐献”，称做“捐输”或“报效”，这是城市商人资助政府的军备、公共建设、水患、饥荒的救济，皇帝出巡与皇帝生日等的开销。根据两淮地区盐政管理官方记录的数字显示，在1738年至1804年之间，这个地区的盐商在四十多个场合总共捐献了三千七百五十万两银子。[45]根据盐政的报告，盐商们都是“情愿”甚至“踊跃”认捐，恭请皇帝“赏收”。在另一方面，盐商们又不时请求以分期付款的方式来捐献。有几次，皇帝对商人的忠诚报效与急公好义表示嘉奖，而只赏收一半的捐款。商人在这种情形下所得到的直接回报不过是所谓“议叙”与名义好听而已。然而在其他场合，皇帝为显示对商人的仁慈宽大，准许他们免费取得额外的“余盐”，或是允许他们延期偿付滞纳的盐税与信托基金的利息。皇恩的殊荣，甚至免除盐商对政府的负债，1780年减免了一百二十万两银子，1782年与1784年大约是三百八十六万六千两。[46]另一批重要的自动捐款，是由广东的盐商与洋行（行商）所认捐的。从1773年至1832年间的捐款总数大约是四百万两银子，数目虽不是大得惊人，也是一笔巨款。[47]

如果能比较清代各皇帝所采行的经济政策，特别是有关商业贸易的细节，甚至比较一个皇帝在不同时期的经济政策，将是一件极有趣的事。遗憾的是这样的比较已远超出本文的范围。然而我们可以强调的是康熙、雍正、乾隆三帝都绝不是蒙昧无知不肯用心的专制君主。康熙皇帝有一次在1717年曾夸称他对盐政方面深刻的了解。[48]雍正皇帝无疑地非常通晓一般的财经事务，但1728年有一次也承认他并不特别了解有关茶政上各种渎职情事，以及有关茶与马的贸易，因此不能给负责的官员特定的指示。[49]乾隆皇帝在1748年曾有很合理的意见，认为一般来说还是把市场方面的事交给人民，准许他们自由流通货物较好。政府的干涉，虽然出于好意，常常由于处理不当而产生扰民的障碍。[50]清代皇帝一般都可以称得上对商人宽大而同情的。但在另一方面，他们对商人有时也出诸操纵甚至有喜怒无常的态度。

三、城市商人的反叛取向

相对于政府统制，重要的一点是检讨商人是否曾抗议或反叛这种统制，和采用什么方式。有关这方面讨论，我们可以19世纪学者汪士铎（1802—1889年）所做的观察作为起点。他认为，商人与城市的文人一样，似乎是最不倾向反叛的，或者我们可以说，他们表现非常低度的反叛取向。汪士铎在1853—1856年间，因太平天国之乱曾躲藏在长江下游的南京与绩溪之间，这段时期所保存的日记中有如下一段：

天下最愚，最不听教诲、不讲理者乡人。自守其所谓“理”而不改。教以正，则哗然动怒；导以为非为乱，则挺然称首。其间妇人又愚于男子。山民又愚于通涂之民。惟商贾则巧猾而不为乱，山民之读书者不及也。在外经商之人，又文弱于当地之商贾。知四民之中，最易作乱者农。工次之。武生次之。山中之士次之。商贾之士次之。城士之士，则硜硜然可以决其不为乱[51]。

这种议论显然是概括而充满偏见的，但我们或可了解这不完全是处于一个大动乱暴力时代所发的愤激之言。无论如何汪士铎是个相当独立敢言的学者，他不受传统儒家思想的束缚，而且是热心于提倡改革、恢复秩序的人。至少他在这段话里提出一个启发性的见解，就是在传统的四种功能团体中，城市商人与城市文人的反叛取向最低。

进一步说，根据汪士铎的推论做一初步检查，显示其中确有一些历史的真实性。[52]中国历史上曾记载无数次农民叛变，但几乎看不到任何城市商人领导的叛变。从唐宋时代以降，我们看到走私盐商与海盗商人的记载，然而他们行动的范围似限于山林、沼泽、海岛与外海上，有时在他们势力范围内，他们也会打劫城镇，因而可算是城市商人的敌人。在明清时代，关于矿工、伐木者与城市匠人的暴动与罢工事件，也有所闻。

当然，一个社会中叛乱取向的问题，或广泛地说暴力取向的问题，其研讨不一定只限于功能团体。举例来说，这个问题可以就个人或团体从年龄、性别、地位、财富、角色、功能、教育、风俗、传统或是其他的角度来探讨。甚至汪士铎所作的粗略的推论也提到其中几方面。然而，对这个问题更深一步的方法论，却已远超出本文的范围，而且坦白地说，也不是作者能力所及的。为说明城市商人在清初时代抗议与叛变的性质与程度，我们可以看看下面四个例子，他们所谓的“商人与手工业者反抗清朝封建统治的斗争”。[53]这四个例子记述的事实均是“罢市”，就是商人与生意人拒绝做生意以示抗议。

（一）1660年在山西潞安的罢市

这次罢市的背景是源于明代生产御用丝织品称做“皇紬”的制度。在山西潞安做这一行生意的“机户”，必须以固定的官价供应这项货品，而官价显然是经常不足以抵付生产所需的开销。明末清初时代，皇紬年度配额是三千匹（一匹为六丈八尺）。1652年诏令将配额减去一千五百二十匹零四丈八尺，每匹的价格则从十两银子增至十三两。1658年，配额又由一千四百七十九匹零二丈减去一千一百七十九匹零二丈，因此实际上所需要的仅是三百匹。但是到1660年，机户发动一次罢市，据说将其织机焚毁，手里捧着账簿记载着他们的损失，准备向北京城进发，直接向皇帝请愿。

据潞安一位朝廷官员王鼐的奏折，这些机户在明末时原有三千张以上的织机，但大多数都已破产，因为他们得依照政府命令按行服务，所谓“抱牌当行”，结果是他们生产的丝得不到适当的偿付而大受损失。从1644年到1660年，所留存的织机仅两三百张。据奏折所言，皇帝的削减配额，延长限期，先行付款，以及“合理实价”，使得机户争着愿为皇室服务。但是本省官吏的取用以及外省采购使者的要求勒索，却使他们遭受损失。理论上，机户们可以从他们出售的丝得到官价付款，但是经过层层勒索，特别是付差官差役的催紬费、验紬费及纳紬费，实际所余无几。

我们在王鼐的奏折中可看到很生动的描写：“臣乡山西，织造潞紬，上供官府之用，下资小民之生。……为工颇细，获利最微。……今年（1660年）四月，臣乡人来言，各机户焚烧紬机，辞行碎牌，痛苦奔逃，携其赔累簿籍，欲赴京陈告，以艰于路费，中道而阻。天有簿籍，必有取用衙门，有衙门必有取用数目。小民含苦未伸，臣闻不胜骇异。”他接着建议严禁本省不得滥行取用，隔省不许擅差私造。从方志记载中，我们不清楚他的建议采行至何种程度，因为只说到山西巡抚下令立碑严禁。推想大概是，差役与差官不许继续强索，而机户也不许再度罢市。[54]

（二）1660年安徽芜湖的罢市

这次罢市是抗议芜湖内地税关过度的附加税与其他各种名目的勒索。根据阴历十月十三日御史李见龙弹劾户部郎中兼湖钞关监督郑秉衡的奏折，在郑秉衡的指使下，若干名不法的官吏征收额外的火耗与特别捐款用来充实其官邸的维持费用。郑秉衡还发明了“皇税”一词，对民船上装载的日用必需品甚至如薪柴与米都征以税。结果是，全部地区的商民发动罢市三天，以1660年阴历七月十四日为始。本地生员韦譞佩等向总督与巡抚请愿，结果总督命令知县接受商民所具甘结，同意地方人民发动罢市是因为征收薪柴与米的征税。据奏折上说，御史闻知这事是得自于从芜湖到北京来诉苦的商人，因而有关这事的消息传遍京城。[55]

这项弹劾似乎并未发生多大效力，因为罢市的事件已经发生了一年，而且显然已不了了之。对我们来说，有关这次罢市最感兴趣的一点，是其行动的有秩序以及商人与士人间的合作。

（三）1682年浙江杭州的罢市

这次罢市是抗议土棍（地方流氓）与旗丁（八旗兵丁）的高利贷，他们对那些无力偿债的人捉去儿女以为抵偿，有时甚至牵连到负债人的亲戚与邻居。杭州北门的商民发动罢市抗议，这事传到一位同情人民的道台王梁那里。第二天，当王梁去与其他官吏会合调查这件事的途中，八旗兵王和尚等一共几百个人，拦住他的仪仗，辱骂他并打破他轿子的顶盖。这次不寻常的暴动，迫使总督与满洲将军连合上奏向朝廷陈明情况，结果皇帝下诏严厉处罚王和尚及其同谋者。这时候，总督则下令店铺恢复营业。这个例子中特殊的一点，是它说明了在一个征服王朝下政治与经济生活的复杂性。[56]

（四）1698年福建浦城的罢市

下面这段故事主要是根据直隶任邱人、出于书香门第的庞垲的墓志铭而来的。在戊寅年，即1698年（彭泽益误认为1758年），庞垲受命为福建建宁府知府。他到任后不久，传来报告说建宁府所辖的浦城县令，因为政令过于严苛，迫使人民反叛。城中愤怒的百姓趁着黑夜，攻击县府的“册局”，放火烧毁文件与记录，并杀死了一个当值的胥吏。县令害怕逃走，当地人民接着发动一次总罢市。庞垲得知这事，立刻赶到浦城，要求当地的教官与典史召集乡绅、生员与人民在明伦堂集合。在这些人面前，庞垲宣布县令的错误与罪状，并加以谴责，使士绅与人民气平下来。然后，他再提醒他们无法纪行为的不当。他让县府的财务与库房重新核对与收集未被焚的文件。他命令各行生意人恢复营业，城内秩序始告恢复。

在这时，总督郭世隆不满省中百姓攻击县府（称为围城），发动罢市的事件日益增加，想借此用高压手段压制罢市，以为警戒。由于县令与地方士绅间的强烈不睦，总督欲借不法结党、阴谋叛变的罪名惩罚所有的士绅。庞垲反对这个做法，他强调县令残酷作风的不当。最后，只有一名变乱者被处死刑，另二人流放。浦城百姓为感谢庞垲的大力相助，建立一个书院来纪念他。他死于1735年。[57]

显然地在福建省其他城市尚有类似的抗议与罢市的事件。当时的总督郭世隆（1643—1716年）出身山西的绿营。[58]上述故事中的县令是鲍鋐，沈阳人，以前曾任笔帖式（满文bitheshi，即书记官，七品、八品或九品），多半是个旗人。[59]从这件事也可以看出旗人与一般汉人的敌对。

本文选编自《东汉的豪族》

2024-12-14
薛其坤：探究微观量子世界

本文系讲演稿整理而得

欧姆定律是接近200年前，由德国物理学家欧姆提出的一个非常经典的电学规律，它说的是通过一个导体，导体的电阻与加在导体两端的电压差成正比，与流过这个导体的电流成反比。大家都非常熟悉。换一句话来说，流过这个导体的电流正比于加在这个导体两端的电压，反比于这个材料的电阻。这个材料的电阻越大，它越绝缘；在额定的电压下，它的电流就越小。

欧姆定律讲的是沿着电流流动方向关于电压、电阻、电流基本关系的科学规律。我们很好奇，自然就想问“在垂直于电流流动的方向上，是不是也会有类似欧姆定律关于电流、电压、电阻关系的东西呢？”答案：“是！”

这就是欧姆定律提出50多年以后，在1879年由美国物理学家埃德温霍尔发现的霍尔效应。霍尔效应实验是一个非常精妙的实验，他把这个导线变成了这样一个平板，当时用的材料是金。在垂直于这个金的平板方向上，再加一个磁场，当然沿着电流流动的方向仍然有欧姆定律的存在。但是由于这个磁场下，流动的电子受到洛伦兹力的作用，它会在垂直于电流的方向也发生偏转。

在这样一个磁场下，电流除了欧姆定律方向的电流在流动以外，电子还在横向发生偏转，形成电荷的积累，形成电压。这个电压就叫霍尔电压，这个现象就是霍尔效应。加一个磁场就可以产生霍尔效应，那么我们自然想问，是不是不需要磁场也能实现这样一个非常伟大的霍尔效应呢？答案也是“是”！

他发现霍尔效应一年以后，就做了这样一个试验，把材料金换成铁，靠铁本身的磁性产生的磁场，也发现了类似的霍尔效应。因为科学机理完全不一样，命名为反常霍尔效应。

不管怎么样，霍尔效应、反常霍尔效应是非常经典的电磁现象之一。为什么呢？它用一个非常简单的科学实验、科学装置就把电和磁这两个非常不一样的现象在一个装置上完成了。

当然了，霍尔效应非常有用。今天我给大家列举了一些大家非常熟悉的例子。比如测量电流的电流钳，我们读取信用卡的磁卡阅读器，汽车的速度计，这都是霍尔效应的应用。它已经遍布在我们生活的每一个方面，是一个极其伟大的科学发现，同时对我们社会技术进步带来了极大的便利。

这不是这个故事的结束。100年以后，德国物理学家冯·克利青把研究的材料从金属变成半导体硅，结果他就发现了量子霍尔效应，或者说霍尔效应的量子版本。他用了一个具体材料，就是我们熟知的每一个计算机、每一个芯片都有的场效应晶体管。这个场效应晶体管中有硅和二氧化硅的分界面，在这个界面上有二维电子气。就是在这样一个体系中，在半导体材料中，他发现了量子霍尔效应。

在强磁场下，冯·克利青先生发现了霍尔电阻，右边这个公式，=h/ne²，h是以普朗克科学家命名的一个常数，是一个自然界的物理学常数。n是自然数——1、2、3、4、5。e就是一个电子带的电量，这是一个非常伟大的发现。为什么呢？我一说就明白，因为测到的霍尔电阻和研究的材料没有任何的关系。硅，可能任何材料都会有这个，它只和物理学常数，和自然界的一些基本性能相关，和具体材料没有任何关系。因此它就打开了我们认识微观世界、认识自然界的大门。

同时，量子霍尔效应给我们材料中运动的电子建造了一个高速公路，就像左边大家看到的动画一样，电子的高速公路上，它的欧姆电阻，平行于电流方向的电阻变成0，像超导一样。因此，用量子霍尔效应这样的材料做一个器件的话，它的能耗会非常低。

大家今天看到的是两条道的情况，是n=2。如果n=3，这个高速公路的一边就有3条道；如果n=4，电子的高速公路就变成4条道，所以这样一种理解就把自然数n，1、2、3、4、5、6、7、8和微观世界的电子高速公路密切结合起来。大家可以看到，我们对自然界的理解，对量子世界的理解又大大前进了一步。

冯·克利青在1980年发现量子霍尔效应以后，由于这个巨大的科学发现，五年以后他被授予诺贝尔物理学奖。

硅有量子霍尔效应，是不是其他半导体材料也会有量子霍尔效应呢？有三位物理学家在第二年，1982年就把研究的材料从硅变成了可以发光的砷化镓，结果，他们发现了分数化的，不是一二三四了，三分之一、五分之一，分数化的量子霍尔效应，1998年这三位物理学家获得诺贝尔物理学奖。

在我们这个世纪，大家都知道石墨烯，有两位物理学家利用石墨烯这个量子材料继续做一百年前的霍尔效应实验，结果发现了半整数的量子霍尔效应。随着量子霍尔效应的不断发现，我们对自然界，对材料，对量子材料，对未来材料的理解在电子层次上、在量子层次上逐渐加深，所以推动了科学，特别是物理学的巨大进步。

量子霍尔效应有很多应用，今天我讲一个大家比较熟悉的应用，那就是重量的测量。我们每天都希望测测体重，重量的测量无处不存在。1889年国际度量衡大会定义了公斤千克的标准，是9:1的铂铱合金做成的圆柱体，以后的一百多年，全世界都用这个做为标准称重量。

但是在118年以后的2007年，我们发现这个标准变化了：减轻了50微克。一个标准减少50微克是一个巨大的变化，全世界的标准就不再标准了，而且随着时间的推移也会进一步变化。因此我们需要更精确，可以用得更久的重量标准。

在2018年的时候，国际度量衡大会重新定义了公斤的标准，那就是基于刚才我提到的量子霍尔效应，和另一个诺奖工作、约瑟夫森效应提出了一个全新的称，叫量子称或者叫基布尔称，它对重量的测量精度可以达到10的负8次方克，而且是由物理学的自然界常数所定义的，1万年、10万年、1亿年也不会发生变化。这是我举的一个大家能理解的例子。

刚才我提到了三个不同版本的量子霍尔效应。它们需要一个磁场，就像霍尔效应一样，而且一般情况下需要的磁场都特别强，一般是10个特斯拉，10万个高斯，这是非常强大的磁场，我们庞大地球产生的磁场只有0.5高斯，我们要用的磁场是地球磁场强度的20万倍。能不能把它去掉磁场也能观察到量子霍尔效应呢？我带领的团队与合作者一起，在2013年的时候完成了这个实验，在世界上首次发现了不需要任何磁场、只需要材料本身的磁性而导致的量子霍尔效应，或者叫量子反常霍尔效应。

这样一个发现是不是也是材料驱动的呢？是的。我在这里给大家复习一下我们所熟悉的材料。在我们一般人的概念中，我们自然界的材料只有3类，导电的金属，不导电的绝缘体，还有一个是半导体，介于两者之间。

第一代半导体有硅、锗，第二代半导体有砷化镓、锑化汞，第三代、第四代还有氮化镓、碳化硅、金刚石等等。在研究材料和材料的相变基础上，包括量子霍尔效应上，有两个物理学家，一个是大家可能比较熟悉的华人物理学家张首晟，和宾夕法尼亚大学的Charles Kane，在这基础上他们提出了一个全新的材料：拓扑绝缘体，也就是大家在屏幕的最右边所能看到的。

什么是拓扑绝缘体？我给大家简单解释一下。这个图大家可能比较熟悉，最左边是一个陶瓷的碗，是绝缘的、不导电的。再朝右是一个金做成的碗，是导电的，叫导体。拓扑绝缘体就是一个陶瓷碗镀了一层导电的膜。如果把这个镀了膜的碗进一步进行磁性掺杂，使它有磁性的话，它就会变成一个只有边上镀金的碗。这个边上镀金碗就叫磁性拓扑绝缘体材料。

按照张首晟等的理论，它就可以让我们能观察到量子反常霍尔效应。但是，这个材料是一个三不像的矛盾体：它有磁性，它要拓扑，它还要绝缘，我们还要把它做成薄膜，这就要求一个运动员篮球打得像姚明那么好，跑步像博尔特那么快，跳水要全红蝉那么伶俐，这样的材料非常难以制备。为什么呢？因为大部分磁性材料都是导电的，铁、钴、镍都是导电的；另外，磁性和拓扑在物理上是很难共存的；还有一点，在两维薄膜的情况下，很难实现铁磁性，使这个才有真正的磁性。因此真正观测到量子反常霍尔效应，在实验室看到它，这是一个极其具有挑战性的实验。

我带领的团队和另外三个团队紧密合作，我们动员了20多位研究生，奋斗了4年，尝试了一千多个样品，最后在2012年10月份，全部完成了量子反常霍尔效应发现，完成了实验。我们证明了确实在边上镀金的碗（磁性拓扑绝缘体）中，存在量子反常霍尔效应这样一个新的规律。

今天，我特别把当时发现量子反常霍尔效应的样品带到了现场。大家可以看到，看到很多电级，电级之间有方块，每个方块上就是首先观察到的量子反常霍尔效应的样品。

这里我再给大家讲一下制备这个材料，对原子磁场的控制，对科学发现非常重要。这是其中一个例子，我们学生制备的，采集的一些照片。中间大家会看到，拓扑绝缘体碲化铋薄膜的扫描隧道显微镜照片，上头每一个亮点代表一个原子，更重要的是，在这个范围内你找不到一个缺陷。说明我们材料的纯度非常高，我们在其他材料中也能做到这个水平。

这是另一个拓扑绝缘体材料：硒化铋。大家可以看到，这么大的范围内，你只看到你想要的原子，没有任何缺陷，而且薄膜是原子级的平整，这为我们最后发现量子反常霍尔效应奠定了非常好的基础。

最近，我们继续在朝这个方向努力，我们正在攻克的一个问题就是高温超导机理这个重大科学问题。我再次放了博士后制备的研究高温超导机理异质结样品的电镜照片，大家从上可以看到有5个样品，不同的颜色代表这个异质结的结合部。大家可以看到，每个亮点几乎是接近一个原子，我们制备的异质结，两个材料的结合部几乎达到了原子尺度的完美，只有这样，我们才能在这样一个非常难以攻克的高温超导机理上有所作为，我们会沿着这个方向继续努力下去。

2024-12-12
贺雪峰：公私边界与国家权力

一、

2009年暑假到鄂东南宗族性地区调研，发现当地村民组特别重要，因为村民组基本上都是一个房头，十几户到几十户，一个姓，自家人，又是村民组，过去的生产队，村民组组长往往也是由本房头最有威信的中年人担任。房头是私，因为大家都是自己人，一家人。这个私是相对于农民家庭这个“小私”的“大私”。村民小组则是公，是国家划定的基本管理单位，是村委会下设小组，且村民小组长一般要由村委会任命（可以由村民推荐或推选），如果说国家是公的话，国家的公是“大公”，村委会是国家在农村最基层的行政建制，可以算是国家这个“大公”在农村最基层的代理人，则村民组就只是“小公”，是最小的“公”了。也正是在村民组一级，“大私”与“小公”重合，形成了国家与社会有效对接，村民组长对内利用自己人的身份来低成本解决矛盾，达成集体行动，提供超出农户的公共品，对外则代表房头利益，维护房头利益。村民组或房头内的事情都可以自治，国家就可以进行低成本的简约治理。国家不介入村民组或房头内的事务，房头就有自治的空间与动力，房头内就需要且会产生唱黑脸的人，说直话的人，也就具有相当的主体性。

2021年暑假再到鄂东南宗族性地区调研，发现之前的公私边界早已打破，国家这个“大公”一直延伸到农户，之前公私同构的村民组和房头快速弱化，不再具有集体行动能力。国家权力延伸到农户的原因很简单，就是过去需要农户出钱出力建设的村庄公共品，由国家下乡资源来进行建设了，不再需要农民出钱出力了，之前村民组或房头内的公共事务需求没有了，房头退化为一种文化现象和价值倾向，从治理层面返回到社会与文化的层面，也就是从“公”的领域退出而仅保留“私”的领域。

一旦国家权力进入到农户家门口，国家就直接与农户打交道。国家要为农民做好事，上项目，项目落地就要占用农户土地，农户就可能索要超出应得利益的好处。因为不损害其他农户的利益，钉子户索要超额好处就没有心理上、道德上的障碍，也没有舆论上的问题，因为国家好处不得白不得。国家就不得不与钉子户讨价还价，外来工程队就不得不与钉子户死缠死打。一户钉子户获利，其他农户迅即成为钉子户，村庄没有人有理由出来“唱黑脸”、“说直话”，以阻止钉子户效应。这样一来，国家发现好事不好做。

小结一下，过去在国家与农户之间实际上是存在着公私同构的村组（宗族或房头）的，现在国家借资源下乡，将权力直接延伸到农户家门口，不再有公私同构的村组这个缓冲带，之前“大私”范围内部解决的大量细小琐碎事务外溢出来，变成国家事务，由此造成新的治理困境。甚至调研乡镇，有一农户家中老人去世，村干部没有上门帮助，农户就到村部大闹，说村干部为什么不去帮他家处理丧事。而实际上过去办理丧事都是靠房头而不需要村干部帮助的。

二、

鄂东南地区是湖北宗族化程度最高的地区，相对鄂东南来讲，湖北省绝大多数农村宗族早已解体，是我们所说原子化农村，也就是说，作为大私的宗族房头早在建国初期就已经消灭或消失，村组建制都不再是依托宗族房头这样的大私，而是在地缘基础上，通过村社集体来建设地缘共同体。虽然缺少“大私”，人民公社时期“三级所有、队为基础”，生产队是农民共同生产与分配单位，自治程度相当高。分田到户以后，实行村民自治，农民要承担“三提五统”，要分摊共同生产费，国家很少介入到村庄内部事务。村庄自治就必须要将农民组织起来，筹资筹劳，出钱出力，就要依靠积极分子，团结大多数，孤立钉子户，以达成公共工程和公益事业建设上的集体行动。将农民组织起来的最重要办法是召开会议，讲清道理，形成共识，实行农民的自我教育、自我管理和自我服务。当然也要通过民主选举、民主决策、民主管理、民主监督。

简言之，在没有宗族房头的农村，通过村民自治来形成地缘基础上的村社共同体，村社共同体成为国家与农民之间的缓冲地带或联结纽带。

取消农业税后，国家不仅不再向农民收取税费，而且开始大量向农村转移资源，之前主要依靠村民出钱出力建设的村庄公共品，现在都由国家来建设，国家直接将服务延伸到农户家门口，农民再组织起来建设村庄公共品就没有必要。也是因此，之前通过自治来达到的自我教育、自我管理和自我服务就显得多余，村民自治就逐步被村级治理行政化所代替，村干部主要工作就是要完成上级布置的任务，而不是深入群众、动员群众、组织群众，村级组织也就逐步丧失了解决村庄小事的能力，村庄任何一件事情都可以直接上升到国家的权力层面，国家就不得不进一步介入到农户之间甚至农户内部的琐碎事务之中。

三、

国家为农民服务当然是很重要的，但是不应该包办代替，而必须要在国家与农户个体之间建立起一个缓冲性的结构，这个结构无论是小公大私同构的村民组（房头），还是组织起来的村社集体。不将农民组织起来，由国家直接面对一家一户农户，农户之间各种细小琐碎事务将极大地降低治理效率，结果就是好事不好办和好事办不好。

2024-12-08
韩琦：安第斯文明的起源：卡拉尔一苏佩

传统观点认为，南美安第斯文明的母文化是查文·德万塔尔文化（公元前1200—前200年）。但随着考古发掘取得新进展，卡拉尔—苏佩文明取代了前者地位，被认为是安第斯地区的第一个文明，其存在于公元前3000年至前1800年之间，清晰展现着秘鲁中北部地区第一个复杂社会的样貌。

20世纪中期以来，不断有考古学家对卡拉尔—苏佩遗址进行考察和研究，但直到1994年秘鲁圣马尔克斯大学的考古学家露丝·沙迪团队对苏佩河谷进行调查，并随之进行系统考古发掘，学界才对它形成新的认知。随着考古挖掘的深入和新成果的出版发表，卡拉尔—苏佩文明的古老性和重要性最终得到证实。2009年，卡拉尔—苏佩圣城被联合国教科文组织列入《世界遗产名录》。

卡拉尔—苏佩位于秘鲁海岸的中北部地区、利马以北约182公里。秘鲁中北部地区的面积为81497平方公里，包括圣塔、内佩纳、塞钦、库莱布拉斯、瓦尔梅、福塔雷萨等十几个沿海河谷。与其他世界文明中心相比，秘鲁海岸似乎不太可能成为文明发祥地，因为东部安第斯山脉和西部太平洋形成的反气旋作用导致这里极度干旱。然而，该地区有50多条从山脉到大海的河流穿过，利用这种水源发展的灌溉对卡拉尔—苏佩文明的出现发挥了决定性作用。

在众多河谷中，苏佩河谷在文明起源时期脱颖而出，仅在这一小盆地就发现了20多个可以被归属于同一时期的城市定居点，它们几乎都有公共建筑、圆形广场、住宅等，都有用土坯、石头、树干和植物纤维建成的阶梯式金字塔，其中还有雕像、马黛茶杯器、石器、棉纺织品、烧焦的食品及其他用品。从建筑规模看，卡拉尔城最大，城市布局分布有序，纪念性建筑种类繁多。其距离大海23公里，处于苏佩河谷中段的初始部分，被认为是该地区居民点的首都，被称为“圣城”。

早在公元前3000年之前，就有一些家族群体在苏佩河谷定居，他们建立集中居住区，疏干湿地，开辟农田，修建灌溉渠道。公元前3000年至公元前2600年，首都地区的城市定居点不断壮大，定居者们在空地上修建广场用于公共活动，并有了第一个金字塔。大约在公元前2600年至公元前2300年，人们对卡拉尔圣城进行整体设计，修建了金字塔和下沉式圆形广场。在公元前2300年至公元前2100年间，大型金字塔、广场等公共建筑的规模和体积都有所扩大。到公元前2100年至公元前1800年，由于劳动力的减少，定居者们用较小的石块改建公共建筑，最后掩埋了一些重要的建筑部分，卡拉尔圣城被废弃。

总体来看，卡拉尔—苏佩文明的主要特征表现在以下几个方面：

以农业生产、渔业生产和贸易交换为主要经济形式。苏佩河谷的居民发展出技术比较先进的集约化农业。他们使用简单的工具（如木棍和鹿角）来掘土，修建灌溉水渠以将河水引入农田。考古证据表明，他们已经懂得通过对各种植物品种的实验，来改善粮食和经济作物的种类、提高产量。他们种植的作物主要有：土豆、红薯、南瓜、豆类、花生、辣椒、玉米、葫芦、鳄梨、番石榴、马黛茶、烟草等，其中棉花是交易的主要产品。沿海居民则捕鱼并采集各种海洋生物，主要包括凤尾鱼、沙丁鱼、贻贝和蛤蜊等。农业和渔业形成一种长期的经济互补关系。

居民们通过以物易物的方式交换产品。沿海居民提供海产品，如离太平洋仅有500米的居民点阿斯佩罗被认为是卡拉尔的渔镇，那里的居民开发了包括使用钩子、麻线、船等在内的捕鱼技术，特别是发明了棉纤维渔网。渔民负责将海产品分发到河谷中的定居点，而河谷居民会给渔民提供所需的渔网和衣物、用作钓线的棉纤维、用作漂浮物的葫芦、制造船桨的木材以及水果蔬菜等，高地居民会提供农产品（粮食）和畜产品（羊驼）。这样，该区域形成一个类似专业化生产的贸易网络，而卡拉尔圣城无疑是这一网络的中心。很显然，这个网络还延伸到更远的地方，因为在卡拉尔—苏佩地区发现了来自高原的洛克木棒、秃鹰羽毛，亚马逊丛林的陆生蜗牛、灵长类动物皮、各种鸟类羽毛以及厄瓜多尔赤道海岸的多刺牡蛎。

灌溉技术的使用、渔网的发明以及活跃的贸易交换提高了生产力，促进了地区经济发展和生产剩余积累，从而使苏佩社会能够以地方政府的形式加强其政治一体化进程，这种政府形式的有效性可以从国家承担的大型纪念性建筑群建设中得到体现。

先进的城市规划和建筑。卡拉尔圣城拥有复杂的城市布局。该城占地66公顷，包括一个核心区和一个外围区。核心区包括32座公共建筑和一些住宅建筑群，外围区有一些住宅建筑群。核心区又分为两大部分，北部为上城，南部为下城。北部的公共建筑分为A、B、C三组，每组都有两个金字塔、广场、官员住房。其中B组的金字塔最大，长160米，宽150米，高18米，坐北朝南，背靠河谷，面向下沉式圆形广场，是卡拉尔城的主建筑。下城建筑有下沉式露天剧场、露天剧场神庙、长桌神庙、圆形祭坛神庙，以及平民住宅区等。

金字塔结构的墙壁上抹有泥土，被涂成白色或浅黄色，偶尔涂成红色。每座金字塔都有一个通向顶部的中央阶梯，其上有几个房间。在主房间都有一个圣火祭坛，祭坛中央有一个火炉，火炉下方配有导风的地下管道。圣火祭坛具有仪式功能，被用于火化各种祭品。

卡拉尔位于地震活跃区，其建造者使用“希克拉斯”技术，即将石块装在芦苇纤维编织的网格袋中，尺寸和重量各不相同，但非常均匀，有一定的松散度，用它们来支撑挡土墙，填充金字塔。这样，当发生强烈地震时，“希克拉斯”会以有限的方式微动，发挥着柔性地基作用，由此实现建筑物的结构稳定。规模宏大的城市和坚固的建筑表明卡拉尔人已经具有先进的组织能力和工程技术。

社会分层和阶级分化已经出现。卡拉尔—苏佩文明显示出复杂的社会结构，已经出现明显的社会分层。如从事体力劳动的生产者，包括渔民、农民、工匠；精英阶层，包括商人、定居点的领导者和祭司。精英们不再直接为自己的生计进行生产，而是致力于专门的活动，如加强远距离贸易；进行天文观测来测量时间和制定历法；在公共活动的建筑施工中试验和应用算术与几何知识；举行仪式和献祭活动。

考古发现揭示出精英阶层和普通民众存在较为明显的区分。城市中心各区的公共建筑和住宅建筑在位置、大小和所用材料上都有区别；服装穿着方式和个人佩戴的饰品上，如男性权威人士的项链和大耳环，女性的项链和头巾，也体现了社会区别。一些装饰品、项链是用从遥远的地方（如厄瓜多尔海岸）所获材料制作的，专供少数社会上层人物使用。

中央集权的国家雏形已经显露。苏佩河谷的人口分布在苏佩河两岸被称作“帕查卡”的城市定居点中，这些定居点的规模和建筑体量各不相同。每个“帕查卡”都由几个“艾柳”组成，这些“艾柳”是通过亲缘关系结合在一起的族群，拥有相同的祖先，通过祖先来确定身份，并由族长领导。族长中有一个主要首领——库拉卡，负责指挥全体居民。这种政府制度在苏佩河谷20多个城市定居点中运行，由于卡拉尔居于核心地位，它发挥领导和组织其他城市定居点的作用，形成一个广泛而有序的互惠、交流网络。

卡拉尔是一座和平与和谐之城，考古发掘中没有发现战争的痕迹，没有防御城墙，没有武器，没有残缺不全的尸体，这与通过战争产生国家的理论解释有所不同。美国考古学家乔纳森?哈斯认为，卡拉尔人进行了人类建立政府的实验，他们将个人自由交给一个中央集权机构，由中央集权机构决定创建一个作为仪式中心的城市，并要求大家为共同或更大的利益努力工作。人们之所以选择成立“中央政府”，是因为意识到合作将使个人和整个社区受益。考古学家露丝?沙迪认为，对神的崇拜是凝聚力和社会平衡背后的驱动力。人们之所以接受中央集权政府的存在，是因为他们相信统治者可以在人与生者的社会和神与死者的社会之间进行调解，政府的管理对于保证生活是必要的。卡拉尔社会展现出一定的复杂性，种种迹象表明，卡拉尔不仅仅是一个简单的农业社会，而且是一个具有一定组织能力和复杂结构的社会实体，已经具备早期国家的基本要素。

宗教作为意识形态与政治权力相结合。卡拉尔的金字塔、广场和祭坛等雄伟建筑不仅是宗教仪式的场所，也是社会和政治活动的中心。金字塔象征着与天界的联系，广场则是集体仪式和庆典的场所。卡拉尔人信奉多神教，崇拜多种神灵。这些神灵与自然现象、农业、天气和其他重要生活领域有关。祭祀活动在卡拉尔占据重要地位。人们通过圣火祭坛进行各种形式的献祭，包括毛发、珠子、石英碎片、骨器、木器、纺织品、鱼类、贝类等，这些被认为是向神灵表达敬意和请求庇护的方式。统治者和祭司被视为祖先和神灵的代表或中介，他们通过控制宗教仪式、祭祀活动和宗教建筑来巩固自己的权威。卡拉尔的宗教活动是在音乐伴奏中进行的，在这里出土了一套由秃鹰和鹈鹕翼骨制成并绘有鸟类和猴子图案的横笛（共32支），一套由骆马骨和鹿骨制成的号角（共38支），一套由芦苇和棉线制作的排笛。在没有军事力量的情况下，宗教成为卡拉尔统治者凝聚和控制社会的力量，它使卡拉尔—苏佩河谷的居民团结起来。

科技知识在文明发展中发挥重要作用。卡拉尔人开发了先进的农业灌溉系统，修建水渠和水库，这对于他们在干旱环境中维持农业生产至关重要。在设计和建造大型纪念性建筑以及修建灌溉水渠时，显然运用了算术和几何知识。有证据表明，卡拉尔人已经具备天文学知识，并将其应用到与经济、宗教活动有关的历法制定中。在卡拉尔上城C组的公共广场中央竖立着一块巨石，是当时用来观测天文的。他们已经发明一种记录信息的工具系统，如在上城C组的画廊金字塔中，考古学家发现一件纺织品遗物，被认为是“基普”，即用作记录工具的一套打结绳线。同时，在上城B组小金字塔的三个石块上还发现了基普的图画。这说明卡拉尔人已经在使用基普，比印加人早数千年之久。考古学家还发现，一些药用植物多次出现在墓葬中，表明卡拉尔人已经了解一些植物的药用价值。在纺织技术方面，他们利用棉花纤维编织连衣裙，采用穿插和缠绕的方法，还制作了渔网、鞋类、包类、绳索等。圣火祭坛下方建造的地下通风系统，能够引导风力保持火焰燃烧，并将烟雾排到室外。需要指出的是，虽然早在公元前4000年厄瓜多尔的瓦尔迪维亚等地就已经开始生产陶器，但卡拉尔人并没有使用或自己生产陶器。他们用葫芦作为器皿，用木头雕刻勺子，用石头雕制盘子。因此，卡拉尔文明属于“前陶瓷”文明，这一点已被考古学家们认定。

由于强烈地震和灾难性气候变化，卡拉尔—苏佩文明在公元前1800年左右被遗弃。虽然如此，它在农业、城市建筑、社会政治组织、宗教文化等方面对后来安第斯文明的发展产生深远影响。可以说，卡拉尔—苏佩文明是安第斯文明的摇篮。

本文转自《光明日报》2024年11月25日

2024-12-07

JEFFREY DING《Technology and the Rise of Great Powers：HOW DIFFUSION SHAPES ECONOMIC COMPETITION》

CONTENTS
1 Introduction
2 GPT Diffusion Theory
3 The First Industrial Revolution and Britain’s Rise
4 The Second Industrial Revolution and America’s Ascent
5 Japan’s Challenge in the Third Industrial Revolution
6 A Statistical Analysis of Software Engineering Skill Infrastructure and Computerization
7 US-China Competition in AI and the Fourth Industrial Revolution
8 Conclusion

1 Introduction

IN JULY 2018, the BRICS nations (Brazil, Russia, India, China, and South Africa) convened in Johannesburg around a specific, noteworthy theme: “Collaboration for Inclusive Growth and Shared Prosperity in the Fourth Industrial Revolution.” The theme was noteworthy in part because of its specificity. Previous iterations of the BRICS summit, which gathers five nations that account for about 40 percent of the world’s population and 25 percent of the world’s GDP,¹ had tackled fuzzy slogans such as “Stronger Partnership for a Brighter Future” and “Broad Vision, Shared Prosperity.” What stood out not only about that year’s theme but also in comments by BRICS leaders at the summit was an unambiguous conviction that the world was undergoing a momentous season of technological change—one warranting the title “Fourth Industrial Revolution.”²

Throughout the gathering, leaders of these five major emerging economies declared that the ongoing technological transition represented a rare opportunity for accelerating economic growth. When Chinese president Xi Jinping addressed the four other leaders of major emerging economies, he laid out the historical stakes of that belief:

From the mechanization of the first industrial revolution in the 18^th century, to the electrification of the second industrial revolution in the 19^th century, to the informatization of the third industrial revolution in the 20^th century, rounds of disruptive technological innovation have … fundamentally changed the development trajectory of human history.³

Citing recent breakthroughs in cutting-edge technologies like artificial intelligence (AI), Xi proclaimed, “Today, we are experiencing a larger and deeper round of technological revolution and industrial transformation.”⁴

While the BRICS summit did not explicitly address how the Fourth Industrial Revolution could reshape the international economic order, the implications of Xi’s remarks loomed in the backdrop. In the following months, Chinese analysts and scholars expanded upon them, especially the connection he drew between technological disruption and global leadership transitions.⁵ One commentary on Xi’s speech, published on the website of the authoritative Chinese Communist Party publication Study Times, detailed the geopolitical consequences of past technological revolutions: “Britain seized the opportunity of the first industrial revolution and established a world-leading productivity advantage.… After the second industrial revolution, the United States seized the dominance of advanced productivity from Britain.”⁶ In his analysis of Xi’s address, Professor Jin Canrong of Renmin University, an influential Chinese international relations scholar, argued that China has a better chance than the United States of winning the competition over the Fourth Industrial Revolution.⁷

This broad sketch of power transition by way of technological revolution also resonates with US policymakers and leading thinkers. In his first press conference after taking office, President Joe Biden underscored the need to “own the future” as it relates to competition in emerging technologies, pledging that China’s goal to become “the most powerful country in the world” was “not going to happen on [his] watch.”⁸ In 2018, the US Congress stood up the National Security Commission on Artificial Intelligence (NSCAI), an influential body that convened leading government officials, technology experts, and social scientists to study the national security implications of AI. Comparing AI’s possible impact to past technologies like electricity, the NSCAI’s 756-page final report warned that the United States would soon lose its technological leadership to China if it did not adequately prepare for the “AI revolution.”⁹

Caught up in the latest technical advances coming out of Silicon Valley or Beijing’s Zhongguancun, these sweeping narratives disregard the process by which emerging technologies can influence a power transition. How do technological revolutions affect the rise and fall of great powers? Is there a discernible pattern that characterizes how previous industrial revolutions shaped the global balance of power? If such a pattern exists, how would it inform our understanding of the Fourth Industrial Revolution and US-China technological competition?

Conventional Wisdom on Technology-Driven Power Transitions

International relations scholars have long observed the link between disruptive technological breakthroughs and the rise and fall of great powers.¹⁰ At a general level, as Yale historian Paul Kennedy has established, this process involves “differentials in growth rates and technological change, leading to shifts in the global economic balances, which in turn gradually impinge upon the political and military balances.”¹¹ Yet, as is the case with present-day speculation about the effects of new technologies on the US-China power balance, largely missing from the international relations literature is an explanation of how technological change creates the conditions for a great power to leapfrog its rival. Scholars have carefully scrutinized how shifts in economic balances affect global military power and political leadership, but there is a need for further investigation into the very first step of Kennedy’s causal chain: the link between technological change and differentials in long-term growth rates among great powers.¹²

Among studies that do examine the mechanics of how technological change shapes economic power transitions, the standard explanation stresses dominance over critical technological innovations in new, fast-growing industries (“leading sectors”). Britain became the world’s most productive economy, according to this logic, because it was home to new advances that transformed its burgeoning textile industry, such as James Hargreaves’s spinning jenny. In the same vein, Germany’s mastery of major breakthroughs in the chemical industry is seen as pivotal to its subsequent challenge to British economic leadership. Informed by historical analysis, the leading-sector (LS) perspective posits that, during major technological shifts, the global balance of economic power tips toward “the states which were the first to introduce the most important innovations.”¹³

Why do the benefits of leading sectors accrue to certain countries? Explanations vary, but most stress the goodness-of-fit between a nation’s domestic institutions and the demands of disruptive technologies. At a general level, some scholars argue that rising powers quickly adapt to new leading sectors because they are unburdened by the vested interests that have built up in more established powers.¹⁴ Others point to more specific factors, including the degree of government centralization or sectoral governance arrangements.¹⁵ Common to all these perspectives is a focus on the institutions that allow one country to first introduce major breakthroughs in an emerging industry. In the case of Britain’s rise, for example, many influential histories highlight institutions that supported “heroic” inventors.¹⁶ Likewise, accounts of Germany’s success with leading sectors focus on its investments in scientific education and industrial research laboratories.¹⁷

The broad outlines of LS theory exert substantial influence in academic and policymaking circles. Field-defining texts, including works by Robert Gilpin and Paul Kennedy, use the LS model to map out the rise and fall of great powers.¹⁸ In a review of international relations scholarship, Daniel Drezner summarizes their conclusions: “Historically, a great power has acquired hegemon status through a near-monopoly on innovation in leading sectors.”¹⁹

The LS template also informs contemporary discussion of China’s challenge to US technological leadership. In another speech about how China could leverage this new round of industrial revolution to become a “science and technology superpower,” President Xi called for China to develop into “the world’s primary center for science and high ground for innovation.”²⁰ As US policymakers confront China’s growing strength in emerging technologies like AI, they also frame the competition in terms of which country will be able to generate radical advances in new leading sectors.²¹

Who did it first? Which country innovated it first? Presented with technical breakthroughs that inspire astonishment, it is only natural to gravitate toward the moment of initial discovery. When today’s leaders evoke past industrial revolutions, as Xi did in his speech to the BRICS nations, they tap into historical accounts of technological progress that also center the moment of innovation.²² The economist and historian Nathan Rosenberg diagnoses the problem with these innovation-centric perspectives: “Much less attention … if any at all, has been accorded to the rate at which new technologies have been adopted and embedded in the productive process. Indeed the diffusion process has often been assumed out of existence.”²³ Yet, without the humble undertaking of diffusion, even the most extraordinary advances will not matter.

Taking diffusion seriously leads to a different explanation for how technological revolutions affect the rise and fall of great powers. A diffusion-centric framework probes what comes after the hype. Less concerned with which state first introduced major innovations, it instead asks why some states were more successful at adapting and embracing new technologies at scale. As outlined in the next section, this alternative pathway points toward a different set of institutional factors that underpin leadership in times of technological leadership, in particular institutions that widen the base of engineering skills and knowledge linked to foundational technologies.

GPT Diffusion Theory

In September 2020, the Guardian published an opinion piece arguing that humans should not fear new breakthroughs in AI. Noting that “Stephen Hawking has warned that AI could ‘spell the end of the human race,’ ” the article’s “author” contends that “I am here to convince you not to worry. Artificial intelligence will not destroy humans. Believe me.”²⁴ If one came away from this piece with the feeling that the author had a rose-tinted view of the future of AI, it would be a perfectly reasonable judgment. After all, the author was GPT-3, an AI model that can understand and produce humanlike text.

Released earlier that year by OpenAI, a San Francisco–based AI lab, GPT-3 surprised everyone—including its designers—with its versatility. In addition to generating poetry and essays like the Guardian op-ed from scratch, early users demonstrated GPT-3’s impressive capabilities in writing code, translating languages, and building chatbots.²⁵ Six months after its launch, one compilation listed sixty-six unique use cases of GPT-3, which ranged from automatically updating spreadsheets to generating website landing pages.²⁶ Two years later, OpenAI’s acclaimed ChatGPT model, built on an improved version of GPT-3, would set the internet aflame with its wide-ranging capabilities.²⁷

While the name “GPT-3” derives from a class of language models known as “generative pre-trained transformers,” the abbreviation, coincidentally, also speaks to the broader significance of recent breakthroughs in AI: the possible arrival of the next general-purpose technology (GPT). Foundational breakthroughs in the ability of computers to perform tasks that usually require human intelligence have the potential to transform countless industries. Hence, scholars and policymakers often compare advances in AI to electricity, the prototypical GPT.²⁸ As Kevin Kelly, the former editor of WIRED, once put it, “Everything that we formerly electrified we will now cognitize … business plans of the next 10,000 startups are easy to forecast: Take X and add AI.”²⁹

In this book, I argue that patterns in how GPTs diffuse throughout the economy illuminate a novel explanation for how and when technological changes affect power transitions. The emergence of GPTs—fundamental advances that can transform many application sectors—provides an opening for major shifts in economic leadership. Characterized by their scope for continuous improvement, pervasive applicability across the economy, and synergies with other technological advances, GPTs carry an immense potential for boosting productivity.³⁰ Carefully tracking how the various applications of GPTs are adopted across various industries, a process I refer to as “GPT diffusion,” is essential to understanding how technological revolutions disrupt economic power balances.

Based on the experience of past GPTs, this potential productivity boost comes with one notable caveat: the full impact of a GPT manifests only after a gradual process of diffusion into pervasive use.³¹ GPTs demand structural changes across a range of technology systems, which involve complementary innovations, organizational adaptations, and workforce adjustments.³² For example, electrification’s boost to productivity materialized about five decades after the introduction of the first electric dynamo, occurring only after factories had restructured their layouts and there had been interrelated breakthroughs in steam turbines.³³ Fittingly, after the release of GPT-3, OpenAI CEO Sam Altman alluded to this extended trajectory: “The GPT-3 hype is way too much … it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.”³⁴

Informed by historical patterns of GPT diffusion, my explanation for technology-driven power transitions diverges significantly from the standard LS account. Specifically, these two causal mechanisms differ along three key dimensions, which relate to the technological revolution’s impact timeframe, phase of relative advantage, and breadth of growth. First, while the GPT mechanism involves a protracted gestation period between a GPT’s emergence and resulting productivity boosts, the LS mechanism assumes that there is only a brief window during which countries can capture profits in leading sectors. “The greatest marginal stimulation to growth may therefore come early in the sector’s development at the time when the sector itself is expanding rapidly,” William Thompson reasons.³⁵ By contrast, the most pronounced effects on growth arrive late in a GPT’s development.

Second, the GPT and LS mechanisms also assign disparate weights to innovation and diffusion. Technological change involves a phase when the technology is first incubated as a viable commercial application (“innovation”) and a phase when the innovation permeates across a population of potential users (“diffusion”). The LS mechanism is primarily concerned about which country dominates innovation in leading sectors, capturing the accompanying monopoly profits.³⁶ Under the GPT mechanism, successful adaptation to technological revolutions is less about being the first to introduce major innovations and more about effectively adopting GPTs across a wide range of economic sectors.

Third, regarding the breadth of technological transformation and economic growth, the LS mechanism focuses on the contributions of a limited number of leading sectors and new industries to economic growth in a particular period.³⁷ In contrast, GPT-fueled productivity growth is spread across a broad range of industries.³⁸ Dispersed productivity increases from many industries and sectors come from the extension and generalization of localized advances in GPTs.³⁹ Thus, the LS mechanism expects the breadth of growth in a particular period to be concentrated in leading sectors, whereas the GPT mechanism expects technological complementarities to be dispersed across many sectors.

A clearer understanding of the contours of technological change in times of economic power transition informs which institutional variables matter most. If the LS trajectory holds, then the most important institutional endowments and responses are those that support a monopoly on innovation in leading sectors. In the context of skill formation, institutional competencies in science and basic research gain priority. For instance, the conventional explanation of Germany’s industrial rise in the late nineteenth century attributes its technological leadership to investments in industrial research labs and highly skilled chemists. These supported Germany’s dominance of the chemical industry, a key LS of the period.⁴⁰

The impact pathway of GPTs brings another set of institutional complementarities to the fore. GPT diffusion theory highlights the importance of “GPT skill infrastructure”: education and training systems that widen the pool of engineering skills and knowledge linked to a GPT. When widespread adoption of GPTs is the priority, it is ordinary engineers, not heroic inventors, who matter most. Widening the base of engineering skills associated with a GPT cultivates a more interconnected technological system, spurring cross-fertilization between institutions optimized for applied technology and those oriented toward foundational research.⁴¹

Returning to the example of late-nineteenth-century advances in chemicals, GPT diffusion spotlights institutional adjustments that differ from those of the LS mechanism. In a decades-long process, innovations in chemical engineering practices gradually enabled the chemicalization of procedures common to many industries beyond synthetic dyes, which was controlled by Germany. Despite trailing Germany in the capacity to produce elite chemists and frontier chemical research, the United States was more effective at adapting to chemicalization because it first institutionalized the discipline of chemical engineering.⁴²

Of course, since GPT diffusion depends on factors aside from human capital, GPT skill infrastructure represents one of many institutional forces at work. Standards-setting organizations, financing bodies, and the competitiveness of markets can all influence the flow of information between the GPT domain and application sectors.⁴³ Since institutions of skill formation produce impacts that spill over into and complement other institutional arrangements, they comprise the focus of my analysis.⁴⁴

Assessing GPT Diffusion across Industrial Revolutions

To test this argument, I employ a mixed-methods approach that pairs qualitative historical analysis with quantitative methods. Historical case studies permit me to thoroughly trace the interactions between technologies and institutions among great powers in previous industrial revolutions. I then explore the generalizability of GPT diffusion theory beyond the chosen set of great powers. Using data on nineteen countries from 1995 to 2020, I analyze the theorized connection between GPT skill infrastructure in software engineering and computerization rates.

To investigate the causal processes that connect technological changes to economic power transitions, I set the LS mechanism against the GPT diffusion mechanism across three historical case studies: Britain’s rise to preeminence in the First Industrial Revolution (IR-1); America’s and Germany’s overtaking of Britain in the Second Industrial Revolution (IR-2); and Japan’s challenge to America’s technological dominance in the Third Industrial Revolution (IR-3), or what is sometimes called the “information revolution.” This case setup allows for a fair and decisive assessment of the explanatory relevance of GPT diffusion theory in comparison to LS theory. Because the IR-1 and IR-2 function as typical cases where the cause and outcome are clearly present, they are ideal for developing and testing mechanism-based theories.⁴⁵ The IR-3, a deviant case in that a technological revolution is not followed by an economic power transition, provides a different but still useful way to compare the two mechanisms.

The IR-1 (1780–1840) is a paradigmatic case of technology-driven power transition. It is well established that the IR-1’s technological advances propelled Great Britain to unrivaled economic supremacy. As for the specific causal pathway, international relations scholarship tends to attribute Britain’s rise to its monopoly over innovation in cotton textiles and other leading sectors. According to these accounts, Britain’s technological leadership in the IR-1 sprang from its institutional capacity to nurture genius inventors in these sectors. Since the publication of these field-defining works, economic and technology historians have uncovered that the impacts on British industrialization of the two most prominent areas of technological change, cotton textiles and iron, followed different trajectories. Often relying on formal econometric methods to understand the impact of key technologies, these historical accounts question the prevailing narrative of the IR-1.

The IR-2 (1870–1914) supplies another opportunity to pit GPT diffusion theory against the LS account. International relations scholars interpret the IR-2 as a case in which Britain’s rivals challenged its economic leadership because they first introduced significant technological advances in leading sectors. Particular emphasis is placed on Germany’s ability to corner market shares in chemicals, which is linked to its strengths in scientific education and industrial research institutions. More granular data on cross-national differences in engineering education suggest that the U.S. technological advantage rested on the country’s wide base of mechanical engineers. Combined with detailed tracing of the pace and extent of technology adoption during this period, this chapter’s evidence suggests modifications to conventional understandings of the IR-2.

In the IR-3 (1960–2000), fundamental breakthroughs in information and communication technologies presented another opening for a shift in economic leadership. During this period, prominent thinkers warned that Japan’s lead in industries experiencing rapid technological change, including semiconductors and consumer electronics, would threaten U.S. economic leadership. Influential scholars and policymakers advocated for the United States to adopt Japan’s keiretsu system of industrial organization and its aggressive industrial policy approach. Ultimately, Japan’s productivity growth stalled in the 1990s. Given the absence of an economic power transition, the primary function of the IR-3 case therefore is to provide disconfirming evidence of the two explanations. If the components of the LS mechanism were present, then the fact that an economic power transition did not occur would damage the credibility of the LS mechanism. The same condition applies to GPT diffusion theory.

In each of the cases, I follow the same standardized procedures. First, I test three pairs of competing propositions about the key technological trajectories, derived from the different expectations of the LS and GPT mechanisms related to the impact timeframe, phase of relative advantage, and breadth of growth. Then, depending on whether the LS or GPT trajectory better accords with the historical evidence, I analyze the goodness-of-fit between the institutional competencies of leading industrial powers and the prevailing trajectory. For instance, if an industrial revolution is better characterized by the GPT trajectory, then the corresponding case analysis should show that differences in GPT skill infrastructure determine which powers rise and fall. Although I primarily distinguish GPT diffusion theory from the LS model, I also examine alternative factors unique to the particular case, as well as two other prominent explanations of how advanced economies differentially benefit from technological changes (the varieties of capitalism and threat-based approaches).

The historical case analysis supports the explanatory power of the GPT mechanism over the LS mechanism. In all three periods, technological changes affected the rise and fall of great powers in a gradual, decades-long impact pathway that advantaged those that effectively diffused GPTs across a broad range of sectors. Education and training systems that cultivated broad pools of engineering skills proved crucial to GPT diffusion.

Evaluating these two competing explanations requires a clear understanding of the cause and outcome that bracket both the GPT and LS mechanisms. The hypothesized cause is a “technological revolution,” or a period characterized by particularly disruptive technological advances.⁴⁶ Since the shape of technological change is uneven, not all improvements in useful knowledge are relevant for power transitions.⁴⁷ However, some extraordinary clusters of technological breakthroughs, often deemed industrial revolutions by historians, do have ramifications for the rise and fall of great powers.⁴⁸ I am primarily interested in the pathway by which these technological revolutions influence the global distribution of power.

The outcome variable of interest is an economic power transition, in which one great power sustains productivity growth at higher levels than its rivals. The balance of power can shift in many ways; here I focus on relative economic growth rates because they are catalysts for intensifying hegemonic rivalries.⁴⁹ Productivity growth, in particular, determines economic growth over the long run. Unique in its fungibility with other forms of power, sustained economic growth is central to a state’s ability to exert political and military influence. As demonstrated by the outcomes of interstate conflicts between great powers, economic and productive capacity is the foundation of military power.⁵⁰

Lastly, the quantitative analysis supplements the historical case studies by scrutinizing the generalizability of GPT diffusion theory outside of great powers. A key observable implication of my argument is that the rate at which a GPT spreads throughout the economy owes much to that country’s institutional capacity to widen the pool of pertinent engineering skills and knowledge. Using a novel method to estimate the breadth of software engineering education at a cross-national level, I analyze the theorized connection between GPT skill infrastructure and computerization rates across nineteen advanced and emerging economies from 1995 to 2020. I supplement my time-series cross-sectional models with a duration analysis and cross-sectional regressions. Robust to many alternative specifications, my results show that, at least for computing technology, advanced economies that have higher levels of GPT skill infrastructure preside over higher rates of GPT diffusion.

Key Contributions

The book makes several contributions to scholarship on power transitions and the effects of technological change on international politics. First, it puts forward a novel explanation for how and when significant technological breakthroughs generate a power transition in the international system. GPT diffusion theory revises the dominant theory based on leading sectors, which holds significant sway over academic and policymaking circles. By deepening our understanding of how technological revolutions influence shifts in economic leadership, this book also contributes to long-standing debates about the causes of power transitions.⁵¹

Second, the findings of this book bear directly on present-day technological competition between the United States and China. Emphasizing where fundamental breakthroughs are first seeded, the LS template strongly informs not only assessments of the US-China competition for technological leadership but also the ways in which leading policymakers in both countries formulate technology strategies. It is no coincidence that the three cases in this study match the three technological revolutions referenced by Chinese president Xi in his speech on the IR-4 to the BRICS summit.

As chapter 7 explores in detail, GPT diffusion theory suggests that Xi, along with other leading policymakers and thinkers in both the United States and China, has learned the wrong lessons from previous industrial revolutions. If the IR-4 reshapes the economic power balance, the impact will materialize through a protracted period during which a GPT, such as AI, acquires a variety of uses in a wide range of productive processes. GPT skill infrastructure, not the flashy efforts to secure the high ground in innovation, will decide which nation owns the future in the IR-4.

Beyond power transitions, Technology and the Rise of Great Powers serves as a template for studying the politics of emerging technologies. An enduring dilemma is that scholars either assign too much weight to technological change or underestimate the effects of new technologies.⁵² Approaches that emphasize the social shaping of technology neglect that not all technologies are created equal, whereas technologically deterministic approaches discount the influence of political factors on technological development. By first distinguishing GPTs, together with their pattern of diffusion, from other technologies and technological trajectories, and then showing how social and political factors shape the pace and direction of GPT diffusion, my approach demonstrates a middle way forward.

Roadmap for the Book

The book proceeds as follows. Chapter 2 fleshes out the key differences between GPT diffusion theory and the LS-based account, as well as the case analysis procedures and selection strategy that allow me to systematically evaluate these two causal mechanisms. The bulk of the evidence follows in three case studies that trace how technological progress affected economic power transitions in the First, Second, and Third Industrial Revolutions.

The first two case studies, the IR-1 and IR-2, show that a gap in the adoption of GPTs, as opposed to monopoly profits from dominating LS innovations, was the crucial driver of an economic power transition. In both cases, the country that outpaced its industrial rivals made institutional adjustments to cultivate engineering skills related to the key GPT of the period. The IR-1 case, discussed in chapter 3, reveals that Britain was the most successful in fostering a wide pool of machinists who enabled the widespread diffusion of advances in iron machinery. In considering the IR-2 case, chapter 4 highlights how the United States surpassed Britain as the preeminent economic power by fostering a wide base of mechanical engineering talent to spread interchangeable manufacturing methods.

The IR-3 case, presented in chapter 5, demonstrates that technological revolutions do not necessarily always produce an economic power transition. The fact that Japan did not overtake the United States as the economic leader would provide disconfirming evidence of both the LS and GPT mechanisms, if the components of these mechanisms were present. In the case of the LS mechanism, Japan did dominate innovations in the IR-3’s leading sectors, including consumer electronics and semiconductor components. In contrast, the IR-3 does not discredit the GPT mechanism because Japan did not lead the United States in the diffusion of information and communications technology across a wide variety of economic sectors.

Chapter 6 uses large-n quantitative analysis to explore how GPT diffusion applies beyond great powers. Chapter 7 applies the GPT diffusion framework to the implications of modern technological breakthroughs for the US-China power balance. Focusing on AI technology as the next GPT that could transform the international balance of power, I explore the extent to which my findings generalize to the contemporary US-China case. I conclude in chapter 8 by underscoring the broader ramifications of the book.

2 GPT Diffusion Theory

HOW AND WHEN do technological changes affect the rise and fall of great powers? Specifically, how do significant technological breakthroughs result in differential rates of economic growth among great powers? International relations scholars have long observed that rounds of technological revolution lead to upheaval in global economic leadership, bringing about a power transition in the international system. However, few studies explore how this process occurs.

Those that do tend to fixate on the most dramatic aspects of technological change—the eureka moments and first implementations of radical inventions. Consequently, the standard account of technology-driven power transitions stresses a country’s ability to dominate innovation in leading sectors. By exploiting brief windows in which to monopolize profits in new industries, the country that dominates innovation in these sectors rises to become the world’s most productive economy. Explanations vary regarding why the benefits of leading sectors tend to accrue in certain nations. Some scholars argue that national systems of political economy that accommodate rising challengers can more readily accept and support new industries. Leading economies, by contrast, are victims of their past success, burdened by powerful vested interests that resist adaptation to disruptive technologies.¹ Other studies point to more specific institutional factors that account for why some countries monopolize leading sectors, such as the degree of government centralization or industrial governance structures.²

An alternative explanation, based on the diffusion of general-purpose technologies (GPTs), draws attention to the less spectacular process by which fundamental innovations gradually diffuse throughout many industries. The rate and scope of diffusion is particularly relevant for GPTs, which are distinguished by their scope for continual improvement, broad applicability across many sectors, and synergies with other technological advances. Recognized by economists and historians as “engines of growth,” GPTs hold immense potential for boosting productivity.³ Realizing this promise, however, necessitates major structural changes across the technology systems linked to the GPT, including complementary innovations, organizational changes, and an upgrading of technical skills. Thus, GPTs lead to a productivity boost only after a “gradual and protracted process of diffusion into widespread use.”⁴ This is why more than five decades passed before key innovations in electricity, the quintessential GPT, significantly transformed manufacturing productivity.⁵

The process of GPT diffusion illuminates a pathway from technological change to power transition that diverges from the LS account (figure 2.1). Under the GPT mechanism, some great powers sustain economic growth at higher levels than their rivals do because, during a gradual process spanning decades, they more intensively adopt GPTs across a broad range of industries. This is analogous to a marathon run on a wide road. The LS mechanism, in contrast, specifies that one great power rises to economic leadership because it dominates innovations in a limited set of leading sectors and captures the accompanying monopoly profits. This is more like a sprint through a narrow running lane.

Why are some countries more successful at GPT diffusion? Building from scholarship arguing that a nation’s success in adapting to emerging technologies is determined by the fit between its institutions and the demands of evolving technologies, I argue that the GPT diffusion pathway informs the institutional adaptations crucial to success in technological revolutions.⁶ Unlike institutions oriented toward cornering profits in leading sectors, those optimized for GPT diffusion help standardize and spread novel best practices between the GPT sector and application sectors. Education and training systems that widen the base of engineering skills and knowledge linked to new GPTs, or what I call “GPT skill infrastructure,” are essential to all of these institutions.

FIGURE 2.1. Causal Diagrams for LS and GPT Mechanisms

The differences between these two theories of technological change and power transition are made clear when one country excels in the institutional competencies for LS product cycles but does not dominate GPT diffusion. Take, for example, chemical innovations and Germany’s economic rise in the late nineteenth century. Germany dominated major innovations in chemicals and captured nearly 90 percent of all global exports of synthetic dyestuffs.⁷ In line with the LS mechanism, this success was backed by Germany’s investments in building R&D labs and training doctoral students in chemistry, as well as a system of industrial organization that facilitated the rise of three chemical giants.⁸ Yet it was the United States that held an advantage in adopting basic chemical processes across many industries. As expected by GPT diffusion theory, the United States held institutional advantages in widening the base of engineering skills and knowledge necessary for chemicalization on a wide scale.⁹ This is when the ordinary tweakers and the implementers come to the fore, and the star scientists and inventors recede to the background.¹⁰

The rest of this chapter fleshes out my theoretical framework. It first clarifies the outcome I seek to explain: an economic power transition in which one great power becomes the economic leader by sustaining productivity growth at higher levels than its rivals. The starting point of my argument is that the diffusion of GPTs is central to the relationship between technological change and productivity leadership. This chapter explicates this argument by justifying the emphasis on both GPTs and diffusion, highlighting the differences between the GPT and LS mechanisms. It then extends the analysis to the institutional competencies that synergize with GPT trajectories. From the rich set of technology-institution interactions identified by evolutionary economists and comparative institutionalists, I justify my focus on institutions that enable countries to widen the skill base required to spread GPTs across industries. After differentiating my argument from alternative explanations, the chapter closes with a description of the book’s research methodology.¹¹

The Outcome: Long-Term Economic Growth Differentials and Power Transitions

Power transitions are to the international system as earthquakes are to the geological landscape. Shifts in the relative power of leading nations send shock waves throughout the international system. What often follows is conflict, the most devastating form of which is a war waged by coalitions of great powers for hegemony over the globe.¹² Beyond heightened risks of conflict, the aftershocks of power transitions reverberate in the architecture of the international order as victorious powers remake international institutions in their own images.¹³

While the power transition literature largely tackles the consequences of power transitions, I treat the rise and fall of great powers as the outcome to be explained. This follows David Baldwin’s instruction for international relations scholars “to devote more attention to treating power as a dependent variable and less to treating it as an independent variable.”¹⁴ Specifically, I explore the causes of “economic power transitions,” in which one great power sustains economic growth rates at higher levels than its rivals.¹⁵

It might not be obvious, at first glance, why I focus on economic power. After all, power is a multidimensional, contested concept that comes in many other forms. The salience of certain power resources depends on the context in which a country draws upon them to exert influence.¹⁶ For my purposes, differentials in economic growth are the most relevant considerations for intensifying hegemonic rivalry. An extensive literature has demonstrated that changes in relative economic growth often precede hegemonic wars.¹⁷

Moreover, changes in global political and military leadership often follow shifts in economic leadership. As the most fungible mode of power, economic strength undergirds a nation’s influence in global politics and its military capabilities.¹⁸ The outcomes of interstate conflicts bear out that economic and productive capacity is the foundation of military power.¹⁹ Paul Kennedy concludes that

all of the major shifts in the world’s military-power balances have followed alterations in the productive balances … the rising and falling of the various empires and states in the international system has been confirmed by the outcomes of the major Great Power wars, where victory has always gone to the side with the greatest material resources.²⁰

How does one identify if or when an economic power transition occurs? Phrased differently, how many years does a great power need to lead its rivals in economic growth rates? How large does that gap have to reach? Throughout this book, I judge whether an economic power transition has occurred based on one great power attaining a lead in overall economic productivity over its rivals by sustaining higher levels of productivity growth rates.²¹ Productivity growth ensures that efficient and sustainable processes are fueling growth in total economic output. Additionally, productivity is the most important determinant of economic growth in the long run, which is the appropriate time horizon for understanding power transitions. “Productivity isn’t everything, but in the long run it is almost everything,” states Nobel Prize–winning economist Paul Krugman.²²

Alternative conceptualizations of economic power cannot capture how effectively a country translates technological advance into national economic growth. Theories of geo-economics, for instance, highlight a state’s balance of trade in certain technologically advanced industries.²³ Other studies emphasize a state’s share of world-leading firms.²⁴ National rates of innovation, while more inclusive, measure the generation of novel technologies but not diffusion across commercial applications, thereby neglecting the ultimate impact of technological change.²⁵ Compared to these indicators, which account for only a small portion of the value-added activities in the economy, productivity provides a more comprehensive measure of economic leadership.²⁶

This focus on productivity is supported by recent work on power measurement, which questions measures of power resources based on economic size. Without accounting for economic efficiency, solely relying on measures of gross economic and industrial output provides a distorted view of the balance of power, particularly where one side is populous but poor.²⁷ If national power was measured by GDP alone, China was the world’s most powerful country during the first industrial revolution. However, China’s economy was far from the productivity frontier. In fact, as the introduction chapter spotlighted, the view that China fell behind the West because it could not capitalize on productivity-boosting technological breakthroughs is firmly entrenched in the minds of leading Chinese policymakers and thinkers.

Lastly, it is important to clarify that I limit my analysis of productivity differentials to great powers.²⁸ In some measures of productivity, other countries may rank highly or even outrank the countries I study in my cases. In the current period, Switzerland and other countries have higher GDP per capita than the United States; before World War I, Australia was the world leader in productivity, as measured by GDP per labor-hour.²⁹ However, smaller powers like pre–World War I Australia and present-day Switzerland are excluded from my study of economic power transitions, as they lack the baseline economic and population size to be great powers.³⁰

There is no exact line that distinguishes great powers from other countries.³¹ Kennedy’s seminal text The Rise and Fall of the Great Powers, for instance, has been challenged for not providing a precise definition of great power.³² Fortunately, across all the case studies in this book, there is substantial agreement on the great powers of the period. According to one measure of the distribution of power resources, which spans 1816 to 2012 and incorporates both economic size and efficiency, all the countries I study rank among the top six at the beginning of the case.³³

The Diffusion of GPTs

Scholars often gravitate to technological change as the source of a power transition in which the mantle of industrial preeminence changes hands. However, there is less clarity over the process by which technical breakthroughs translate into this power shift among countries at the technological frontier. I argue that the diffusion of GPTs is the key to this mechanism. In this section, I first outline why my theory privileges GPTs over other types of technology. I then set forth why diffusion should be prioritized over other phases of technological change, especially innovation. Finally, I position GPT diffusion theory against the leading sector (LS) model, which is the standard explanation in the international relations literature.

Why GPTs?

Not all technologies are created equal. When assessed on their potential to transform the productivity of nations, some technical advances, such as the electric dynamo, rank higher than others, such as an improved sleeping bag. My theory gives pride of place to GPTs, such as electricity and the steam engine, which have historically generated waves of economy-wide productivity growth.³⁴ Assessed on their own merits alone, even the most transformative technological changes do not tip the scale far enough to significantly affect aggregate economic productivity.³⁵ GPTs are different because their impact on productivity comes from accumulated improvements across a wide range of complementary sectors; that is, they cannot be judged on their own merits alone.

Recognized by economists and economic historians as “engines of growth,” GPTs are defined by three characteristics.³⁶ First, they offer great potential for continual improvement. While all technologies offer some scope for improvement, a GPT “has implicit in it a major research program for improvements, adaptations, and modifications.”³⁷ Second, GPTs acquire pervasiveness. As a GPT evolves, it finds a “wide variety of uses” and a “wide range of uses.”³⁸ The former refers to the diversity of a GPT’s use cases, while the latter alludes to the breadth of industries and individuals using a GPT.³⁹ Third, GPTs have strong technological complementarities. In other words, the benefits from innovations in GPTs come from how other linked technologies are changed in response and cannot be modeled from a mere reduction in the costs of inputs to the existing production function. For example, the overall energy efficiency gains from merely replacing a steam engine with an electric motor were minimal; the major benefits from factory electrification came from electric “unit drive,” which enabled machines to be driven individually by electric motors, and a radical redesign of plants.⁴⁰

Taken together, these characteristics suggest that the full impact of a GPT materializes via an “extended trajectory” that differs from those associated with other technologies. Economic historian Paul David explains:

We can recognize the emergence of an extended trajectory of incremental technical improvements, the gradual and protracted process of diffusion into widespread use, and the confluence with other streams of technological innovation, all of which are interdependent features of the dynamic process through which a general purpose engine acquires a broad domain of specific applications.⁴¹

For example, the first dynamo for industrial application was introduced in the 1870s, but the major boost of electricity to overall manufacturing productivity did not occur until the 1920s. Like other GPT trajectories, electrification required a protracted process of workforce skill adjustments, organizational adaptations, such as changes in factory layout, and complementary innovations like the steam turbine, which enabled central power generation in the form of utilities.⁴² To track the full impact of these engines of growth, one must travel the long roads of their diffusion.

Why Diffusion?

All technological trajectories can be divided into a phase when the technology is incubated and then first introduced as a viable commercial application (“innovation”) and a phase when the innovation spreads through a population of potential users, both nationally and internationally (“diffusion”).⁴³ Recognizing this commonly accepted distinction, other studies of the scientific and technological capabilities of nations primarily focus on innovation.⁴⁴ I depart from other works by giving priority to diffusion, since that is the phase of technological change most significant for GPTs.⁴⁵

Undeniably, the activities and conditions that produce innovation can also spur diffusion.⁴⁶ A firm’s ability to conduct breakthrough R&D does not just create new knowledge but also boosts its capacity to assimilate innovations from external sources (“absorptive capacity”).⁴⁷ Faced with an ever-shifting technological frontier, building competency at producing new innovations gives a firm the requisite prior knowledge for identifying and commercializing external innovations. Other studies extend these insights beyond firms to regional and national systems.⁴⁸ In order to absorb and diffuse technological advances first incubated elsewhere, they argue, nations must invest in a certain level of innovative activities.

This connection between innovation capacity and absorptive capacity could question the GPT mechanism’s attention to diffusion. Possibly, a country’s lead in GPT innovation could also translate directly into a relative advantage in GPT diffusion.⁴⁹ Scholarship on the agglomeration benefits of innovation hot spots, such as Silicon Valley, support this case to some extent. Empirical analyses of patent citations indicate that knowledge spillovers from GPTs tend to cluster within a geographic region.⁵⁰ In the case of electricity, Robert Fox and Anna Guagnini underscore that it was easier for countries with firms at the forefront of electrical innovation to embrace electric power at scale. The interconnections between the “learning by doing” gained on the job in these leading firms and academic labs separated nations in the “fast lane” and “slow lane” of electrification.⁵¹

Being the first to pioneer new technologies could benefit a state’s capacity to absorb and diffuse GPTs, but it is not determinative. A country’s absorptive capacity also depends on many other factors, including institutions for technology transfer, human capital endowments, openness to trade, and information and communication infrastructure.⁵² Sometimes the “advantages of backwardness” allow laggard states to adopt new technologies faster than the states that pioneer such advances.⁵³ In theory and practice, a country’s ability to generate fundamental, new-to-the-world innovations can widely diverge from its ability to diffuse such advances.

This potential divergence is especially relevant for advanced economies, which encompass the great powers that are the subject of this research. Although innovation-centered explanations do well at sorting the advantages of technological breakthroughs to countries at the technological frontier compared to those trying to catch up, they are less effective at differentiating among advanced economies. As supported by a wealth of econometric research, divergences in the long-term economic growth of countries at the technology frontier are shaped more by imitation than innovation.⁵⁴ These advanced countries have firms that can quickly copy or license innovations; first mover advantages from innovations are thus limited even in industries, like pharmaceuticals, that enforce intellectual property rights most strictly.⁵⁵ Nevertheless, advanced countries that are evenly matched in their capacity for radical innovation can undertake vastly different growth trajectories in the wake of technological revolutions. Differences in diffusion pathways are central to explaining this puzzle.

This diffusion-centered approach is especially well suited for GPTs. Since GPTs entail gradual evolution into widespread use, there is a longer window for competitors to adopt GPTs more intensively than the leading innovation center. In other technologies, first-mover benefits from pioneering initial breakthroughs are more significant. For instance, leadership in the innovation of electric power technologies was fiercely contested among the industrial powers. The United States, Germany, Great Britain, and France all built their first central power stations within a span of three years (1882–1884), their first electric trams within a span of nine years (1887–1896), and their first three-phase AC power systems within a span of eight years (1891–1899).⁵⁶ However, the United States clearly led in the diffusion of these systems: by 1912, its electricity production per capita had more than doubled that of Germany, its closest competitor.⁵⁷ Thus, while most countries at the technological frontier will be able to compete in the production and innovation of GPTs, the hardest hurdles in the GPT trajectory are in the diffusion phase.

GPT Diffusion and LS Product Cycles

GPT diffusion challenges the LS-based account of how technological change drives power transitions. The standard explanation in the international relations literature emphasizes a country’s dominance in leading sectors, defined as new industries that experience rapid growth on the back of new technologies.⁵⁸ Cotton textiles, steel, chemicals, and the automobile industry form a “classic sequence” of “great leading sectors,” developed initially by economist Walt Rostow and later adapted by political scientists.⁵⁹ Under the LS mechanism, a country’s ability to maintain a monopoly on innovation in these emerging industries determines the rise and fall of lead economies.⁶⁰

This model of technological change and power transition builds on the international product life cycle, a concept pioneered by Raymond Vernon. Constructed to explain patterns of international trade, the cycle begins with a product innovation and subsequent sales growth in the domestic market. Once the domestic market is saturated, the new product is exported to foreign markets. Over time, production shifts to these markets, as the original innovating country loses its comparative advantage.⁶¹

LS-based studies frequently invoke the product cycle model.⁶² Analyzing the effects of leading sectors on the structure of the international system, Gilpin states, “Every state, rightly, or wrongly, wants to be as close as possible to the innovative end of ‘the product cycle.’ ”⁶³ One scholar described Gilpin’s US Power and the Multinational Corporation, one of the first texts that outlines the LS mechanism, as “[having] drawn on the concept of the product cycle, expanded it into the concept of the growth and decline of entire national economies, and analyzed the relations between this economic cycle, national power, and international politics.”⁶⁴

The product cycle’s assumptions illuminate the differences between the GPT and LS mechanisms along three key dimensions. In the first stage of the product cycle, a firm generates the initial product innovation and profits from sales in the domestic market before saturation. Extending this model to national economies, the LS mechanism emphasizes the clustering of LS innovations and attendant monopoly profits in a single nation.⁶⁵ “The extent of national success that we have in mind is of the fairly extreme sort,” write George Modelski and William Thompson. “One national economy literally dominates the leading sector during its phase of high growth and is the primary beneficiary of the immediate profits.”⁶⁶ The GPT trajectory, in contrast, places more value on where technologies are diffused than where an innovation is first pioneered.⁶⁷ I refer to this dimension as the “phase of relative advantage.”

In the next stage, the product innovation spreads to global markets and the technology gradually diffuses to foreign competitors. Monopoly rents associated with a product innovation dissipate as production becomes routinized and transfers to other countries.⁶⁸ Mirroring this logic, Modelski and Thompson write, “[Leading sectors] bestow the benefits of monopoly profits on the pioneer until diffusion and imitation transform industries that were once considered radically innovative into fairly routine and widespread components of the world economy.”⁶⁹ Thompson also states that “the sector’s impact on growth tends to be disproportionate in its early stages of development.”⁷⁰

The GPT trajectory assumes a different impact timeframe. The more wide-ranging the potential applications of a technology, the longer the lag between its initial emergence and its ultimate economic impact. This explains why the envisioned transformative impact of GPTs does not appear straightaway in the productivity statistics.⁷¹ Time for complementary innovations, organizational restructuring, and institutional adjustments such as human capital formation is needed before the full impact of a GPT can be known. It is precisely the period when diffusion transforms radical innovations into routine components of the economy—the stage when the causal effects of leading sectors are expected to dissipate—that generates the productivity gap between nations.

The product cycle also reveals differences between the LS and GPT mechanisms regarding the “breadth of growth.” Like the product cycle’s focus on an innovation’s life cycle within a singular industry, the LS mechanism emphasizes the contributions of a limited number of new industries to economic growth in a particular period. GPT-fueled productivity growth, on the other hand, is dispersed across a broad range of industries.⁷² Table 2.1 specifies how LS product cycles differ from GPT diffusion along the three dimensions outlined here. As the following section will show, the differences in these two technological trajectories shape the institutional factors that are most important for national success in adapting to periods of technological revolution.

Table 2.1 Two Mechanisms of Technological Change and Power Transitions

Mechanisms	Impact Timeframe	Phase of Relative Advantage	Breadth of Growth	Institutional Complements
LS product cycles	Lopsided in early stages	Monopoly on innovation	Concentrated	Deepen skill base in LS innovations
GPT diffusion	Lopsided in later stages	Edge in diffusion	Dispersed	Widen skill base in spreading GPTs

While I have highlighted the differences between GPT diffusion and LS product cycles, it is important to recognize that there are similarities between the two pathways.⁷³ Some scholars, for example, associate leading sectors with broad spillovers across economic sectors.⁷⁴ In addition, lists of leading sectors and lists of GPTs sometimes overlap, as evidenced by the fact that electricity is a consensus inclusion on both lists. Moreover, both explanations begin with the same premise: to fully uncover the dynamics of technology-driven power transitions, it is essential to specify which new technologies are the key drivers of economic growth in a particular time window.⁷⁵

At the same time, these resemblances should not be overstated. Many classic leading sectors do not have general-purpose applications. For instance, cotton textiles and automobiles both feature on Rostow’s series of leading sectors, and they are studied as leading sectors because each has “been the largest industry for several major industrial nations in the West at one time or another.”⁷⁶ Although these were certainly both fast-growing large industries, the underlying technological advances do not fulfill the characteristics of GPTs. In addition, many of the GPTs I examine do not qualify as leading sectors. The machine tool industry in the mid-nineteenth century, for instance, was not a new industry, and it was never even close to being the largest industry in any of the major economies. Most importantly, though the GPT and LS mechanisms sometimes point to similar technological changes, they present very different understandings of how revolutionary technologies bring about an economic power transition. As the next section reveals, these differences also map onto varied institutional adaptations.

GPT Skill Infrastructure

New technologies agitate existing institutional patterns.⁷⁷ They appeal for government support, generate new collective interests in the form of technical societies, and induce organizations to train people in relevant fields. If institutional environments are slow or fail to adapt, the development of emerging technologies is hindered. As Gilpin articulates, a nation’s technological fitness is rooted in the “extent of the congruence” between its institutions and the demands of evolving technologies.⁷⁸ This approach is rooted in a rich tradition of work on the coevolution of technology and institutions.⁷⁹

Understanding the demands of GPTs helps filter which institutional factors are most salient for how technological revolutions bring about economic power transitions. Which institutional factors dictate disparities in GPT adoption among great powers? Specifically, I emphasize the role of education and training systems that broaden the base of engineering skills linked to a particular GPT. This set of institutions, which I call “GPT skill infrastructure,” is most crucial for facilitating the widespread adoption of a GPT.

To be sure, GPT diffusion is dependent on institutional adjustments beyond GPT skill infrastructure. Intellectual property regimes, industrial relations, financial institutions, and other institutional factors could affect GPT diffusion. Probing inter-industry differences in technology adoption, some studies find that less concentrated industry structures are positively linked to GPT adoption.⁸⁰ I limit my analysis to institutions of skill formation because their effects permeate other institutional arrangements.⁸¹ GPT skill infrastructure provides a useful indicator for other institutions that standardize and spread the novel best practices associated with GPTs.⁸²

It should also be noted that the institutional approach is one of three main categories of explanation for cross-country differences in economic performance over the long term.⁸³ Other studies document the importance of geography and culture to persistent cross-country income differences.⁸⁴ I prioritize institutional explanations for two reasons. First, natural experiments from certain historical settings, in which institutional divergence occurs but geographical and cultural factors are held constant, suggest that institutional differences are particularly influential sources of long-term economic growth differentials.⁸⁵ Second, since LS-based accounts of power transitions also prioritize institutional adaptations to technological change, my approach provides a more level test of GPT diffusion against the standard explanation.⁸⁶

One final note about limits to my argument’s scope. I do not investigate the deeper origins of why some countries are more effective than others at developing GPT skill infrastructure. Possibly, the intensity of political competition and the inclusiveness of political institutions influence the development of skill formation institutions.⁸⁷ Other fruitful lines of inquiry stress the importance of government capacity to make intertemporal bargains and adopt long time horizons in making technology investments.⁸⁸ It is worth noting that a necessary first step to productively exploring these underlying causes is to establish which types of technological trajectories and institutional adaptations are at work. For instance, LS product cycles may be more closely linked to mercantilist or state capitalist approaches that favor narrow interest groups, whereas, political systems that incorporate a broader group of stakeholders may better accommodate GPT diffusion pathways.

Institutions Fit for GPT Diffusion

If GPTs drive economic power transitions, which institutions fit best with their demands? Institutional adaptations for GPT diffusion must solve two problems. First, since the economic benefits of GPTs materialize through improvements across a broad range of industries, capturing these benefits requires extensive coordination between the GPT sector and numerous application sectors. Given the sheer scope of potential applications, it is infeasible for firms in the GPT sector to commercialize the technology on their own, as the necessary complementary assets are embedded with different firms and industries.⁸⁹ In the AI domain, as one example of a potential GPT, firms that develop general machine learning algorithms will not have access to all the industry-specific data needed to fine-tune those algorithms to particular application scenarios. Thus, coordination between the GPT sector and other organizations that provide complementary capital and skills, such as academia and competitor firms, is crucial. In contrast, for technologies that are not general-purpose, this type of coordination is less conducive and could even be detrimental to a nation’s competitive advantage, as the innovating firm could leak its technical secrets.⁹⁰

Second, GPTs pose demanding conditions for human capital adjustments. In describing the connection between skill formation and technological fitness, scholars often delineate between general skills and industry-specific skills. According to this perspective, skill formation institutions that optimize for the former are more conducive to technological domains characterized by radical innovation, while institutions that optimize for the latter are more favorable for domains marked by incremental innovation.⁹¹ GPT diffusion entails both types of skill formation. The skills must be specific to a rapidly changing GPT domain but also broad enough to enable a GPT’s advance across many industries.⁹² Strong linkages between R&D-intensive organizations at the technological frontier and application areas far from the frontier also play a key role in GPT diffusion. This draws attention to the interactions between researchers who produce innovations and technicians who help absorb them into specific contexts.⁹³

Education and training systems that foster relevant engineering skills for a GPT, or what I call GPT skill infrastructure, address both constraints. Engineering talent fulfills the need for skills that are rooted in a GPT yet sufficiently flexible to implement GPT advances in a wide range of sectors. Broadening the base of engineering knowledge also helps standardize best practices with GPTs, thereby coordinating information flows between the GPT sector and application sectors. Standardization fosters GPT diffusion by committing application sectors to specific technological trajectories and encouraging complementary innovations.⁹⁴ This unlocks the horizontal spillovers associated with GPTs.⁹⁵

Indeed, distinct engineering specializations have emerged in the wake of a new GPT. New disciplines, such as chemical engineering and electrical engineering, have proved essential in widening knowledge bases in the wake of a new GPT.⁹⁶ Computer science, another engineering-oriented field, was central to US leadership in the information revolution.⁹⁷ These professions developed alongside new technical societies—ranging from the American Society of Mechanical Engineers to the Internet Engineering Task Force—that formulated and disseminated guidelines and benchmarks for GPT development.⁹⁸

Clearly, the features of GPT skill infrastructure have changed over time. Whereas informal associations systematized the skills crucial for mechanization in the eighteenth century, formal higher education has become increasingly integral to computerization in the twenty-first century.⁹⁹ Some evidence suggests that computers and other technologies are skill-biased, in the sense that they favor workers with more years of schooling.¹⁰⁰ These trends complicate but do not undercut the concept of GPT skill infrastructure. Regardless of the extent of formal training, all configurations of GPT skill infrastructure perform the same function: to widen the pool of engineering skills and knowledge associated with a GPT. This can take place in universities as well as in informal associations, provided these settings train engineers and facilitate the flow of engineering knowledge between knowledge-creation centers and application sectors.¹⁰¹

Which Institutions Matter?

The institutional competencies for exploiting LS product cycles are different. Historical analysis informed by this frame highlights heroic inventors like James Watt and pioneering research labs at large companies.¹⁰² Studying which countries benefited most from emerging technologies over the past two centuries, Herbert Kitschelt prioritizes the match between the properties of new technologies and sectoral governance structures. Under his framework, for example, tightly coupled technological systems with high causal complexity, such as nuclear power systems and aerospace platforms, are more likely to flourish in countries that allow for extensive state support.¹⁰³ In other studies, the key institutional factors behind success in LS product cycles are education systems that subsidize scientific training and R&D facilities in new industries.¹⁰⁴

These approaches equate technological leadership with a state’s success in capturing market shares and monopoly profits in new industries.¹⁰⁵ In short, they use LS product cycles as the filter for which institutional variables matter. Existing scholarship lacks an institutional explanation for why some great powers are more successful at GPT diffusion.

Competing interpretations of technological leadership in chemicals during the late nineteenth century crystallize these differences. Based on the LS template, the standard account accredits Germany’s dominance in the chemical industry—as represented by its control over 90 percent of global production of synthetic dyes—to its investments in scientific research and highly skilled chemists.¹⁰⁶ Germany’s dynamism in this leading sector is taken to explain its overall industrial dominance.¹⁰⁷

GPT diffusion spotlights a different relationship between technological change and institutional adaptation. The focus turns toward institutions that complemented the extension of chemical processes to a wide range of industries beyond synthetic dye, such as food production, metals, and textiles. Under the GPT mechanism, the United States, not Germany, achieved leadership in chemicals because it first institutionalized chemical engineering as a discipline. Despite its disadvantages in synthetic dye production and chemical research, the United States was more effective in broadening the base of chemical engineering talent and coordinating information flows between fundamental breakthroughs and industrial applications.¹⁰⁸

It is important to note that some parts of the GPT and LS mechanisms can coexist without conflict. A state’s capacity to pioneer new technologies can correlate with its capacity to absorb and diffuse GPTs. Countries that are home to cutting-edge R&D infrastructure may also be fertile ground for education systems that widen the pool of GPT-linked engineering skills. However, these aspects of the LS mechanism are not necessary for the GPT mechanism to operate. In accordance with GPT diffusion theory, a state can capitalize on GPTs to become the most powerful economy without monopolizing LS innovation.

Moreover, other dimensions of these two mechanisms directly conflict. When it comes to impact timeframe and breadth of growth, the GPT and LS mechanisms advance opposing expectations. Institutions suited for GPT diffusion can diverge from those optimized for creating new-to-the-world innovations. Research on human capital and long-term growth separates the effects of engineering capacity, which is commonly tied to adoptive activities, and other forms of human capital that are more often connected to inventive activities.¹⁰⁹ This divergence can also be seen in debates over the effects of competition on technological activity. On the one hand, Joseph Schumpeter and others have argued that monopoly structures incentivize more R&D activity because the monopolists can appropriate all the gains from technological innovation.¹¹⁰ On the other hand, empirical work demonstrates that more competitive market structures increase the rate of technological adoption across firms.¹¹¹ Thus, while there is some overlap between these two mechanisms, they can still be set against each other in a way that improves our understanding of technological revolutions and power transitions.

This theoretical framework differs from related work on the political economy of technological change.¹¹² Scholars attribute the international competitiveness of nations to broader institutional contexts, including democracy, national innovation systems, and property rights enforcement.¹¹³ Since this book is limited to the study of shifts in productivity leadership at the technological frontier, many of these factors, such as those related to basic infrastructure and property rights, will not explain differences among technologically advanced nations.

In addition, most of the institutional theories put forth to explain the productivity of nations are technology-agnostic, in that they treat all forms of technological change equally. To borrow language from a former chairman of the US Council of Economic Advisers, they do not differentiate between an innovation in potato chips and an innovation in microchips.¹¹⁴ In contrast, I am specific about GPTs as the sources of shifts in competitiveness at the technological frontier.

Other theories identify key technologies but leave institutional factors at a high level of abstraction. Some scholars, for instance, posit that the lead economy’s monopoly on leading-sector innovation eventually erodes because of “ubiquitous institutional rigidities.”¹¹⁵ Unencumbered by the vested interests that resist disruptive technologies, rising challengers inevitably overtake established powers. Because these explanations are underspecified, they cannot account for cases where rich economies expand their lead or where poorer countries do not catch up.¹¹⁶

When interpreting great power competition at the technological frontier, adjudicating between the GPT and LS mechanisms represents a choice between two different visions. The latter prioritizes being the first country to introduce novel technologies, whereas the former places more value on disseminating and transforming innovations after their inception. In sum, industrial competition among great powers is not a sprint to determine which one can create the most brilliant Silicon Valley; it is a marathon won by the country that can cultivate the closest connections between its Silicon Valleys and its Iowa Citys.

Alternative Explanations

Although I primarily set GPT diffusion theory against the LS model, I also consider two other prominent explanations that make specific claims about how technological breakthroughs differentially advantage leading economies. Crucially, these two lines of thinking could account for differences in GPT diffusion, nullifying the import of GPT skill infrastructure.

Threat-Based Arguments

According to one school of thought, international security threats motivate states to invest in science and technology.¹¹⁷ When confronted with more threatening geopolitical landscapes, states are more incentivized to break down status quo interests and build institutions conducive to technological innovation.¹¹⁸ Militaries hold outsized influence in these accounts. For example, Vernon Ruttan argues that military investment, mobilized against war or the threat of war, fueled commercial advances in six technologies designated as GPTs.¹¹⁹ Studies of the success of the United States and Japan with emerging technologies also stress interconnections between military and civilian technological development.¹²⁰ I group these related arguments under the category of threat-based theories.

Related explanations link technological progress with the balance of external threats and domestic roadblocks. Mark Taylor’s “creative insecurity” theory describes how external economic and military pressures permit governments to break from status quo interest groups and promote technological innovation. He argues that the difference between a nation’s external threats and its internal rivalries determines its propensity for innovation: the greater the difference, the greater the national innovation rate.¹²¹ Similarly, “systemic vulnerability” theory emphasizes the influence of external security and domestic pressures on the will of leaders to invest in institutions conducive to innovation, as well as the effect of “veto players” on their ability to do so.¹²²

Certainly, external threats could impel states to invest more in GPTs, and military investment can help bring forth new GPTs; however, there are several issues with adapting threat-based theories to explain differences in GPT diffusion across great powers. First, threat-based arguments tend to focus on the initial incubation of GPTs, as opposed to the gradual spread of GPTs throughout a national economy. During the latter phase, a great deal of evidence suggests that civilian and military needs can greatly conflict.¹²³ Besides, some GPTs, such as electricity in the United States, developed without substantial military investment. Since other civilian institutions could fill in as strong sources of demand for GPTs, military procurement may not be necessary for spurring GPT diffusion. Institutional adjustments to GPTs therefore can be motivated by factors other than threats. Ultimately, to further probe these points of clash, the impact of security threats and military investment must be traced within the historical cases.

Varieties of Capitalism

The “varieties of capitalism” (VoC) explanation highlights differences among developed democracies in labor markets, industrial organization, and interfirm relations and separates them into coordinated market economies (CMEs) and liberal market economies (LMEs). VoC scholars argue that CMEs are more suited for incremental innovations because their thick intercorporate networks and protected labor markets favor gradual adoption of new technological advances. LMEs, in contrast, are more adept at radical innovation because their fluid labor markets and corporate organization make it easier for firms to reorganize themselves around disruptive technologies. Most relevant to GPT diffusion theory, VoC scholars argue that LMEs incentivize workers to acquire general skills, which are more accommodative to radical innovation, whereas CMEs support industry-specific training, which is more favorable for incremental innovation.¹²⁴

It is possible that differences between market-based capitalism and strategically coordinated capitalism account for GPT diffusion gaps between nations. Based on the expectations of the VoC approach, LMEs should be more likely to generate innovations with the potential to become GPTs, and workers in LMEs should possess more general skills that could spread GPTs across firms.¹²⁵ Examining the pattern of innovation during the information revolution, scholars find that the United States, an LME, concentrated on industries experiencing radical innovation, such as semiconductors and telecommunications, while Germany, a CME, specialized in domains characterized by incremental innovation, such as mechanical engineering and transport.¹²⁶

Despite bringing vital attention to the diversity of skill formation institutions, VoC theory’s dichotomy between general and industry-specific skills does not dovetail with the skills demanded by specific GPTs.¹²⁷ Cutting across this sometimes arbitrary distinction, the engineering skills highlighted in GPT diffusion theory are specific to a fast-evolving GPT field and general enough to transfer ideas from the GPT sector across various sectors. Software engineering skills, for instance, are portable across multiple industries, but their reach is not as ubiquitous as critical thinking skills or mathematics knowledge. To address similar gaps in skill classifications, many political economists have appealed for “a more fine-grained analysis of cross-national differences in the particular mix of jobs and qualifications that characterize different political economies.”¹²⁸ In line with this move, GPT skill infrastructure stands in for institutions that supply the particular mix of jobs and qualifications for enabling GPT diffusion. The empirical analysis provides an opportunity to examine whether this approach should be preferred to the VoC explanation for understanding technology-driven power transitions.

Research Methodology

My evaluation of the GPT and LS mechanisms primarily relies on historical case studies, which allow for detailed exploration of the causal processes that connect technological change to economic power transitions. Employing congruence-analysis techniques, I select cases and assess the historical evidence in a way that ensures a fair and rigorous test of the relative explanatory power of the two mechanisms.¹²⁹ This sets up a “three-cornered fight” among GPT diffusion theory, the rival LS-based explanation, and the set of empirical information.¹³⁰

The universe of cases most useful for assessing the GPT and LS mechanisms are technological revolutions (cause) that produced an economic power transition (outcome) in the industrial period. Following guidance on testing competing mechanisms that prioritize typical cases where the cause and outcome are clearly present, I investigate the First Industrial Revolution (IR-1) and the Second Industrial Revolution (IR-2).¹³¹ Both cases featured clusters of disruptive technological advances, highlighted by some studies as “technological revolutions” or “technology waves.”¹³² They also saw economic power transitions, when one great power sustained growth rates at substantially higher levels than its rivals.¹³³ I also study Japan’s challenge to American economic leadership—which ultimately failed—in the Third Industrial Revolution (IR-3). This deviant case can disconfirm mechanisms and help explain why they break down.¹³⁴

These cases are highly crucial and relevant for testing the GPT mechanism against the LS mechanism. All three cases favor the latter in terms of background conditions and existing theoretical explanations. Scholarship has attributed shifts in economic power during this period to the rise of new leading sectors.¹³⁵ Thus, if the empirical results support the GPT mechanism, then my findings would suggest a need for major modifications to our understanding of how technological revolutions affect the rise and fall of great powers. The qualitative analysis appendix provides further details on case selection, including the universe of cases, the justification for these cases as “most likely cases” for the LS mechanism, and relevant scope conditions.¹³⁶

This overall approach adapts the methodology of process-tracing, often conducted at the individual or micro level, to macro-level mechanisms that involve structural factors and evolutionary interactions.¹³⁷ Existing scholarship on diffusion mechanisms, which the GPT mechanism builds from, emphasizes the influence of macro-level processes. In these accounts the diffusion trajectory depends not just on the overall distribution of individual-level receptivity but also on structural and institutional features, such as the degree of interconnectedness in a population.¹³⁸ This approach aligns with a view of mechanistic thinking that allows for mechanisms to be set at different levels of abstraction.¹³⁹ As Tulia Falleti and Julia Lynch point out, “Micro-level mechanisms are no more fundamental than macro-level ones.”¹⁴⁰

To judge the explanatory strength of the LS and GPT mechanisms, I employ within-case congruence tests and process-tracing principles to evaluate the predictions of the two theoretical approaches against the empirical record.¹⁴¹ In each historical case, I first trace how leading sectors and GPTs developed in the major economies, paying particular attention to adoption timeframes, the technological phase of relative advantage, and the breadth of growth—three dimensions that differentiate GPT diffusion from LS product cycles.¹⁴²

For example, my assessment of the two mechanisms along the impact time-frame dimension follows consistent procedures in each case.¹⁴³ To evaluate when certain technologies were most influential, I establish when they initially emerged (based on dates of key breakthroughs), when their associated industries were growing fastest, and when they diffused across a wide range of application sectors. When data are available, I estimate a GPT’s initial arrival date by also factoring in the point at which it reached a 1 percent adoption rate in the median sector.¹⁴⁴ Industry growth rates, diffusion curves, and output trends all help measure the timeline along which technological breakthroughs substantially influenced the overall economy. The growth trajectory of each candidate GPT and LS is then set against a detailed timeline of when a major shift in productivity leadership occurs.

I then turn to the institutional factors that could explain why some countries were more successful in adapting to a technological revolution, with a focus on the institutions best suited to the demands of GPTs and leading sectors.¹⁴⁵ If the GPT mechanism is operative, the state that attains economic leadership should have an advantage in institutions that broaden the base of engineering human capital and spread best practices linked to GPTs. Additional evidence of the GPT diffusion theory’s explanatory power would be that other countries had advantages in institutions that complement LS product cycles, such as scientific research infrastructure and sectoral governance structures.

These evaluation procedures are effective because I have organized the competing mechanisms “so that they are composed of the same number of diametrically opposite parts with observable implications that rule each other out.”¹⁴⁶ This allows for evidence in favor of one explanation to be doubly decisive in that it also undermines the competing theory.¹⁴⁷ In sum, each case study is structured around investigating a set of four standardized questions that correspond to the three dimensions of the LS and GPT mechanisms as well as the institutional complements to technological trajectories (table 2.2).¹⁴⁸

table 2.2 Testable Propositions of the LS and GPT Mechanisms

Dimensions	Key Questions	LS Propositions	GPT Propositions
Impact timeframe	When do revolutionary technologies make their greatest marginal impact on the economic balance of power?	New industries make their greatest impact on growth differentials in early stages.	GPTs do not make a significant impact on growth differentials until multiple decades after emergence.
Key phase of relative advantage	Do monopoly profits from innovation or benefits from more successful diffusion drive growth differentials?	A state’s monopoly on innovation in leading sectors propels it to economic leadership.	A state’s success in widespread adoption of GPTs propels it to economic leadership.
Breadth of growth	What is the breadth of technology-driven growth?	Technological advances concentrated in a few leading sectors drive growth.	Technological advances dispersed across a broad range of GPT-linked industries drive growth.
Institutional complements	Which types of institutions are most advantageous for national success in technological revolutions?	Key institutional adaptations help a state capture market shares and monopoly profits in new industries.	Key institutional adaptations widen the base of engineering skills and knowledge for GPT diffusion.

In each case study, I consider alternative theories of technology-driven power transitions. Countless studies have examined the rise and fall of great powers. My aim is not to sort through all possible causes of one nation’s rise or another’s decline. Rather, I am probing the causal processes behind an established connection between technological advances in each industrial revolution and an economic power transition. The VoC framework and threat-based theories outline alternative explanations for how significant technological advances translated into growth differentials among great powers. Across all the cases, I assess whether they provide a better explanation for the historical case evidence than the GPT and LS mechanisms.

I also address case-specific confounding factors. For example, some scholars argue that abundant inputs of wood and metals induced the United States to embrace more machine-intensive technology in the IR-2, reasoning that Britain’s slower adoption of interchangeable parts manufacturing was an efficient choice given its natural resource constraints.¹⁴⁹ For each case, I determine whether these types of circumstantial factors could nullify the validity of the GPT and LS mechanisms.

In tracing these mechanisms, I benefit from a wealth of empirical evidence on past industrial revolutions, which have been the subject of many interdisciplinary inquiries. Since the cases I study are well-traversed terrain, my research is primarily based on secondary sources.¹⁵⁰ I rely on histories of technology and general economic histories to trace how and when technological breakthroughs affected economic power balances. Notably, my analysis takes advantage of the application of formal statistical and econometric methods to assess the impact of significant technological advances, part of the “cliometric revolution” in economic history.¹⁵¹ Some of these works have challenged the dominant narrative of previous industrial revolutions. For instance, Nick von Tunzelmann found that the steam engine made minimal contributions to British productivity growth before 1830, raising the issue that earlier accounts of British industrialization “tended to conflate the economic significance of the steam engine with its early diffusion.”¹⁵²

I supplement these historical perspectives with primary sources. These include statistical series on industrial production, census statistics, discussions of engineers in contemporary trade journals, and firsthand accounts from commissions and study teams of cross-national differences in technology systems. In the absence of standardized measures of engineering education, archival evidence helps fill in details about GPT skill infrastructure for each of the cases. In the IR-1 case, I benefit from materials from the National Archives (United Kingdom), the British Newspaper Archive, and the University of Nottingham Libraries, Manuscripts, and Special Collections. My IR-2 case analysis relies on collections based at the Bodleian Library (United Kingdom), the Library of Congress (United States), and the University of Leipzig and on British diplomatic and consular reports.¹⁵³ In the IR-3 case analysis, the Edward A. Feigenbaum Papers collection, held at Stanford University, helps inform US-Japan comparisons in computer science education.

My research also benefits greatly from new data on historical technological development. I take advantage of improved datasets, such as the Maddison Project Database.¹⁵⁴ New ones, such as the Cross-Country Historical Adoption of Technology dataset, were also beneficial.¹⁵⁵ Sometimes hype about exciting new technologies influences the perceptions of commentators and historians about the pace and extent of technology adoption. More granular data can help substantiate or cut through these narratives. Like the reassessments of the impact of previous technologies, these data were released after the publication of the field-defining works on technology and power transitions in international relations. Making extensive use of these sources therefore provides leverage to revise conventional understandings.

When assessing these two mechanisms, one of the main challenges is to identify the key technological changes to trace. I take a broad view of technology that encompasses not just technical designs but also organizational and managerial innovations.¹⁵⁶ Concretely, I follow Harvey Brooks, a pioneer of the science and technology policy field, in defining technology as “knowledge of how to fulfill certain human purposes in a specifiable and reproducible way.”¹⁵⁷ The LS and GPT mechanisms both call attention to the outsized import of particular technical breakthroughs and their interactions with social systems, but they differ on which ones are more important. Therefore, a deep and wide understanding of advances in hardware and organizational practices in each historical period is required to properly sort them by their potential to spark LS or GPT trajectories.

This task is complicated by substantial disagreements over which technologies are leading sectors and GPTs. Proposed lists of GPTs often conflict, raising questions about the criteria used for GPT selection.¹⁵⁸ Reacting to the length of such lists, other scholars fear that “the [GPT] concept may be getting out of hand.”¹⁵⁹ According to one review of eleven studies that identified past GPTs, twenty-six different innovations appeared on at least one list but only three appeared on all eleven.¹⁶⁰

The LS concept is even more susceptible to these criticisms because the characteristics that define leading sectors are inconsistent across existing studies. Though most scholars agree that leading sectors are new industries that grow faster than the rest of the economy, there is marked disagreement on other criteria. Some scholars select leading sectors based on the criterion that they have been the largest industry in several major industrial nations for a period of time.¹⁶¹ Others emphasize that leading sectors attract significant investments in R&D.¹⁶² To illustrate this variability, I reviewed five key texts that analyze the effect of leading sectors on economic power transitions. Limiting the lists of proposed leading sectors to those that emerged during the three case study periods, I find that fifteen leading sectors appeared on at least one list and only two appeared on all five.¹⁶³

My process for selecting leading sectors and GPTs to trace helps alleviate concerns that I cherry-pick the technologies that best fit my preferred explanation. In each historical case, most studies that explicitly identify leading sectors or GPTs agree on a few obvious GPTs and leading sectors. To ensure that I do not omit any GPTs, I consider all technologies singled out by at least two of five key texts that identify GPTs across multiple historical periods.¹⁶⁴ I apply the same approach to LS selection, using the aforementioned list I compiled.

Following classification schemes that differentiate GPTs from “near-GPTs” and “multipurpose technologies,” I resolve many of the conflicts over what counts as a GPT or leading sector by referring to a set of defining criteria.¹⁶⁵ For instance, while some accounts include the railroad and the automobile as GPTs, I do not analyze them as candidate GPTs because they lack a variety of uses.¹⁶⁶ I support my choices with empirical methods for LS and GPT identification. To confirm certain leading sectors, I examine the rate of growth across various industry sectors. I also leverage recent studies that identify GPTs with patent-based indicators.¹⁶⁷ Taken together, these procedures limit the risk of omitting certain technologies while guarding against GPT and LS concept creep.¹⁶⁸ The qualitative analysis appendix outlines how I address other issues related to LS and GPT identification, including concerns about omitting important single-purpose technologies and scenarios when certain technological breakthroughs are linked to both LS and GPT trajectories.

These considerations underscore that taking stock of the key technological drivers is only the first step in the case analysis. To judge whether these breakthroughs actually brought about the impacts that are often claimed for them, it is important to carefully trace how these technologies evolved in close relation with societal systems.

As a complement to the historical case studies, this book’s research design includes a large-n quantitative analysis of the relationship between the breadth of software engineering skill formation institutions and computerization rates. This tests a key observable implication of GPT diffusion theory, using time-series cross-sectional data on nineteen advanced and emerging economies across three decades. I leave the more detailed description of the statistical methodology to chapter 6.

Summary

The technological fitness of nations is determined by how they adapt to the demands of new technical advances. I have developed a theory to explain how revolutionary technological breakthroughs affect the rise and fall of great powers. My approach is akin to that of an investigator tasked with figuring out why one ship sailed across the ocean faster than all the others. As though differentiating the winning ship’s route from possible sea-lanes in terms of trade wind conditions and course changes, I first contrast the GPT and LS trajectories with regard to the timing, phase, and breadth of technological change. Once the superior route has been mapped, attention turns to the attributes of the winning ship, such as its navigation equipment and sailors’ skills, that enabled it to take advantage of this fast lane across the ocean. In similar fashion, having set out the GPT trajectory as the superior route from technological revolution to economic leadership, my argument then highlights GPT skill infrastructure as the key institutional attribute that dictates which great power capitalizes best on this route.

3 The First Industrial Revolution and Britain’s Rise

FEW HISTORICAL EVENTS have shaken the world like the First Industrial Revolution (IR-1, 1780–1840). Extraordinary upheaval marked the contours and consequences of the IR-1. For the first time in history, productivity growth accelerated dramatically, allowing large numbers of people to experience sustained improvements in their living standards. Small towns transformed into large cities, new ideologies gathered momentum, and emerging economic and social classes reshaped the fabric of society. These changes reverberated in the international sphere, where the ramifications of the IR-1 included the transition to industrialized mass warfare, the decline of the absolutist state, and the birth of the modern international system.

Among these transformations, two phenomena stand out. The first is the remarkable technological progress that inaugurated the IR-1 period. Everything was changing in part because so many things were changing—water frames, steam engines, and puddling processes not least among them. The second is Britain’s rise to unrivaled economic leadership, during which it sustained productivity growth at higher levels than its rivals, France and the Netherlands. The following sections adjudicate the debates over the exact timeline of Britain’s industrialization, but there is no doubt that Britain, propelled by the IR-1, became the world’s most advanced economic power by the mid-nineteenth century.

No study of technological change and power transitions is complete without an account of the IR-1. For both the LS and GPT mechanisms, the IR-1 functions as a typical case that is held up as paradigmatic of technology-driven power transitions. The standard account in international relations scholarship attributes Britain’s industrial ascent to its dominance of innovation in the IR-1’s leading sectors, including cotton textiles, iron metallurgy, and steam power.¹ Present-day scholarship and policy discussions often draw upon stylized views of the IR-1, analogizing present developments in information technology and biotechnology to the effects of steam power and cotton textiles in the industrial revolution.²

A deeper inquiry into the IR-1 and Britain’s economic rise challenges many of these conventional views. First, it reveals that general-purpose transformations linked to advances in iron metallurgy diffused widely enough to significantly affect economy-wide productivity only after 1815—a timeline that aligns with the period when Britain significantly outpaced its rivals in industrialization. Other prominent advances, including the steam engine, made only limited contributions to Britain’s rise to industrial prominence in this period owing to a prolonged period of gestation before widespread adoption. Second, the IR-1 case also demonstrates that it was Britain’s advantage in extending mechanization throughout the economy, not monopoly profits from innovations in cotton textiles, that proved crucial to its industrial ascendancy. Third, the historical data illustrate that the dispersion of mechanical innovations across many sectors fueled British productivity growth. Across these three dimensions, the IR-1 case matches the GPT trajectory better than the LS trajectory.

Since no country monopolized innovations in metalworking processes and Britain’s competitors could also absorb innovations from abroad, why did Britain gain the most from this GPT trajectory? In all countries, as technical advances surged forward, institutional adjustments raced to cultivate the skills required to keep pace. Importantly, France and the Netherlands were competitive with Britain—and even surpassed it in some respects—in scientific research infrastructure and education systems for training expert engineers. These institutional settings in France and the Netherlands, however, produced knowledge and skills that were rather divorced from practical applications.

Britain’s comparative advantage rested on another type of skill infrastructure. It depended less on heroic innovators like James Watt, the famed creator of the modern steam engine, and more on competent engineers who could build and maintain new technological systems, as well as make incremental adaptations to implement these systems in many different settings.³ As expected by GPT diffusion theory, Britain benefited from education systems that expanded the base of mechanically skilled engineers and disseminated knowledge of applied mechanics. Britain’s competitors could not match its system for cultivating a common technical language in applied mechanics that encouraged the knowledge exchanges between engineers and entrepreneurs needed for advancing mechanization from one industry to the next.

To trace these mechanisms, I gathered and sorted through a wealth of evidence on the IR-1. Historical accounts served as the foundational materials, including general economic histories of the IR-1, histories of influential technologies and industries like the steam engine and the iron industry, country-specific histories, and comparative histories of Britain, France, and the Netherlands. I also benefited from contemporary assessments of the IR-1’s institutional features provided by trade journals, proceedings of mechanics’ institutes, recruitment advertisements published in local newspapers, and essays by leading engineers. This evidence stems from archival materials at the British Newspaper Archive, the National Archives (United Kingdom), and the University of Nottingham Libraries, Manuscripts, and Special Collections. Triangulating a variety of sources, I endeavored to back up my claims with statistical evidence in the form of industrial output estimates, patenting rates, and detailed biographical information on British engineers.

The assessment of the GPT and LS mechanisms against historical evidence from the IR-1 proceeds as follows. To begin, the chapter reviews Britain’s rise to industrial preeminence, which is the outcome of the case. Next, it sorts the key technological breakthroughs of the period by their potential to drive two types of trajectories—LS product cycles and GPT diffusion. I then assess whether Britain’s rise in this period is better explained by the GPT or LS mechanism, tracing the development of candidate leading sectors and GPTs in terms of impact timeframe, phase of relative advantage, and breadth of growth. If the GPT trajectory holds for this period, there should be evidence that Britain was better equipped than its competitors in GPT skill infrastructure. Another section evaluates whether the historical data support this expectation. Before concluding the chapter, I address alternative factors and explanations.

A Power Transition: Britain’s Rise

When did Britain ascend to industrial hegemony? The broad outlines of the story are well known. Between the mid-eighteenth century and the mid-nineteenth century, the industrial revolution propelled Great Britain to global preeminence. Although Britain did not boast the world’s largest economy—China held that title during this period—it did capitalize on the technologies of the industrial revolution to become “the world’s most advanced productive power.”⁴ France and the Netherlands, its economic rivals, did not keep pace with Britain’s productivity growth.

While both the LS and GPT models agree that Britain established itself as the preeminent industrial power in this period, a clearer sense of when this shift occurred is essential for testing the explanatory power of the LS and GPT mechanisms during this period. One common view of Britain’s industrialization, brought to prominence by Rostow, depicts an accelerated takeoff into sustained growth. Rostow’s timeline dates this takeoff to the last two decades of the eighteenth century.⁵ In alignment with this periodization, some scholars writing in the LS tradition claim that Britain achieved its industrial ascent by the late eighteenth century.⁶

A different perspective, better supported by the evidence that follows, favors a delayed timeline for Britain’s ascent to industrial preeminence. Under this view, Britain did not sustain economic and productivity advances at levels substantially higher than its rivals until the 1820s and after. To clarify the chronology of Britain’s industrial ascent, the following sections survey three proxies for productivity leadership: GDP per capita, industrialization, and total factor productivity.

GDP PER-CAPITA INDICATORS

Trend lines in GDP per capita, a standard proxy for productivity, confirm the broad outlines of Britain’s rise. Evidence from the Maddison Project Database (MPD) points to the decades after 1800, not before, as the key transition period (figure 3.1).⁷ These trends come from the 2020 version of the MPD, which updates Angus Maddison’s data and incorporates new annual estimates of GDP per capita in the IR-1 period for France, the Netherlands, and the United Kingdom.⁸ In 1760, the Netherlands boasted the world’s highest per-capita income, approximately 35 percent higher than Britain’s.⁹ The Dutch held this lead for the rest of the eighteenth century through to 1808, when Britain first overtook the Netherlands in GDP per capita. By 1840, Britain’s GDP per capita was about 75 percent higher than that of France and about 10 percent ahead of that of the Netherlands.¹⁰

figure 3.1 Economic Power Transition in the IR-1. *Source*: Maddison Project Database, version 2020 (Bolt and van Zanden 2020).

It should be noted that GDP per-capita information for the early years of the IR-1 is sometimes missing or only partially available. For years prior to 1807, the MPD bases Dutch GDP per-capita estimates on data for just the Holland region, so the Dutch economy’s decline during this time could be an artifact of changes in data sources.¹¹ At the same time, to ensure that the MPD data can be used to provide accurate information on historical patterns of economic growth and decline, researchers have made adjustments to partial data series and consulted experts to assess their representativeness.¹² Furthermore, the Holland-based data in the early 1800s already indicated a decline in Dutch GDP per capita. Although data scarcity makes it difficult to mark out exactly when Britain’s GDP per capita surpassed that of the Netherlands, the MPD remains the best source for cross-country comparisons of national income in this period.

INDUSTRIALIZATION INDICATORS

Industrialization indicators depict a mixed picture of when Britain sustained leadership in economic efficiency. By one influential set of metrics compiled by economic historian Paul Bairoch, Britain’s per-capita industrialization levels had grown to 50 percent higher than those of France in 1800, from a position of near-equality with France and Belgium in 1750. For scholars who map the trajectories of great powers, these estimates have assumed a prominent role in shaping the timeline of British industrial ascendance.¹³ For instance, Paul Kennedy employs Bairoch’s estimates to argue that the industrial revolution transformed Britain into a different kind of world power.¹⁴

Further examination of Bairoch’s estimates qualifies their support for an accelerated timeline of Britain’s industrial ascendance. First, by limiting his definition of “industrial output” to manufacturing industry products, Bairoch excludes the contribution of notable sectors such as construction and mining, a distinction even he admits is “rather arbitrary.”¹⁵ Second, the gap between Britain and France in per-capita industrialization levels in 1800 still falls within the margin of error for Bairoch’s estimates.¹⁶

Moreover, a delayed timeframe is supported by industrialization measures that encompass more than the manufacturing industries. In 1700, the Netherlands had a substantially higher proportion of its population employed in industry (33 percent) compared to the United Kingdom (22 percent). In 1820, the proportion of people employed in UK industry had risen to 33 percent—higher than the Dutch corresponding rate of 28 percent.¹⁷ One expert on the pre-industrial revolution in Europe notes that the Netherlands was at least as industrialized as England, if not more so, throughout the eighteenth century.¹⁸ Lastly, aggregate industrial production trends map out a post-1815 surge in British industrialization, providing further evidence that Britain did not solidify its productivity advantage until decades into the nineteenth century.¹⁹

PRODUCTIVITY INDICATORS

Total factor productivity (TFP) indicators, which capture the efficiency by which production factors are converted into useful outputs, further back the delayed ascent story. As was the case with trends in aggregate industrial output, TFP growth in Britain did not take off until after 1820.²⁰ In truth, British TFP growth was very modest throughout the eighteenth century, averaging less than 1 percent per year.²¹

While the paucity of reliable data on total factor productivity in France hinders cross-national comparisons in this period, some evidence suggests that Britain did not surpass the Netherlands in TFP until after 1800.²² Historian Robert Allen estimated TFP in agriculture by calculating the ratio between actual output per worker and the output per worker predicted by a country’s available agricultural population and land. On this metric for the year 1800, the Netherlands ranked higher than Britain and all other European nations.²³ The Dutch also attained the highest absolute TFP in Europe for almost all of the eighteenth century.²⁴

Which periodization of Britain’s industrial ascent better reflects the empirical evidence? On balance, measures of per-capita GDP, industrialization levels, and total factor productivity support a deferred timeline for Britain’s industrial rise. This clarification of when an economic power transition occurred during the IR-1 provides a stable target to test the competing LS and GPT mechanisms.

Key Technological Changes in the IR-1

Before evaluating the LS and GPT mechanisms in greater depth, the technological elements of the IR-1 must be further specified. Hargreaves’s spinning jenny (1764), Arkwright’s water frame (1769), Watt’s steam engine (1769), Cort’s puddling process (1784), and many other significant technical advances emerged during the First Industrial Revolution. The most likely sources of GPT and LS trajectories can be identified with guidance from existing work that calls attention to key technologies and accepted criteria for these two categories. Narrowing the assessment of these two mechanisms to a limited set of technologies makes for a more viable exercise.

Candidate Leading Sectors: Cotton Textiles, Iron, and Steam Power

A strong degree of consensus on the leading sectors that powered Britain’s rise in the IR-1 makes it relatively easy to identify three candidate sectors: cotton textiles, iron, and steam power.²⁵ Among these, historians widely recognize the cotton textile industry as the original leading sector of the First Industrial Revolution.²⁶ New inventions propelled the industry’s rapid growth, as its share of total value added to British industry rose from 2.6 percent in 1770 to 17 percent in 1801.²⁷ In characterizing the significance of the cotton industry, Schumpeter went as far as to claim, “English industrial history can, in the epoch 1787–1842 … be almost resolved into the history of a single industry.”²⁸

If the cotton textile industry places first in the canon of the IR-1’s leading sectors, then the iron industry follows close behind. In their account of Britain’s rise, Modelski and Thompson single out these two major industries, employing pig iron production and cotton consumption, as indicators for Britain’s leading sector growth rates.²⁹ According to the traditional view of the IR-1, the iron and cotton industries were the only two that experienced “highly successful, rapidly diffused technical change” before the 1820s.³⁰

I also evaluate the steam power industry as a third possible leading sector. A wide range of LS-based scholarship identifies steam power as one of the technological foundations of Britain’s leadership in the nineteenth century.³¹ Most of this literature labels only the steam engine itself as the leading sector, but since leading sectors are new industries, the steam engine–producing industry is the more precise understanding of the leading sector related to major advances in steam engine technology. Compared to the iron and cotton textile industries, it is much more uncertain whether the steam engine–producing industry, which experienced relatively slow growth in output and productivity, meets the analytical criteria for a leading sector during the IR-1 case.³² Still, I include the steam engine–producing industry as a potential leading sector, leaving it to the case analysis to further study its growth trajectory.

Candidate GPTs: Iron, Steam Engine, and the Factory System

Since the possible sources of GPT trajectories in the IR-1 are less established, I drew on previous studies that mapped out technological paradigms in this period to select three candidate GPTs: the steam engine, mechanization based on advances in iron machinery, and the factory system.³³ As a possible source of GPT-style effects, the steam engine is a clear choice. Alongside electricity and (ICT) technology, it has been described as one of the “Big Three” GPTs, appearing in nearly all catalogs of GPTs.³⁴ Here the emphasis is on the capacity of steam engines to transform a wide variety of industrial processes across many sectors, as opposed to the potential growth of the steam engine–producing industry.

Of the two paradigmatic industries of the IR-1, cotton and iron, the latter was a more plausible driver of GPT-style effects for Britain. As the demand for iron-made machinery grew, iron foundries development of new generations of machine tools, such as cylinder-boring machines, contributed to the creation of a mechanical engineering industry.³⁵ This spurred the mechanization of production processes in a wide range of industries, including agriculture, food processing, printing, and textiles.³⁶ Although both cotton textiles and iron were fast-growing industries, developments in iron better resembled a “motive branch” driving pervasive effects across the economy.³⁷

In addition, the late eighteenth century saw the emergence of centralized factories, which significantly increased the scale of goods production. The factory system offered the potential to change the techniques of production across many industries. One widely cited classification scheme for GPTs picks out the factory system as an “organizational GPT” in the late eighteenth to early nineteenth century period.³⁸ Other scholars describe this organizational innovation as “one of the most fundamental changes of ‘metabolism’ in the Industrial Revolution.”³⁹

I also considered but ultimately decided against including developments in railroads as a candidate GPT. Among five core texts that classify GPTs across many historical periods, at least two highlighted the significance of the railroad to the IR-1.⁴⁰ In my view, the railroad represented a disruptive advance, but it did not acquire the variety of uses to qualify as a GPT. Railways carried many types of freight and made new business models possible, but their function was limited to transport.⁴¹

Sources of LS and GPT Trajectories

Table 3.1 recaps the potential technological sources for both the GPT and LS mechanisms. It is important to clarify three points about the sorting process. First, it is notable but not surprising that the candidate leading sectors and GPTs draw from similar technological wellsprings. Both mechanisms agree that some inventions, like Cort’s puddling process for making wrought iron, mattered much more than others in terms of their impact on the economic balance of power.

Table 3.1 Key Sources of Technological Trajectories in the IR-1

Candidate Leading Sectors	Candidate GPTs
Cotton textile industry	Factory system
Iron industry	Mechanization
Steam engine–producing industry	Steam engine

Where the mechanisms separate is in how this process transpired. Cort’s puddling process and other ironmaking innovations, under the GPT model, are expected to follow an impact pathway characterized by three features: an extended lag before they affect productivity growth, the spread of mechanization across the economy, and widespread complementary innovations in many machine-using industries. Under the LS model, the same technological sources are expected to affect economic growth in a way that is lopsided in the early stages of development, fueled by monopolizing iron exports, and limited to technological innovations in the iron industry.

Second, it is still useful to classify three distinct candidate GPTs in the IR-1, despite the fact that developments in factory systems, mechanization, and steam engines were mutually reinforcing in many respects. Steam engines depended on the transition from hand-tool processes to machinery-based production systems; at the same time, the impact of steam engines on coal mining was to boost the iron industry, spurring mechanization. Yet a number of historians distinguish the expansion of mechanization in the British industrial revolution from transformations linked to the steam engine, arguing that the latter’s economic impact materialized much later than the former.⁴² Thus, while these candidate GPT trajectories are interconnected, it is still possible to locate various GPTs at different stages of their life cycle.

Third, not all of these technological changes necessarily had a decisive impact on Britain’s capacity to sustain higher productivity levels than its rivals during the period of interest. They are labeled as candidates for a reason. As this chapter will show, the steam engine did not achieve widespread diffusion until after Britain had already established economic leadership. When subjected to more rigorous empirical analysis, developments in some technologies may not track well with the proposed LS and GPT trajectories for this period.

GPT vs. LS Trajectories in the IR-1

Spelling out possible sources of technological trajectories in the IR-1 provides a bounded terrain for testing the validity of the GPT and LS mechanisms. By leveraging differences between the two mechanisms with respect to impact timeframe, phase of relative advantage, and breadth of growth, I derive three sets of opposing predictions for how technological changes translated into an economic power transition in this period. I then assess whether, and to what extent, the developments in the IR-1 supported these predictions.

OBSERVABLE IMPLICATIONS RELATED TO IMPACT TIMEFRAME

When did the revolutionary technologies of the IR-1 disrupt the economic balance of power? If the impact timeframe of the LS mechanism holds, then radical technical advances in the cotton textile, iron, and/or steam engine–producing industries should have substantially stimulated British economic growth shortly after the emergence of major technological advances in the 1760s and 1770s.⁴³ Accordingly, scholars theorize that leading sectors propelled Great Britain to industrial superiority in the late eighteenth century.⁴⁴ In line with this conception of a rapid timeline, Modelski and Thompson expect that the growth of two lead industries, cotton and iron, peaked in the 1780s.⁴⁵

On the other hand, if the GPT mechanism was operational, the impact of major technological breakthroughs on Britain’s industrial ascent should have arrived on a more gradual timeline. Key advances tied to mechanization, steam power, and the factory system emerged in the 1770s and 1780s. Given that GPTs require a long period of delay before they diffuse and achieve widespread adoption, the candidate GPTs of the IR-1 should not have had substantial economy-wide repercussions until the early decades of the nineteenth century and after. I use the year 1815 as a rough cut-point to separate the accelerated impact timeframe of leading sectors from that of GPTs in this period.

OBSERVABLE IMPLICATIONS RELATED TO THE PHASE OF RELATIVE ADVANTAGE

The LS mechanism places high value on the innovation phase of technological change. Where major breakthroughs arise is key. Accordingly, Britain’s capacity to pioneer major technological advances should explain economic growth differentials in the IR-1. Concretely, the LS mechanism expects that Britain’s rise was fueled by its dominance of innovation in the cotton textile, iron, and steam engine–producing industries, as well as the resultant monopoly rents from exports in these sectors.

The GPT mechanism emphasizes a less-celebrated phase of technological change. Where innovations diffuse is key. Differentials in the rate and intensiveness of GPT adoption generate the gap between an ascending industrial leader and other competitors. The GPT mechanism suggests that Britain’s rise to industrial preeminence can be traced to its superior ability to diffuse generic technological changes across the economy.

OBSERVABLE IMPLICATIONS RELATED TO BREADTH OF GROWTH

The last set of observable implications relate to the breadth of growth during the IR-1. As illustrated in the descriptions here of the candidate leading sectors, many accounts attribute Britain’s industrial ascent to a narrow set of critical advances.⁴⁶ In one of the first texts dedicated to the investigation of technology and international relations, William Ogburn declared, “The coming of the steam engine … is the variable which explains the increase of Britain as a power in the nineteenth century.”⁴⁷ According to GPT diffusion theory, Britain’s rise to industrial preeminence came from the advance of GPTs through many linked sectors.

Taken together, these three sets of diverging predictions guide my assessment of the GPT mechanism against the LS mechanism. I make expectations specific to the IR-1 case by using the relevant information on particular technologies and the timeline of British industrialization. Table 3.2 lays out the specific, testable predictions that provide the framework of evaluation in the following sections.

Table 3.2 Testable Predictions for the IR-1 Case Analysis

Prediction 1: LS (impact timeframe)	The cotton textile, iron, and/or* steam engine–producing industries made a significant impact on Britain’s rise to industrial preeminence before 1815.
Prediction 1: GPT	Mechanization, the steam engine, and / or the factory system made a significant impact on Britain’s rise to industrial preeminence only after 1815.
Prediction 2: LS (relative advantage)	Innovations in cotton textile, iron, and/or the steam engine–producing industries were concentrated in Britain.British advantages in the production and exports of textiles, iron, and/or steam engines were crucial to its industrial superiority.
Prediction 2: GPT	Innovations in iron, the steam engine, and/or the factory system were not concentrated in Britain.
Prediction 2: GPT	British advantages in the diffusion of mechanization, steam engines, and/or the factory system were crucial to its industrial superiority.
Prediction 3: LS (breadth of growth)	Productivity growth in Britain was limited to the cotton textile, iron, and/or steam engine – producing industries.
Prediction 3: GPT	Productivity growth in Britain was spread across a broad range of industries linked to mechanization, the steam engine, and/or the factory system.
* The operator “and/or” links all candidate leading sectors and GPTs because it could be the case that only some of these technologies drove the trajectories of the period.

Impact Timeframe: Delayed Surge vs. Fast Rise of British Industrialization

The painstaking reconstruction of temporal chronology is at the heart of tracing mechanisms. Tremendous technological changes occurred during this period, but when exactly did they make their mark on Britain’s industrial superiority? The period when significant technological innovations emerge often does not match up with the time when their impacts are felt. Unfortunately, when drawing lessons from the IR-1 on the effect of technology on international politics, scholars have conflated the overall significance of certain technologies with near-immediate impact.⁴⁸ Establishing a clear timeline of when technological changes catalyzed a shift in productivity leadership during the IR-1 is therefore an important first step in comparing the LS and GPT mechanisms.

DIVERGING TIMELINES: COTTON VS. IRON

Time-series data on the output growth of twenty-six industries that accounted for around 60 percent of Britain’s industrial production help differentiate the growth schedules of the cotton textiles and iron industries. According to these data, the major upswing in British industrialization took place after 1815, when the aggregate growth trend increased from 2 percent to a peak of 3.8 percent by 1825. In line with the expectations of the LS model, the cotton textile industry grew exceptionally fast following major technological innovations in the 1760s, but from the 1780s there was a deceleration in the output growth of cotton textiles. Based on the relatively early peak of the cotton industry’s expansion, David Greasley and Les Oxley conclude that “it appears unlikely that cotton played the major role in the post-1815 upswing in British industrialization.”⁴⁹

Following a completely different trajectory, growth in iron goods was more in line with the GPT model. Starting in the 1780s, the growth rate of the British iron industry accelerated to a peak of about 5.3 percent in the 1840s.⁵⁰ “Compared to cotton textiles, change in iron was gradual, incremental, and spread out over a longer period of time,” Espen Moe writes.⁵¹ With a limited role for cotton, the gradual expansion of the iron industry led Britain’s post-1815 industrial surge, as its trend output tracked much more closely with that of aggregate industry. In sum, the cotton industry followed the growth path of a leading sector, whereas developments in the iron industry reflected the impact timeline of a GPT.

The timing of Britain’s mechanization, linked to the expanded uses of iron in machine-making, also aligned with the GPT trajectory. The first metalworking tools for precision engineering, including Wilkinson’s boring mill of 1774 and Maudslay’s all-iron lathe, appeared in the late eighteenth century, but they would remain in a “comparatively rudimentary state” until about 1815.⁵² According to accounts of qualified engineers and the 1841 Select Committee on Exportation of Machinery, over the course of the next two decades improvements and standardization in such machine tools ushered in a “revolution” in machine-making.⁵³ The gradual evolution of the mechanical engineering industry provides additional support for a delayed impact timeframe for mechanization. According to British patent data from 1780 to 1849, the share of mechanical engineering patents among the total number of patents increased from an average of 18 percent in the decade starting in 1780 to a peak of 34 percent in the one starting in 1830.⁵⁴

DELAYED, OUT-OF-PERIOD EFFECTS: STEAM ENGINE AND FACTORY SYSTEM

Compared to mechanization, the steam engine had not diffused widely enough through Britain’s economy by the mid-nineteenth century to make a substantial impact on overall industrial productivity. Detailed investigations of steam engine adoption have forced a reassessment of commonly held assumptions about the rapid impact of steam power on British productivity growth.⁵⁵ According to one growth accounting analysis, which compares the impact of steam against water power as a close substitute source of power, steam power’s contribution to British productivity growth was modest until at least the 1830s and most influential in the second half of the nineteenth century.⁵⁶ This revised impact timeframe conflicts with international relations scholarship, which advances faster trajectories for steam’s impact as a leading sector.⁵⁷

Steam engine adoption was slow. In 1800, thirty years after Watt patented his steam engine, there were only about thirty-two engines operating in Manchester, which was a booming center of industrialization.⁵⁸ Even into the 1870s, many important sectors in the British economy, such as agriculture and services, were virtually unaffected by steam, as most of steam power’s applications were concentrated in mining and in cotton textiles.⁵⁹ During the period when LS accounts expect its peak growth, the steam engine could not claim generality of use.

The process by which the steam engine gained a broad variety and range of uses entailed complementary innovations that followed many years after the introduction of Watt’s steam engine. It took sixty years for steam to become the prime driver of maritime transport; that was made possible only after cumulative enhancements to the power of steam engines and the replacement of paddle wheels by screw propellers, which increased the speed of steam-powered ships.⁶⁰ Watt’s original low-pressure design engines consumed large amounts of coal, which hindered widespread adoption. After 1840, aided by inventions such as the Lancashire boiler and discoveries in thermodynamics, it became economically viable to deploy steam engines that could handle higher pressures and temperatures.⁶¹ In sum, steam power may be the quintessential example of the long delay between the introduction of a GPT and significant economy-wide effects.

It is worth assessing whether interconnections between developments in steam and those in iron and cotton give grounds for an earlier impact trajectory.⁶² Yet both forward and backward linkages were limited in the early stages of the steam engine’s evolution. Regarding the latter, the steam engine–producing industry did not substantially stimulate the iron industry’s expansion. In the late 1790s, at a peak in sales, Boulton and Watt steam engines consumed less than 0.25 percent of Britain’s annual iron output.⁶³ Forward linkages to textiles, the most likely sector to benefit from access to cheaper steam power, were also delayed. Steam-powered inventions in textiles did not overtake water-powered textile processes until after 1830.⁶⁴ Of course, over the long run steam power was a critical technological breakthrough that changed the energy budget of the British economy.⁶⁵ However, for investigating the mechanisms that facilitated Britain’s rise to become the clear productivity leader in First Industrial Revolution, the steam engine played a modest role and most of its effects were out-of-period.

A similar timeline characterizes the progression of the factory system, another candidate GPT considered for this period. The factory system diffused slowly and took hold in only a limited number of trades during the time when Britain was establishing its industrial preeminence. The British textile industry, as the earliest adopter of this organizational innovation, had established nearly five thousand steam- or water-powered factories by the 1850s.⁶⁶ Other industries, however, were much slower to adopt the factory system. In the first decades of the nineteenth century, small workshops and domestic production still dominated the metal trades as well as other hardware and engineering trades.⁶⁷

Moreover, factories were still relatively small even into the mid-nineteenth century, and some industries adopted a mixed factory system in which many processes were outsourced to household workers.⁶⁸ It was not until steam power overtook water power in the 1830s and 1840s as a source of power in factories that the subsequent redesigns of factory layouts led to large gains in productivity.⁶⁹

What does this clarified chronology of technological impacts in the IR-1 mean for the explanatory power of the GPT and LS mechanisms? Of the three candidate leading sectors, only cotton textiles, which expanded rapidly and peaked in terms of output growth in the 1780s and 1790s, followed the impact timeframe of a leading sector. As figure 3.2 shows, by 1814 British cotton exports had already surpassed 75 percent of the value they would attain in 1840. Yet Britain sustained productivity growth rates at higher levels than its rivals only in the first decades of the nineteenth century. Thus, the period when cotton should have made its greatest impact on Britain’s industrial ascent does not accord with the timeline of Britain’s industrialization surge.

The hurried timeline of the LS mechanism contrasts with the more delayed impact of other technological advances. As predicted by the GPT mechanism, all three candidate GPTs—mechanization, the steam engine, and the factory system—had little impact on Britain’s industrial rise until after 1815. In fact, the diffusion timelines for the steam engine and factory system were so elongated that their impact on Britain’s rise to industrial preeminence was limited in this period. In 1830, steam engine adoption, as measured by total horsepower installed, was only one-quarter of the 1840 level, whereas by 1830 the level of iron production had reached about 50 percent of its corresponding value in 1840 (see figure 3.2). This is consistent with the steady expansion of mechanization across industry in the early decades of the 1800s as the GPT trajectory most attuned with the timing of Britain’s rise to economic leadership.

figure 3.2 Technological Impact Timeframes in the IR-1. *Note*: British cotton exports, iron production, and steam engine adoption over time. *Source*: Robson 1957, 331–35; Mitchell 1988, 280–81; Crafts 2004, 342.

Phase of Relative Advantage: Diffusion of Iron vs. Monopoly Profits from Cotton

Thus far, the empirical evidence has presented a bird’s-eye view of the overall timeline of technological change and industrialization in the IR-1, but there are two other dimensions on which the GPT and LS trajectories diverge. According to the expectations of the LS mechanism, the phase of technological development central to Britain’s relative economic rise was its dominance of key innovations in cotton textiles, iron, or the steam engine. The GPT mechanism predicts, in contrast, that Britain’s advantage in the diffusion of mechanization, the steam engine, or the factory system was the key driver.

The rest of this section tests two sets of predictions derived from the diverging assumptions of the two mechanisms. First, regarding the geographic clustering of major technological breakthroughs in the IR-1, I assess whether innovations in candidate leading sectors and GPTs were concentrated in Britain. Next, regarding the comparative consequences of these technologies, I evaluate whether Britain’s industrial superiority drew more from its advantages in the production and exports of the IR-1’s leading sectors or from its advantage in the diffusion of the IR-1’s GPTs.

INNOVATION CLUSTERING IN THE IR-1’S BREAKTHROUGHS

Did Britain dominate innovation in the leading sectors of the IR-1? At first glance, there is no question that radical advances in candidate leading sectors clustered in Britain. This list includes Watt’s steam engine, Arkwright’s water frame, Cort’s puddling process, and many more. Per one analysis of 160 major innovations introduced during the nineteenth and twentieth centuries, Britain was home to 44 percent of major innovations from 1811 to 1849—a rate that was double that of the closest competitor (the United States at 22 percent).⁷⁰

Further investigation into British superiority in technological innovation paints a more mixed picture. According to another list of technical advances by country of origin, Britain accounted for only 29 percent of major innovations in the years from 1826 to 1850, a period that corresponds to when it cemented its productivity leadership.⁷¹ Moreover, the European continent introduced many significant innovations, including the Jacquard loom, mechanical flax spinning, chlorine bleaching, the Leblanc soda–making process, and the Robert continuous papermaking machine.⁷² France, in particular, generated many of the major industrial discoveries, such as in chemicals, clocks, glass, papermaking, and textiles.⁷³

Thus, some scholars argue that Britain’s comparative edge was in more incremental improvements. Reflecting on technological creativity in the IR-1, economic historian Joel Mokyr argues, “Britain seems to have no particular advantage in generating macroinventions … the key to British technological success was that it had a comparative advantage in microinventions.”⁷⁴ A proverb from the time captured this distinction: “For a thing to be perfect it must be invented in France and worked out in England.”⁷⁵ This suggests that digging deeper into the different phases of technological development can help uncover the roots of Britain’s industrial leadership.

MONOPOLY PROFITS VS. DIFFUSION DEFICIT

First, as was the case with the period when they made their impact, developments in cotton and iron followed very different paths with respect to the phase of technological change that determined economic differentials. Britain’s cotton textile industry, the most likely source of monopoly profits, grew faster than other industries before 1800, and it sold most of its goods abroad. Technological innovations such as the spinning jenny and the water frame triggered exponential increases in the efficiency of cotton production, and Britain’s cotton output increased by 2,200 percent from 1770 to 1815.⁷⁶ From 1750 to 1801, cotton’s share of Britain’s major exports increased from 1 percent to 39.6 percent.⁷⁷

Certainly, the growth of British cotton exports was remarkable, but what was the impact of associated monopoly rents on overall growth differentials? Supported by improved quantitative estimates of the cotton industry’s impact on the British economy, historians generally accept that the cotton industry was much more significant for enhancing Britain’s trade balance than for boosting its economic productivity.⁷⁸ According to one estimate, between 1800 and 1860, the cotton industry accounted for 43 percent of the threefold increase in the value of exports but only 8 percent of the threefold increase in national income.⁷⁹

Overall, exports constituted a small proportion of British economic activity during the IR-1. From 1770 to 1841, British exports as a percentage of overall industrial demand increased only from 13 to 16 percent.⁸⁰ Now, these figures probably underrate trade as a critical engine of growth for Britain in the IR-1, as they ignore gains from the reinvestment of profits from overseas trade.⁸¹ But the impact of reinvestment has been challenged, and it is not apparent why reinvestments from exports were more important than reinvestments from the profits generated by domestic production.⁸²

The iron industry’s impact on Britain’s economic rise did not run through monopoly profits from innovation. From 1794 to 1796, British ironmakers contributed 11 percent of Britain’s manufacturing exports. This proportion actually declined to just 2 percent by the 1814–1816 period and stayed around that rate into the 1830s.⁸³ It is also questionable whether Britain held a relative advantage in iron exports during the late eighteenth century, which is when LS accounts expect monopoly profits to drive British industrialization.⁸⁴ In fact, British industries continued to rely on imports of high-grade iron from Sweden and Russia well into the nineteenth century.⁸⁵

An alternative pathway, captured by the GPT trajectory, posits that Britain’s advantage came from the diffusion of iron machinery advances across a wide range of sectors. To trace this trajectory, it is necessary to pay more attention to what a prominent historian of the IR-1 calls one of the astonishing things about the phenomenon: the gap between “innovation as ‘best practice’ technique and the diffusion of innovation to become ‘representative’ technique.”⁸⁶

Britain was more successful than its industrial rivals in the diffusion of mechanization. Contemporary observers from the European continent often remarked upon Britain’s ability to bridge this gap between best practice and representative practice.⁸⁷ Writing in 1786 in their Voyages aux Montagnes, French observers F. and A. de la Rochefoucauld-Liancourt, commenting on Britain’s relative advantage in the widespread adaptation of the use of iron, noted

the great advantage [their skill in working iron] gives them as regards the motion, lastingness and accuracy of machinery. All driving wheels and in fact almost all things are made of cast iron, of such a fine and hard quality that when rubbed up it polishes just like steel. There is no doubt but that the working of iron is one of the most essential of trades and the one in which we are most deficient.⁸⁸

But France’s deficiency in iron machinery was not a product of its lack of access to key innovations. In fact, France was the world’s center of science from the late eighteenth century until the 1830s.⁸⁹ Rather, as the following quote illustrates, Britain’s industrial rivals fell behind in “diffused average technology” and the “effective spread of technical change more widely.” Economic historian Peter Mathias writes:

It is remarkable how quickly formal knowledge of “dramatic” instances of new technology, in particular steam engines, was diffused, and how quickly individual examples of “best-practice” technology in “show piece” innovations were exported. The blockage lay in the effective spread of technical change more widely—diffused average technology rather than single instances of best-practice technology in “dramatic” well-publicized machines.⁹⁰

Advances in iron metallurgy played a crucial role in a GPT trajectory that spread from a sector that improved the efficiency of producing capital goods. The GPT trajectory unfolds as the technology becomes more general-purpose through interactions between the upstream capital goods sector and the user industries that enlarge the range of applications. Rosenberg’s depiction of this type of system highlights the nineteenth-century American machine tool industry as the innovative capital goods sector.⁹¹ In this case, Britain’s metal-processing works were the crucial wellspring. Specifically, technical advances in iron fed into metalworking industries from which broadly similar production processes diffused over a large number of industries.⁹² Maxine Berg, a professor of history at the University of Warwick, pinpoints these industries as the “prime mechanism for technological diffusion.”⁹³

Scholars also identify Watt’s improved steam engine as a potential source of both LS- and GPT-based effects. Here I focus on testing the LS prediction about the steam engine–producing industry because the previous section showed that the steam engine and the factory system, as candidate GPTs, diffused too slowly to make a meaningful impact on the economic power transition in the IR-1.

It is tough to make a case that the growth of the steam engine–producing industry generated a substantial source of monopoly profits for Britain. Equipped with an exclusive patent, James Watt and Matthew Boulton set up a firm in 1775 to sell steam engines.⁹⁴ In the period from 1775 to 1825, however, the firm sold only 110 steam engines to overseas customers.⁹⁵ By 1825, France and the United States were manufacturing the Watt engine and high-pressure engines at more competitive prices, and overseas demand declined sharply.⁹⁶ Thus, the international sales history of this firm severely weakens the significance of the monopoly profits associated with the innovation of the steam engine.⁹⁷

In sum, the evidence from this section supports two conclusions. British advantages in the production and export of iron, steam engines, and cotton textiles (the best representative of the LS trajectory) had muted effects on its overall industrialization and productivity advances. Second, the contributions of technological breakthroughs in iron metallurgy and steam power to Britain’s industrial rise track better with the GPT mechanism, based on relative advantages in widespread technological diffusion as opposed to monopoly profits from innovation.

Breadth of Growth: Complementarities of Iron vs. Spillovers from Cotton Textiles

The breadth of growth in the IR-1 is the last dimension on which the LS and GPT trajectories disagree. Was Britain’s industrial rise driven by technological changes confined to a narrow range of leading sectors, or was it based on extensive, complementary innovations that enabled the spread of GPTs? Making use of data on sectoral sources of productivity growth, trade flows, and patents, I evaluate these competing propositions about the breadth of technological change in the industrial revolution.

WIDESPREAD PRODUCTIVITY GROWTH

In differentiating between the narrow view and the broad view of technical change during the IR-1, a natural starting point is to estimate the contribution of various industries to British productivity growth. Deirdre McCloskey’s calculations of sectoral contributions to productivity growth support the broad view. Though cotton accounted for a remarkable 15 percent of Britain’s total productivity growth, nonmodernized sectors still drove the lion’s share (56 percent) of productivity gains.⁹⁸

Manufacturing trade data provide another testing ground. If other manufacturing industries outside of textiles and iron were technologically stagnant during the first fifty years of the nineteenth century, then British competitiveness in these industries should decline relative to textiles and iron. The narrow view implies that Britain should have imported other manufactures. Peter Temin’s analysis of British trade data, however, finds the opposite. Throughout the first half of the nineteenth century, British manufacturing exports matched the increase in cotton exports throughout the first half of the nineteenth century.⁹⁹ In a wide range of manufactures, such as arms and ammunition, carriages, glass, and machinery and metals, Britain held a clear comparative advantage. This pattern points to some general pattern of change that spanned industries. “The spirit that motivated cotton manufactures extended also to activities as varied as hardware and haberdashery, arms, and apparel,” Temin concludes.¹⁰⁰

The patent record also depicts a landscape of extensive technological change.¹⁰¹ From 1780 to 1840, about 80 percent of all patented inventions came from outside the textiles and metals industries.¹⁰² Per Christine MacLeod’s data on British patents covering the 1750–1799 period, most capital goods patents originated from sectors outside of textile machinery and power sources.¹⁰³ As summed up by historian Kristine Bruland, the historical evidence supports “the empirical fact that this was an economy with extensive technological change, change that was not confined to leading sectors or highly visible areas of activity.”¹⁰⁴

GPTS AND COMPLEMENTARY INNOVATIONS

At this point, indicators of the multisectoral spread of innovation in the IR-1 should not be sufficient to convince a skeptical reader of the GPT mechanism’s validity. Broad-based growth could be a product of macroeconomic factors, such as sound fiscal and monetary policy or labor market reforms, rather than a GPT trajectory.¹⁰⁵ Proving that the dispersion of technological change in Britain’s economy reflected a GPT at work requires evidence that connects this broad front to mechanization.¹⁰⁶

Input-output analysis, which sheds light on the linkages between industries, suggests that improvements in the working of iron had broader economic significance. To better understand the interrelationships among industries during the industrial revolution, Sara Horrell, Jane Humphries, and Martin Weale constructed an input-output table for the British economy in 1841. Across the seventeen industries included in the analysis, the two industries most closely associated with mechanization—metal manufacture and metal goods—scored the highest on combined backward and forward linkages.¹⁰⁷ These two domains were “lynchpins of linkage effects.”¹⁰⁸

Patent indicators confirm these results. When patents are grouped according to standard industry taxonomies, the resulting distribution shows that the textile industry contributed to 15 percent of the patents issued between 1711 and 1850, making it the most inventive industry in aggregate terms.¹⁰⁹ However, when patents are sorted by general techniques as opposed to industry sectors, the same data reveal the underlying drive force of mechanical technology: it is linked to almost 50 percent of all British patents during this period.¹¹⁰

Along all three dimensions of technological trajectories in the IR-1, the process-tracing evidence bolsters the validity of the GPT mechanism. First, slower-moving developments in mechanization lined up with a delayed timeline of Britain’s industrialization. Other candidate leading sectors and GPTs either peaked too early (cotton) or got started too late (steam engine, factory system). Second, Britain gained its industrial dominance from a relative advantage in widespread adoption of iron metalworking and linked machinery. Third, the benefits from this GPT advantage circulated throughout the economy, rather than remaining concentrated in the iron industry.

The standard explanation of how the IR-1 gave rise to a power transition, as captured by the LS mechanism, analyzes technological change at the level of industries that grow faster than others. The historical evidence reveals the limitations of these industry taxonomies. Instead, advantages in the diffusion of production machinery—a general pattern of change that extended across a wide range of economic activities—propelled Britain to industrial dominance.¹¹¹

Institutional Complementarities: GPT Skill Infrastructure in the IR-1

Having mapped Britain’s industrial rise to a GPT trajectory linked to mechanization, there is still a need to explain why Britain was best positioned to exploit this trajectory. If other countries at the technological frontier can also cultivate mechanical innovations at home and absorb them from abroad, why were Britain’s competitors unable to benefit from the diffusion of metalworking processes to the same extent? This section supports an explanation based on Britain’s institutional competencies in widening the pool of engineering skills and knowledge linked to mechanization.

Which types of institutions for skill provision were most conducive to national success in the IR-1? One common refrain is that Britain’s leadership was rooted in the genius of individual innovators like James Watt, and such genius did not transfer as quickly across borders during the IR-1.¹¹² Though recent scholarship has weakened this view, many influential histories center on the “heroic” inventors of the industrial revolution.¹¹³ Consistent with the LS template, this view focuses on the institutions that helped drive heroic invention in Britain, such as the development of a patent system.¹¹⁴

The pathway by which mechanization propelled Britain’s industrial ascent, established as a GPT trajectory in the previous section, emphasizes another set of institutions for skill formation. In line with GPT diffusion theory, Britain owed its relative success in the IR-1 to mechanics, instrument-makers, and engineers who could build machines according to blueprints and improve upon them depending on the application context. Under this view, the institutions that trained the “tweakers” and “implementers,” rather than those that cultivated genius inventors, take center stage.¹¹⁵

Widening the Base of Mechanical “Tweakers” and “Implementers”

At first, rapid advances in precise metalworking exposed a skills shortage in applied mechanics. Beginning in the 1770s, a cascade of recruitment advertisements in local newspapers sought out an “engine-maker” or a “machine-maker.”¹¹⁶ Reflecting on this skills mismatch, the president of Britain’s Institute of Civil Engineers stated that the use of cast iron in machine parts “called for more workmen than the millwright class could supply.”¹¹⁷

A number of institutional adjustments helped Britain meet this demand for mechanically skilled tweakers and implementers. Initially, Britain benefited from a flexible apprenticeship system that empowered workers in related domains to get trained in applied mechanics.¹¹⁸ Thus, to develop the workforce to build and maintain the machinery of the IR-1, Britain could draw from a wide pool of blacksmiths, millwrights, gunsmiths and locksmiths, instrument-makers, mechanics, and toolmakers.¹¹⁹

In addition, institutes dedicated to broadening the base of mechanical expertise helped diffuse ironmaking and machine-making skills. Starting in the 1790s, private and informal initiatives created a flurry of trade associations that supported a new class of mechanical and civil engineers and helped connect them with scientific societies and entrepreneurs.¹²⁰ Critical centers included the Andersonian Institution in Glasgow, the Manchester College of Arts and Sciences, the School of Arts in Edinburgh, the Mechanical Institution in London, the Society for the Diffusion of Useful Knowledge, and hundreds of mechanics’ institutes.¹²¹ These institutes helped to absorb knowledge from foreign publications on science and engineering, recruit and upskill new mechanical engineers from a variety of trades, and spread mechanical engineering knowledge more widely.¹²²

It is important to note that these institutional features differed from those more suited to the “heroic inventor” model. In Britain’s cotton textile industry, the classic leading sector of the IR-1, the key institutional complements deviated greatly from the education and training systems that widened the base of mechanical expertise in the IR-1. Collating information from biographies of British engineers, online databases, and detailed economic histories, Ralf Meisenzahl and Joel Mokyr constructed a database of 759 British individuals who made incremental improvements to existing inventions during the industrial revolution.¹²³ Notably, based on their analysis of interactions between tweakers and their institutional surroundings, they found that the textile industry was an outlier in terms of protectiveness over intellectual property rights and reluctance to share information about new techniques. Less than one-tenth of tweakers in textiles published their knowledge to a broader audience or joined professional societies, in stark contrast to the two-thirds of tweakers in mechanically inclined fields who did so.¹²⁴ Over 80 percent of the tweakers who were active primarily in textiles took out at least one patent, compared to just 60 percent for tweakers overall.¹²⁵

These trends in applied mechanics underscore the significance for British mechanization of “collective invention,” a process that involved firms sharing information freely with one another and engineers publishing technical procedures in journals to spur the rapid diffusion of best-practice techniques. According to one analysis of how various districts adapted to the early phases of industrialization, areas that practiced collective invention often cultivated “a much higher degree of technological dynamism than locations which relied extensively on the patent system.”¹²⁶

Britain’s Comparative Advantage over France and the Netherlands in GPT Skill Infrastructure

Britain’s competitors also grasped the significance of Britain’s wide pool of mechanical skills. Whereas codified knowledge crisscrossed western Europe and North America via patent specifications, global exchanges among scientific societies, and extensive visits by foreign observers to British workshops and industrial plants, the European continent struggled greatly to absorb tacit knowledge, especially the know-how embodied in the practical engineering skills of British mechanical tweakers and implementers.¹²⁷ France and the Netherlands fiercely poached British engineers, as the transfer of tacit knowledge in the fields of large-scale ironworking and machine construction almost always necessitated the migration of skilled workers from Britain.¹²⁸ “It was exactly in the skills associated with the strategic new industries of iron and engineering that [Britain’s] lead over other countries was most marked,” argues Mathias.¹²⁹

Why did this repository of engineering skills develop more fruitfully in Britain than in its industrial rivals? A growing body of evidence suggests that Britain’s institutions better adapted its distribution of skills to mechanization. Britain’s institutional advantage was rooted in the system of knowledge diffusion that connected engineers with entrepreneurs, cities with the countryside, and one social class with another. Institutes that trained mechanics took part in a broader “mushrooming of associations” that spread technical knowledge in early-nineteenth-century Britain.¹³⁰ By the mid-nineteenth century, there were 1,020 such associations in Britain, with a total membership of approximately 200,000; clearly, these networks are essential to any explanation that links human capital to Britain’s industrial ascent.¹³¹ Compared to their peers on the continent, British mechanics had superior access to scientific and technical publications.¹³² As a result, the British system of the early nineteenth century had no match in its abundance of people with “technical literacy.”¹³³

The French system, by way of comparison, lacked similar linkages and collaborations between highly educated engineers and local entrepreneurs.¹³⁴ Though France produced elite engineers at schools like the École Polytechnique, it trained too few practitioners to widen the base of mechanical skills.¹³⁵ For example, Napoleon’s early-nineteenth-century reform of France’s higher education system encouraged the training of experts for narrow political and military ends, thereby limiting the ability of trainees to build connections with industry.¹³⁶ These reforms and other industrial policies directed French engineers toward projects associated with luxury industries and specialized military purposes, which “tended to become locked away from the rest of the economy in special enclaves of high cost.”¹³⁷ To illustrate, through the mid-1830s, only one-third of École Polytechnique graduates entered the private sector.¹³⁸ France’s system for disseminating mechanical knowledge and skills was vastly inferior to that of the British.

The Netherlands also failed to develop a base of mechanical skills that linked scientific research to practical ends. In some mechanical sciences, the Dutch generated plenty of potentially useful innovations, even pioneering key breakthroughs that eventually improved the steam engine.¹³⁹ Yet the Dutch struggled to translate these scientific achievements into practical engineering knowledge because they trailed the British in forming institutional settings that made widespread knowledge of applied mechanics possible. Records of Dutch educational systems, the dearth of societies that held lectures and demonstrations for mechanical learning, and the materials available at libraries in technical colleges all “reflected a profound lack of interest in applied mechanics.”¹⁴⁰ In his study of Dutch technological leadership, Karel Davids argues that, during the first three-quarters of the eighteenth century, “collaboration between science and industry in the Netherlands failed to merge in the very period that relations between the two became rapidly closer in Britain.”¹⁴¹

Britain’s advantage in GPT diffusion was not rooted in its higher education system, which lagged far behind the French education system during the IR-1 period.¹⁴² France had already established more than twenty universities before the French Revolution. The French system of higher technical education, from the late eighteenth century through the 1830s, had no rival. The Grande Écoles system, including the elite École Polytechnique (established in 1794), trained expert scientists and engineers to take on top-level positions as industrial managers and high-level political personnel.¹⁴³ Up until 1826, England had set up only two universities, Oxford and Cambridge. These institutions made limited contributions to training the workforce necessary for industrialization. One study with a sample of 498 British applied scientists and engineers born between 1700 and 1850 found that only 50 were educated at Oxford or Cambridge; 329 were not university-educated.¹⁴⁴

At this point, curiosity naturally leads us to ask why Britain accumulated an advantage in GPT skill infrastructure. Due to practical constraints of time and space, I acknowledge but do not delve into the deeper causes for the notable responsiveness of Britain’s institutions to the skill demands of mechanization. In surveying valuable lines of inquiry on this subject, chapter 2 points to government capacity to adopt long time horizons and reach intertemporal bargains. In the IR-1 case, two specific factors are also worthy of consideration. Attributing Britain’s later success to pre-industrial training practices, some studies suggest that Britain’s apprenticeship system allowed for agile and flexible adaptation to fluctuations in the demand for skills, especially in mechanical trades.¹⁴⁵ Looking even further back, other scholars probe the geographical origins of Britain’s mechanical skills, underscoring the lasting effects of Britain’s adoption of watermills in the early Middle Ages.¹⁴⁶

Alternative Explanations of Britain’s Rise

The history of the IR-1 is certainly not a neglected topic, and the literature features enthusiastic debates over a wide range of possible causes for Britain’s rise. Prominent explanations tie Britain’s early industrialization to population growth,¹⁴⁷ demand and consumption standards,¹⁴⁸ access to raw materials from the colonies,¹⁴⁹ slavery,¹⁵⁰ and trade.¹⁵¹ The obvious concern is that various contextual factors may confound the analysis of the LS and GPT mechanisms.

I am not rewriting the history of the IR-1. I am drawing from one particularly influential and widely held view of the IR-1—that technological advances drove Britain’s industrial ascent—and investigating how technological change and institutional adaptations produced this outcome. The most relevant alternative factors, therefore, are those that provide a different interpretation of how technologies and institutions coevolved to result in Britain’s industrial hegemony. Although I primarily focus on the LS mechanism as the most formidable alternative explanation to the GPT diffusion theory, other explanations also warrant further investigation.

Threat-Based Explanations

Threat-based theories assert that external threats are necessary to incentivize states to innovate and diffuse new technologies. Did Britain owe its technological leadership to war and its military’s impetus to modernize? During the IR-1 period, Britain was embroiled in the Revolutionary and Napoleonic Wars (1793–1815), a near-continuous stretch of conflicts involving France and other European states. If threat-based explanations stand up in the IR-1 case, then the historical record should show that these wars made an essential and positive contribution to Britain’s adoption of iron machinery and mechanization.

Some evidence supports this argument. By 1805, the British government’s needs for iron in the war effort accounted for 17 percent of the total British iron output in 1805.¹⁵² This wartime stimulus to iron production facilitated improvements in iron railways, iron ships, and steam engines.¹⁵³ In particular, military investments in gunmaking produced important spin-offs in textiles and machine tools, most famously encapsulated by Watt’s dependence on John Wilkinson’s cannon boring techniques to make the condenser cylinders for his steam engine.¹⁵⁴

On the flip side, war’s disruptive costs are likely to have offset any stimulus to Britain’s mechanization. Aside from Wilkinson’s cannon boring device and some incremental improvements, wartime pressures did not produce any major technological breakthroughs for the civilian economy.¹⁵⁵ Military needs absorbed productive laborers from Britain’s civilian economy, resulting in labor shortages.¹⁵⁶ War also limited both the domestic demand for iron, by halting investment in construction, agriculture, and other industries, and the foreign demand for British iron, by cutting off foreign trade. Historian Charles Hyde notes, “In the absence of fighting, overall demand for iron might have been higher than it was.”¹⁵⁷

Furthermore, any temporary benefits that accrued to Britain’s iron industry in wartime were wiped out in the transition to peacetime. In one influential text, historian Thomas Ashton reviewed how each of the wars of the eighteenth century, including the Revolutionary and Napoleonic Wars, affected Britain’s iron industry.¹⁵⁸ He observed a similar pattern in each case. At first, the outbreak of hostilities boosts demand for iron in the form of armament, and trade disruptions protect domestic producers against foreign competitors. This initial boom is followed by a severe crash, however, when the iron industry adjusts to the conflict’s aftermath. A trade depression follows. Converting foundries to make plowshares instead of cannons incurs heavy losses, made even more painful by the fact that war conditions promoted “feverish” developments that were unsustainable in the long run.¹⁵⁹

On a more fundamental level, threat-based theories have limited leverage in explaining Britain’s relative rise because its economic competitors were also embroiled in conflicts—in many cases, against Britain. The Dutch fought Britain in the fourth Anglo-Dutch War (1780–1784) as well as in the Napoleonic Wars.¹⁶⁰ Of course, during this time, France was Britain’s main military opponent. Thus, since the Netherlands and France also faced a threatening external environment, the net effect of the war on economic growth differentials should have been minimal.¹⁶¹ If anything, since France fought on many more fronts than Britain during this period, proponents of threat-based explanations would expect France to have experienced more effective and widespread diffusion of iron machinery throughout its economy. The case analysis clearly discredits that expected outcome.

VoC Explanations

Can Britain’s particular brand of capitalism account for its technological rise? The varieties of capitalism (VoC) approach posits that liberal market economies (LMEs) are particularly suited to radical innovation. Consistent with this framework, international political economy scholars emphasize that Britain’s free market economy supported gains in rapidly changing technological domains like consumer goods, light machine tools, and textiles.¹⁶² During the IR-1 period, Britain began to develop the institutional features that would cement it as a LME, including decentralized collective bargaining and high levels of corporatization.¹⁶³ Most pertinent to GPT diffusion theory, VoC scholars expect LMEs like Britain to excel at cultivating general skills, which help transfer GPT-related knowledge and techniques across firms.

Taking measure of Britain’s human capital development in general skills in this period is therefore central to evaluating whether its technological leadership can be explained by its form of capitalism. Overall, estimates of literacy rates and school attendance demonstrate that the general level of human capital in Britain was notably low for an industrial leader.¹⁶⁴ British literacy rates for males were relatively stagnant between 1750 and 1850, and average literacy rates in Britain were much lower than rates in the Netherlands and barely higher than those in France around the turn of the nineteenth century.¹⁶⁵ In fact, general levels of educational attainment in Britain, as measured by average years of schooling, declined from around 1.4 years in 1740 to 1.25 years in 1820.¹⁶⁶ Contrary to VoC theory’s expectations, Britain did not hold an advantage in general skills during this period.

The VoC explanation’s applicability to the IR-1 period is further limited by issues with designating Britain as the only LME in this period. Like Britain, the Netherlands functioned as a relatively open economy and exhibited tendencies toward economic liberalism, but it was not able to adapt and diffuse significant technological changes.¹⁶⁷ Though France is now considered a coordinated market economy, in the early nineteenth century it took on some of the characteristics of LMEs by implementing capital market reforms and trade liberalization.¹⁶⁸ The VoC approach therefore struggles to resolve why these two LMEs diverged so greatly in their adaptation to mechanization.

Case-Specific Factors

Among other factors specific to the IR-1 setting, one alternative explanation emphasizes Britain’s fortunate geographic circumstances. More specifically, classic works have argued that proximity to plentiful coalfields was essential to British industrialization.¹⁶⁹ These natural resource endowments enabled the expansion of coal-intensive industries, such as the iron industry. In this line of thinking, the fact that coal was cheaper in Britain than elsewhere in Europe explains why Britain was the first to sustain productivity leadership.¹⁷⁰

The relationship between coal and industrialization does not necessarily undermine the GPT mechanism. For one, in principle, Britain’s competitors could also have effectively leveraged coal resources. The southern provinces of the Netherlands were located close to Belgian coalfields.¹⁷¹ Over the course of the eighteenth century, Dutch industry had mostly shifted to coal, and away from peat stocks, as a key source of energy.¹⁷² Even if Britain’s industrial rivals had to pay more by importing coal, the expected productivity gains associated with adopting new technologies should have outweighed these costs. Moreover, GPT skill infrastructure could have mediated the relationship between coal and mechanization, as Britain’s edge in metalworking skills spurred the adoption of new coal-using technologies, which strengthened the connection between proximity to coal and economic growth.¹⁷³

Summary

In many ways, the industrial revolution marked an exceptional transformation. It is to any number of historical trends what the birth of Jesus is to the Gregorian calendar—an inflection point that separates “before” and “after.” For my purposes, however, the industrial revolution is a typical case showing how technological revolutions influence the rise and fall of great powers. Evidence from great powers’ different adaptations to technological changes in this period therefore helps test GPT diffusion theory against the LS mechanism.

In sum, GPT diffusion theory best explains why Britain led Europe’s industrial transformation in this period. Britain effectively capitalized on general-purpose improvements in mechanization owing to its institutional advantages that were conducive to widening the pool of mechanical skills and knowledge. According to GPT diffusion theory, countries like this disproportionately benefit from technological revolutions because they adapt more successfully to the GPT trajectories that transform productivity. In line with these expectations, Britain was more successful than its industrial rivals in sustaining long-term economic growth, which became the foundation of its unrivaled power in the early and mid-nineteenth century.

On the flip side, this chapter’s case analysis undercuts the LS-based explanation. The timeframe for when leading sectors were expected to stimulate Britain’s productivity growth did not align with when Britain’s industrialization took off. Britain’s economic ascent owed more to the widespread adoption of iron metalworking and linked production machinery than to monopoly profits from cotton textiles. The key institutional complements were not those that produced heroic inventions—Britain’s rivals held their own in these areas—but rather those that fostered widespread knowledge of applied mechanics.

Do these findings hold in other periods of technological and geopolitical upheaval? The IR-1 was one of the most extraordinary phases in history, but it was not the only era to attain the title of an “industrial revolution.” To further explore these dynamics, it is only appropriate to turn to the period some have labeled the Second Industrial Revolution.

4 The Second Industrial Revolution and America’s Ascent

IN THE LATE nineteenth and early twentieth centuries, the technological and geopolitical landscape transformed in ways familiar to observers of today’s environment. “AI is the new electricity” goes a common refrain that compares current advances in machine intelligence to electrical innovations 150 years ago. Those fundamental breakthroughs, alongside others in steel, chemicals, and machine tools, sparked the Second Industrial Revolution (IR-2), which unfolded from 1870 to 1914.¹ Studies of how present-day technological advances could change the balance of power draw on geopolitical competition for technological leadership in the IR-2 as a key reference point.²

Often overshadowed by its predecessor, the IR-2 is equally important for investigating causal patterns that connect technological revolutions and economic power transitions. The presence of both cause and outcome ensures a fruitful test of the GPT and LS mechanisms. The beginning of the period featured remarkable technological innovations, including the universal milling machine, the electric dynamo, the synthesis of indigo dye, and the internal combustion engine. According to some scholars, one would be hard-pressed to find another period with a higher density of important scientific advances.³ By the end of the period, Britain’s decline and the rise of Germany and the United States had yielded a new balance of economic power, which one historian describes as a “shift from monarchy to oligarchy, from a one-nation to a multi-nation industrial system.”⁴ Arguably, British industrial decline in the IR-2 was the ultimate cause of World War I.⁵

International relations scholars hold up the IR-2 as a classic case of a power transition caused by LS product cycles.⁶ According to this view, Britain’s rivals cornered market shares in the new, fast-growing industries arising from major technological innovations in electricity, chemicals, and steel.⁷ Specifically, scholars argue that Germany surpassed Britain in the IR-2 because it was “the first to introduce the most important innovations” in these key sectors.⁸ Analysis of emerging technologies and today’s rising powers follows a similar template when it compares China’s scientific and technological capabilities to Germany’s ability to develop major innovations in chemicals.⁹ Thus, as a most likely case for the LS mechanism, which is favored by background conditions and existing theoretical explanations, the IR-2 acts as a good test for the GPT mechanism.

Historical evidence from this period challenges this conventional narrative. No country monopolized innovation in leading sectors such as chemicals, electricity, steel, and motor vehicles. Productivity growth in the United States, which overtook Britain in productivity leadership during the IR-2, was not dominated by a few R&D-based sectors. Moreover, major breakthroughs in electricity and chemicals, prominent factors in LS accounts, required a protracted process of diffusion across many sectors before their impact was felt. This made them unlikely key drivers of the economic rise of the United States before 1914.

Instead, the IR-2 case evidence supports GPT diffusion theory. Spurred by inventions in machine tools, the industrial production of interchangeable parts, known as the “American system of manufacturing,” embodied the key GPT trajectory.¹⁰ The United States did not lead the world in producing the most advanced machinery; rather, it had an advantage over Britain in adapting machine tools across almost all branches of industry. Though the American system’s diffusion also required a long gestation period, the timing matches America’s industrial rise. Incubated by the growing specialization of machine tools in the mid-nineteenth century, the application of interchangeable parts across a broad range of manufacturing industries was the key driving force of America’s relative economic success in the IR-2.¹¹

Since a nation’s efficacy in adapting to technological revolutions is determined by how well its institutions complement the demands of emerging technologies, the GPT model of the IR-2 highlights institutional factors that differ from those featured in standard accounts. LS-based theories tend to highlight Germany’s institutional competencies in scientific education and industrial R&D.¹² In contrast, the case analysis points toward the American ability to develop a broad base of mechanical engineering skills and standardize best practices in mechanical engineering. Practice-oriented technical education at American land-grant colleges and technical institutes enabled the United States to take better advantage of interchangeable manufacturing methods than its rivals.

This chapter’s evidence comes from a variety of sources. In tracing the contours of technological trajectories and the economic power transition in this period, I relied on histories of technology, categorization schemes from the long-cycle literature, general accounts of economic historians, and revised versions of historical productivity measures. I investigated the fit between institutions and technology in leading economies using annual reports of the US Commissioner of Education, British diplomatic and consular reports, cross-national data on technological diffusion, German engineering periodicals, and firsthand accounts from inspection teams commissioned to study related issues.¹³ My analysis benefited from archival materials based at the Bodleian Library’s Marconi Archives (United Kingdom), the Library of Congress (United States), and the University of Leipzig and from records of the British Foreign Office.

This chapter proceeds as follows. I begin by chronicling the economic power transition that took place during the IR-2 to clarify that the United States, not Germany, ascended to industrial preeminence. I then identify the key technological breakthroughs, which I sort according to their ties to GPT and LS trajectories. Along the dimensions of impact timeframe, phase of relative advantage, and breadth of growth, this chapter demonstrates that the GPT trajectory aligns better with how the IR-2 enabled the economic rise of the United States. Next, I evaluate whether differences in GPT skill infrastructure can account for the American edge over Britain and Germany in interchangeable manufacturing methods. Toward the chapter’s end, I also tackle alternative explanations.¹⁴

A Power Transition: America’s Ascent

To begin, tracing when an economic power transition takes place is critical. In 1860, Britain was still at the apogee of its industrial power.¹⁵ Most historians agree that British industrial preeminence eroded in the late nineteenth century. By 1913, both the United States and Germany had emerged as formidable rivals to Britain with respect to the industrial and productive foundations of national power. According to Paul Kennedy’s influential account, before World War I Britain was “in third place,” and “in terms of industrial muscle, both the United States and imperial Germany had moved ahead.”¹⁶ Aided with more data than was available for the IR-1 case, I map the timeline of this economic power transition with various measures of industrial output and efficiency.

In the IR-2 case, clarifying who surpassed Britain in economic efficiency takes on added gravity. Whereas in the IR-1 Britain separated itself from the rest, both the United States and Germany challenged British industrial power in the IR-2. But studies of this case often neglect the rise of the United States. Preoccupied with debates over whether Germany’s overtaking of Britain sparked World War I, the power transition literature has directed most of its attention to the Anglo-German competition for economic leadership.¹⁷ Some LS-based accounts explain only Germany’s rise in this period without investigating America’s ascent.¹⁸

As the rest of this section will show, Germany and the United States both surpassed Britain on some measures of economic power, but the United States emerged as the clear productivity leader. Therefore, any explanation of the rise and fall of technological leadership in this period must be centered on the US experience. The following sections trace the contours of the IR-2’s economic power transition with a range of indicators for productivity leadership, including GDP per capita, industrialization, labor productivity, and total factor productivity.

GDP PER-CAPITA INDICATORS

Changes in total GDP over the course of the IR-2 provide a useful departure point for understanding changes in the balance of productive power. At the beginning of the period in 1871, Germany’s economy was around three-quarters the size of the British economy; by the end of the period, in 1913, Germany’s economy was approximately 14 percent larger than Britain’s. The growth trajectory of the American economy was even starker. Over the same time period, overall economic output in the United States increased from 1.2 times to around 3.4 times that of Britain’s total GDP.¹⁹ This trend is further confirmed by the growth rates of overall GDP for the three countries. In the period between 1870 and 1913, the US GDP grew roughly 5.3 times over, compared to 3.3 for Germany and 2.2 for the United Kingdom.²⁰

While gross economic size puts countries in contention for economic leadership, the most crucial outcome is sustained economic efficiency. Compared to total output, trend lines in real GDP per capita mark out a broadly similar picture of the IR-2 period, but they also differ in two significant respects (figure 4.1). First, whereas the United States was already the largest economy by total output in 1870, the United Kingdom maintained a slight lead in real GDP per capita over the United States in the 1870s. The United Kingdom’s average GDP per capita over the decade was about 15 percent higher than the US equivalent.²¹ US GDP per capita was roughly on par with Britain’s throughout the 1880s and 1890s, but the United States established a substantial lead starting around 1900.²²

FIGURE 4.1. Economic Power Transition during the IR-2. *Source*: Maddison Project Database, version 2020 (Bolt and van Zanden 2020).

Second, in contrast to trend lines in aggregate economic output, Germany did not surpass Britain in GDP per capita before World War I. Germany certainly closed the gap, as its GDP per capita increased from around 50 percent of British GDP per capita in 1870 to around 70 percent in the years before World War I. However, Germany never even came close to overtaking the United Kingdom in GDP per capita during this period.²³ This is an important distinction that justifies the focus on US technological success in the IR-2, since surpassing at the technological frontier is a different challenge than merely catching up to the technological frontier.

INDUSTRIALIZATION INDICATORS

Industrialization indicators back up the findings from the GDP per-capita data. The United States emerged as the preeminent industrial power, boasting an aggregate industrial output in 1913 that equaled 36 percent of the global total—a figure that exceeded the combined share of both Great Britain and Germany.²⁴ More importantly, the United States became the leading country in terms of industrial efficiency, with a per-capita industrialization level about 10 percent higher than Britain’s in 1913.²⁵

Once again, the emphasis on productivity over aggregate output reveals that the economic gap between Germany and Britain narrowed but did not disappear. In aggregate terms, Germany’s share of the world’s industrial production rose to 16 percent in 1913. This eclipsed Britain’s share, which declined from 32 percent of the world’s industrial production in 1870 to just 15 percent in 1913.²⁶ However, Germany did not overtake Britain in industrial efficiency. In 1913, its per-capita industrialization level was about 75 percent of Britain’s.²⁷ The magnitude of this gap was approximately the same as the gap between German per-capita GDP and British per-capita GDP.

PRODUCTIVITY INDICATORS

Lastly, I consider various productivity statistics. Stephen Broadberry’s work on the “productivity race” contains the most comprehensive and rigorous assessments of productivity levels in Britain, Germany, and the United States in this period.²⁸ Comparative statistics on labor productivity line up with findings from other indicators (figure 4.2). The United States surpassed Britain in aggregate labor productivity during the 1890s or 1900s, whereas Germany’s aggregate labor productivity increased relative to but did not fully overtake Britain’s over the IR-2 period.²⁹

FIGURE 4.2. Comparative Labor Productivity Levels in the IR-2. *Source*: Broadberry 2006, 110.

Another set of productivity indicators, Maddison’s well-known and oft-cited historical data on comparative GDP per hour worked, supports Broadberry’s comparative measures of labor productivity levels.³⁰ According to Maddison’s estimates of the average rate of productivity growth from 1870 to 1913, the American and German economies were both growing more productive relative to the British economy. The growth rate of America’s GDP per hour worked was 1.9 percent compared to 1.8 percent for the German rate and 1.2 percent for the UK rate.³¹

It should be noted that the United Kingdom may have retained a total factor productivity lead in this period. Based on 1909 figures, the last measurements available before World War I, the US aggregate TFP was a little over 90 percent of Britain’s. By 1919, US aggregate TFP was nearly 10 percent larger than Britain’s.³² The United States could have surpassed Britain in overall TFP before World War I, but the data do not clearly demonstrate this outcome. Still, the TFP data track well with the general trends found in other measures of economic efficiency, including a marked increase in US TFP in the 1890s and 1900s as well as a steady narrowing of the gap between UK and German TFP throughout the period. Issues related to the availability, reliability, and comparability of capital stock estimates during this period, however, caution against concluding too much from the TFP trends alone.³³

Albeit with some caveats, the general thrust of evidence confirms that the United States overtook Britain in productivity leadership around the turn of the twentieth century. In productive efficiency, Germany significantly narrowed the gap but did not surpass Britain. A clarified picture of the outcome also helps guide the assessment of the LS and GPT mechanisms. In contrast to work that focuses on Anglo-German rivalry in this period, I prioritize explaining why the United States became the preeminent economic power. Moreover, if GPT diffusion theory holds for this period, it should also explain why the United States was more successful than Germany in overtaking Britain in productivity during this period.

Key Technological Changes in the IR-2

Which technological changes could have sparked the economic power transition before World War I? The IR-2 was an age of dizzying technological breakthroughs, including but not limited to the electric dynamo (1871), the first internal combustion engine (1876), the Thomas process for steel manufacturing (1877), and the synthesis of indigo dye (1880).³⁴ Tracking down how every single technical advance could have affected the growth differentials among Britain, Germany, and the United States is an unmanageable task. I narrow the scope of analysis to the most likely sources of LS and GPT trajectories based on previous scholarship that calls attention to the significance of certain technological developments in the IR-2. Once confirmed to meet the established criteria for leading sectors and GPTs, these technological drivers serve as the fields of reference for assessing the validity of the GPT and LS mechanisms in this case.

Candidate Leading Sectors

I focus on the chemicals, electrical equipment, motor vehicles, and steel industries as the leading sectors of the IR-2. These choices are informed by scholars who study the implications of technological change during this period from a LS perspective. The first three sectors feature in the standard rendering of the IR-2 by prominent historical accounts, which centers major discoveries in chemistry and electricity as well as the invention of the internal combustion engine.³⁵ Among those who study the effect of technological revolutions on the balance of power, there is near-consensus that the chemicals and electrical industries were technologically advanced, fast-growing industries during this time.³⁶ Some scholars also identify the automobile industry as a key industry in this period.³⁷ Others reason, however, that automobiles did not emerge as a leading sector until a later period.³⁸

The automobile, chemicals, and electrical industries all experienced prodigious growth during the IR-2, meeting the primary qualification for leading sectors. According to statistics from the US census, the percentage increase in value added by manufacture in each of the chemicals, electrical, and automobile industries was much higher than the average across all industries from 1899 through 1909. In fact, the automobile and electrical equipment industries boasted the two highest rates of percentage growth in value added over this period among sectors with a market size over $100 million.³⁹

I also consider developments in steel as a possible source of leading-sector product cycles. It is hard to ignore the explosive growth of the steel industry in both Germany, where it multiplied over 100-fold from 1870 to 1913, and the United States, where it multiplied around 450 times over the same period.⁴⁰ In addition, many scholars list steel as one of the leading sectors that affected the economic power balance in the IR-2.⁴¹ Rostow identifies steel as part of “the classic sequence” of “great leading sectors.”⁴² In sum, I consider four candidate leading sectors in this period: the automobile, chemicals, electrical equipment, and steel industries.

Candidate GPTs

I analyze chemicalization, electrification, the internal combustion engine, and interchangeable manufacture as potential drivers of GPT-style transformations in the IR-2. Of these four, electricity is the prototypical GPT. It is “unanimously seen in the literature as a historical example of a GPT.”⁴³ Electricity is one of three technologies, alongside the steam engine and information and communications (ICT) technology, that feature in nearly every article that seeks to identify GPTs throughout history.⁴⁴ Electrical technologies possessed an enormous scope for improvement, fed into a variety of products and processes, and synergized with many other streams of technological development. Empirical efforts to identify GPTs with patent data provide further evidence of electricity as a GPT in this period.⁴⁵

Like advances in electricity, clusters of innovations in chemicals and the internal combustion engine not only spurred the rapid growth of new industries but also served as a potential source of GPT trajectories. Historians of technology pick out chemicalization, alongside electrification, as one of two central processes that transformed production routines in the early twentieth century.⁴⁶ Historical patent data confirm that chemical inventions could influence a wide variety of products and processes.⁴⁷

In line with GPT classification schemes by other scholars, I also evaluate the internal combustion engine as a candidate GPT, with the potential to replace the steam engine as a prime mover of many industrial processes.⁴⁸ After its introduction, many believed that the internal combustion engine would transform a range of manufacturing processes with smaller, divisible power units.⁴⁹

Lastly, I examine the advance of interchangeable manufacture, spurred by innovations in machine tools, as a candidate GPT in this period. Though the machine tool industry was neither new nor especially fast-growing, it did play a central role in extending the mechanization of machine-making first incubated in the IR-1. The diffusion of interchangeable manufacture, or the “American system,” owed much to advances in turret lathes, milling machines, and other machine tools that improved the precision of cutting and shaping metals. Rosenberg’s seminal study of “technological convergence” between the American machine tool industry and metal-using sectors highlighted how innovations in metalworking machines transformed production processes across a wide range of industries.⁵⁰ Following Rosenberg’s interpretation, historians recognize the nexus of machine tools and mechanization as one of the key technological trajectories during this period.⁵¹

Sources of LS and GPT Trajectories

I aimed to include as many candidate technological drivers as possible, provided that the technological developments credibly met the criteria of a leading sector or GPT.⁵² All candidate leading sectors and GPTs I study in this period were flagged in multiple articles or books that explicitly identified leading sectors or GPTs in the IR-2 period, which helped provide an initial filter for selection. This allows for a good test of the GPT diffusion mechanism against the LS product cycles mechanism.⁵³ This sorting process is an important initial step for evaluating the two mechanisms, though a deeper excavation of the historical evidence is required to determine whether the candidates actually made the cut.

table 4.1 Key Sources of Technological Trajectories in the IR-2

Candidate Leading Sectors	Candidate GPTs
Steel industry	Interchangeable manufacture
Electrical equipment industry	Electrification
Chemicals industry	Chemicalization
Automobile industry	Internal combustion engine

There is substantial overlap between the candidate GPTs and leading sectors in the IR-2, as reflected in table 4.1, but two key distinctions are worth emphasizing. First, one difference between the candidate GPTs and leading sectors is the inclusion of machine tools in the former category. The international relations scholarship on leading sectors overlooks the impact of machine tools in this period, possibly because the industry’s total output did not rank among the largest industries, and also because innovation in machine tools was relatively incremental.⁵⁴ One survey of technical development in machine tools from 1850 to 1914 described the landscape as “essentially a series of minor adaptations and improvements.”⁵⁵ Relatedly, the steel industry, commonly regarded as an LS, is not considered a candidate GPT. Under the GPT mechanism, innovations in steel are bound up in a GPT trajectory driven by advances in machine tools.

Second, even though some technological drivers, such as electricity, are considered both candidate leading sectors and candidate GPTs, there are different interpretations of how developments in these technological domains translated into an economic power transition. In the case of new electrical discoveries, control over market share and exports in the electrical equipment industry represents the LS trajectory, whereas the gradual spread of electrification across many industries stands in for the GPT trajectory. Two trajectories diverge in a yellow wood, and the case study evidence will show which one electricity traveled.⁵⁶

GPT vs. LS Trajectories in the IR-2

Equipped with a better grasp of the possible technological drivers in the IR-2, I follow the same procedures used in the previous chapter to assess the validity of the GPT and LS mechanisms.

OBSERVABLE IMPLICATIONS RELATED TO THE IMPACT TIMEFRAME

GPT diffusion and LS product cycles present two competing interpretations of the IR-2’s impact timeframe. The LS mechanism expects growth associated with radical technological breakthroughs to be explosive in the initial stages. Under this view, off the back of major breakthroughs such as the first practical electric dynamo (1871), the modern internal combustion engine (1876), and the successful synthesis of indigo dye (1880), new leading sectors took off in the 1870s and 1880s.⁵⁷ Then, according to the expected timeline of the LS mechanism, these new industries stimulated substantial growth in the early stages of their development, bringing about a pre–World War I upheaval in the industrial balance of power.⁵⁸

The GPT trajectory gives a different timeline for when productivity benefits from major technological breakthroughs were realized on an economy-wide scale. Before stimulating economy-wide growth, the candidate GPTs that emerged in the 1880s—tied to advances in electricity, chemicals, and the internal combustion engine—required many decades of complementary innovations in application sectors and human capital upgrading. These candidate GPTs should have contributed only modestly to the industrial rise of the United States before World War I, with impacts, if any, materializing toward the very end of the period.

Critically, one candidate GPT should have produced substantial economic effects during this period. Unlike other GPT trajectories, interchangeable manufacture had been incubated by earlier advances in machine tools, such as the turret lathe (1845) and the universal milling machine (1861).⁵⁹ Thus, by the late nineteenth century, interchangeable manufacturing methods should have diffused widely enough to make a significant impact on US industrial productivity.

OBSERVABLE IMPLICATIONS RELATED TO THE PHASE OF RELATIVE ADVANTAGE

When spelling out how the IR-2 produced an economic power transition, the two mechanisms also stress different phases of technological change. According to the LS mechanism, Britain’s industrial prominence waned because it lost its dominance of innovation in the IR-2’s new industries. The United States and Germany benefited from monopoly profits accrued from being lead innovators in electrical equipment, chemical production, automobiles, and steel. In particular, Germany’s industrial rise in this period draws a disproportionate share of attention. Many LS accounts attribute Germany’s rise to its dominance of innovations in the chemical industry, “the first science-based industry.”⁶⁰ Others emphasize that the American global lead in the share of fundamental innovations after 1850 paved the way for the United States to dominate new industries and become the leading economy in the IR-2.⁶¹

The GPT mechanism has different expectations regarding the key determinant of productivity differentials. Where innovations are adopted more effectively has greater significance than where they are first introduced. According to this perspective, Britain lost its industrial preeminence because the United States was more effective at intensively adopting the IR-2’s GPTs.

OBSERVABLE IMPLICATIONS RELATED TO BREADTH OF GROWTH

Finally, regarding the breadth of growth, the third dimension on which the two mechanisms diverge, the LS trajectory expects that a narrow set of modernized industries drove productivity differentials, whereas the GPT trajectory holds that a broad range of industries contributed to productivity differentials. The US growth pattern serves as the best testing ground for these diverging predictions, since the United States overtook Britain as the economic leader in this period.

Table 4.2 Testable Predictions for the IR-2 Case Analysis

Prediction 1: LS (impact timeframe)	The steel, electrical equipment, chemicals, and/or* automobile industries made a significant impact on the rise of the United States to productivity leadership before 1914.
Prediction 1: GPT	Electrification, chemicalization, and/or the internal combustion engine made a significant impact on the rise of the United States to productivity leadership only after 1914.The extension of interchangeable manufacture made a significant impact on the rise of the United States to productivity leadership before 1914.
Prediction 2: LS (phase of relative advantage)	Innovations in the steel, electrical equipment, chemicals, and/or automobile industries were concentrated in the United States.German and American advantages in the production and export of electrical equipment, chemical products, automobiles, and/or steel were crucial to their industrial superiority.
Prediction 2: GPT	Innovations in machine tools, electricity, chemicals, and/or the internal combustion engine were not concentrated in the United States.
	American advantages in the diffusion of interchangeable manufacture were crucial to its productivity leadership.
Hypothesis 3: LS (breadth of growth)	Productivity growth in the United States was limited to the steel, electrical, chemicals, and/or automotive industries.
Hypothesis 3: GPT	Productivity growth in the United States was spread across a broad range of industries linked to interchangeable manufacture.
*The operator “and/or” links all the candidate leading sectors and GPTs because it could be the case that only some of these technologies drove the trajectories of the period.

The two explanations hold different views about how technological disruptions produced an economic power transition, related to the impact timeframe of new advances, the phase of technological change that yields relative advantages, and the breadth of technology-fueled growth. Based on the differences between the LS and GPT mechanism across these dimensions, I derive three sets of diverging predictions for how technological changes contributed to relative shifts in economic productivity during this period. Table 4.2 collects these predictions, which structure the case analysis in the following sections.

Impact Timeframe: Gradual Gains vs. Immediate Effects from New Breakthroughs

The opening move in assessing the LS and GPT mechanisms is determining when the IR-2’s eye-catching technological advances actually made their mark on leading economies. Tracking the development timelines for all the candidate leading sectors and GPTs of the IR-2 produces two clear takeaways. First, innovations related to electricity, chemicals, and the internal combustion engine did not make a significant impact on US productivity leadership until after 1914. Second, advances in machine tools and steel—the remaining candidate GPT and leading sector, respectively—contributed substantially to US economic growth before World War I; thus, their impact timeframes fit better with when the United States overtook Britain as the preeminent economic power.

DELAYED TIMELINES: CHEMICALS, ELECTRICITY, AND THE INTERNAL COMBUSTION ENGINE

Developments in chemicals, electricity, and internal combustion provide evidence against the LS interpretation. If the LS mechanism was operational in the IR-2, advances in chemicals should have made a significant impact on US productivity leadership before World War I.⁶² Yet, in 1914, the United States was home to only seven dye-making firms.⁶³ Major US chemicals firms did not establish industrial research laboratories like those of their German counterparts until the first decade of the twentieth century.⁶⁴ Terry Reynolds, author of a history of the American Institute of Chemical Engineers, concludes, “Widespread use of chemists in American industrial research laboratories was largely a post–World War I phenomenon.”⁶⁵ Thus, it is very unlikely that chemical innovations made a meaningful difference to growth differentials between the United States and Britain before 1914.

At first glance, the growth of the German chemical industry aligns with the LS model’s expectations. Germany was the first to incorporate scientific research into chemical production, resulting in the synthesis of many artificial dyes before 1880.⁶⁶ Overtaking Britain in leadership of the chemical industry, Germany produced 140,000 tons of dyestuffs in 1913, more than 85 percent of the world total.⁶⁷

While Germany’s rapid growth trajectory in synthetic dyes was impressive, the greater economic impacts of chemical advances materialized after 1914 through a different pathway: “chemicalization,” or the spread of chemical processes across ceramics, food-processing, glass, metallurgy, petroleum refining, and many other industries.⁶⁸ Prior to key chemical engineering advances in the 1920s, industrial chemists devoted limited attention to unifying principles across the manufacture of different products. The rapid expansion of chemical-based industries in the twentieth century owed more to these later improvements in chemical engineering than earlier progress in synthetic dyes.⁶⁹ Ultimately, these delayed spillovers from chemicalization were substantial, as evidenced by higher growth rates in the German chemical industry during the interwar period than in the two decades before World War I.⁷⁰

Electrification’s impact timeframe with respect to US productivity growth mirrored that of chemicalization. Scholarly consensus attributes the US productivity upsurge after 1914 to the delayed impact of the electrification of manufacturing.⁷¹ From 1880 to 1930, power production and distribution systems gradually evolved from shaft and belt drive systems driven by a central steam engine or water wheel to electric unit drive, in which electric motors powered individual machines. Unit drive became the predominant method in the 1920s only after vigorous debates in technical associations over its relative merits, the emergence of large utilities that improved access to cheap electricity, and complementary innovations, like machine tools, that were compatible with electric motors.⁷²

Quantitative indicators also verify the long interval between key electrical advances and electrification’s productivity boost. Economic geographer Sergio Petralia has investigated the causal relationship between adoption of electrical and electronic (E&E) technologies, operationalized as E&E patenting activity in individual American counties and the per-capita growth of those counties over time. One of his main findings is that the effects of E&E technology adoption on growth are not significant prior to 1914.⁷³ This timeline is confirmed by a range of other metrics, including the energy efficiency of the American economy, electric motors’ share of horsepower in manufacturing, and estimates of electricity’s total contribution to economic growth.⁷⁴

The diffusion of internal combustion engines across application sectors was also slow. Despite its initial promise, the internal combustion engine never accounted for more than 5 percent of the generation of total horsepower in US manufacturing from 1869 to 1939.⁷⁵ In 1900, there were only eight thousand cars in the entire United States, and the U. motor vehicle industry did not overtake its French competitor as the world’s largest until 1904.⁷⁶ Furthermore, the turning point for the mass production of automobiles, Ford’s installation of a moving assembly line for making Model Ts, did not occur until 1913.⁷⁷

KEY TIMINGS: MACHINE TOOLS AND STEEL

When assigning credit to certain technologies for major upheavals in global affairs, awe of the new often overwhelms recognition of the old. Based on the previous analysis, it is unlikely that new breakthroughs in electricity, chemicals, and internal combustion fueled the economic power transition that transpired in this period. Instead, careful tracing reveals the persevering impact of earlier developments in machine tools.⁷⁸ During the IR-2, technical advances in machine tools were incremental, continuous improvements that helped disseminate transformative breakthroughs from the mid-nineteenth century, such as the turret lathe and the universal milling machine.⁷⁹

Profiles of key application sectors and quantitative indicators validate the GPT mechanism’s expected impact timeframe for machine tools. Marking 1880 as the date when “the proliferation of new machine tools in American industry had begun to reach torrential proportions,” Rosenberg outlines how three application sectors—sewing machines, bicycles, and automobiles—successively adopted improved metal-cutting techniques from 1880 to 1910.⁸⁰ As the American system took hold, the number of potential machine tool users multiplied 15-fold, from just 95,000 workers in 1850 to almost 1.5 million in 1910.⁸¹ Patenting data identify the last third of the nineteenth century as the period when extensive technological convergence characterized the machine tool industry and application sectors.⁸²

FIGURE 4.3. Technological Impact Timeframes in the IR-2. *Note*: US chemical production, horsepower from electric central stations, and machine intensity over time. *Source*: Murmann 2003; US Census Bureau 1975.

Figure 4.3 depicts the diverging impact timeframes of interchangeable manufacturing methods, electrification, and chemicalization. Machine intensity substantially increased from 1890 to 1910, as measured by horsepower installed per persons employed in manufacturing. By contrast, the United States did not experience significant increases in electrical and chemical production until after 1910.

Of all the candidate leading sectors, the steel industry best fits the expectations of the LS mechanism regarding when industries transformed by radical innovations stimulated growth in the rising powers. Just as the 1780s were a period when the technological conditions for cotton production were transformed, the mid-nineteenth century featured major breakthroughs in the steel industry that allowed for the mass production of steel, such as the Siemens-Martin open-hearth furnace (1867) and Bessemer converter (1856).⁸³ Over the course of the IR-2 period, the United States and Germany quickly exploited these breakthroughs in steelmaking to massively boost steel production.

The overtaking of Britain by both Germany and the United States in total steel production by the early 1890s matches the timeline of Britain’s overall economic decline.⁸⁴ Paul Kennedy cites Germany’s booming steel output as a key factor driving its industrial rise; by 1914, German steel output was larger than that of Britain, France, and Russia combined.⁸⁵ Likewise, US steel output grew from one-fifth of British production in 1871 to almost five times more than British steel output in 1912.⁸⁶ Given these impressive figures, the next section investigates the American and German advantages in steel production in further detail.

Phase of Relative Advantage: The American System’s Diffusion

The second dimension on which the GPT and LS trajectories differ relates to the phase of technological change that accounted for the relative success of the United States in the IR-2. Cross-country historical evidence on the IR-2’s technological drivers illustrates that the United States had true comparative advantages over other advanced economies that were rooted in its absorption and diffusion capabilities.

INNOVATION CLUSTERING IN STEEL, ELECTRICITY, CHEMICALS, AND/OR MOTOR VEHICLES?

In electricity, industrial powers fiercely contested innovation leadership as the United States, Germany, Great Britain, and France all built their first central power stations, electric trams, and alternating current power systems within a span of nine years.⁸⁷ However, the United States clearly led in diffusing these systems: US electricity production per capita more than doubled that of Germany, the next closest competitor, in 1912. Along this metric of electrification, Britain’s level was just 20 percent of the US figure.⁸⁸

To be clear, Britain fell behind in adopting electrification, even though it introduced some of the most significant electrical innovations.⁸⁹ In 1884, for example, British inventor Charles Parsons demonstrated the first steam turbine for practical use, an essential step for commercializing electric power, but this technology was more rapidly and widely adopted in other countries.⁹⁰ The British Institution of Electrical Engineers aptly captured this phenomenon in an 1892 resolution: “Notwithstanding that our countrymen have been among the first in inventive genius in electrical science, its development in the United Kingdom is in a backward condition, as compared with other countries, in respect of practical application to the industrial and social requirements of the nation.”⁹¹

In chemicals, the achievements of both the US and German chemical industries suggest that no single country monopolized innovation in this sector. Germany’s synthetic dye industry excelled not because it generated the initial breakthroughs in aniline-violet dye processes—in fact, those were first pioneered in Britain—but because it had perfected these processes for profitable exploitation.⁹² Similar dynamics characterized the US chemical industry.⁹³

In most cases, the United States was not the first to introduce major innovations in leading sectors. Many countries introduced major innovations in chemicals, electricity, motor vehicles, and steel during this period (table 4.3).⁹⁴ Across the four candidate leading sectors, American firms pioneered less than 30 percent of the innovations. Contrary to the propositions of the LS mechanism, innovations in steel, electricity, chemicals, and motor vehicles were spread across the leading economies.

table 4.3 Geographic Distribution of Major Innovations in Leading Sectors, 1850–1914

	Chemicals	Electricity	Motor Vehicles	Steel
France	2	1	1	1
Germany	3	3	3	0
Great Britain	1	3	1	1
United States	2	3	1	0
Various other countries	0	1	0	2
Sole US share	25%	27%	17%	0%
Source: Van Duijn 1983, 176–79 (compilation of 160 innovations introduced during the nineteenth and twentieth centuries).

Moreover, the limited role of electrical and chemical exports in spurring American growth casts further doubt on the significance of monopoly profits from being the first to introduce new advances.⁹⁵ The British share of global chemical exports almost doubled the US share in 1913.⁹⁶ Overall, the United States derived only 8 percent of its national income from foreign trade in 1913, whereas the corresponding proportion for Britain was 26 percent.⁹⁷ Even though the United States was the quickest to electrify its economy, Germany captured around half of the world’s exports in electrical products.⁹⁸

If monopoly profits from innovation clustering in any leading sector propelled the industrial rise of the United States and Germany, it would be the steel industry. Both nations made remarkable gains in total steel output over this period, and scholars commonly employ crude steel production as a key indicator of British decline and the shifting balance of industrial power in the decades before World War I.⁹⁹ Thus, having established the delayed impact of the electrical, chemical, and automobile industries in this period, the steel industry takes on an especially large burden for the LS mechanism’s explanatory power in this period.

Yet Britain capitalized on many major innovations in steelmaking, including the Talbot furnace, which became essential to producing open-hearth steel.¹⁰⁰ Moreover, trade patterns reveal that Britain still held a comparative advantage in the export of steel between 1899 and 1913.¹⁰¹ How to square this with Germany’s dominance in total steel output?

The prevailing wisdom takes total steel output figures to stand for superior American and German technological know-how and productivity.¹⁰² In truth, new steelmaking processes created two separate steel industries. Britain shifted toward producing open-hearth steel, which was higher in quality and price. According to the British Iron Trade Association, Britain produced about four times more open-hearth steel than Germany in 1890.¹⁰³ On the other hand, Germany produced cheap Thomas steel and exported a large amount at dumping prices. In fact, some of Germany’s steel exports went to Britain, where they were processed into higher-quality steel and re-exported.¹⁰⁴ In sum, this evidence questions what one scholar deems “the myth of the technological superiority and outstanding productivity of the German steel industry before and after the First World War.”¹⁰⁵

AMERICAN MACHINE TOOLS—GPT DIFFUSION ADVANTAGE

Though new industries like electricity and chemicals hog much of the spotlight, developments in machine tools underpin the most important channel between differential rates of technology adoption and the IR-2’s economic power transition. After noting the importance of the electrical and chemical industries during the period, British historian Eric Hobsbawm elevates the importance of machine tools: “Yet nowhere did foreign countries—and again chiefly the USA—leap ahead more decisively than in this field.”¹⁰⁶

In line with the expectations of GPT diffusion theory, comparative estimates confirm a substantial US lead in mechanization in the early twentieth century. In 1907, machine intensity in the United States was more than two times higher than rates in Britain and Germany.¹⁰⁷ In 1930, the earliest year for which data on installed machine tools per employee are available, Germany lagged behind the United States in installed machine tools per employee across manufacturing industries by 10 percent, with a significantly wider gap in the tools most crucial for mass production.¹⁰⁸

This disparity in mechanization was not rooted in the exclusive access of the United States to special innovations in machine tools. In terms of quality, British machine tools were superior to their American counterparts throughout the IR-2 period.¹⁰⁹ German firms also had advantages in certain fields like sophisticated power technology.¹¹⁰ Rather, the distinguishing feature of the US machine tool industry was excellence in diffusing innovations across industries.¹¹¹ Reports by British and German study trips to the United States provide some of the most detailed, reliable accounts of transatlantic differences in manufacturing methods. German observers traveled to the United States to learn from their American competitors and eventually imitate American interchangeable manufacturing methods.¹¹² British inspection teams reported that the US competitive edge came from the “adaptation of special apparatus to a single operation in almost all branches of industry”¹¹³ and “the eagerness with which they call in the aid of machinery in almost every department of industry.”¹¹⁴

Fittingly, one of the most colorful denunciations of American innovation capacity simultaneously underscored its strong diffusion capacity. In an 1883 address to the American Association for the Advancement of Science, Henry Rowland, the association’s vice president, denigrated the state of American science for its skew toward the commercialization of new advances. Rowland expressed his disgust with media representations that upheld the “obscure American who steals the ideas of some great mind of the past, and enriches himself by the application of the same to domestic uses” over “the great originator of the idea, who might have worked out hundreds of such applications, had his mind possessed the necessary element of vulgarity.”¹¹⁵ Yet, it was America’s diffusion capacity—in all its obscurity and vulgarity—that sustained its growth to economic preeminence.

Breadth of Growth: The Wide Reach of Interchangeable Manufacture

What were the sources of American productivity growth in the IR-2? The pattern of American economic growth is most pertinent to investigate because the United States overtook Britain in productivity leadership during the IR-2. Regarding the breadth of economic growth, the LS trajectory expects that American productivity growth was concentrated in a narrow set of modernized industries, whereas the GPT trajectory holds that American productivity growth was dispersed across a broad range of industries. Sector-level estimates of total factor productivity (TFP) growth provide useful evidence to assess these diverging propositions.

WIDESPREAD PRODUCTIVITY GROWTH

The historical data support GPT diffusion theory’s expectation of pervasive US productivity growth. John Kendrick’s detailed study of US productivity growth in this period depicts a relatively balanced distribution. Among the industries studied, nearly 60 percent averaged between 1 and 3 percent increases in output per labor-hour from 1899 to 1909.¹¹⁶ Broad swathes of the US economy, outside of the leading sectors, experienced technological change. For instance, the service sector, which included segments of the construction, transport, wholesale, and retail trade industries, played a key role in the US capacity to narrow the gap with Britain in productivity performance.¹¹⁷

R&D-centric sectors were not the primary engines of US growth. In a recent update to Kendrick’s estimates, a group of researchers estimated how much of US productivity growth was driven by “great inventions sectors,” a designation that roughly corresponds to this chapter’s candidate leading sectors.¹¹⁸ They found that these sectors accounted for only 29 percent of U.S. TFP growth from 1899–1909.¹¹⁹ Despite employing 40 percent of all research scientists in 1920, the chemical industry was responsible for only 7 percent of US TFP growth throughout the following decade.¹²⁰

MACHINE TOOLS AND BROADLY DISTRIBUTED PRODUCTIVITY GROWTH

Broad-based productivity growth in the US economy does not necessarily mean that a GPT was at work. Macroeconomic factors or the accumulation of various, unconnected sources of TFP growth could produce this outcome. Therefore, if the GPT trajectory captures the breadth of growth in the IR-2, then the historical evidence should connect broadly distributed productivity growth in the United States to developments in machine tools.

The extension of the American system boosted productivity in a wide range of sectors. Applications of this system of special tools reshaped the processes of making firearms, furniture, sewing machines, bicycles, automobiles, cigarettes, clocks, boots and shoes, scientific instruments, typewriters, agricultural implements, locomotives, and naval ordnance.¹²¹ Its influence covered “almost every branch of industry where articles have to be repeated.”¹²² Per a 1930 inventory of American machine tools, the earliest complete survey, nearly 1.4 million metalworking machines were used across twenty industrial sectors.¹²³ In his seminal study of American productivity growth during this period, Kendrick identifies progress in “certain types of new products developed by the machinery and other producer industries [that] have broad applications across industry lines” as a key source of the “broad, pervasive forces that promote efficiency throughout the economy.”¹²⁴

The breadth of productivity spillovers from machine tools was not boundless. Machine-using industries constituted a minority of the manufacturing industries, which themselves accounted for less than one-quarter of national income.¹²⁵ However, users of new machine tools extended beyond just manufacturing industries. Technologically intensive services, such as railroads and steam transportation, also benefited significantly from improved metalworking techniques.¹²⁶ In agriculture, specialized machine tools helped advance the introduction of farm machinery such as the reaper, which revolutionized agricultural productivity.¹²⁷

In describing how machine tools served as a transmission center in the US economy, Rosenberg describes the industry as a pool of skills and technical knowledge that replenishes the economy’s machine-using sectors—that is, an innovation that addresses one industry’s problem gets added to the pool and becomes available, with a few modifications, for all technologically related industries.¹²⁸ As sales records from leading machine tool firms show, many application sectors purchased the same type of machine. In 1867, Brown and Sharpe Manufacturing Company sold the universal milling machine, just five years after its invention, not only to machinery firms that made tools for a diverse range of industries but also to twenty-seven other firms that produced everything from ammunition to jewelry.¹²⁹ In this way, the machine tool industry functioned, in Rosenberg’s words, as “a center for the acquisition and diffusion of new skills and techniques in a machinofacture type of economy.”¹³⁰

Indeed, advances in machine tools had economy-wide effects. The social savings method estimates how much a new technology contributed to economic growth, compared to a counterfactual situation in which the technology had not been invented.¹³¹ Referencing this method to differentiate between the impacts of new technologies in this period, economic historian Joel Mokyr puts forward the American system of manufacturing as the most important:

From a purely economic point of view, it could be argued that the most important invention was not another chemical dye, a better engine, or even electricity.… There is one innovation, however, for which “social savings” calculations from the vantage point of the twentieth century are certain to yield large gains. The so-called American System of manufacturing assembled complex products from mass-produced individual components. Modern manufacturing would be unthinkable without interchangeable parts.¹³²

Institutional Complementarities: GPT Skill Infrastructure in the IR-2

With confirmation that the pattern of technological change in the IR-2 is better characterized by the GPT trajectory, the natural next step is to probe variation among leading economies in adapting to this trajectory. Why was the United States more successful than Britain and Germany in adapting to the demands of interchangeable manufacture? According to GPT diffusion theory, the historical evidence should reveal that the US edge was based on education and training systems that broadened and systematized mechanical engineering skills. These institutional adaptations would have resolved two key bottlenecks in the spread of interchangeable manufacture: a shortage of mechanical engineering talent and ineffective coordination between machine tool producers and users.

Widening the Base of Mechanical Engineers

Which institutions for skill formation were most central to the ability of the United States to take advantage of new advances in machine tools? Established accounts of economic rivalry among great powers in the IR-2 focus on skills linked to major innovations in new, science-based industries. Emphasizing Germany’s advantage in training scientific researchers, these studies attribute Germany’s technological success in this period to its investments in R&D facilities and advanced scientific and technical education.¹³³ Such conclusions echo early-twentieth-century British accounts of Germany’s growing commercial prowess, which lauded German higher education for awarding doctorates in engineering and its qualitative superiority in scientific research.¹³⁴

American leadership in the adoption of interchangeable manufacturing methods was beholden to a different set of institutions for skill formation. Progress in this domain did not depend on new scientific frontiers and industrial research laboratories.¹³⁵ In fact, the United States trailed both Britain and Germany in scientific achievements and talent.¹³⁶ Widespread mechanization in the United States rested instead on a broad base of mechanical engineering skills.

Alongside the development of more automatic and precise machine tools throughout the nineteenth century, this new trajectory of mechanization demanded more of machinists and mechanical engineers. Before 1870, US firms relied on informal apprenticeships at small workshops for training people who would design and use machine tools.¹³⁷ At the same time, engineering education at independent technical schools and traditional colleges and universities did not prioritize mechanical engineers but were mostly oriented toward civil engineering.¹³⁸ Yet craft-era methods and skills were no longer sufficient to handle advances that enhanced the sophistication of machine tools.¹³⁹ Thus, in the mid-eighteenth century, the US potential for mechanization was significantly constrained by the need for more formal technical instruction in mechanical engineering.

Over the next few decades, advances on three main fronts met this need for a wider pool of mechanical engineering expertise: land-grant schools, technical institutes, and standardization efforts. In 1862, the US Congress passed the first Morrill Land-Grant Act, which financed the creation of land-grant colleges dedicated to the agricultural and mechanical arts. Although some of these schools offered low-quality instruction and initially restricted their mission to agricultural concerns, the land-grant funds also supported many important engineering schools, such as the Massachusetts Institute of Technology (MIT) and Cornell University.¹⁴⁰ The number of US engineering schools multiplied from six in 1862, when the Morrill Act was passed, to 126 in 1917.¹⁴¹ These schools were especially significant in widening the base of professional mechanical engineers. In 1900, out of all students pursuing mechanical engineering at US higher education institutions, 88 percent were enrolled in land-grant colleges.¹⁴²

The establishment of technical institutes also served demands for mechanical engineering training. Pure technical schools like the Worcester Polytechnic Institute, founded in 1868, and the Stevens Institute of Technology, founded in 1870, developed mechanical engineering curricula that would become templates for engineering programs at universities and colleges.¹⁴³ Embedded with local and regional businesses, technical institutes developed laboratory exercises that familiarized students with real-world techniques and equipment. In this respect, these institutes and land-grant colleges “shared a common belief in the need to deliver a practice-oriented technical education.”¹⁴⁴

Another significant development in the spread of mechanical engineering knowledge was the emergence of professional engineering societies that created industrial standards. The most prominent of these were the American Society of Mechanical Engineers (ASME), founded in 1880, the American Section of the International Association for Testing Materials, set up in 1898, and the Franklin Institute, which became America’s leading technical society around the start of the IR-2.¹⁴⁵ As these associations coordinated to share best practices in mechanical engineering, they improved knowledge flows between the machine tool industry and application sectors.¹⁴⁶ Standardization in various machine processes and components, such as screw threads, helped spread mechanization across disparate markets and communities.¹⁴⁷

It should be emphasized that these efforts were effective in producing the skills and knowledge necessary for advancing mechanization because they broadened the field of mechanical engineering. Mechanical engineering instruction at land-grant schools and technical institutes and through professional associations allowed for more students to become “average engineers,” as opposed to “the perpetuation of a self-recognized elite.”¹⁴⁸ Recent research finds that this diffused engineering capacity produced enduring benefits for American industrialization. By collecting granular data on engineering density for the United States at the county level, William Maloney and Felipe Caicedo capture the engineering talent spread across various US counties in 1880 and parse the effect of engineering capacity on industrial outcomes decades later. They find that there is a statistically significant, positive relationship between the level of engineering density in 1880 and the level of industrialization decades later.¹⁴⁹

The Comparative Advantage of the United States over Britain and Germany in GPT Skill Infrastructure

Both Britain and Germany fell short of the US standard in GPT skill infrastructure. For Britain, the key gap was in the supply of mechanical engineering talent. British educational institutions and professional bodies fiercely guarded the apprenticeship tradition for training mechanical engineers.¹⁵⁰ For instance, the University of Oxford did not establish an engineering professorship until 1908.¹⁵¹ Meanwhile, American engineers systematically experimented with machine redesigns, benefiting from their training at universities and technical institutes.

These diverging approaches resulted in stark differences in skill formation. In 1901, probably around 2,600 students were enrolled in full-time higher technical education in the United Kingdom.¹⁵² Limiting this population to those in their third or fourth year of full-time study—an important condition because many UK programs, unlike German and American institutions, did not progress beyond two years of study—leaves only about 400 students.¹⁵³ By comparison, in mechanical engineering programs alone, the United States in 1900 had 4,459 students enrolled in higher technical education.¹⁵⁴ Controlling for population differences, the United States substantially outpaced Britain in engineering density, as measured by the number of university-educated engineers per 100,000 male laborers.¹⁵⁵

Germany developed a more practical and accessible form of higher technical education than Britain. From 1870 to 1900, enrollments in the German technische Hochschulen increased nearly fourfold, from 13,674 to 32,834 students.¹⁵⁶ Alongside the technische Mittelschulen (technical intermediate schools comparable to American industrial trade schools and lower-level engineering colleges), the technische Hochschulen cultivated a broad base of mechanical engineers.¹⁵⁷ Germany’s system of technical education attracted admirers from around the world. Some went there to study in the schools, and others went to study how the school system worked, with the aim of borrowing elements of the German model.¹⁵⁸

Germany’s problems were with weak linkages between mechanical engineering education and industrial applications. Key German standards bodies and technical colleges prioritized scientific and theoretical education at the expense of practical skills—a trend “most pronounced in mechanical engineering.”¹⁵⁹ According to an expert on German standard-setting in this period, “no national standards movement was inaugurated in [the machine industry] until after the outbreak of [World War I].”¹⁶⁰ German experts on engineering education, intent on reforming technical instruction to get engineers more experience with factor organization and project management in the field, recommended, for example, that practical training courses be offered in partnerships with engineering associations.¹⁶¹ Articles in the Zeitschrift des Vereines Deutscher Ingenieure (Journal of the Association of German Engineers) lamented that the technische Hochschulen and technical universities were not equipping students with practical skills to operate in and manage factories and workshops.¹⁶² These issues slowed Germany’s incorporation of interchangeable parts and advanced machine tools.

A report by Professor Alois Riedler of the Technical University of Berlin, who was commissioned by the Prussian Ministry of Education to tour American engineering schools in the 1890s, illustrates the differences in engineering education between the United States and Germany. According to Riedler, extensive practical training and experience with shop and laboratory applications were distinctive features of an American engineering education. To substantiate differences in practical instruction between engineering departments in the two countries, Riedler analyzed the time allocated to theoretical and practical instruction across four-year courses of study.¹⁶³ Compared to their German peers, American students spent far more time on exercises in mechanical technical laboratories and other types of practical training (figure 4.4). In the Technische Universität Berlin (Technical University Berlin), practical exercises in the laboratory and shop accounted for less than 6 percent of total instruction time over a four-year course of study.¹⁶⁴ In contrast, engineering students at Cornell University spent more than one-third of their course engaged in laboratory study and shopwork. As a consequence of reports by Riedler and others, German institutions began establishing laboratories for mechanical engineering around 1900.¹⁶⁵

FIGURE 4.4. Comparison of Curricula at German and American Engineering Schools (1893). *Source*: US Bureau of Education (BOE) 1895, 684–86. *Note*: In this BOE report, the German schools are labeled Technological University in Austria, Technological University in Prussia, and Technological University in South Germany. A reasonable assumption, informed by background research, is that these refer to Technical University Wien, Technical University Berlin, and Technical University of Munich, respectively. Though TU Wien is in Austria, it is used to illustrate trends in German engineering education because many German schools saw it as an influential model.

It should be made clear that, in the United States, institutional adaptations to new opportunities presented by interchangeable manufacture were not rooted in cultivating highly skilled scientific talent. The best and brightest American scientists furthered their education at European universities.¹⁶⁶ Even proponents of American engineering education concluded that “strictly scientific and intellectual education in American technological schools” did not even match “the average of a secondary industrial school” in Germany.¹⁶⁷ According to one study conducted by the National Association of German-American Technologists, an organization that regularly circulated ideas between the two countries, German technical institutes held an edge over their US peers in research on different technologies in mechanical engineering.¹⁶⁸

The deeper roots of US institutions’ greater effectiveness at adapting to the skill formation needs of interchangeable manufacture cannot be fully explored here. The legacy of the Morrill Act certainly looms large, as do the contributions of a diverse set of institutional adaptations unrelated to that groundbreaking federal policy, including independent centers like the Franklin Institute, technical high schools, professional associations, and specialized engineering programs initiated at preexisting universities.¹⁶⁹ Other potential sources for the US advantage in GPT skill infrastructure include its openness to foreign technicians and the unique challenges and culture of the American frontier.¹⁷⁰

LS-Based Theories and Chemical Engineering

Analyzing the education and training systems for chemical advances provides a secondary test to determine which institutions are most apt to bring national success in technological revolutions.¹⁷¹ LS accounts typically point to Germany’s innovation capacity as the key determinant of its competitiveness in chemicals, especially in the key segment of synthetic dye production.¹⁷² To extend this lead in synthetic dyes, Germany profited from leading industrial research labs and scientific education institutions, which employed the world’s top academic chemists and produced about two-thirds of the world’s chemical research.¹⁷³

By comparison, the US capacity to innovate in chemicals was weak. From 1901 to 1930, only one American researcher received a Nobel Prize in Chemistry, while German and British researchers captured almost three-fourths of the Nobel Prizes in Chemistry in that span.¹⁷⁴ In 1899, German publications accounted for half of all citations in American chemical journals, essentially double the share credited to American publications.¹⁷⁵ At the same time, American scholarship barely registered in Europe-based chemistry journals, where the best research was published. According to one analysis of references in Annual Reports on the Progress of Chemistry, an authoritative British review journal, American publications accounted for only 7 percent of the citations in 1904.¹⁷⁶

As was the case with machine tools, effective adaptation to new chemical technologies in the United States rested on a different set of institutional competencies. Despite trailing Germany in chemical breakthroughs and top chemists, the United States pioneered a chemical engineering discipline that facilitated the gradual chemicalization of many industries. A crucial step in this process was the emergence of unit operations, which broke down chemical processes into a sequence of basic operations (for example, condensing, crystallizing, and electrolyzing) that were useful to many industries, including ceramics, food processing, glass, metallurgy, and petroleum refining.¹⁷⁷ American institutions of higher education, most notably MIT, quickly adopted the unit operations model and helped cultivate a common language and professional community of chemical engineering.¹⁷⁸ As Rosenberg and Steinmueller conclude, “American leadership in introducing a new engineering discipline into the university curriculum, even at a time when the country was far from the frontier of scientific research, was nowhere more conspicuous than in the discipline of chemical engineering early in the 20^th century.”¹⁷⁹

In contrast, Germany was slow to develop the infrastructure for supporting chemical engineers. Up through the interwar period, the chemical engineering profession “failed to coalesce in Germany.”¹⁸⁰ Chemical engineering did not become a distinct academic subject area in Germany until after the Second World War.¹⁸¹ Because German universities did not equip chemists with engineering skills, the burden of training chemists was shifted to firms.¹⁸² Additionally, the German chemical industry maintained a strict division of labor between chemists and mechanical engineers. The lack of skill systematization resulted in more secrecy, less interfirm communication, and a failure to exploit externalities from common chemical processes.¹⁸³

The United States reaped the spoils of technological convergence in chemicalization not just because it trained large numbers of chemical engineers but also because it strengthened university-industry linkages and standardized techniques in chemical engineering.¹⁸⁴ Without the connective tissue that promotes information flows between the chemical sector and application sectors, a large base of chemical engineers was insufficient. Britain, for instance, was relatively successful at training chemical engineers during the interwar period; however, the weak links between British educational institutions and industrial actors limited the dissemination of technical knowledge, and the concept of unit operations did not take hold in Britain to the degree that it did in America.¹⁸⁵ Additionally, professional engineering associations in the United States, including the American Institute of Chemical Engineers, advanced standardization in the chemical industry—an initiative not imitated in Britain until a decade later.¹⁸⁶ Unlike their American peers, it was not until after World War II that British chemical engineers saw themselves as “members of a professional group that shared a broad commonality cutting across the boundary lines of a large number of industries.”¹⁸⁷

Since substantial, economy-wide benefits from these chemical breakthroughs did not materialize until after the end of the IR-2 period, it is important to not overstate these points. Nonetheless, tracing which country best exploited chemical innovations through the interwar period can supplement the analysis of institutional complementarities for machine tools.¹⁸⁸ Evidence from the coevolution of chemical technologies and skill formation institutions further illustrates how institutional adaptations suited to GPT trajectories differed from those suited to LS trajectories.

Alternative Factors

Like its predecessor, the IR-2 has been the subject of countless studies. Scholars have thoroughly investigated the decline of Britain and the rise of the United States and Germany, offering explanations ranging from immigration patterns and cultural and generational factors to natural resource endowments and labor relations.¹⁸⁹ My aim is not to sort through all possible causes of British decline. Rather, I am tracing the mechanisms behind an established connection between the IR-2’s technological breakthroughs and an economic power transition. Thus, the contextual factors most likely to confound the GPT diffusion explanation are those that provide an alternative explanation of how significant technological changes translated into the United States supplanting Britain in economic leadership. Aside from the LS mechanism, which has been examined in detail, the influence of international security threats and varieties of capitalism deserve further examination.

Threat-Based Explanations

How did external threats influence technological leadership in the IR-2? Scholars have argued that US military investment, mobilized against the threat of a major war, was crucial to the development of many GPTs.¹⁹⁰ US national armories’ subsidization of the production of small arms with interchangeable parts in the early nineteenth century was crucial, some studies argue, to the diffusion of the American system to other industries in the second half of the century.¹⁹¹

Though firearms production provided an important experimental ground for mechanized production, military support was not necessary to the development of the American system of manufacturing. Questioning the necessity of government funding and subsidies for the spread of the American system, one study credits the development of interchangeable manufacture to four civilian industries: clock manufacturing, ax manufacturing, typewriter manufacturing, and watch manufacturing.¹⁹² In particular, the clock industry played a crucial role in diffusing mechanized production practices. More attuned to the dynamics of the civilian economy than the small arms manufacturers, clockmakers demonstrated that interchangeable manufacture could drastically increase sales and cut costs.¹⁹³ In his definitive study of the history of American interchangeable parts manufacture, David Hounshell concludes that “the sewing machine and other industries of the second half of the 19^th century that borrowed small arms production techniques owed more to the clock industry than to firearms.”¹⁹⁴

Military investment and government contracting did not provide long-term sources of demand for interchangeable manufacturing methods.¹⁹⁵ Over the course of the IR-2, the small arms industry’s contribution to American manufacturing declined, totaling less than 0.3 percent of value added in American industry from 1850 to 1940.¹⁹⁶ Thus, arguments centered on military investment neglect that the spread of the American system, not its initial incubation, is the focal point for understanding how the IR-2 catalyzed an economic power transition.¹⁹⁷

Another threat-based argument posits that countries that face more external threats than internal rivalries will achieve more technological success.¹⁹⁸ In the IR-2 case, however, the United States was relatively isolated from external conflicts, while the United Kingdom and Germany faced many more threats (including each other).¹⁹⁹ Moreover, the United States was threatened more by internal rivalries than by external enemies at the beginning of the IR-2, as it had just experienced a civil war.²⁰⁰ This argument therefore provides limited leverage in the IR-2 case.

VoC Explanations

What about the connection between America’s particular type of capitalism and its economic rise? Rooted in the varieties of capitalism (VoC) tradition, one alternative explanation posits that the United States was especially suited to embrace the radical innovations of the IR-2 because it exhibited the characteristics of a liberal market economy (LME). GPT diffusion theory and VoC-based explanations clash most directly on points about skill formation. The latter’s expectation that LMEs like the United States should excel at cultivating general skills could account for US leadership in GPT diffusion during this period.²⁰¹

The empirical evidence casts doubt on this explanation. In the early twentieth century, the leading nations had fairly similar levels of enrollment rates in elementary and post-elementary education. In 1910, enrollment rates for children ages five to nineteen in the United States was about 12 percent lower than Britain’s rate and only 3 percent higher than Germany’s.²⁰² Years of education per worker increased by essentially the same proportion in both Britain (by a factor of 2.2) and the United States (by a factor of 2.3) between 1870 and 1929.²⁰³ In terms of higher education expenditures per capita, the two countries were essentially tied.²⁰⁴ Differences in the formation of general skills cannot account for the technological leadership of the United States in this period.²⁰⁵

Moreover, the degree to which the United States fully embraced the characteristics of LMEs is disputed. In the view of studies that position the United States as a model for managerial capitalism in this period, US industrial governance structures enabled the rise of giant managerialist firms.²⁰⁶ This approach primarily sees America’s rise to industrial preeminence through the most visible actors in the American system of political economy: oligopolies in the automobile, steel, and electrical industries. There was significant diversity, however, in firm structure. Though many giant corporations did grow to take advantage of economies of scale and capital requirements in some mass-produced goods (such as automobiles), networks of medium-sized firms still dominated important segments of these new industries, such as the production of electric motors. One-third of the fifty largest manufacturing plants in the United States made custom and specialty goods.²⁰⁷ From 1899 to 1909, sectors that relied on batch and custom production, including machine tools, accounted for one-third of value added in manufacturing.²⁰⁸ No specific brand of capitalism fulfilled the demands of production across all domains.²⁰⁹

Case-Specific Factors

Another traditional explanation highlights American natural resource abundance as a key factor in transatlantic differences in mechanization. Compared to its European competitors, the benefits to the United States derived from its endowment of natural resources, such as plentiful supplies of timber, biased its manufacturing processes toward standardized production.²¹⁰ “The American turn in the direction of mass production was natural,” claims one influential study of diverging approaches to mechanization in this period.²¹¹

The extent to which natural resource endowments determined transatlantic differences in technological trajectories is disputed. Undeterred by natural resource differences, German engineers and industrialists frequently used American machine tools and imitated US production technology.²¹² In fact, around the early twentieth century, the level of machine intensity in German industries was catching up to the rate in American industries.²¹³ The US-Germany gap in mechanization was more about Germany’s struggles in proficiently using advanced tools, not the choice of methods shaped by natural resource endowments. Crucially, skill formation and embedded knowledge about working with new machinery influenced the efficient utilization of American capital-intensive techniques.²¹⁴

Summary

The standard version of the Second Industrial Revolution’s geopolitical aftershocks highlights Germany’s challenge to British power. Germany’s relative economic rise, according to this account, derived from its advantage in industrial research and scientific infrastructure, which enabled it to capture the gains from new industries such as electricity and chemicals. However, a range of indicators emphasize that it was the United States, not Germany, that surpassed Britain in productivity leadership during this period. The US industrial ascent in the IR-2 illustrates that dominating innovation in leading sectors is not the crucial mechanism in explaining the rise and fall of great powers. Britain’s decline was not a failure of innovation but of diffusion. As the renowned economist Sir William Arthur Lewis once mused, “Britain would have done well enough if she merely imitated German and American innovations.”²¹⁵

Indeed, the IR-2 case further supports the conclusion that capacity to widely diffuse GPTs is the key driver of long-term growth differentials. The US success in broadening its talent base in mechanical engineering proved critical for its relative advantage in adapting machine tools across a broad range of industries. Like all GPT trajectories, this process was a protracted one, but it aligns better with when the United States surpassed Britain in productive leadership than the more dramatic breakthroughs in chemicals, electricity, and automobiles. To further investigate how the LS mechanism breaks down and why the GPT mechanism holds up, we turn to the high-tech competition in the twentieth century between the United States and Japan—or what some label the Third Industrial Revolution.

5 Japan’s Challenge in the Third Industrial Revolution

IN THE TWO previous cases, an industrial revolution preceded a shift in global leadership. Britain established its economic dominance in the early nineteenth century, and the United States took the mantle in the late nineteenth century. During the last third of the twentieth century (1960–2000), the technological environment underwent a transformation akin to the First and Second Industrial Revolutions. A cluster of information technologies, connected to fundamental breakthroughs in computers and semiconductors, disrupted the foundations of many industries. The terms “Third Industrial Revolution” (IR-3) and “Information Age” came to refer to an epochal shift from industrial systems to information-based and computerized systems.¹ Amid this upheaval, many thought Japan would follow in the footsteps of Britain and the United States to become the “Number One” technological power.²

Of the countries racing to take advantage of the IR-3, Japan’s remarkable advances in electronics and information technology garnered a disproportionate share of the spotlight. “The more advanced economies, with Japan taking the lead in one industry after another, [were] restructuring their economies around the computer and other high tech industries of the third industrial revolution,” Gilpin wrote.³ In the late 1980s and early 1990s, a torrent of works bemoaned the loss of US technological leadership to Japan.⁴ In a best-selling book on US-Japan relations, Clyde Prestowitz, a former US trade negotiator, declared, “Japan has … become the undisputed world economic champion.”⁵

Japan’s dominance in the IR-3’s leading sectors was perceived as a threat to international security and to US overall leadership of the international system.⁶ Former secretary of state Henry Kissinger and other prominent thinkers warned that Japan would convert its economic strength into threatening military power.⁷ Per a 1990 New York Times poll, 58 percent of Americans believed that Japan’s economic power was more of a threat to American security than the Soviet Union’s military power.⁸

Historical precedents loomed over these worries. US policymakers feared that falling behind Japan in key technologies would, like relative declines experienced by previous leading powers, culminate in an economic power transition. Paul Kennedy and other historically minded thinkers likened the US position in the 1980s to Britain’s backwardness a century earlier: two industrial hegemons on the brink of losing their supremacy.⁹ Often alluding to the LS mechanism, these comparisons highlighted Japan’s lead in specific industries that were experiencing significant technological disruption, such as consumer electronics and semiconductors. As David Mowery and Nathan Rosenberg wrote in 1991, “Rapidly growing German domination of dyestuffs helped to propel that country into the position of the strongest continental industrial power. The parallels to the Japanese strategy in electronics in recent decades are striking.”¹⁰

Many voices called for the United States to mimic Japan’s keiretsu system of industrial organization and proactive industrial policy, which they viewed as crucial to the rising power’s success with leading sectors.¹¹ Kennedy’s The Rise and Fall of the Great Powers attributed Japan’s surge in global market shares of high-tech industries to R&D investments and the organizing role of the Ministry of International Trade and Industry (MITI).¹² These claims about the basis of Japan’s leadership in the information revolution relied on LS product cycles as the filter for the most important institutional factors.

The feared economic power transition, however, never occurred. To be sure, Japanese firms did take dominant positions in key segments of high-growth industries like semiconductors and consumer electronics. Additionally, the Japanese economy did grow at a remarkable pace, averaging an annual 2.4 percent increase in total factor productivity (TFP) between 1983 and 1991. However, Japan’s TFP growth stalled at an average of 0.2 percent per year in the 1990s—a period known as its “lost decade.” By 2002, the per capita GDP gap between Japan and the United States was larger than it had been in 1980.¹³ Becoming the world’s leading producer in high-tech industries did not catalyze Japan’s overtaking of the United States as the leading economy.

The IR-3 case is particularly damaging for LS-based explanations. Japan took advantage of the IR-3’s opportunities by cornering the market in new, technologically progressive industries, fulfilling the conditions posited by the LS mechanism for Japan to become the foremost economic power. Yet, as the case study evidence will reveal, an economic power transition did not occur, even though all these conditions were present. The Japanese challenge to American technological leadership in the last third of the twentieth century therefore primarily functions as a deviant, or falsifying, case for the LS mechanism.¹⁴

By contrast, the IR-3 case does not undermine the GPT mechanism. Since Japan did not lead the United States in the diffusion of general-purpose information technologies, the conditions for an economic power transition under the GPT mechanism were absent in the IR-3. Since there could be many reasons why an economic power transition does not occur, the absence of a mechanism in a negative case provides limited leverage for explaining how technology-driven economic power transitions occur. Still, the IR-3 case evidence will show that LS theory expects an outcome that does not occur—a US-Japan economic power transition—in part because it fails to account for the relative success of the United States in GPT diffusion. This advantage stemmed from its superior ability to cultivate the computer engineering talent necessary to advance computerization. In that regard, this deviant case can help form better mechanism-based explanations.¹⁵

Surprisingly, few scholars have revisited claims that Japan’s leadership in leading sectors meant that it was on its way to economic preeminence.¹⁶ Decades of hindsight bring not just perspective but also more sources to pore over. Revised estimates and the greater availability of data help paint a more granular picture of how the US-Japan productivity gap evolved in this period. To narrow down the crucial technological trajectories, I pieced together histories of semiconductors and other key technologies, comparative histories of technological development in the United States and Japan, and general economic histories of the IR-3. In addition, I leveraged bibliometric techniques to estimate the number of universities in both countries that could supply a baseline quality of software engineering education. Surveys on computer utilization by Japanese agencies, presentations on computer science education by Japanese and American analysts at international meetings, documents from the Edward A. Feigenbaum Papers collection, and back issues of Nikkei Computer (日経コンピュータ) at the Stanford University East Asia Library all helped flesh out the state of GPT skill infrastructure in the IR-3.

The evaluation of the GPT and LS mechanisms against historical evidence from the IR-3 proceeds as follows. The chapter first makes clear that a US-Japan economic power transition did not take place. Subsequently, it reviews and organizes the technological breakthroughs of the IR-3 into candidate leading sectors and GPTs. It then examines whether all the components of the GPT or LS mechanism were present. Since the outcome did not occur in this case, it is important to trace where the mechanisms break down. All the aspects of the LS mechanism were present in the IR-3, but the GPT mechanism was not operational because Japan fell behind the United States in diffusing information technologies across a broad range of sectors. Based on this evidence, the next section explains why institutional explanations rooted in LS trajectories are unconvincing. Before turning to alternative factors and explanations, the chapter analyzes whether GPT skill infrastructure was a factor in sustained US technological leadership.

A Power Transition Unfulfilled: Japan’s Rise Stagnates

In a 1983 article for Parade, the Pulitzer Prize–winning journalist David Halberstam described Japan’s industrial ascent as America’s “most difficult challenge for the rest of the century” and “a more intense competition than the previous political-military competition with the Soviet Union.”¹⁷ By the end of the century, however, the possibility of Japan displacing the United States as the technological hegemon was barely considered, let alone feared.¹⁸ The economic power transition that accompanied the IR-1 and IR-2 did not materialize in this case. Indeed, most indicators presented a clear trend: Japan’s economy catches up in the 1980s, stagnates in the 1990s, and ultimately fails to overtake the US economy in productivity leadership.

GDP PER-CAPITA INDICATORS

During the three decades after 1960, Japan’s economy experienced remarkable growth, reaching a GDP per capita in 1990 that was 81 percent of the US mark that year. In the following ten years, known as Japan’s “lost decade,” Japan’s growth in GDP per capita stalled. By 2007, Japan’s GDP per capita had dropped back down to 73 percent of that of the United States (figure 5.1).

INDUSTRIALIZATION INDICATORS

Comparative industrialization statistics tell a similar story. In terms of global manufacturing output, Japan gained on the United States through the 1970s and 1980s and nearly matched the United States, at 20 percent of global manufacturing output, in the early 1990s. Japan’s share of global manufacturing output subsequently declined to around 10 percent in 2010, while the US share increased in the 1990s and held at 20 percent until 2010.¹⁹ In manufacturing industries, Japan’s labor productivity growth from 1995 to 2004 averaged only 3.3 percent, whereas the United States averaged 6.1 percent in the same metric.²⁰

FIGURE 5.1. Japan’s Catch-up to the United States in GDP per Capita Stalls in the 1990s. *Note*: Real GDP per capita in 2011$ prices. *Source*: Maddison Project Database, version 2020 (Bolt and van Zanden 2020).

PRODUCTIVITY INDICATORS

Productivity statistics also reveal a general trend of convergence without overtaking. From a total factor productivity of just half that of the United States in 1955, Japan’s TFP grew steadily. By 1991, the productivity gap between the United States and Japan was only 5 percent. As was the case with the GDP per capita and industrialization figures, Japan’s productivity growth then slowed, and the gap between the United States and Japan widened during the 1990s (figure 5.2). Throughout this decade, Japan averaged just 0.2 percent annual TFP growth.²¹ By 2009, Japan’s TFP dropped to only 83 percent of the US figure. The US-Japan labor productivity gap followed a similar course.²²

FIGURE 5.2. Japan’s Catch-up to the United States in Productivity Stalls in the 1990s. *Source*: Jorgenson, Nomura, and Samuels 2018, 18.

Key Technological Changes in the IR-3

Parsing through the different trajectories of technological change is a necessary first step to determine whether the LS and GPT mechanisms were operative in this period. This task is complicated by the tremendous technological changes that emerged in the IR-3, such as the first microprocessor (1971), the production of recombinant DNA (1972), the VHS format for video recording (1976), and the first personal computer (1981). Guided by past scholars’ efforts to map the key nodes of the information revolution as well as analytic measures for leading sectors and GPTs, this section takes stock of key technological drivers that affected the US-Japan economic power balance.

Candidate Leading Sectors

Amid a shifting technological landscape, the most likely sources of LS trajectories were information and communications technologies (ICTs). Certainly, scholars have highlighted technological developments in a wide range of industries as possible leading sectors, including lasers and robotics.²³ Nonetheless, Japan’s success in the computer, consumer electronics, and semiconductor industries was most relevant for its prospects of overtaking the United States as the foremost economic power. In each of these three leading sectors, Japan dominated the production of key components.²⁴

All three relatively new industries achieved extraordinary growth off the back of technological breakthroughs, fulfilling the established criteria for leading sectors. In the US economy during the 1980s, computer and data-processing services ranked as the fastest-growing industries in terms of jobs added.²⁵ After Japan’s MITI identified semiconductors and computers as strategic industries in 1971, both industries experienced extremely high growth rates in the next two decades.²⁶ The US electronics industry also experienced a remarkable surge during the late twentieth century, growing thirty times faster than other manufacturing industries by some estimates.²⁷ These trends in computers and electronics held across advanced industrialized countries.²⁸

Candidate GPTs

Given that the IR-3 is often known as the “information revolution,” clusters of ICT innovations naturally serve as the most likely sources of GPT trajectories. Efforts to map this era’s GPTs specifically highlight computers,²⁹ semiconductors,³⁰ and the internet.³¹ Each of these technological domains exhibited great scope for improvement and complementarities with other technologies.

Because advances in computers, semiconductors, and the internet were all closely connected, I group technological developments in these three domains under “computerization,” the process in which computers take over tasks such as the storage and management of information. This is consistent with other studies that identify the general category of ICT as the GPT operative in this period.³² The growing prevalence of software-intensive systems enabled computers to become more general-purpose. Computerization also benefited from advances in both semiconductors, which reduced costs for investments in IT equipment, and the internet, which connected computers in efficient networks.³³

Sources of LS and GPT Trajectories

In sum, the IR-3’s candidate leading sectors and GPTs all revolved around ICTs (table 5.1). Other technical advances in lasers, new sources of energy, and biotechnology also drew attention as possible sources of LS and GPT trajectories. I do not trace developments in these technologies because their potential was largely unrealized, at least within the context of US-Japan economic competition during the IR-3.

Though candidate leading sectors and GPTs converge on ICTs, they diverge on the key trajectories. The GPT perspective emphasizes the process by which firms transfer tasks and activities to computers. In contrast, LS accounts spotlight the growth of key industry verticals. For example, the consumer electronics industry fits the mold of previous candidate leading sectors like automobiles and cotton textiles, which were large and fast-growing but limited in their linkages to other industries. Taking stock of candidate leading sectors and GPTs merely functions as a preliminary filter. The rest of the chapter will further flesh out the differences between the LS- and GPT-based explanations of how these technologies affected the US-Japan economic rivalry.

table 5.1 Key Sources of Technological Trajectories in the IR-3

Candidate Leading Sectors	Candidate GPTs
Computer industry	Computerization
Consumer electronics industry
Semiconductor industry

GPT vs. LS Mechanisms: The (Non) Spread of ICTs across Japan’s Economy

In both the IR-1 and IR-2 cases, a technological revolution sparked a shift in global economic leadership. The case analyses aimed to determine whether the historical evidence fit better with observable implications derived from the GPT or LS mechanism. Revolutionary technological breakthroughs also occurred in the IR-3, but an economic power transition never occurred. Thus, if the case study reveals that Japan dominated innovation in the computer, consumer electronics, and semiconductor industries, then this would provide disconfirming evidence against the LS mechanism. Likewise, if the historical evidence shows that Japan led the way in computerization during this period, this would undermine GPT diffusion theory.

LS Mechanism Present: Japan Dominates the Production of Key ICTs

A bevy of evidence establishes that the LS mechanism was operative during the IR-3. From the mid-twentieth century through the 1980s, Japan captured a growing global market in new industries tied to major technological discoveries in semiconductors, consumer electronics, and computers. In dynamic random-access memory (DRAM) chips, one of the highest-volume verticals in the semiconductor industry, Japanese firms controlled 76 percent of the global market share.³⁴ By one US federal interagency working group estimate, from 1980 to 1987 the United States lost the lead to Japan in more than 75 percent of critical semiconductor technologies.³⁵

Japanese industry also gained competitive advantages in consumer electronics. From 1984 to 1990, US firms lost global market share in thirty-five of thirty-seven electronics categories as Japanese firms took over the production of many electronic products.³⁶ Japan occupied dominant shares of global production of color televisions and DVDs.³⁷ It was also the first economy to commercialize high-definition television (HDTV) systems, a highly touted part of the consumer electronics market.³⁸

A similar trend held in computers, especially in computer hardware components like flat panel displays.³⁹ The US trade balance in computers with Japan turned from a surplus in 1980 into a $6 billion deficit by 1988.⁴⁰According to the Yearbook of World Electronics Data, in 1990 Japan’s share of global computer production eclipsed the share held by the United States, which had previously led the world.⁴¹

Comparisons of LS growth rates also indicate that Japan was poised to overtake the United States as the economic leader. In an International Organization article published in 1990, Thompson posited that average annual growth rates in leading sectors across major economies heralded shifts in economic leadership. Over the nineteenth century, Britain’s growth rate in leading sectors peaked in the 1830s before flattening between 1860 and 1890, a period when the United States and Germany outstripped Britain in LS growth rates.⁴² Crucially, Thompson’s data showed that Japan outpaced the United States in growth rates within leading sectors from 1960 to 1990.⁴³ Linking these historical trends, Thompson identified Japan as America’s main competitor for “systemic leadership.”⁴⁴

Comprehensive assessments of Japan’s relative industrial strength support this account of LS growth rates. US government, academic, and industry entities issued a plethora of reports warning of Japan’s growing global market share and exports in key technologies. One review of six such reports, all published between 1987 and 1991, found a growing consensus that US capabilities in many of these technologies were declining relative to Japan’s.⁴⁵ A 1990 US Department of Commerce report on trends in twelve emerging technologies, including supercomputers, advanced semiconductor devices, and digital imaging technology, projected that the United States would lag behind Japan in most of these technologies before 2000.⁴⁶ The 1989 MIT Commission on Industrial Productivity’s Made in America: Regaining the Productive Edge serves as a particularly useful barometer of Japan’s position in leading sectors.⁴⁷ Made in America argued that the United States was losing out to Japan in eight manufacturing sectors, including consumer electronics, semiconductors, and computers. As business historian Richard Langlois summarizes, “By the mid-1980s, by most accounts, America had ‘lost’ consumer electronics and was in imminent danger of losing semiconductors and computers.”⁴⁸

Some argued that Japan’s advantage in these leading sectors was rooted in certain institutional arrangements. Observers regularly pointed to Japan’s keiretsu system, which was structured around large, integrated business groups, as the key institutional factor in its success in high-tech industries. The MIT Commission’s Made in America report, for instance, questioned whether the US system of industrial organization could match up against “much stronger and better organized Japanese competition.”⁴⁹ This aligned with a common narrative in the mid-1980s that “American firms should become more like Japanese firms.”⁵⁰

Others pointed to Japan’s industrial policy, coordinated by MITI, as the key institutional competency that explained Japan’s success in leading sectors. Academics and policymakers pushed for the United States to imitate Japan’s industrial policy approach, which they perceived as effective because of MITI’s ability to strategically coordinate R&D investments in key technologies.⁵¹ For instance, scholars regarded the “Fifth Generation Project,” a national initiative launched by MITI in 1982, as a stepping-stone to Japan’s building of the world’s most advanced computers.⁵² The American aversion to industrial policy and a decentralized economic policymaking apparatus, by comparison, was alleged to be detrimental to innovation in the IR-3’s new technologies.

By the turn of the millennium, such arguments were no longer being put forward. Despite capturing key LS industries and rapidly catching up to the United States in the 1980s, Japan did not ultimately overtake the United States as the lead economy. Contrary to the expected outcome of the LS mechanism, Japan’s control of critical sectors in semiconductors and consumer electronics did not translate into strong, sustained economic growth. This outcome challenges the LS mechanism’s validity in the IR-3 case.

An Absent GPT Mechanism: The United States Leads Japan in ICT Diffusion

Does evidence from the IR-3 also discredit the GPT mechanism? If the components of the GPT mechanism, like the LS mechanism, were present during this period, this would weaken the explanatory power of GPT diffusion theory. However, in contrast to its success in key leading sectors, Japan lagged in adopting computerized technologies. Thus, GPT diffusion theory would not expect Japan to have overtaken the United States in the IR-3.

To account for sustained US economic leadership in the IR-3, this section traces the developments of the IR-3 in the United States and Japan across the three dimensions that differentiate GPT from LS trajectories. First, relative to the LS mechanism, the impact timeframes of the IR-3’s technological breakthroughs are more elongated. Developments in ICTs did not spread to a wide range of economic applications until the 1990s. Second, though Japan excelled in the production of computers and electronics, it fell behind in the general pace of computerization across the economy. Lastly, Japan’s advantages were concentrated in a narrow range of ICT-producing industries, whereas the United States benefited from broad-based productivity growth.

IMPACT TIMEFRAME

The advance of computerization, like past GPT trajectories, demanded a prolonged period of organizational adaptation and complementary innovations. It is reasonable to date the computerization GPT’s emergence as the year 1971—the year when Intel introduced the microprocessor, which greatly expanded the functionalities of computers.⁵³ It was also the year when the share of information technology equipment and software reached 1 percent in the net capital stocks of the median sector in the US economy.⁵⁴ Before then, during the 1960s, mainframe computers powered by integrated circuits serviced only a limited range of commercial purposes, such as producing bank statements and managing airline reservations. With the internet’s rise in the 1990s, new information and communication networks further spread computerization to different business models, such as e-commerce.⁵⁵ Alongside this stream of complementary technical advances, firms needed time to build up their computer capital stock and reorganize their business processes to match the needs of information technology.⁵⁶

Computers traveled a gradual and slow road to widespread use. By the late 1980s, many observers bemoaned the computer revolution’s failure to induce a surge of productivity growth. In 1987, the renowned economist Robert Solow distilled this “productivity paradox” in a famous quip: “We see the computers everywhere but in the productivity statistics.”⁵⁷ A decade later, however, the growing adoption of information technology triggered a remarkable surge in US productivity growth.⁵⁸ It took some time, but the American economy did eventually see the computers in the productivity statistics.

The landscape of US-Japan technology competition looks very different when accounting for the elongated lag from computerization’s arrival to its widespread diffusion. Japan’s control over key segments of ICT production in the 1970s and 1980s did not correspond to an advantage in GPT diffusion. A more patient outlook illuminates that the delayed impact of computerization aligns with when the United States extended its productivity lead over Japan. After 1995, while Japan’s economic rise stalled, labor and total factor productivity grew rapidly for a decade in the United States. The difference was that the United States benefited greatly from an ICT-driven productivity acceleration.⁵⁹

FIGURE 5.3. The US-Japan Computerization Gap Widens in the 1990s. *Source*: Milner and Solstad 2021; Comin 2010, 381.

PHASE OF RELATIVE ADVANTAGE

If the GPT mechanism was operative for Japan’s rise in the IR-3, Japan should have led the United States in computerization. Though Japan continued to contest US leadership in the production of certain computer architectures and devices, it failed to keep up with the United States in the adoption of computers across industries. As figure 5.3 reveals, the gap between the United States and Japan in computerization widened in the 1990s. In fact, South Korea, which lagged behind Japan in generating new innovations in computer systems, surpassed Japan in its computer usage rate during the 1990s. Taken together, these indicators suggest that Japan’s problem was with GPT diffusion, not innovation.

The pathway by which ICTs drove the US-Japan productivity gap is particularly revealing. In sectors that produced ICTs, Japan’s TFP acceleration was similar to the US trajectory; however, in sectors that intensively used IT, Japan’s TFP growth lagged far behind that of its rival.⁶⁰ In particular, US ICT-using service industries adapted better to computerization. In terms of labor productivity growth in these industries, the United States experienced the strongest improvement out of all OECD countries from the first half of the 1990s to the second half of the decade.⁶¹ The contribution of ICT-using services to Japan’s labor productivity growth, by contrast, declined from the first half to the second half of the decade.⁶²

Japan eventually adopted the GPT trajectory associated with ICT. Like all advanced economies at the frontier, Japan could draw from the same technology pool as America. Using a growth accounting framework that accounts for cyclical factors, Takuji Fueki and Takuji Kawamoto trace Japan’s post-2000 resurgence in TFP growth to the extension of the ICT revolution to a broader range of IT-using sectors.⁶³ By then, however, Japan was at least five years behind the United States in taking advantage of computerization.

BREADTH OF GROWTH

Alongside the dispersion of ICTs throughout the US economy, the sources of productivity growth also spread out. In the United States, spillovers from ICT advances, especially in services, contributed to economy-wide TFP growth. Industry-level patterns of TFP growth reveal that US productivity growth became noticeably more broad-based after 1995, a trend that accelerated further after 2000.⁶⁴

In contrast, Japan’s advantages in the IR-3 were concentrated in a narrow range of sectors. After 1995, Japanese productivity growth remained localized, only transitioning to a more broad-based pattern after 2000.⁶⁵ Michael Porter’s exhaustive account of national competitiveness across leading economies describes Japan as a “study in contrasts,” with some of the most internationally competitive industries found side by side with some of the most uncompetitive.⁶⁶ The hype over Japan’s success in leading sectors induced some analysts to generalize from a few exceptional industrial sectors while overlooking developments in struggling sectors.⁶⁷

LS Mechanisms and “Wintelism”

Some international political economy scholars divide trends in US competitiveness into a phase of relative decline in the leading sectors of the 1980s (consumer electronics, computer hardware, and parts of the semiconductor industry) and a period of resurgence in the new leading sector of the 1990s (software electronics).⁶⁸ This explanation for why Japan’s LS advantage did not convert into an economic power transition could conceivably restore credibility to the LS mechanism. However, even this generous interpretation fails to capture the dynamics of the IR-3.⁶⁹

Consider one prominent line of LS-based thinking that emphasizes US advantages in adapting to “Wintelism,” a type of industrial structure best suited for new advances in software electronics. Wintelism, a portmanteau of “Windows” and “Intel,” refers to the transformation of the computer industry from a vertically integrated oligopoly into a horizontally segmented structure dominated by components providers with controlling architectural standards, such as Intel and Microsoft.⁷⁰ Compared to Japan, the US institutional environment was more supportive of horizontal, specialized value-chains in software electronics. It is put forward that Japan’s inability to adapt to the Wintelist industrial paradigm explains why it was unable to overtake the United States as the leading economy.⁷¹

The Wintelism argument falls prey to general issues with the LS mechanism.⁷² Like LS accounts of Japan’s advantage in semiconductors and consumer electronics in the 1980s, the Wintelism argument still places too much value on ICT-producing industries. Capturing profit shares in the global software electronics industry is not the same as translating new advances in software electronics into economy-wide growth.⁷³ The pathway from new technologies to overall productivity growth involves much more than just the success of companies like Microsoft and Intel.

Indeed, as a GPT diffuses more widely, large monopolists may hinder coordination between the GPT and application sectors. Both Microsoft and Intel, for instance, often restricted sharing information about their technology roadmaps, thereby hindering complementary innovations and adoption of microelectronics in applications sectors such as automobiles. In fact, regulatory and technological forces were necessary to limit the influence of dominant firms and encourage the development of complementary technologies, which were crucial to widening the GPT trajectory of computerization.⁷⁴ It is conceivable that the United States could have achieved a greater rate of computerization if its computer industry had not been dominated by two firms that held key architectural standards.

Overall, hindsight is not kind to LS-based accounts of Japan’s institutional advantages in the IR-3. While matching institutional competencies to particularly significant technological trajectories is a sound approach, the LS trajectory fails to capture how technological changes opened opportunities for an economic power transition. Japan’s industrial structure and sectoral targeting policies reaped economic gains that were temporary and limited to specific industries.⁷⁵ To understand why the United States gained lasting and broad-based advantages from computerization, a different set of institutional complementarities must be explored.

Institutional Complementarities: GPT Skill Infrastructure in the IR-3

In line with GPT diffusion theory, institutional adaptations that widened the base of computer engineering skills and knowledge proved crucial to the enduring technological leadership of the United States in the IR-3.⁷⁶ Computerization required not just innovators who created new software architectures but also programmers who undertook more routine software engineering tasks. It was the US ability to tap into a more expansive pool of the latter that fueled a more intensive pace of computerization than in Japan.

Widening the Base of Computer Engineers

GPTs generate an imbalance between the possibility of sweeping changes across many domains and the constraints of existing skills. Historically, engineering disciplines have developed procedures for adapting new GPT-linked knowledge across localized applications and expanded access to such knowledge to a broader set of individuals. Similarly, the key element of the American GPT skill infrastructure in the IR-3 was the development of a computer science discipline—the latest in a line of engineering disciplines that have emerged in the wake of GPTs.⁷⁷

US education effectively adapted to changes in the computerization trajectory. The recognition of computer science as an independent discipline, as evidenced by the early and rapid growth of computer science departments in the United States, helped systematize the knowledge necessary for the spread of computerization.⁷⁸ Led by top universities and the Association of Computing Machinery (ACM), US institutions piloted new training programs in computer science. In 1968, the ACM published an influential and forward-looking curriculum that helped colleges organize their computing education.⁷⁹ These adaptations converted the practical experience from developers of new computer and software breakthroughs into general, accessible knowledge.⁸⁰

The development of this GPT skill infrastructure met significant hurdles. In the 1970s, the US computer science discipline struggled to meet demands for software engineering education in the midst of conflicts over the right balance between theory-oriented learning and hands-on programming work.⁸¹ Reacting to this situation, the ACM’s 1978 curriculum revision directed computer science toward application-based work, stating that programming exercises “provide a philosophy of discipline which pervades all of the course work.”⁸² Industry pressure and the US Department of Defense’s Software Engineering Institute, a partnership established with Carnegie Mellon University in 1984, expanded the number of software engineering specializations at universities.⁸³ Teaching capacity was another inhibiting factor. Computer science departments had to limit course enrollments because they could not fill the faculty positions to meet exploding student demand, resulting in a decline in computer science enrollments in the mid-1980s.⁸⁴

Though the process was not seamless, overall trends reflect the US success in widening the pool of engineering skills and knowledge for advancing computerization. From 1966 to 1996, the number of undergraduate computer science degrees awarded annually in the United States grew from 89 to about 24,500.⁸⁵ In 1986, at its peak during this period, computer science accounted for 12.5 percent of all science and engineering degrees awarded in the United States.⁸⁶ According to benchmarking efforts by the Working Group for Software Engineering Education and Training, US institutions accounted for about one-third of the world’s bachelor’s degree programs in software engineering in 2003.⁸⁷ Throughout this period, the United States also benefited from a system open to tapping foreign software engineering talent.⁸⁸

The US Comparative Advantage over Japan in GPT Skill Infrastructure

Did GPT skill infrastructure factor decisively in Japan’s inability to keep pace with the United States in the diffusion of ICTs? Varying data reporting conventions, especially the fact that Japanese universities subsumed computer science under broader engineering fields, make it difficult to precisely quantify the US-Japan gap in software engineering skills.⁸⁹ One narrative that gained momentum in the late 1980s accredited Japan’s success in high-tech manufacturing industries to its quantitative advantage in engineers. A National Science Foundation (NSF) report and two State of the Union addresses by President Ronald Reagan endorsed this belief.⁹⁰ For example, the NSF’s 1997 special report on Japan’s scientific and technological capabilities declared, “By 1994, with roughly one-half the population, Japan produced more engineering and computer science degrees at the undergraduate level than the United States.”⁹¹ Such statements blended computer science degrees with all categories of engineering education.

Computer science–specific data reveal the US edge in ICT human resources. According to data from Japan’s Information Technology Promotion Agency, Japan awarded about 16,300 computer science and mathematics bachelor’s degrees in 2009, while the United States awarded 63,300 of these types of degrees that same year.⁹² One survey by the Japanese Information Technology Services Industry Association found that only 3.6 percent of college graduates who entered Japan’s information service industry in April 1990 received their degree from a computer science department.⁹³ By counting immigrants entering computer-related professions and total university graduates in ICT software and hardware fields, one study estimated annual inflows into the US and Japanese ICT labor pools. In 1995, these inflows into the US ICT labor pool outpaced those in Japan by 68 percent. By 2001, this gap between the two countries’ annual inflows of ICT talent had reached almost 300 percent.⁹⁴ Therefore, in the years when the US advantage in ICT diffusion was most pertinent, the skill gap between the United States and Japan in computer and software engineering talent grew even wider.⁹⁵

Moreover, a computer science degree in Japan did not provide the same training as one in America. First, Japanese universities were slow to adapt to emerging trends in computer science. In both 1997 and 2007, the Information Processing Society of Japan modeled its computing curriculum revisions on American efforts that had been made six years earlier.⁹⁶ The University of Tokyo, Japan’s leading university, did not establish a separate department of computer science until 1991, which was twenty-six years later than Stanford.⁹⁷ Overly centralized governance of universities also inhibited the development of computer science as an independent discipline in Japan.⁹⁸ As Jeffrey Hart and Sangbae Kim concluded in 2002, “The organizational and disciplinary flexibility of US universities in computer science has not been matched in any of the competing economies.”⁹⁹

Software engineering presented a particular challenge for Japan. In 1988, the Japanese-language industry journal Nikkei Computer surveyed six thousand Japanese firms that used office computers. Situated outside the computer industry, these firms were involved in a broad range of fields, including materials manufacturing, finance, services, government, and education. More than 80 percent of the responding companies disclosed shortages of software programmers and designers.¹⁰⁰ On average, outside firms provided one-quarter of their information technology personnel, and their reliance on outsourcing was magnified for programmers, system designers, and application managers.¹⁰¹ A nonprofit foundation for ICT development in Japan reported similar barriers to computer utilization in 1991. The survey found that companies relied heavily on computer personnel, especially software engineers, dispatched temporarily from other organizations.¹⁰² Small and medium-sized software departments, which could not afford to invest in on-the-job training, were especially disadvantaged by the lack of formal software engineering education in Japan.¹⁰³

Bibliometric techniques can help substantiate the gap between the United States and Japan in skill infrastructure for software engineering. I analyzed around seven thousand publications from 1995 in the Web of Science Core Collection’s “Computer Science, Software Engineering” category.¹⁰⁴ To gauge the breadth of institutional training in software engineering, I counted the number of Japanese and American universities that employed at least one researcher represented in this dataset. According to my estimates, the United States boasted 1.59 universities per million people that met this baseline quality of software engineering education, while Japan only had 1.17 universities per million people. This amounts to a gap of around 40 percent.

Lastly, weak industry-university linkages in computer science hampered Japan’s development of GPT skill infrastructure. Imposing centralized control over universities, the Japanese Ministry of Education, Science, and Culture (MESC) inhibited cooperation between new departments of information science and the corporate labs where much of the computing talent was concentrated.¹⁰⁵ Japanese researchers regularly complained about the size of MESC grants, as well as the ministry’s restrictions on their ability to seek alternative funding sources. Japan’s overall budget level for university facilities in 1992 remained the same as it was in 1975. Additional government funds went instead to independent centers of excellence, diverting resources away from efforts to broaden the pool of training institutions in software engineering.¹⁰⁶

Alternative Factors

How do alternative explanations perform in this case? A range of factors beyond GPT skill infrastructure could have influenced diverging rates of ICT diffusion in the United States and Japan. I focus on the role of external threats and varieties of capitalism because they present alternative mechanisms for how states adapted differently to the technological revolution that occurred in the IR-3.

Threat-Based Explanations

Threat-based theories struggle to account for differences in the US and Japanese technological performance in this period. A “cult of vulnerability” permeated Japan’s leaders over this period as they coped with tensions in East Asia and the oil crises of the 1970s.¹⁰⁷ Likewise, the growth of the US “national security state,” fueled by the dangers of the Cold War, functioned as “the secret to American innovation.”¹⁰⁸ Under his creative insecurity framework, Taylor holds up both Japan and the United States as exemplars of the IR-3 period, reasoning that they both partly owed their technological success to the galvanizing effects of a threatening international environment.¹⁰⁹ General threat-based explanations therefore cannot explain differences in technological outcomes between the United States and Japan, namely, why the United States was more successful in ICTs than Japan.

A related argument points to the significance of US military procurement for computerization. As was the case with its influence on the American system of manufacturing in the IR-2, the US military provided the demand for initial investments in computers and semiconductors. In the 1940s and 1950s the US military was a key patron behind computing breakthroughs.¹¹⁰ Assured by large military procurements, innovative firms undertook risky, fundamental research that produced spillovers to many other industries. For instance, the first all-purpose electronic digital computer, the University of Pennsylvania’s electronic numerical integrator and calculator (ENIAC), was developed during World War II. The ENIAC was supported by funding from the Army Ballistics Research Laboratory, and the first program run on the computer was a simulation of the ignition of the hydrogen bomb.¹¹¹

In place of the military, could other entities have served as a large demand source for ICTs? Commercial entities like Bell Labs and IBM also developed fundamental breakthroughs in ICTs. According to Timothy Bresnahan and Manuel Trajtenberg, it was “only a coincidence” that US government demand played such a pivotal role in semiconductor development.¹¹² Others argue that while commercial advances in semiconductors and computers would likely still have occurred absent the impetus of military funding, they would have emerged after substantial delay.¹¹³

Resolving this debate depends on one’s view of the key stage in computerization. Those who emphasize the importance of military procurement often hold up the importance of first-mover advantages in the American computer industry.¹¹⁴ However, decades after the military helped develop the first computers and transistors, Japan had cornered the market in many related industries. The significance of military procurement is diminished when a GPT’s dissemination, as opposed to its emergence, is taken as the starting point. By 1960, the start of the IR-3 period, ICT development in the United States was already much less reliant on military support. In 1955, the demand for Bell’s transistors from two large telephone networks alone was nearly ten times more than from all military projects.¹¹⁵ In fact, as the commercial sector increasingly drove momentum in ICTs, military involvement arguably hindered continued advances in the commercial sector, as there was tension between the different technical cultures.¹¹⁶

On balance, the most significant aspect of the military’s involvement in the advance of computerization in the United States was its role in building up GPT skill infrastructure. The US military played a key role in cultivating the computer science discipline in its early years. Beginning in the 1960s, defense agencies supported academic research in computer science, such as the aforementioned Software Engineering Institute, which created centers of excellence and broadened the base of computer science education.¹¹⁷ From 1977 through the mid-1980s, defense funding supported more than half of academic computer science R&D.¹¹⁸ At the same time, military investment in computer science did not come without downsides. Defense funding was concentrated in elite research universities at the cutting edge of the field, such as Carnegie Mellon and Stanford, whereas nondefense government funding supported computer science education across a wider range of US universities.¹¹⁹ On the effects of military computer science funding, Stanford professor Terry Winograd wrote, “It has resulted in a highly unequal situation in which a few schools have received almost all the resources. Although this may have led to more effective research in the short run, it has also been a factor contributing to the significant long-term shortage of trained computer researchers.”¹²⁰

VoC Explanations

The varieties-of-capitalism (VoC) approach provides another possible explanation for why the US economy benefited more than Japan’s from the innovations of the IR-3.¹²¹ According to the VoC framework, firms in coordinated market economies (CMEs) provide industry-specific training that is more conducive to incremental innovation, whereas worker training in more general skills in liberal market economies (LMEs) proves more favorable for radical innovations. VoC scholars point to some evidence from the international pattern of innovation during the IR-3 that supports these expectations. Based on patent data from 1983–1984 and 1993–1994, Peter Hall and David Soskice find that Germany, a CME, specialized in technology classes characterized by incremental innovation, whereas the United States, an LME, specialized in domains characterized by radical innovation.¹²² Therefore, the VoC perspective expects that Japan, a CME like Germany, was unable to keep up with the United States in the IR-3 because high-tech sectors such as computer software and biotechnology demanded radical innovations.¹²³

This VoC-derived explanation provides an incomplete account of the IR-3 case. First, comprehensive empirical investigations into the innovative performance of CMEs and LMEs, especially the success of Japan as a radical innovator, undermine the explanatory power of VoC theory for the IR-3 period. Hall and Soskice’s initial analysis was based on four years of data on patent counts from only two countries, the United States and Germany. Taylor’s more extensive analysis, which covered thirty-six years (1963–1999) of patent counts and forward citations for all LME and CME countries, found that the predictions of VoC theory are not supported by the empirical data.¹²⁴ In fact, contrary to the expectations of the VoC explanation, Japan was a leader in radical innovation, ranking second only to the United States in patent counts weighted by forward citations, which are a strong proxy for the radicalness of innovations.¹²⁵

Second, VoC theory does not make distinctions between different types of general skills, which varied in their significance to national success in the IR-3. Regarding general training in terms of foundational schooling, Japan was making substantial improvements in average years of schooling, enrollment ratio, and access to higher education.¹²⁶ GPT diffusion theory specifies the key general skills as those that best suited the advance of computerization. Consistent with these expectations, the case study evidence points to the US-Japan gap in software engineering, a set of general skills that permeated sectoral boundaries, as the crucial factor in US success with widespread computerization.

Case-Specific Factors

Other factors unique to the IR-3 case deserve further consideration. Among these alternative explanations, one popular theory was that Japan’s kanji system (Chinese-based characters) contributed to its slow adoption of computers.¹²⁷ Marshall Unger, a specialist in the Japanese writing system, highlighted difficulties with representing kanji in computerized formats, which resulted in higher costs for storing data and for word-processing functions.¹²⁸ American computers had to handle only ninety-five printable characters, whereas Japanese personal computers needed to store six thousand Japanese characters.¹²⁹ Not only did language differences increase the cost of Japanese computers, but they also prevented Japanese adopters from using off-the-shelf computers from the United States, as these did not support Japanese language functions.

While particularities of the Japanese language may have initially hindered Japan’s computerization, it is important not to overstate the impact of this language barrier. In a review of this theory, another expert on computational linguistics argued that Unger overemphasized the additional overhead and speed costs associated with Japanese writing systems.¹³⁰ Moreover, users and companies adapted over time. By the end of the 1980s, advances in processor technology allowed computers to support the greater word-processing demands of Japanese language systems.¹³¹ Therefore, during the critical years when the US-Japan computerization gap widened, the impact of the kanji system was less pronounced.

Summary

Through much of the late twentieth century, it was only a matter of time until Japan achieved economic preeminence—at least in the eyes of many analysts and scholars. Invoking the assumptions of the LS mechanism, they expected that this economic power transition would be brought about by Japan’s advantages in new sectors such as consumer electronics, semiconductor components, and computer hardware. Today, after Japan’s decade-long slowdown in productivity growth, there is virtually no discussion of it overtaking the United States as the leading economic power.

Looking back, one might be tempted to conclude that history has vindicated past critics who labeled the claims of Japan’s imminent ascension to technological hegemony as “impressionistic,”¹³² as well as the retrospective analyses that called out such projections for being “premature.”¹³³

This chapter’s conclusions suggest a more nuanced interpretation. It is not that the prognoses of LS-based accounts were overeager or overly subjective. The real issue is that they were based on faulty assumptions about the pathway by which technological advances make economic power transitions possible. Indeed, the IR-3 case provides strong negative evidence against the LS mechanism, revealing that the expected outcome of an economic power transition failed to materialize in part because of the US advantage in GPT diffusion. The relative success of the United States in diffusing the trajectory of computerization across many ICT-using sectors, in line with GPT diffusion theory, was due to institutional adaptations to widen the skill base in computer engineering. In sum, the US advantage in GPT diffusion accounts for why the economic power transition expected by the LS account failed to transpire.

6 A Statistical Analysis of Software Engineering Skill Infrastructure and Computerization

I HAVE ARGUED that the shape of technological change is an overlooked dimension of the rise and fall of great powers. Most researchers point to various institutions to explain why some countries experience more scientific and technological progress than others. A central insight of this book is that the institutional factors most relevant for technological leadership depend on whether key technological trajectories map onto GPT diffusion or LS product cycles. GPT diffusion theory posits that great powers with better GPT skill infrastructure, defined as the ability to broaden the engineering skills and knowledge linked to a GPT, will more effectively adapt to technological revolutions.

This chapter evaluates a key observable implication of GPT diffusion theory. The expectation is that where there is a wider pool of institutions that can train engineering talent related to a GPT, there will be more intensive rates of GPT diffusion. Using data on computerization and a novel approach to estimate the number of universities that provide quality software engineering education in a country, this chapter first tests the theorized connection between GPT skill infrastructure and GPT adoption on time-series cross-sectional data for 19 advanced and emerging economies from 1995 to 2020. I supplement this panel analysis with two additional tests: a duration model of the speed by which 76 countries achieved a certain computerization threshold, as well as a cross-sectional regression of data on 127 countries averaged over the 1995–2020 period. While the historical case studies verified this relationship among great powers, large-n quantitative analysis allows us to explore how GPT diffusion applies beyond the chosen case studies.

To preview the findings, the evidence in this chapter backs GPT diffusion theory. Countries better positioned to widen their base of software engineering skills preside over higher rates of computer adoption. This relationship holds even when accounting for other factors that could affect computerization and different specifications of the independent and dependent variables. This chapter proceeds by operationalizing computerization rates and skill infrastructure in software engineering and then statistically testing the relationship between the two variables.

Operationalizing the Independent Variable: GPT Skill Infrastructure in Software Engineering

My key independent variable is skill infrastructure connected to computerization. The computer, a prototypical GPT, represents a natural choice for this type of inquiry, as engineering education data for many past GPTs is nonexistent for many countries.¹ Plus, enough time has passed for us to see the effects of computerization. The statistical analysis focuses on the effects of skill formation institutions in software engineering, the computer science discipline tasked with training generalists in computing technology.² Concretely, this chapter’s measure of GPT skill infrastructure captures the breadth of a country’s pool of software engineering skills and knowledge.

Efforts to measure the GPT skill infrastructure in software engineering face three challenges. First, standardized measures of human capital in computer science across countries are not available. The UNESCO Institute for Statistics (UIS) collects internationally comparable data on technicians and researchers in various fields, but this dataset does not include information specific to computer science and has limited temporal coverage.³ Second, variation across countries in the format of computer science education undercuts some potential benchmarks, such as undergraduate enrollments in computer science programs. In some countries, computer science education is subsumed under a broad engineering category, not recognized as a separate degree course.⁴ Lastly, comparisons of computer science education struggle to account for the quality of such training. International rankings of universities for computer science garner media coverage, but they rely on subjective survey responses about reputation and largely concentrate on elite programs.⁵

To the extent possible, my measure of GPT skill infrastructure addresses these obstacles. The goal is to operationalize engineering-oriented computer science education in a way that can be standardized across countries and accounts for differences in the format and quality of computer science education. My novel approach estimates the number of universities in each country that can be reasonably expected to provide a baseline quality of software engineering education. To establish this baseline in each country, I count the number of universities that employ at least one researcher who has published in a venue indexed by the Web of Science (WoS) Core Collection’s category on “Computer Science, Software Engineering.” In this category, the WoS citation database extends back to 1954 and allows for reliable cross-country comparisons based on institutional affiliations for published papers and conference proceedings.⁶ This approach is also insulated from distinctions related to whether certain degrees count as “computer science” programs or as “general engineering” courses. A particular university’s naming scheme has no bearing; as long as an institution retains at least one researcher who has published in the software engineering field, it counts in the GPT skill infrastructure measure.

To gather data on the number of universities that contribute to software engineering skill formation around the world, I analyze 467,198 papers from the WoS Core Collection’s “Computer Science, Software Engineering” category published between the years 1995 and 2020. I use the Bibliometrix open-source software to derive institutional and country affiliations from this corpus.⁷ Specifically, I collect the university and country affiliations for the corresponding authors of all 467,198 publications. For each country, I count the number of distinct university affiliations. Hypothetically, if country X’s researchers were all concentrated at a single center of excellence, it could boast more researchers represented in the corpus than country Y but still score lower on my metric. For making comparisons across countries, the number of distinct university affiliations is a better indicator of a country’s access to a broad pool of training institutions for software engineering, which is central to GPT skill infrastructure.

FIGURE 6.1. Software Engineering Skill Infrastructure by Country (2007). *Source*: Author’s calculations based on Web of Science Core Collection database.

I estimate a country’s GPT skill infrastructure in a particular year by averaging its score on this metric in that year along with its scores in the two previous years. This step provides checks against the risk that developments specific to a particular year—a conference cancellation, for example—may muddle the measure. To illustrate country distributions on this metric, figure 6.1 depicts the number of universities that meet my baseline for software engineering skill formation in G20 countries for one of the middle years in the dataset.

As with all bibliometric research, given the bias toward English-language papers in publication datasets, this method could undercount publications from non-English-speaking countries.⁸ Fortunately, this linguistic bias found in social science papers is less pronounced in engineering and mathematics papers, which comprise my dataset.⁹ Another factor that mitigates this bias is the very low standard for quality engineering education. Even if an institution’s researchers publish a substantial portion of their writings in a non-English language, as long as just one publication landed in the WoS “Computer Science, Software Engineering” category, that institution would still count in my definition of GPT skill infrastructure.

I considered other measures of software engineering skills, but none were suitable for this type of analysis. Data on the number of full-time equivalent telecommunication employees, collected by the ITU, shed some light on the distribution of ICT skills in various economies. Concretely, this indicator captures the total number of people employed by telecommunication operators for the provision of fixed-telephone, mobile-cellular, and internet and data services.¹⁰ The rationale is that the number of employees in this critical ICT services industry represents the broader pool of software engineering talent in a country. However, since this measure is biased toward specialists who develop and install computer technologies and other ICTs, it overlooks many engineers who intensively use ICTs in their work, even if they are not involved in developing software and computing tools.¹¹

The time coverage of other measures was limited. For instance, the International Telecommunication Union (ITU) database on ICT skills like programming or coding in digital environments only goes back to 2019.¹² In its Global Competitiveness Index, the World Economic Forum surveys business executives on the digital skills of their country’s population, but this data series starts in 2017.¹³

Operationalizing the Dependent Variable: Computerization Rates

Divergences among nations in scientific and technological capabilities have attracted a wide range of scholarship. While my focus is on the overall adoption rate of GPTs within economies, many scholars and government bodies have made significant contributions to quantifying national rates of innovation, often based on patenting activity, publications, and R&D investments.¹⁴ Cross-national data on the diffusion of specific technologies, by comparison, has been sparse.¹⁵ The Cross-country Historical Adoption of Technology (CHAT) dataset, which documents the intensity with which countries around the world use fifteen historically significant technologies, has helped address this deficiency.¹⁶ Other studies of cross-national technology adoption gaps quantify the diffusion of the internet and government uses of information technology.¹⁷

My primary measure of computerization is the proportion of households with a computer. These data are sourced from the International Telecommunication Union’s World Telecommunication/ICT Indicators (WTI) database.¹⁸ In this dataset, access to a computer includes use of both desktop and portable computers but excludes devices with some computing ability, such as TV sets and mobile phones.¹⁹ By estimating the number of households in a country with access to a computer, this measure elucidates cross-country differences in the intensity of computerization. Though observations for some countries start in 1984, there is limited coverage before 1995, which serves as the initial year for the data collection effort detailed in this chapter.

After the ITU was tasked with supplying indicators for access to ICTs around the world—a crucial target of the United Nations’ Millennium Development Goals (MDG) adopted in 2000—the agency started to track the number of personal computers by country.²⁰ The ITU produces computer usage figures through two methods. First, when available, survey data from national and supranational statistical offices (such as Eurostat) are used. Though the MDG initiative has encouraged national statistical offices to help the ITU in monitoring ICT access, data coverage is still incomplete. If data on the number of households with a computer are unavailable for a country in one year, the ITU makes an estimate based on computer sales and import figures, adjusted to incorporate the average life of a computer as well as other related indicators, such as the number of computer users. For example, the computer usage indicator for Latvia comes from Eurostat in 2013, an ITU estimate in 2014, and the Central Statistical Bureau of Latvia in 2015.

Despite its limitations, I prefer the ITU’s computerization indicator over alternative measures. Francesco Caselli and Wilbur John Coleman examine the determinants of computer adoption across a large sample of countries between 1970 and 1990. To estimate the intensive margin of diffusion, they use the value of a country’s computing equipment imports per worker as a proxy for its computer investment per worker.²¹ However, imports do not account for a country’s computer investments sourced from a domestic computer industry; this is an issue that becomes more salient in the later years of the dataset.²²

A more optimal indicator would estimate computer access and usage among businesses, since such economic activity is more likely to produce productivity improvements than household use. I examined a few alternatives. The CHAT dataset employs the number of personal computers per capita, which is one of three measures highlighted by the authors as conveying information on GPTs.²³ However, this indicator still does not capture the degree of computerization in productive processes, as opposed to personal use, and has limited temporal coverage compared to the ITU’s household computerization measure.²⁴ The OECD collects some data on ICT access and usage by businesses, but this effort did not start until 2005 and covers only OECD countries.²⁵

Fortunately, it stands to reason that a country’s household computer adoption can serve as a proxy for its computerization rates in business activities. In the appendix, I provide further support for this claim. Comparing the available data on household and business computerization for twenty-six countries between 2005 and 2014, I find a strong correlation between these two variables (correlation coefficient = 0.8).²⁶

Summary of Main Model Specifications

To review, this chapter tests whether GPTs diffuse more intensively and quickly in countries that have institutional advantages in widening the pool of relevant engineering skills and knowledge. The first hypothesis reads as follows:

H1: Countries with higher levels of GPT skill infrastructure in software engineering will sustain more intensive computerization rates.

With country-years as the unit of analysis, I estimate time-series cross-sectional (TSCS) models of nineteen countries over twenty-six years. Quantitative analysis permits an expansion of scope beyond the great powers covered in the case studies. As outlined in the theory chapter, differences in GPT skill infrastructure are most relevant for economies that possess the absorptive capacity to assimilate new breakthroughs from the global technological frontier.²⁷ Since less-developed economies are often still striving to build the baseline physical infrastructure and knowledge context to access the technological frontier, variation in GPT skill infrastructure among these countries is less salient. Thus, I limit the sample to nineteen G20 countries (the excluded member is the European Union), which represent most of the world’s major industrialized and emerging economies.

Before constructing TSCS regressions, I first probe the relationship between GPT skill infrastructure in software engineering and computerization rates. Prior to the inclusion of any control variables, I plot the independent and dependent variable in aggregate to gauge whether the hypothesized effect of GPT skill infrastructure is plausible. The resulting bivariate plot suggests that there could be a strong, positive relationship between these two variables (figure 6.2).²⁸

The basic trend in figure 6.2 provides evidence for the contention that countries better equipped with computer science skill infrastructure experience higher rates of computerization. Although these preliminary results point to a strong relationship between these two variables, further tests are needed to rule out unseen confounders that could create spurious correlations and influence the strength of this relationship. TSCS regression analysis facilitates a deeper investigation of the relationship between GPT skill infrastructure and computerization.

FIGURE 6.2. Software Engineering Skill Infrastructure and Computerization. *Source*: Author’s calculations, available at Harvard Dataverse: https://doi.org/10.7910/DVN/DV6FYS.

To control for factors that could distort the relationship between computer-related skill infrastructure and computerization, I incorporate a number of control variables in the baseline model. Rich countries may be able to spend more on computer science education; additionally, they also more easily bear the expenses of adopting new technologies, as exemplified by large disparities between developed and developing countries in information technology investment levels.²⁹ The inclusion of GDP per capita in the model accounts for economic development as a possible confounder. I use expenditure-side real GDP at current purchasing power parities (PPPs), which is best suited to comparing relative living standards across countries. A country’s total population, another control variable, addresses the possibility that larger countries may be more likely to benefit from network effects and economies of scale, which have been positively linked to technology adoption.³⁰ I also include the polity score for regime type. Research suggests that democratic governments provide more favorable environments for technology diffusion, and studies have confirmed this connection in the specific context of internet technologies.³¹

Finally, the baseline model includes two control variables that represent alternative theories of how technological changes differentially advantage advanced economies. First, I include military spending as a proportion of GDP in the regressions. The case studies have interrogated the influential view that military procurement is an essential stimulus for GPT adoption.³² By examining the relationship between military spending and computerization across a large sample of countries, the statistical analysis provides another test of this argument. Moreover, the varieties of capitalism (VoC) scholarship suggests that liberal market economies (LMEs) are especially suited to form the general skills that could aid GPT adoption across sectors. Therefore, the baseline model also controls for whether a country is designated as an LME according to the VoC typology.³³

In terms of model specification, I employ panel-corrected standard errors with a correction for autocorrelation, a typical method for analyzing TSCS data.³⁴ Given the presence of both autocorrelation and heteroskedasticity, I estimate linear models on panel data structures using a two-step Prais-Winsten feasible generalized least squares procedure.³⁵

Time-Series Cross-Sectional Results

Table 6.1 gives the results of the three initial models, which provide further support for the theoretical expectations.³⁶ Model 1 incorporates controls that relate to economic size and level of development. Model 2 adds a control variable for regime type. Lastly, model 3 includes a variable that represents a prominent alternative theory for how technological breakthroughs can differentially advantage certain economies. This also functions as the baseline model. In all three models, the coefficient on the GPT skill infrastructure measure is positive and highly statistically significant (p < .05).

Table 6.1 Results of Time-Series Cross-Sectional Models

	Computerization
	Dependent Variable
	(1)	(2)	(3)
GPT skill infrastructure	3.760^**	4.064^***	4.227^**
	(1.643)	(1.676)	(1.666)
GDP per capita	29.754^***	29.319^***	29.435^***
	(3.760)	(3.737)	(3.789)
Total population	6.969^***	7.046^***	6.781^***
	(1.625)	(1.654)	(1.549)
Polity score		−0.456	−0.472^*
		(0.295)	(0.277)
Military spending			−0.940
			(2.413)
Liberal market economy			−2.194
			(3.961)
Constant	−374.599^***	−368.051^***	−361.885^***
	(60.452)	(61.173)	(58.374)
Observations	383	370	370
Note: Standard errors in parentheses.
^p < .10; ^p < .05; ^**p < .01

The effect of GPT skill infrastructure on GPT adoption is also substantively significant. Given the coefficient of the GPT skill infrastructure measure in the baseline model,³⁷ a 1 percent increase in the universities per 100,000 people that provide software engineering education results in an increase of the computerization rate by 0.042 percentage points.³⁸ Though the substantive effect seems small at first glance, its magnitude becomes clear when contextualized by differences in GPT skill infrastructure across the sample. For example, in China over this time period, the average number of universities per 100,000 people that met my baseline for GPT skill infrastructure was 0.040. The corresponding figure for the United States was 0.248. According to the coefficient estimate for GPT skill infrastructure, this difference of 520 percent corresponds to a difference of nearly 22 percentage point units in the computerization rate.

It should be noted that only two control variables, economic development and population, came in as statistically significant in the baseline model. As expected, wealthier countries and more populous countries presided over more intensive adoption of computing technologies. The null result for regime type is worth highlighting, as the effects of democracy on technology adoption are disputed.³⁹ Finally, contrary to the expectations of competing explanations to GPT diffusion theory, the effects of military spending and VoC are insignificant. This is consistent with the findings from the historical case studies.

Quantitative appendix table 1 displays the results after incorporation of three additional controls. First, trade linkages expose countries to advanced techniques and new ideas, opening the door to technology diffusion. A high level of trade openness has been associated with more intensive adoption of information technologies.⁴⁰ Relatedly, there is evidence that a country’s openness to international trade has a positive and sizable effect on various measures of innovation, including high-technology exports, scientific publications, and patents.⁴¹ Second, higher urban density has been linked to faster diffusion of technologies such as the television and the internet.⁴² Model 8 incorporates a trade openness variable and an urbanization variable.

Third, patterns at the regional level could shape how computerization spreads around the world. Scholars have identified such regional effects on the diffusion of ideas, policies, and technologies.⁴³ In model 9, I assess spatial dynamics with dummy variables for the following regions: East Asia and Pacific; Europe and Central Asia; Latin America and Caribbean; the Middle East and North Africa; North America; South Asia; and sub-Saharan Africa.⁴⁴ The positive effect of GPT skill infrastructure on computerization stays strong and highly statistically significant across these two models.

To ensure that the results were not determined by my choice of independent variable, I constructed an alternative specification of GPT skill infrastructure. Reanalyzing data on 467,198 software engineering publications, I counted the number of distinct authors for each country, as a proxy for the breadth of researchers who could foster human capital in software engineering. Though I still maintain that the primary specification best captures software engineering skill infrastructure, this alternative construction guards against possible issues with institution-based indicators, such as problems with institutional disambiguation and nonstandardized author affiliations.⁴⁵ With this alternative specification, the estimated effect of GPT skill infrastructure on computerization remains positive and significant for the baseline model as well as for models with additional controls.⁴⁶

Duration Analysis

When it comes to whether great powers can harness the potential of GPTs for productivity growth, the speed of adoption—not just the intensity of adoption—is pertinent. In the historical case studies, technological leaders adapted more quickly to industrial revolutions because of their investments in widening the base of engineering knowledge and skills associated with GPTs. This leads to the second hypothesis.

H2: Countries with higher levels of GPT skill infrastructure in software engineering will experience faster levels of computer adoption.

In testing this hypothesis, the dependent variable shifts to the amount of time it takes for a country to reach a particular computerization rate. A critical step is to establish both the specific computerization rate that constitutes successful “adoption” of computers as well as when the process of diffusion begins. Regarding the former, I count the “first adoption” of computerization as when the proportion of households with a computer in a country reaches 25 percent. This approach is in line with Everett Rogers’s seminal work on the S-shaped curve for successful diffusion of an innovation, which typically takes off once the innovation reaches a 10 to 25 percent adoption rate.⁴⁷ For the duration analysis, since many of the countries enter the dataset with levels of computer adoption higher than 10 percent, the 25 percent level threshold is more suitable.⁴⁸

I take 1995 as the starting point for the diffusion of computers as a GPT. Though an earlier date may be more historically precise, the 1995 date is more appropriate for modeling purposes, as the data on computerization rates for countries before this time are sparse. In a few cases, a country clearly achieved the 25 percent computerization threshold before 1995.⁴⁹ As a practical measure to estimate the duration models, I assume that the time it took for these countries to adopt computers was one year. Right-censoring occurs with the last year of data, 2020, as many countries still had not reached the 25 percent computerization rate.

Using these data, I employ a Cox proportional hazards model to estimate the time it takes for countries to reach a 25 percent computerization rate based on the start date of 1995. Often used by political scientists to study conflict duration or the survival of peace agreements, duration models are also commonly used to investigate the diffusion of new technologies and to determine why some firms take longer to adopt a certain technology than others.⁵⁰ Freed from the demands of TSCS analysis for yearly data, the duration analysis expands the county coverage, incorporating all upper-middle-income economies or high-income economies, based on the World Bank’s income group classifications.⁵¹ The resulting sample, which excludes countries that never attained the 25 percent computerization threshold, includes seventy-six countries.

Table 6.2 reports the estimated coefficients from the duration analysis. Positive coefficients match with a greater likelihood of reaching the computerization threshold. I use the same explanatory variable and controls as the baseline model from the TSCS analysis. These variables all enter the model with their measures in 1995. Model 4a takes the 25 percent computerization rate as the adoption threshold, while model 4b adjusts it to 20 percent to ensure that this designation is not driving results.

Table 6.2 Time to Computerization by Country

	25% Threshold(4a)	20% Threshold(4b)
	Dependent Variable
GPT skill infrastructure	0.673^***	0.517^***
	(0.137)	(0.119)
GDP per capita	1.186^***	1.110^***
	(0.335)	(0.288)
Total population	0.127	0.062
	(0.085)	(0.074)
Polity score	0.022	0.024
	(0.025)	(0.023)
Military spending	0.017	−0.042
	(0.218)	(0.198)
Liberal market economy	0.760	0.785
	(0.503)	(0.493)
N (number of events)	76 (74)	83 (83)
Likelihood ratio test (df = 6)	112.9^***	111.2^***
Note: Standard errors in parentheses.
^p < .10; ^p < .05; ^**p < .01

As the models demonstrate, the effect of GPT skill infrastructure on the speed by which countries achieve computerization is positive and highly statistically significant, providing support for hypothesis 2. Based on model 4a’s hazard ratio for the independent variable (1.96) for a given year, a tenfold increase in software engineering university density doubles the chances of a country reaching the computerization threshold.⁵² These results hold up after introducing additional control variables (quantitative appendix table 3).⁵³

A Cross-Sectional Approach: Averages across the 1995–2020 Period

As an additional check, I collapse the panel dataset into cross-sectional averages of GPT skill infrastructure and computerization over the 1995–2020 period in a large sample of countries. In certain aspects, cross-sectional evidence could be more appropriate for comprehending the impact of features, like skill formation institutions, that are difficult to capture in yearly intervals because they change gradually.⁵⁴ This approach allows for more countries to be included, as the yearly data necessary for TSCS analysis were unavailable for many countries. Limiting the sample based on the same scope conditions as the duration analysis leaves 127 countries.

I also include the same set of controls used in the previous analyses of the panel data: GDP per capita, total population, regime type, military spending, and liberal market economies. I employ ordinary least squares (OLS) regression to estimate the model. Since both a scale-location plot and a Breausch-Pagan test demonstrate that heteroskedasticity is not present in the data, it is appropriate to use an OLS regression estimator with normal standard errors.

The results of the regression analysis provide further support for the theoretical expectations. To recap, the independent variable is the estimated average skill infrastructure for software engineering between 1995 and 2020, and the dependent variable is the average computerization rate during the same period. Since analyzing bibliographic information on yearly software engineering publications for 127 countries is a demanding exercise, I estimated the average number of universities that nurture software engineering skills based on publications for the two middle years (2007, 2008) in the dataset, instead of deriving this average based on publication data across the entire time range.⁵⁵ Table 6.3 displays the results, with the incremental inclusion of control variables, also averaged over the period 1995–2020, across three models.⁵⁶ The coefficient on the GPT skill infrastructure measure remains positive and highly statistically significant across all three models (p < .01).

Table 6.3 GPT Skill Infrastructure Predicts More Computerization

	Computerization
	Dependent Variable
	(5)	(6)	(7)
GPT skill infrastructure	3.211^***	3.737^***	3.761^***
	(0.528)	(0.609)	(0.649)
GDP per capita	16.617^***	15.536^***	14.977^***
	(1.564)	(1.723)	(1.812)
Total population	−1.647^***	−0.739	−0.831^*
	(0.440)	(0.485)	(0.500)
Polity score		−0.066	−0.070
		(0.147)	(0.180)
Military spending			0.577
			(1.604)
Liberal market economy			5.017
			(4.269)
Constant	−83.099^***	−86.165^***	−79.773^***
	(20.577)	(22.316)	(23.636)
Observations	127	112	110
R²	0.812	0.833	0.834
Note: Standard errors in parentheses.
^p < .10; ^p < .05; ^**p < .01

I perform several additional tests to confirm the robustness of the results. I first include the same additional controls used in the preceding TSCS analysis. Quantitative appendix table 4 shows that the main findings are supported. One limitation of models that rely on cross-sectional averages is endogeneity arising from reverse causality. In other words, if greater diffusion of computers throughout the economy spurs more investment in institutions that broaden the pool of software engineers, then this could confound the baseline model’s estimates. To account for this possibility, in model 16 in quantitative appendix table 5, I operationalize GPT skill infrastructure using the estimate for the year 1995, the start of the period, instead of its average level over the 1995–2020 period.⁵⁷ Thus, this model captures the impact of GPT skill infrastructure in 1995 on how computerization progressed over the remaining sample years.⁵⁸ The effect remains positive and statistically significant.

While this chapter’s primary aim is to investigate empirical patterns expected by GPT diffusion theory, the quantitative analysis can also probe whether computerization is positively influenced by institutions that complement LS product cycles. To that end, I add a control variable that stands in for the institutional competencies prioritized by the LS model.⁵⁹ Measures of computer exports and ICT patents serve as two ways to capture a country’s ability to generate novel innovations in the computer industry.⁶⁰ In the resulting analysis, the LS-linked variables do not register as statistically significant.⁶¹

These results should be interpreted with care. In many cases, measures of institutional capacity to build a strong, innovative computer sector may be highly correlated with measures of GPT skill infrastructure. Because the statistical approach struggles to differentiate between the causal processes that connect these two features and computerization, the historical case studies take on the main burden of comparing the GPT diffusion and LS mechanisms against each other. Still, the inclusion of variables linked with the LS mechanism does suggest that, in the context of computerization, there is ineffectual evidence that the presence of a strong leading sector spills over into other sectors and generates multiplier effects—a key observable implication of LS accounts.⁶² Additionally, these models drive home the importance of differentiating between institutions linked to innovative activities (in the sense of introducing new products and processes) and engineering-oriented institutions, like GPT skill infrastructure, which are more connected to adoptive activities.⁶³

Summary

Using a variety of statistical methods, this chapter tested the expectation that countries better equipped to widen the base of engineering talent in a GPT will be more successful at diffusing that GPT throughout their economies. The combination of TSCS models, duration analysis, and cross-sectional regressions lends credence to the strength of the relationship between GPT skill infrastructure and computerization. The results hold across a range of additional tests and robustness checks.

There are two major limitations to this chapter’s approach. First, the statistical analysis should be interpreted mainly as an independent evaluation of GPT diffusion theory, not as an additional comparison between GPT diffusion theory and the causal pathway linked to LS product cycles. In the large-scale statistical analysis, indicators linked with the LS mechanism can also be associated with higher computerization rates, making it difficult to weigh the two explanations against each other. The rich historical detail in the case studies therefore provides the prime ground for tracing causal mechanisms.

Second, this chapter evaluates only one aspect of GPT skill infrastructure. A more comprehensive assessment would include not just the capacity to widen the pool of software engineering talent, which was the independent variable in this analysis, but also the strength of information flows between the GPT sector and application sectors. For instance, in the IR-2 case, both the United States and Germany trained large numbers of mechanical engineers, but American technological institutes placed more emphasis on practical training and shop experience, which strengthened connections between the US mechanical engineering education system and industrial application sectors. Building on data collection efforts that are starting to measure these types of linkages, such as the proportion of publications in a technological domain that involve industry-academia collaborations, future research should conduct a more complete assessment of GPT skill infrastructure.⁶⁴

Notwithstanding these limitations, the quantitative analysis backs a key observable implication of GPT diffusion theory: advanced economies’ level of GPT skill infrastructure is strongly linked to GPT adoption rates. Not only does this provide some initial support for the generalizability of this book’s central argument beyond just great powers, but it also gives further credibility to GPT diffusion theory’s relevance to US-China competition today.

7 US-China Competition in AI and the Fourth Industrial Revolution

THE FIRST MACHINE to defeat a human Go champion; powerful language models that can understand and generate humanlike text; a computer program that can predict protein structures and enable faster drug discovery—these are just a few of the newest discoveries in AI that have led some to declare the arrival of a Fourth Industrial Revolution (IR-4).¹ As for the latest geopolitical trends, China’s rise has been the dominant story of this century as national security establishments grapple with the return of great power competition. Located squarely at the intersection of these two currents, the US-China technological rivalry has become an inescapable topic of debate among those interested in the future of power—and the future itself.

Who will lead the way in the Fourth Industrial Revolution? To answer this question, leading thinkers and policymakers in the United States and China are both drawing lessons from past technology-driven power transitions to grapple with the present landscape. Unfortunately, as this chapter argues, they have learned the wrong lessons. Specifically, the leading-sector perspective holds undue influence over thinking about the relationship between technological change and the possibility of a US-China power transition. Yet careful tracing of historical cases and statistical analysis have revealed that the GPT mechanism provides a better model for how industrial revolutions generate the potential for a power transition. When applied to the IR-4 and the evolving US-China power relationship, GPT diffusion theory produces different insights into the effects of the technological breakthroughs of today on the US-China power balance, as well as on the optimal strategies for the United States and China to pursue.

This chapter sketches out the potential impacts of today’s emerging technologies on the US-China power balance. I first describe the current productivity gap between the United States and China, with particular attention to concerns that the size of this gap invalidates analogies to previous rising powers. Next, I review the array of technologies that have drawn consideration as the next GPT or next LS. Acknowledging the speculative nature of technological forecasting, I narrow my focus to developments in AI because of its potential to revitalize growth in ICT industries and transform the development trajectories of other enabling technologies.

The essence of this chapter is a comparison of the implications of the LS and GPT mechanisms for how advances in AI will affect a possible US-China economic power transition. In contrast to prevailing opinion, which hews closely to the LS template, GPT diffusion theory suggests that the effects of AI on China’s rise will materialize through the widespread adoption of AI across many sectors in a decades-long process. The institutional factors most pertinent to whether the United States or China will more successfully benefit from AI advances are related to widening the base of AI-related engineering skills and knowledge. I also spell out how the implications of GPT diffusion theory for the US-China power balance differ from those derived from alternative explanations.

The objective here is not to debate whether China will or will not catch up to the United States, or whether technological capabilities are more significant than all other considerations that could affect China’s long-term economic growth. Rather, this chapter probes a more limited set of questions: If emerging technologies were to significantly influence the US-China economic power balance, how would this occur? Which country is better positioned to take advantage of the Fourth Industrial Revolution? What would be the key institutional adaptations to track?²

A Power Transition in Progress?

Over the past four decades, there has been no greater shift in the global economic balance than China’s rise. China is either already the world’s largest economy, if measured by purchasing power parity (PPP) exchange rates, or is projected to soon overtake the United States, based on nominal exchange rates.³ China’s impressive economic growth has led many to proclaim that the era of US hegemony is over.⁴

Economic size puts China in contention with the United States as a great power competitor, but China’s economic efficiency will determine whether a power transition occurs. Countries like Switzerland outpace the United States on some measures of economic efficiency, but they lack the economic size to contend. Other rising powers, such as India, boast large economies but lag far behind in economic efficiency. Mike Beckley concludes: “If the United States faces a peer competitor in the twenty-first century … it will surely be China.”⁵ For this conditional to be true, China’s productivity growth is critical.⁶ This is not the first time in history that China has boasted the world’s largest economy; after all, it held that distinction even as Britain was taking over economic leadership on the back of the First Industrial Revolution.

Where does China currently stand in comparison to the productivity frontier? Based on 2018 figures, China’s real GDP per capita (at 2010 PPPs) is about 30 percent that of the United States.⁷ From 2000 to 2017, China’s total factor productivity (TFP) never surpassed 43 percent of US TFP (figure 7.1).⁸ In 2015, labor productivity in China remained at only 30 percent of that in the United States, though this figure had doubled over the past two decades.⁹

These numbers suggest that China sits much further from the productivity frontier than past rising powers. If the US-China power relationship is fundamentally different from those in previous eras, the relevance of conclusions from previous cases could be limited.¹⁰ For instance, in the early years of the IR-1, Britain was only slightly behind the Netherlands, the productivity leader at the time. The United Kingdom’s GDP per capita was 80 percent of Dutch GDP per capita in 1800.¹¹ Similarly, at the beginning of the IR-2, the United States trailed Britain in productivity by a small margin. In 1870, labor productivity and TFP in the United States were 90 percent and 95 percent, respectively, of labor productivity and TFP in Britain.¹² During the 1870s, average GDP per capita in the United Kingdom was about 15 percent higher than average GDP per capita in the States.¹³

FIGURE 7.1. US-China Productivity Gap (2000–2017). *Source*: Penn World Table version 9.1; Feenstra, Inklaar, and Timmer 2015.

Still, that China could surpass the United States in productivity leadership is not outside the realm of possibility.¹⁴ The IR-3 case is a better comparison point for the current productivity gap between the United States and China. In 1960, the start of the IR-3 case, Japanese GDP per capita was 35 percent of US GDP.¹⁵ At the time, Japan’s TFP was 63 percent of US TFP, and Japan’s labor productivity was only 23 percent of US labor productivity, a lower proportion than the China-US ratio at present.¹⁶ Despite the initial chasm, the TFP gap between the United States and Japan narrowed to only 5 percent in 1991.¹⁷

Indeed, productivity growth is crucial for China to sustain its economic rise in the long term. For the 1978–2007 period, Xiaodong Zhu decomposed the sources of China’s economic growth into labor deepening, human capital, capital deepening, and total factor productivity growth. He found that growth in TFP accounted for 78 percent of China’s growth in GDP per capita.¹⁸ The burden on TFP improvements will only increase, given the diminishing impact of other drivers of China’s growth miracle, such as urbanization and demographic dividends.¹⁹

Whether China can sustain productivity growth is an open question. Plagued by inefficient infrastructure outlays, China’s aggregate TFP growth declined from 2.8 percent in the decade before the global financial crisis to 0.7 percent in the decade after (2009–2018).²⁰ If calculated using alternative estimates of GDP growth, China’s TFP growth was actually negative from 2010 to 2017, averaging −0.5 percent.²¹ This productivity slowdown is not unique to China. Even before the 2008 global financial crisis, there was a slowdown in TFP growth among advanced economies due to waning effects from the information and communications technologies (ICTs) boom.²² China’s labor productivity growth also decelerated from 8.1 percent in 2000–2007 to 4.2 percent in 2011–2019, but the later period’s slowed growth rate was still six times greater than the US labor productivity growth rate in the same period.²³

Adaptation to technological advances will be central to China’s prospects of maintaining high rates of productivity growth. China’s leaders worry about getting stuck in the “middle-income trap,” a situation in which an economy is unable to advance to high-income status after it exhausts export-driven, low-cost manufacturing advantages. Many studies have stressed the linkage between China’s capacity to develop and absorb emerging technologies and its prospects for escaping the middle-income trap.²⁴ The Chinese government also increasingly pushes the development and adoption of information technology and other cutting-edge technologies as a way to increase TFP.²⁵ Thus, tracking China’s future productivity growth necessitates a better understanding of specific technological trajectories in the current period.

Key Technological Changes in the IR-4

ICTs, the key technological drivers of the IR-3, are still at the heart of the IR-4. From visionaries and daydreamers to economists and technology forecasters, there is a wide-ranging consensus that AI will breathe new life into the spread of digitization. The World Economic Forum calls AI the “engine that drives the Fourth Industrial Revolution.”²⁶ Kai-Fu Lee, the former head of Google China, boldly asserts, “The AI revolution will be on the scale of the Industrial Revolution but probably larger and definitely faster.”²⁷ To further explore AI’s role in the IR-4, I examine this technological domain as a source of both GPT and LS trajectories.

Candidate Leading Sectors

LS accounts forecast that in future waves of technological change ICTs will continue to drive economic transformation. According to one analysis of US-China technological rivalry in the twenty-first century, ICTs are “widely regarded as the current leading sector.”²⁸ I reviewed five key texts that predicted future leading sectors, all written by scholars who study historical cycles of technological change and global leadership transitions.²⁹ These forecasts also highlight other candidate leading sectors, including lasers and new sources of energy, but they converge on ICTs as the leading sector of the next wave of technological disruption.

Informed by the LS model, AI’s effects on global technological competition are often framed through its potential to open up new opportunities for latecomers to catch up and leapfrog advanced countries in key segments like AI chips. China’s national AI development plan outlines its ambition to become the world’s leading center of innovation in AI by 2030.³⁰ Scholars analyze China’s capacity to develop global intellectual monopolies in certain AI applications and enhance independent innovation in AI so as to guard against other countries leveraging weaponized interdependence.³¹ Descriptions of China’s AI strategy as aimed toward seizing “the commanding heights” of next-generation technologies reflect the belief that competition in AI will be over global market shares in strategic sectors.³²

Candidate GPTs

Among the possible GPTs that could significantly impact a US-China economic power transition, AI stands out. Like the literature on leading sectors, the GPT literature also converges on ICTs as a continued driver of technological revolution. Kenneth Carlaw, Richard Lipsey, and Ryan Webb, three pioneers of GPT-based analysis, identify programmable computing networks as the basic GPT that is driving the modern ICT revolution.³³ Crucially, AI could open up a new trajectory for this ICT revolution. Recent breakthroughs in deep learning have improved the ability of machines to learn from data in fundamental ways that can apply across hundreds of domains, including medicine, transportation, and other candidate GPTs like biotechnology and robotics. This is why AI is often called the “new electricity”—a comparison to the prototypical GPT. Economists regard it as the “next GPT”³⁴ and “the most important general-purpose technology of our era.”³⁵

Several studies have found evidence for a GPT trajectory in AI. One study, using a novel dataset of preprint papers, finds that articles on deep learning conform with a GPT trajectory.³⁶ Using patent data from 2005 to 2010 to construct a three-dimensional indicator for the GPT-ness of a technology, Petralia ranks technological classes based on their GPT potential.³⁷ His analysis finds that image analysis, a field that is closely tied to recent advances in deep learning and AI, ranks among the top technological classes in terms of GPT-ness.³⁸ Another effort employs online job posting data to differentiate among the GPT-ness of various technological domains, finding that machine learning technologies are more likely to be GPTs than other technologies such as blockchain, nanotechnology, and 3D printing.³⁹

To be sure, forecasts of future GPTs call attention to other technological trends as well. Other studies have verified the GPT potential of biotechnology.⁴⁰ Robotics, another candidate GPT for the IR-4, could underpin “the next production system” that will boost economy-wide productivity, succeeding the previous one driven by information technology.⁴¹ While my primary reasons for limiting my analysis to AI are based on space constraints as well as on its prominence in the surrounding literature, it is also important to note that developments in both biotechnology and robotics are becoming increasingly dependent on advances in deep learning and big data.⁴²

The Limits of Technological Foresight

Unlike exercises to pinpoint key technologies of previous industrial revolutions, which benefited from hindsight, identifying technological drivers in the IR-4 is a more speculative exercise. It is difficult to find true promise amid the hype. The task is made harder by the fact that even experts and technological forecasting bodies regularly miss the next big thing. In 1945, a team led by Dr. Theodore von Kármán, an eminent aerospace engineer, published Toward New Horizons, a thirty-two-volume text about the future of aviation. The study failed to foresee major new horizons such as the first human in space, intercontinental ballistic missiles (ICBMs), and solid-state electronics—all of which emerged within fifteen years.⁴³ In the early 1990s, the US Army conducted a technology forecast assessment to identify the technologies most likely to transform ground warfare in the next century. When the forecast was evaluated in 2008 by the Army’s senior scientists and engineers, it graded out at a “C.” Among its most significant misses was the development of the internet.⁴⁴

I am no Cassandra. It is very possible that if I were writing this book in 2000, this chapter would focus on the promise of nanotechnology, not AI. At that time, President Bill Clinton had just unveiled the National Nanotechnology Initiative. In a 2003 speech, Philip J. Bond, the undersecretary for technology at the Department of Commerce at the time, declared:

Nano’s potential rises to near Biblical proportions. It is not inconceivable that these technologies could eventually achieve the truly miraculous: enabling the blind to see, the lame to walk, and the deaf to hear; curing AIDS, cancer, diabetes and other afflictions; ending hunger; and even supplementing the power of our minds, enabling us to think great thoughts, create new knowledge, and gain new insights.⁴⁵

Decades later, there is a collective exhaustion around the hype surrounding nanotechnology, a phenomenon one scientist calls “nanofatigue.”⁴⁶

One lesson that stands out from mapping the technological landscape in past industrial revolutions is that the most significant GPTs of an era often have humble origins. In the IR-2, new innovations in electricity and chemicals garnered the most attention, but America’s economic rise owed more to advances in machine tools that were first introduced decades earlier. In the same way, “old” GPTs like electricity could still shock the world.⁴⁷ Today there is still a lot of potential for expanded industrial electrification, which could have a major impact on productivity.⁴⁸ Similarly, high-capacity battery technologies could transform productivity on a broad scale.⁴⁹ Interestingly, patent data also demonstrate the continued importance of electrical technologies. Among the top ten GPT candidates as ranked by Petralia’s indicator of “GPT-ness,” there were as many technological classes in the electrical and electronic category as there were in the computers and communications category.⁵⁰

For my purposes, it is reassuring that developments in AI also draw on a long history. In the United States, the legitimization of AI as an important field of research dates back to the 1960s.⁵¹ Thus, though the rest of the chapter takes AI as the most important GPT for the near future, it does so with a humble mindset, acknowledging that looking forward to the future often starts with digging deeper into the past.

The GPT vs. LS Mechanisms in the IR-4

There has been no shortage of speculation about whether the United States or China is better fit for the new AI revolution. Each week it seems there is a new development in the “AI arms race” between the two nations.⁵² Many believe that China is an AI superpower on the verge of overtaking the United States in the key driver of the IR-4.⁵³ As the following sections will show, these discussions tend to follow the LS template in their assumptions about the trajectory of AI development and key institutional adjustments.

Conversely, GPT diffusion theory provides an alternative model for how AI could affect the US-China power balance, with implications for the optimal institutional adaptations to the AI revolution. I conclude that, if the lessons of past industrial revolutions hold, the key driver of a possible US-China economic power transition will be the relative success of these nations in diffusing AI throughout their economies over many decades. This technological pathway demands institutional adaptations to widen the base of AI engineering skills and knowledge. While recognizing that international competition over AI is still in the early innings, this chapter outlines a preliminary framework for assessing which country’s roster is better equipped for success.

Impact Timeframe: The Decisive Years in the US-China AI Competition

If guided by the LS mechanism, one would expect the impacts of AI on US-China power competition to be very significant in the early stages of the technology’s trajectory. Indeed, many prominent voices have articulated this perspective. Consider, for example, a report titled “Is China Beating the US to AI Supremacy?,” authored by Professor Graham Allison, the director of Harvard Kennedy School’s Belfer Center for Science and International Affairs and Eric Schmidt, former CEO of Google and cochair of the National Security Commission on Artificial Intelligence (NSCAI). For Allison and Schmidt, the decisive years in US-China AI competition are just around the corner. Assuming that AI advances will be rapidly adopted across many economic domains, their aim is to “sound an alarm over China’s rapid progress and the current prospect of it overtaking the United States in applying AI in the decade ahead.”⁵⁴ Shaped by a similar framework, other influential texts also predict that China’s productivity boost from AI will come to fruition in the 2020s.⁵⁵

If GPT diffusion theory serves as the basis for analysis, these influential texts severely underestimate the time needed for economic payoffs from AI. Historical patterns of GPT advance have borne out that, even in early adopter countries, it takes at least three or four decades for these fundamental technologies to produce a significant productivity boost.⁵⁶

Using this pattern as guidance, we can roughly project AI’s impact timeframe, after establishing an initial emergence date for this GPT. In 2012, the AlexNet submission to ImageNet, a competition that evaluates algorithms on large-scale image classification, is widely recognized as spurring this current, deep learning–based paradigm of AI development.⁵⁷ If using the metric of when a GPT achieves a 1 percent adoption rate in the median sector, the AI era probably began in the late 2010s.⁵⁸ As of 2018, according to the most recent census survey on the extent of AI adoption in the US economy, only 2.75 percent of firms in the median sector reported using AI technologies.⁵⁹ Thus, regardless of which arrival date is used, if AI, like previous GPTs, requires a prolonged period of gestation, substantial productivity payoffs should not materialize until the 2040s and 2050s.⁶⁰

Of course, other factors could affect AI’s expected impact timeframe, including the possibility that the general process of technological adoption is accelerating. Some evidence indicates that the waiting time for a significant productivity boost from a new GPT has decreased over time.⁶¹ Lee argues that the AI revolution will be faster than previous GPT trajectories owing to the increasingly frictionless distribution of digital algorithms and more mature venture-capital industry.⁶² Nevertheless, preliminary evidence suggests that AI will face similar implementation lags as previous GPTs, including bottlenecks in access to computing resources, human capital training, and business process transformations.⁶³

Phase of Relative Advantage: Innovation-centrism and China’s AI Capabilities

Debates about China’s scientific and technological power reduce complex dynamics to one magic word—“innovation.”⁶⁴ Whether China can generate novel technologies is often the crux of debates over China’s growing scientific and technological capabilities and a potential US-China power transition.⁶⁵ For David Rapkin and William Thompson, the prospect of China overtaking the United States as the leading power is dependent on “China’s capacity to innovate”—specifically as it relates to revolutionary technological changes that allow challengers to leapfrog the leader in economic competition.⁶⁶ “If … China’s innovativeness continues to lag a considerable distance behind that of the US, then China overtaking the US might wait until the twenty-second century,” they posit.⁶⁷ China’s innovation imperative, as Andrew Kennedy and Darren Lim describe it in language common to LS analysis, is motivated by “monopoly rents generated by new discoveries.”⁶⁸

Innovation-centric views of China’s AI capabilities paint an overly optimistic picture of China’s challenge to US technological leadership. Allison and Schmidt’s Belfer Center paper, for instance, emphasizes China’s growing strength in AI-related R&D investments, leading AI start-ups, and valuable internet companies.⁶⁹ Likewise, the NSCAI’s final report suggests that China is poised to overtake the United States in the capacity to generate new-to-the-world advances in AI, citing shares of top-cited, breakthrough papers in AI and investments in start-ups.⁷⁰ These evaluations match up with viewpoints that are bullish on China’s overall technological capabilities, which also point to its impressive performance along indicators of innovation capacity, such as R&D expenditures, scientific publications, and patents.⁷¹

Some other comparisons of US and Chinese AI capabilities arrive at the opposite conclusion but still rely on the LS template. For instance, two Oxford scholars, Carl Frey and Michael Osborne, have likened claims that China is on the verge of overtaking the United States in AI to overestimates of Japan’s technological leadership in computers in the 1980s. In their view, just like Japan, China will fail to overtake the United States as the world’s technological leader because of its inability to produce radical innovations in AI. In fact, they claim, the prospects are even bleaker this time around: “China, if anything, looks less likely to overtake the United States in artificial intelligence than Japan looked to dominate in computers in the 1980s.”⁷²

If analysis of US-China competition in AI was centered on GPT diffusion theory, it would focus more on China’s capacity to widely adopt AI advances. In this scenario, it is neither surprising nor particularly alarming that China, like other great power contenders such as Japan in the IR-3, Germany in the IR-2, and France in the IR-1, contributes to fundamental innovations. No one country will corner all breakthroughs in a GPT like AI. The key point of differentiation will be the ability to adapt and spread AI innovations across a wide array of sectors.

A diffusion-centric perspective suggests that China is far from being an AI superpower. Trends in ICT adoption reveal a large gap between the United States and China. China ranks eighty-third in the world on the International Telecommunication Union’s ICT development index, a composite measure of a country’s level of networked infrastructure, access to ICTs, and adoption of ICTs.⁷³ By comparison, the United States sits among the world’s leaders at fifteenth. Though China has demonstrated a strong diffusion capacity in consumer-facing ICT applications, such as mobile payments and food delivery, Chinese businesses have been slow to embrace digital transformation.⁷⁴

In fact, it is often Chinese scholars and think tanks that acknowledge these deficiencies. According to an Alibaba Research Institute report, China significantly trails the United States in penetration rates of many digital technologies across industrial applications, including digital factories, industrial robots, smart sensors, key industrial software, and cloud computing.⁷⁵ China also significantly trails the United States in an influential index for adoption of cloud computing, which is essential to implementing AI applications.⁷⁶ In 2018, US firms averaged a cloud adoption rate of over 85 percent, more than double the comparable rate for Chinese firms.⁷⁷

To be fair, China has achieved some success in adopting robots, a key application sector of AI. China leads the world in total installations of industrial robots. Aided by favorable industry composition and demographic conditions, China added 154,000 industrial robots in 2018, which was more than were installed by the United States and Japan combined.⁷⁸ Based on 2021 data from the International Federation of Robotics, China outpaces the United States in robot density as measured by the number of industrial robots per 10,000 manufacturing employees.⁷⁹

However, China’s reputed success in robot adoption warrants further scrutiny. The IFR’s figures for employees in China’s manufacturing sector significantly underestimate China’s actual manufacturing workforce. If these figures are revised to be more in line with those from the International Labor Organization (ILO), China’s robot density would fall to less than 100 robots per 10,000 manufacturing employees, which would be around one-third of the US figure.⁸⁰ On top of that, talent bottlenecks hamper robot diffusion in China, since skilled technicians are required to reprogram robots for specific applications.⁸¹ An unused or ineffective robot counts toward robot density statistics but not toward productivity growth.

Breadth of Growth: Picking Winners vs. Horizontal Approaches to AI Development

Divergent perspectives on the breadth of growth in technological revolutions also separate LS-based and GPT-based views of the US-China case. If technological competition in the IR-4 is limited to which country gets a bigger share of the market in new leading industries like AI, then direct sectoral interventions in the mold of China’s AI strategy could be successful. However, if the breadth of growth in the IR-4 follows the GPT trajectory of the three previous industrial revolutions, another approach will be more effective.

China’s AI strategy has hewed closely to the LS model. This approach builds off a series of directives that prioritize indigenous innovation in select frontier technologies, an emphasis that first appeared in the 2006 “National Medium- and Long-Term Plan for the Development of Science and Technology” and extends through the controversial “Made in China 2025” plan.⁸² Since the mid-2000s, the number of sectoral industrial policies issued by the State Council, China’s cabinet-level body, has significantly increased.⁸³ Appropriately, the State Council’s 2017 AI development plan outlined China’s ambitions to become the world’s primary innovation center for AI technology.⁸⁴

On the breadth of growth dimension, tension between GPT diffusion theory and China’s application of the LS template is rooted in differing expectations for how the economic boost from revolutionary technologies will unfold. Take, for example, China’s 2010 “Strategic Emerging Industries” (SEI) initiative, which targets seven technological sectors based on opportunities for China to leapfrog ahead in new industries.⁸⁵ Oriented around assumptions that a limited number of technologically progressive industries will drive China’s future growth, the SEI defines success based on the resultant size of these industries, as measured by their value added as a share of GDP.⁸⁶

In contrast, GPT diffusion theory expects that, in the country that best capitalizes on the IR-4, productivity growth will be more dispersed. In this view, the AI industry never needs to be one of the largest, provided that AI techniques trigger complementary innovations across a broad range of industries. Relatedly, some Chinese thinkers have pushed back against industrial policies that favor narrow technology sectors. A research center under China’s own State Council, in a joint analysis with the World Bank, concluded in 2012: “A better innovation policy in China will begin with a redefinition of government’s role in the national innovation system, shifting away from targeted attempts at developing specific new technologies and moving toward institutional development and an enabling environment that supports economy-wide innovation efforts within a competitive market system.”⁸⁷ The economy-wide transformation enabled by AI, if it lives up to its potential as a GPT, demands a more broad-based response.

When it comes to technology policy, there is always a push and pull between two ends of a spectrum. Vertical industrial policy, or “picking winners,” targets certain technologies, often leading to top-down intervention to ensure that the nation’s firms are competitive in specific industries. Horizontal industrial policy promotes across-the-board technological development and avoids labeling certain technologies as more significant than others. This book argues that both camps have it partly right, at least when it comes to ensuring long-term economic growth in times of technological revolution. Picking technological winners is needed in that some technologies do matter more than others; however, the “winners” are GPTs, which require horizontal industrial policies to diffuse across many application sectors. Institutions for skill formation in AI engineering, the subject of the next section, split the difference between these two approaches.

Institutional Complementarities: GPT Skill Infrastructure in the IR-4

In 2014, Baidu, one of China’s leading tech giants, hired Andrew Ng away from Google, poaching the cofounder of Google’s deep learning team. Three years later, Baidu lured Qi Lu away from Microsoft, where he had served as the architect of the company’s AI strategy. Their departures were headline news and spurred broader discussions about China’s growing AI talent.⁸⁸

When Alibaba, another one of China’s tech giants, celebrated its listing on the Hong Kong stock exchange in November 2019, it showcased a different form of AI talent. In one picture of the gong-ringing celebration, Yuan Wenkai, who works for an Alibaba-owned logistics warehouse, stood third from the right. A former tally clerk who graduated from a run-of-the-mill Guangdong vocational school, Yuan holds substantial expertise in automation management. His success with boosting the sorting capacity of a logistics warehouse by twenty thousand orders per hour—responding to elevated demand during the shopping frenzy of Single’s Day (November 11)—merited an invite to the ceremony.⁸⁹

Even as AI systems exceed human-level performance at tasks ranging from playing Go to translating news articles, human talent will remain crucial for designing and implementing such systems.⁹⁰ According to one global survey of more than three thousand business executives, landing the “right AI talent” ranked as the top barrier to AI adoption for companies at the frontier of incorporating AI into their products, services, and internal processes.⁹¹ But what makes up the “right AI talent”? In its distilled form, GPT diffusion theory suggests that China’s chance of leading the AI revolution rests more on the Yuan Wenkais of the world than the Andrew Ngs. The most important institutional adjustments to the IR-4 are those that widen the pool of AI engineering skills and knowledge.

Indeed, alongside the maturation of the AI field, recent reports have emphasized skills linked to implementing theoretical algorithms in practice and in ways suited for large-scale deployment. In early 2022, the China Academy of Information and Communications Technology (CAICT), an influential research institute under the Ministry of Industry and Information Technology, published two reports that identified AI’s “engineering-ization” (工程化) as a significant trend that involves addressing challenges in transforming AI-based projects from prototypes to large-scale production.⁹² Relatedly, per Burning Glass job postings from 2010 to 2019, the overall increase in demand for “AI-adjacent” positions in the United States far exceeded that for “core AI” positions.⁹³ Covering skills needed to implement AI throughout many sectors and legacy systems, this pool of AI-adjacent jobs includes positions for systems engineers and software development engineers.

GPT Skill Infrastructure for AI: A US-China Comparison

At present, the United States is better positioned than China to develop the skill infrastructure suitable for AI. First, the United States has more favorable conditions for expanding the number of AI engineers. According to three separate projects that mapped out the global AI talent landscape, many more AI engineers work in the United States than in any other country.⁹⁴ In 2017, Tencent Research Institute and BOSS Zhipin (a Chinese online job search platform) found that the number of AI “practitioners” (从业者) in the United States far surpassed the corresponding Chinese figure. Figure 7.2 captures this gap across four key AI subdomains: natural language processing (by three times), chips and processors (by fourteen times), machine learning applications (by two times), and computer vision (by three times).⁹⁵ Overall, the total number of AI practitioners in the United States was two times greater than the corresponding figure for China.⁹⁶ Furthermore, data from two separate reports by LinkedIn and SCMP Research confirm that the United States leads the world in AI engineers.⁹⁷

In addition to statistics on the AI workforce, the quantity and quality of AI education is another consideration for which country is better positioned to develop GPT skill infrastructure for AI. Again, the United States leads China by a significant margin in terms of universities with faculty who are proficient in AI. In 2017, the United States was home to nearly half of the world’s 367 universities that provide AI education, operationalized by universities that have at least one faculty member who has published at least one paper in a top AI conference.⁹⁸ In comparison, China had only 20 universities that met this standard. After replicating this methodology for the years 2020–2021, the US advantage is still pronounced, with 159 universities to China’s 29 universities.⁹⁹

FIGURE 7.2. A US-China Comparison of AI Practitioners in Key Subdomains. *Source*: Tencent Research Institute and BOSS Zhipin 2017.

These findings contradict prominent views on the present global distribution of AI engineering talent. In his best-selling book AI Superpowers: China, Silicon Valley, and the New World Order, Kai-Fu Lee argues that the current AI landscape is shifting from an age of discovery, when the country with the highest-quality AI experts wins out, to an age of implementation, when the country with the largest number of sound AI engineers is advantaged.¹⁰⁰ In an age of implementation, Lee concludes, “China will soon match or even overtake the United States in developing and deploying artificial intelligence.”¹⁰¹ Pitted against the statistics from the previous passages, Lee’s evidence for China’s lead in AI implementers is meager. His attention is concentrated on anecdotes about the voracious appetite for learning about AI by Chinese entrepreneurs in Beijing.¹⁰² While this analysis benefits from Lee’s experience as CEO of Sinovation Ventures, a venture capital fund that invests in many Chinese AI start-ups, it could also be colored by his personal stake in hyping China’s AI capabilities.

Drawing on Lee’s book, Allison and Schmidt also assert that China is cultivating a broader pool of AI talent than the United States today. Specifically, they point out that China graduates three times as many computer science students as the United States on an annual basis.¹⁰³ Yet the study on which this figure is based finds that computer science graduates in the United States have much higher levels of computer science skills than their Chinese peers. In fact, the average fourth-year computer science undergraduate in the United States scores higher than seniors from the top programs in China.¹⁰⁴ Therefore, estimates of China’s pool of AI engineering talent will be misleading if they do not establish some baseline level of education quality. This is another reason to favor the indicators that support an enduring US advantage in AI engineers.¹⁰⁵

Second, as previous industrial revolutions have demonstrated, strong linkages between entrepreneurs and scientists that systematize the engineering knowledge related to a GPT are essential to GPT skill infrastructure. In the AI domain, an initial evaluation suggests that this connective tissue is especially strong in the United States. Based on 2015–2019 data, it led the world with the highest number of academic-corporate hybrid AI publications—defined as those coauthored by at least one researcher from both industry and academia—more than doubling the number of such publications from China.¹⁰⁶ Xinhua News Agency, China’s most influential media organization, has pinpointed the lack of technical exchanges between academia and industry as one of five main shortcomings in China’s AI talent ecosystem.¹⁰⁷

These preliminary indicators align with assessments of the overall state of industry-academia exchanges in China. Barriers to stronger industry-academia linkages include low mobility between institutions, aimless government-sponsored research collaborations, and misguided evaluation incentives for academic researchers.¹⁰⁸ One indicator of this shortcoming is the share of R&D outsourced by Chinese firms to domestic research institutes: this figure declined from 2.4 percent in 2010 to 1.9 percent in 2020. Over the same time period, the share of Chinese firms’ R&D expenditures performed by domestic higher education institutions also decreased from 1.2 percent to 0.4 percent.¹⁰⁹

Moreover, the US approach to AI standard-setting could prove more optimal for coordinating information flows between labs working on fundamental AI advances and specific application sectors. Market-mediated, decentralized standardization systems are particularly suited for advancing technological domains characterized by significant uncertainty about future trajectories, which clearly applies to AI.¹¹⁰ In such fields, governments confront a “blind giant’s quandary” when attempting to influence technological development through standards-setting.¹¹¹ The period when government involvement can exert the most influence over the trajectory of an emerging technology coincides with the time when the government possesses the least technical knowledge about the technology. Government intervention therefore could lock in inferior AI standards compared with market-driven standardization efforts.

In that light, China’s state-led approach to technical standards development could hinder the sustainable penetration of AI throughout its economy. For example, the Chinese central government plays a dominant role in China’s AI Industry Alliance, which has pushed to wrest leadership of standards setting in some AI applications away from industry-led standardization efforts.¹¹² Excessive government intervention has been a long-standing weakness of China’s standardization system, producing standards not attuned to market demands and bureaucratic rivalries that undermine the convergence of standards.¹¹³ Wang Ping, a leading authority on this topic, has argued that China needs to reform its standardization system to allow private standards development organizations more space to operate, like the Institute of Electrical and Electronics Engineers in the United States and the European Committee for Electrotechnical Standardization.¹¹⁴

In sum, the United States is better positioned than China to not only broaden its pool of AI engineering skills but also benefit from academia-industry linkages in AI engineering. In previous industrial revolutions, these types of institutional adaptations proved crucial to technological leadership. Still, much uncertainty remains in forecasts about GPT skill infrastructure for AI, especially with regard to determining the right measures for the right AI talent. Recent studies of market demand for AI- and ICT-related jobs suggest that employers are softening their demands for a four-year degree in computer science as a requirement for such positions.¹¹⁵ Certificate programs in data science and machine learning that operate under the bachelor-degree level could play an important role in expanding the pool of AI engineering talent.¹¹⁶ Taking into account these caveats, this section’s evaluation of GPT skill infrastructure at the very least calls into question sweeping claims that China is best placed to capitalize on the IR-4.

Reframing National AI Strategies

The preceding conclusions offer a marked contrast with how American and Chinese policymakers devise national AI strategies. Policy proposals for pursuing US leadership in AI consistently call for more AI R&D as the highest priority. For example, the report “Meeting the China Challenge: A New American Strategy for Technology Competition,” published in 2020 by a working group of twenty-eight China specialists and experts, provided sixteen policy recommendations for how the United States should ensure its leadership in AI and three other key technological domains. The very first recommendation was for the United States to significantly expand investment in basic research, raising total R&D funding to at least 3 percent of GDP.¹¹⁷ The Trump administration’s “American AI Initiative,” launched to maintain US leadership in AI “in a time of global power competition,” also listed AI R&D spending as its very first policy recommendation.¹¹⁸

The Chinese government also prioritizes investments in R&D, sometimes at the expense of other routes to productivity growth oriented around technology adoption and education.¹¹⁹ China’s five-year plan (2021–2025) aims to raise basic research spending by over 10 percent in 2021, targeting AI and six other key technological domains.¹²⁰ China consistently sets and meets ambitious targets for R&D spending, but that same commitment has not extended to education funding. While China’s R&D spending as a percentage of GDP in 2018 was higher than that of Brazil, Malaysia, Mexico, or South Africa (other middle-income countries that industrialized on a similar timeline), China’s public expenditure on education as a percentage of GDP was lower than the figure in those countries.¹²¹ According to a group of experts on China’s science and technology policy, one possible explanation for this disparity between attention to R&D versus education is the longer time required for efforts in the latter to yield tangible progress in technological development.¹²²

As both the United States and China transition from initiating a new GPT trajectory to diffusing one across varied application sectors, investing in the broader AI-adjacent skill base will become more crucial than cornering the best and the brightest AI experts. Policies directed at widening the AI talent base, such as enhancing the role of community colleges in developing the AI workforce, deserve more attention.¹²³ Applied technology centers, dedicated field services, and other technology diffusion institutions can incentivize and aid adoption of AI techniques by small and medium-sized enterprises.¹²⁴ Reorienting engineering education toward maintaining and overseeing AI systems, not solely inventing new ones, also fits this frame.¹²⁵

A strategy oriented around GPT diffusion does not necessarily exclude support for the exciting research progress in a country’s leading labs and universities. R&D spending undoubtedly will not just help cultivate novel AI breakthroughs but also contribute to widening the GPT skill infrastructure in AI. All too often, however, boosting R&D spending seems to be the boilerplate recommendation for any strategic technology.¹²⁶ GPTs like AI are not like other technologies, and they demand a different toolkit of strategies.

Alternative Factors

In exploring how the IR-4 could bring about an economic power transition, it is important to compare the implications derived from the GPT diffusion mechanism to those that follow from explanations stressing other factors. Consistent with the previous chapters, I first consider threat-based explanations and the varieties of capitalism (VoC) approach. I then address how US-China competition over emerging technologies could be shaped by differences in regime type, a factor that is particularly salient in this case.

Threat-Based Explanations

One potentially dangerous implication of threat-based explanations is that war, or manufacturing the threat of war, is necessary for economic leadership in the IR-4. Crediting the US military’s key role in spurring advances in GPTs during the twentieth century, Ruttan doubts that the United States could initiate the development of GPTs “in the absence of at least a threat of major war.”¹²⁷ Extending Ruttan’s line of thinking to the US strategic context in 2014, Linda Weiss expressed concerns that the end of the Cold War, along with the lack of an existential threat, removed the impetus for continued scientific and technological innovation. Specifically, she questioned “why China has not yet metamorphosed into a rival that spurs innovation like the Soviet Union and Japan.”¹²⁸ Weiss only needed a little more patience. A few years later, the narrative of a US-China “Tech Cold War” gained momentum as both sides of the bilateral relationship trumped up threats to push national scientific and technological priorities.¹²⁹

GPT diffusion theory strongly refutes the notion that manufacturing external threats is necessary for the United States or China to prevail in the IR-4. An external menace did not drive the rise of the United States in the IR-2. Across all cases, military actors were involved in but not indispensable to spurring the development of GPTs, as many civilian entities also fulfilled the purported role of military investment in providing a large initial demand for incubating GPTs. Furthermore, threat-based interpretations extend only to the moment when one country stimulates the first breakthroughs in a GPT. Even if stoking fears can galvanize support for grand moonshot projects, these do not determine which country is able to benefit most from the widespread adoption of advances in GPTs like AI. That hinges on the more low-key toil of broadening the engineering skill base and advancing interoperability standards in GPTs—not fearmongering.

VoC Explanations

Applying the VoC framework to US-China competition in AI gives more ambiguous results. The VoC approach states that liberal market economies (LME)—prototypically represented by the United States—are more conducive to radical innovation than coordinated market economies (CME).¹³⁰ It is unclear, however, whether China fits into the VoC framework as a CME or LME. While some label China a CME, others characterize it as an LME.¹³¹ This disputed status speaks to the substantial hybridity of China’s economy.¹³² China has been treated as a “white space on the map” of VoC scholarship, which was originally developed to classify different forms of advanced capitalist economies.¹³³ This makes it difficult to derive strong conclusions from VoC scholarship about China’s ability to adapt to the IR-4’s radical innovations.

The same holds if we focus on the skill formation aspect of the VoC framework. China’s education system emphasizes training for general skills over vocational skills.¹³⁴ In this respect, it is similar to LMEs like the United States, which means VoC theory provides limited leverage for explaining how the IR-4 could differentially advantage the United States or China. On this topic, GPT diffusion theory points to differences in AI engineering education as more significant than distinctions based on the CME-LME models.

Case-Specific Factors: Regime Type

What is the effect of regime type on technological leadership in the IR-4? The distinction between authoritarian China and the democratic United States takes center stage in arguments about the future of great power rivalry.¹³⁵ Regime type could also influence the specific aspect of great power competition that GPT diffusion tackles—whether one great power is able to sustain productivity growth at greater rates than its rivals by taking advantage of emerging technologies. Some evidence suggests that, owing to investments in inclusive economic institutions, democracies produce more favorable conditions for growth than autocracies.¹³⁶ Additionally, empirical work shows that democracies outgrow autocracies in the long run because they are more open to absorbing and diffusing new techniques.¹³⁷ Other studies find, more specifically, that internet technologies diffuse faster in democracies, possibly because nondemocracies are threatened by the internet’s potential to empower antigovernment movements.¹³⁸

On the other hand, the impact of democracy on technological progress and economic growth is disputed. Drawing on data from fifty countries over the 1970–2010 period, Taylor finds that regime type does not have a strong relationship with national innovation rates, as measured by patenting rates.¹³⁹ One review of the econometric evidence on democracy and growth concludes that “the net effect of democracy on growth performance cross-nationally over the last five decades is negative or null.”¹⁴⁰ Moreover, China’s rapid economic growth and adoption of internet technologies stands out as an exception to general claims about a democratic advantage when it comes to leveraging new technologies as sources of productivity growth. Contrary to initial expectations, incentives to control online spaces have made some autocratic regimes like China more inclined to spread internet technologies.¹⁴¹ Other scholars point out that the stability of China’s authoritarian regime has encouraged substantial contributions to R&D and technical education, the type of investments in sustained productivity growth typically associated with democracies.¹⁴²

It is not within the scope of this chapter to settle these debates.¹⁴³ Still, the juxtaposition of the GPT and LS mechanisms does speak to how regime type could shape US-China competition in the IR-4. Though the conventional wisdom links democracy to freewheeling thought and capacity for innovation, the most important effects of regime type in US-China technological competition during the IR-4, under the GPT diffusion framework, may materialize through changes in GPT skill infrastructure. Democracies tend to be more politically decentralized than autocracies, and decentralized states could be more responsive to the new demands for engineering skills and knowledge in a particular GPT. This accords with evidence that new technologies consistently diffuse more quickly in decentralized states.¹⁴⁴

Summary

How far can we take GPT diffusion theory’s implications for the US-China case? I have presented support for the GPT mechanism across a range of historical case studies, each of which covers at least four decades and two countries. At the same time, it is necessary to acknowledge limitations in translating lessons from past industrial revolutions and power transitions to the present.

To begin, it is important to clarify that my findings directly address the mechanism by which technological breakthroughs enable China to surpass the United States in economic productivity.¹⁴⁵ The scenario in which China overtakes the United States as the most powerful economy is different from one in which the US-China power gap narrows but does not fully disappear. Scholars have rightly noted that the latter scenario—“posing problems without catching up,” in the words of Thomas Christensen—still significantly bears on issues such as Taiwan’s sovereignty.¹⁴⁶ Even so, the possibility of China fully closing the power gap with the United States is especially crucial to study. When rising and established powers are close to parity, according to power transition theory, the risk of hegemonic war is the greatest.¹⁴⁷ China’s ability to sustain economic growth also affects its willingness and ability to exert influence in the international arena.

Next, GPT diffusion theory speaks to only one pathway to productivity leadership. The historical case studies have demonstrated that institutional responses to disruptive technological breakthroughs play a key part in economic power transitions. However, China’s prospects for long-term economic growth could also hinge on demographic and geographic drivers.¹⁴⁸

A number of factors could affect whether lessons from the past three industrial revolutions extend to the implications of present-day technological advances for a US-China power transition. The most plausible transferability issues can be grouped into those that relate to the nature of great power competition and those that relate to the nature of technological change.

First, as put forward by Stephen Brooks and William Wohlforth, China’s rise in the twenty-first century could face structural barriers that did not exist in previous eras.¹⁴⁹ Relying in part on data from 2005–2006, they argue that the present gap between the United States and China in military capabilities, as captured in long-term investments in military R&D, is much larger than past gaps between rising powers and established powers.¹⁵⁰ Arguably, the US-China gap in military expenditures has narrowed to the extent that comparisons to historical distributions of military capabilities are more viable. In 2021, China accounted for 14 percent of global military expenditures. This updated figure, albeit still much lower than the US share (38 percent), reflects China’s military modernization efforts over the past two decades during a time of declining US military spending.¹⁵¹ This ratio is more comparable to distributions of military capabilities in the historical periods analyzed in the case studies.¹⁵²

Another structural barrier China faces is that the growing complexity of developing and deploying advanced military systems now makes it more difficult for rising powers to convert economic capacity into military capacity than it was in the past.¹⁵³ There are a few reasons why it is still relevant to study China’s ability to convert the technological breakthroughs of the IR-4 into sustained productivity growth. To start, rising states could still benefit from the steady diffusion of some complex military technologies connected to advances in the commercial domain, such as armed uninhabited vehicles.¹⁵⁴ In addition, military effectiveness does not solely derive from extremely complex systems like the F-22 stealth fighter. Converting production capacity to military strength could be more relevant for China’s investments in asymmetric capabilities and those suited for specific regional conflicts, such as ground-based air defense systems and the rapid replacement of naval forces.¹⁵⁵ Lastly, there remains a strong connection between economic development and countries’ capabilities to “produce, maintain, and coordinate complex military systems.”¹⁵⁶

As for the second set of transferability issues, the technological landscape itself is changing. Accelerated globalization of scientific and technological activities may reduce the likelihood of adoption gaps between advanced economies when it comes to emerging technologies.¹⁵⁷ Despite these considerations, there are also compelling reasons to think that the nature of technological change in this current period only magnifies the importance of GPT diffusion theory. Cross-country studies indicate that while new technologies are spreading between countries faster than ever, they are spreading to all firms within a country at increasingly slower rates. Networks of multinational firms at the global technology frontiers have reduced cross-national lags in the initial adoption of new technologies, but the cross-national lags in the “intensive adoption” of new technologies, as measured by the time between the technologies’ initial adoption to intensive penetration throughout a country, has only grown.¹⁵⁸ These trends give more weight to the GPT mechanism.

Finally, even if the rise and fall of great technologies and powers is fundamentally different in the twenty-first century, previous industrial revolutions still exert substantial influence in the minds of academics and policymakers.¹⁵⁹ To justify and sustain their agendas, influential figures in both the United States and China still draw on these historical episodes. At the very least, this chapter submits different lessons to be learned from these guiding precedents.

When some of the leading thinkers of our era declare that the AI revolution will be more significant than the industrial revolution, it is difficult to not get caught up in their excitement. Somehow, every generation winds up believing that their lives coincide with a uniquely important period in history. But our present moment might not be so unprecedented. Unpacking how AI could influence a possible US-China power transition in the twenty-first century requires first learning the lessons of GPT diffusion from past industrial revolutions.

8 Conclusion

STUDIES OF HOW TECHNOLOGY interacts with the international landscape often fixate on the most dramatic aspect of technological change—the eureka moment. Consistent with this frame, the standard explanation for the technological causes of economic power transitions emphasizes a rising power’s ability to dominate profits in leading sectors by generating the first implementation of radical inventions. This book draws attention, in contrast, to the often unassuming process by which an innovation spreads throughout an economy. The rate and scope of diffusion is particularly relevant for GPTs—fundamental advances like electricity or AI that have the potential to drive pervasive transformation across many economic sectors.

Based on the process of GPT diffusion, this book puts forward an alternative theory of how and when significant technological breakthroughs generate differential rates of economic growth among great powers. When evaluating how technological revolutions affect economic power transitions, GPTs stand out as historical engines of growth that can provide major boosts to national productivity. Though each is different, GPTs tend to follow a common pattern: after multiple decades of complementary innovations and institutional adaptations, they gradually diffuse across a broad range of industries. Everything, everywhere, but not all at once.

This impact pathway markedly diverges from existing theories based on leading sectors. Akin to a sprint on a narrow lane, great power competition over leading sectors is framed as a race to dominate initial breakthroughs in the early growth periods of new industries. In contrast, GPT diffusion theory proposes that by more effectively adopting GPTs across many application sectors, some great powers can sustain higher levels of productivity growth than their competitors. Like a marathon on a wide road, great power competition over GPTs is a test of endurance.

Disruptive technological advances can bring about economic power transitions because some countries are more successful at GPT diffusion than others. A nation’s effectiveness at adapting to emerging technologies is determined by the fit between its institutions and the demands of those technologies. Thus, if economic power transitions are driven by the GPT trajectory, as opposed to LS product cycles, the institutional adaptations that matter most are those that facilitate information exchanges between the GPT sector and application sectors, in particular the ability of nations to widen the engineering skill base linked to a new GPT.

Three historical case studies, designed and conducted in a way to assess the explanatory power of the GPT mechanism against the LS mechanism, provide support for GPT diffusion theory. The case studies cover periods characterized by both remarkable technological change—the “three great industrial revolutions” in the eyes of some scholars—and major fluctuations in the global balance of economic power.¹ Overall, the case study evidence underscores the significance of GPT diffusion as the key pathway by which the technological changes associated with each industrial revolution translated into differential rates of economic growth among the great powers.

In the case of Britain’s rise to economic preeminence during the First Industrial Revolution, expanded uses of iron in machine-making spurred mechanization, the key GPT trajectory. The gradual progression of mechanization aligned with the period when Britain’s productivity growth outpaced that of France and the Netherlands. Britain’s proficiency in adopting iron machinery across a wide range of economic activities, rather than export advantages from dominating innovation in leading sectors such as cotton textiles, proved more central to its industrial ascent. Though its industrial rivals boasted superior systems of higher technical education for training expert scientists and top-flight engineers, Britain benefited from mechanics’ institutes, educational centers like the Manchester College of Arts and Sciences, and other associations that expanded access to technical literacy and applied mechanics knowledge.

The Second Industrial Revolution case also favors the GPT mechanism’s explanation of why certain great powers better adapt to periods of remarkable technological change. The LS mechanism focuses on Germany’s discoveries in new science-based industries, such as chemicals, as the driving force behind its catching up to Britain before World War I. However, the United States, emerging as the preeminent economic power during this period, was more successful than Germany in exploiting the technological opportunities of the Second Industrial Revolution. Enabled by innovations in machine tools, the extension of interchangeable manufacturing techniques across many American industries functioned as the key GPT trajectory that fueled the rise of the United States. Scientific infrastructure or industrial R&D capabilities, areas in which the United States lagged behind its industrial rivals, cannot account for its advantage in adopting special-purpose machinery across nearly all branches of industry. Rather, the United States gained from institutional adaptations to widen the base of mechanical engineering talent, including through the expansion of technical higher education schools and the professionalization of mechanical engineering.

Evidence from the US-Japan rivalry amid the information revolution exposes more gaps in the LS account. During the late twentieth century Japan captured global market shares in new fast-growing sectors such as consumer electronics and semiconductor components, prompting many to predict that it would overtake the United States as the leading economic power. Yet such an economic power transition, an inevitability based on the expectations of the LS mechanism, never occurred. Instead, the United States sustained higher rates of economic growth than Japan owing, in part, to greater spread of computerization across many economic sectors. Japan’s productivity growth kept up with the US rate in sectors that produced information technology but lagged far behind in sectors that intensively used information technology. Once again, differences in institutional adaptations to widen the GPT skill base turned out to be significant. While Japanese universities were very slow to adapt their training to the demand for more software engineers, US institutions effectively broadened the pool of such skills by cultivating a separate discipline of computer science.

As a supplement to the case studies, I conducted a large-n statistical analysis to test whether countries with superior GPT skill infrastructure preside over higher rates of GPT diffusion. Leveraging time-series cross-sectional data on software engineering education and computerization rates in nineteen countries (the G20 economies) across twenty-five years, the quantitative analysis confirmed this crucial expectation derived from GPT diffusion theory. I found less support for other factors often assumed to have a positive effect on an economy’s broader technological transformation, including institutional factors linked to securing LS product cycles. This empirical test validates a core component of GPT diffusion theory across a sample of the world’s major emerging and developed economies.

Main Contributions

First, at its core, Technology and the Rise of Great Powers introduces and defends GPT diffusion theory as a novel explanation for how and when technological change can lead to a power transition. Historical case studies and statistical analysis substantiate the explanatory power of GPT diffusion theory over the standard explanation of technology-driven power transitions based on leading sectors, which exerts enduring influence in policy and academic circles.² In doing so, the book answers the call by scholars such as Michael Beckley and Matthew Kroenig for the international relations field to devote more attention to the causes of power transitions, not just their consequences.³

By expounding on the significance of GPT skill infrastructure, the book points toward next steps to better understanding the politics behind some of the most significant technological advances in human history. To investigate why some countries are more successful in cultivating GPT skill infrastructure, promising avenues of research could tap into existing work that concentrates on centralization, inclusiveness of political institutions, government capacity to adopt long time horizons, and industrial organization.⁴ In these efforts, being careful to differentiate between various pathways by which technological changes make their mark, as this book does with the GPT and LS mechanisms, will be especially important when underlying political factors that satisfy the demands of one type of technological trajectory run counter to the demands of another.

Future work should also probe other institutional factors beyond GPT skill infrastructure that contribute to cross-national differences in GPT adoption. This opens up a universe of institutions that are often ignored in innovation-centered accounts of technological leadership, including gender gaps in engineering education,⁵ transnational ethnic networks that facilitate technology transfer,⁶ and “technology diffusion institutions,” such as standard-setting organizations and applied technology centers.⁷

In positioning the LS mechanism as the main foil to the GPT mechanism, my intention is to use this clash between theories to productively advance our understanding of the rise and fall of great technologies and powers. Contestation should not be misread as disparagement. In one sense, GPT diffusion theory builds on previous scholarship about leading sectors, which first identified the need to flesh out more specific linkages between certain technological advances and more highly aggregated economic changes in the context of power transitions.⁸ Testing, revising, and improving upon established theories is essential to gradual yet impactful scientific progress—not so unlike the incremental, protracted advance of a GPT.

Second, the book’s central argument also suggests revisions to assessments of power in international politics. Recognizing that scientific and technological capabilities are becoming increasingly central to a nation’s overall power, researchers tend to equate technological leadership with a country’s ability to initiate “key ‘leading sectors’ that are most likely to dominate the world economy into the twenty-first century.”⁹ For instance, an influential RAND report, “Measuring National Power in the Postindustrial Age,” proposes a template for measuring national power based on a country’s capacity to dominate innovation cycles in “leading sectors.”¹⁰ In this effort, the authors draw directly from LS-based scholarship: “The conceptual underpinnings of this template are inspired by the work of Schumpeter, Rostow, Gilpin, Kennedy, and Modelski and Thompson.”¹¹ This study has gained considerable traction in academic and policymaking circles, inspired further workshops on national power, and has been called “the definitive US study on CNP [Comprehensive National Power].”¹²

Contrary to these approaches, this book submits that evaluations of scientific and technological power should take diffusion seriously. Assessments that solely rely on indicators of innovation capacity in leading sectors will be misleading, especially if a state lags behind in its ability to spread and embed innovations across productive processes. A more balanced judgement of a state’s potential for technological leadership requires looking beyond multinational corporations, innovation clusters like Silicon Valley, and eye-popping R&D numbers to the humble undertaking of diffusion. It shines the spotlight on a different cast of characters: medium-sized firms in small towns, engineers who tweak and implement new methods, and channels that connect the technological frontier with the rest of the economy.

In an article published in the Review of International Political Economy journal, I illustrated the value of this diffusion-oriented approach in gauging China’s scientific and technological capabilities.¹³ Preoccupied with China’s growing strength in developing new-to-the-world advances, existing scholarship warns that China is poised to overtake the United States in technological leadership. This is mistaken. There is still a large gap between the United States and China when it comes to the countries’ readiness to effectively spread and utilize cutting-edge technologies, as measured by penetration rates of digital technologies such as cloud computing, smart sensors, and key industrial software. When the focus shifts away from impressive and flashy R&D achievements and highly cited publications, China’s “diffusion deficit” comes to light. Indeed, a diffusion-centric assessment indicates that China is much less likely to become a scientific and technological superpower than innovation-centric assessments predict.

Relatedly, the GPT diffusion framework can be fruitfully applied to debates about the effects of emerging technologies on military power. Major theories of military innovation focus on relatively narrow technological developments, such as aircraft carriers, but the most consequential military implications of technological change might come from more fundamental advances like GPTs. In an article that employs evidence from electricity’s impact on military effectiveness to analyze how AI could affect the future of warfare, Allan Dafoe and I challenge studies that predict AI will rapidly spread to militaries around the world and narrow gaps in capabilities.¹⁴

Third, as chapter 7 spells out in detail, GPT diffusion theory provides an alternative model for how revolutionary technologies, in particular AI, could affect the US-China power balance. This, in turn, implies different optimal policies for securing technological advantage. Drawing on the LS template, influential thinkers and policymakers in both the United States and China place undue emphasis on three points: the rapid timeframe of economic payoffs from AI and other emerging technologies; where the initial, fundamental innovations in such technologies cluster; and growth driven by a narrow range of economic sectors.

GPT diffusion theory suggests diverging conclusions on all three dimensions. The key technological trajectory is the relative success of the United States and China in adopting AI advances across many industries in a gradual process that will play out over multiple decades. It will be infeasible for one side to cut the other off from foundational innovations in GPTs. The most important institutional factors, therefore, are not R&D infrastructure or training grounds for elite AI scientists but rather those factors that widen the skill base in AI and enmesh AI engineers in cross-cutting networks with entrepreneurs and scientists.¹⁵

Yet, the United States is fixated on dominating innovation cycles in leading sectors. When it comes to their grand AI strategy, US policymakers are engrossed in ensuring that leading-edge innovations do not leak to China, whether by restricting the immigration of Chinese graduate students in advanced technical fields or by imposing export controls on high-end chips for training large models like GPT-3 and ChatGPT.¹⁶ A strategy informed by GPT diffusion theory would, instead, prioritize improving and sustaining the rate at which AI becomes embedded in a wide range of productive processes. For instance, in their analysis of almost 900,000 associate’s degree programs, Center for Security and Emerging Technology researchers Diana Gehlhaus and Luke Koslosky identified investment in community and technical colleges as a way to unlock “latent potential” in the US AI talent pipeline.¹⁷ This recommendation accords with an OECD working paper on the beneficial effects of a wider ICT skills pool on digital adoption rates across twenty-five European countries. The study finds that “the marginal benefit of training for adoption is found to be twice as large for low-skilled than for high-skilled workers, suggesting that measures that encourage the training of low-skilled workers are likely to entail a double dividend for productivity and inclusiveness.”¹⁸

At the broadest level, this book demonstrates a method to unpack the causal effects of technological change on international politics. International relations scholars persistently appeal for the discipline to better anticipate the consequences of scientific and technological change, yet these demands remain unmet. By one measure, between 1990 and 2007, only 0.7 percent of the twenty-one thousand articles published in major international relations journals explicitly dealt with the topic of science and technology.¹⁹ One bottleneck to researching this topic, which Harold Sprout articulated back in 1963, is that most theories either grossly underestimate the implications of technological advances or assume that technological advance is the “master variable” of international politics.²⁰

This book shows that the middle ground can be a place for fruitful inquiry. Technology does not determine the rise and fall of great powers, but some technological trends, like the diffusion of GPTs, do seem to possess momentum of their own. Social and political factors, as represented by GPT skill infrastructure, shape the pace and direction of these technological trajectories. This approach is particularly useful for understanding the effects of technological change across larger scales of time and space.²¹

2024-12-05

康拉德·黑塞：论宪法规范力

一、实际上的宪法和法律上的宪法

1862年4月16日，费迪南·拉萨尔（Ferdinand Lassalle）在追求进步和自由的柏林地区协会发表了关于立宪主义的演讲。[1]他的基本论点为，宪法问题本就不是法律问题，而是权力问题。因为一国宪法是该国实际存在的权力关系的体现：体现为军队的军事力量，体现为大地主影响力的社会力量，体现为大规模工业和资本的经济力量。此外，虽然精神力量与上述力量不属于同一类别，但也体现在一般意识和普通教育中。以上因素的相互作用，是决定社会上一切法律和法律制度的推动力量，使法律和法律制度不能与其本质存在根本的不同，因此这些因素都是一国“实际上的宪法”（wirkliche Verfassung）。用拉萨尔的话来说，那些通常被称为宪法的内容，即“法律上的宪法”（rechtliche Verfassung），只是一纸空文而已。只有与实际上的宪法相一致时，法律上的宪法才能发挥激励和规范的作用。否则，二者就会发生无法避免的冲突。从长远来看，法律上的宪法，仅是一纸空文，必然会屈服于国家实际存在的权力关系。

不管是政治家还是律师都这样教育我们：宪法问题本就不是法律问题，而是权力问题。在拉萨尔发表以上观点40年后，格奥尔格·耶利内克（Georg Jellinek）发表了如下观点：“宪法的发展为我们提供了一个巨大的教训，但是其巨大的意义仍然没有得到足够的重视，即法律条文无法实际控制国家权力的分配。真正的政治力量是按照独立于所有法律形式的规律运行的。”[2]在当下，这种思想显然没有过时，其只是被简化了，以显性或隐性的方式仍然存在，因为被拉萨尔看作权力的决定性因素之一的一般意识和普通教育已经完全退居幕后。这种想法似乎更加吸引人，它是如此显而易见；它似乎相当清醒地站在事实的基础上，把所有的幻想抛到一边；它似乎已被历史经验所证实，宪法的历史似乎确实告诉我们，在日常生活的政治斗争中，就像在国家生活的决定性问题上，政治现实的力量总是大于法律规范的力量，规范性总是不得不让位于现实性。我们只需要回顾一下拉萨尔在前述演讲中论及的普鲁士预算冲突，令耶利内克发出前述颓丧之言的议会政治地位的变化，或者《魏玛宪法》自始就无可辩驳的失败。

就后果来看，现实条件具有决定性作用意味着：法律上的宪法发挥作用的前提是现实与规范的完全一致，但这只是一种假设的特殊情形。（原则上，静态的、理性的规范和运动的、非理性的现实之间存在着一种无法消除的内在张力。）因此，既然根据上述观点，宪法的冲突状态是持续存在的：即在其本质性构成部分上，也即非纯技术性构成部分上，法律上的宪法受制于实际上的宪法。那么，关于实际上的宪法具有决定性作用的观点，无非是对法律上的宪法的否定。人们可以用鲁道夫·索姆（Rudolf Sohm）的著名短语[3]的变体说：宪法法与宪法本质相矛盾。[4]

对宪法的这种否定也就包含了对宪法学作为一门法律科学的价值的否定。宪法学与所有的法律科学一样，是一门规范科学，这使它区别于作为纯粹现实科学的政治社会学和政治学。如果宪法规范等同于对不断变化的事实关系的反映，那么宪法学就会成为一门没有法的法学。最终，它除了反复陈述和评论现实政治所创造的事实之外，将没有其他任务。在这种情况下，宪法学研究不是为使命性的公正的国家秩序服务，而是为了对既有权力关系进行法律正当化辩护。倘若真如上述所言，对于一个学科而言是不光彩的。如果接受此种对于宪法的否定态度，如果将实际上的宪法视为唯一的决定性因素，那么宪法学就失去了作为规范科学的特征，从而变成了一门纯粹的事实科学，也就不再区别于社会学或政治学。

如果宪法的确仅仅是对事实权力的表达，那么这种对宪法的否定以及对宪法学作为一门法律科学的价值的否定是合理的。如果宪法自身有约束国家生活的力量，即使是有限的，那么前面的否定就失去了基础，由此便引出了“宪法规范力”的问题。除了事实的情况与特定的政治和社会现实因素的决定性力量之外，是否也存在宪法的决定性力量呢？它的基础是什么？它的范围有多大？宪法学人基于职业本位认为决定国家生活进程的主要是法律，这难道不是一种虚构吗？毕竟国家生活进程实际上是由完全不同的力量决定的。这些问题在宪法领域尤为突出，因为与法律体系的其他领域相比，宪法缺乏执行其规范的外部保障。对于这一问题的解答，关乎法律上的宪法这一观念的存废，也关乎作为一门规范科学的宪法学的存废。

二、宪法规范力的可能与界限

寻找上述问题的答案的尝试，必须以法律上的宪法与政治和社会现实的相互限制为出发点；[5]还必须进一步考虑法律上的宪法在这一出发点下发挥作用的限度和可能性；最后，也必须追问发挥这种作用的先决条件。

1.只有把法律上的宪法与现实条件结合在一起，并且在两者不可分割的联系以及相互限制中，才能认识到法律规范在具体现实中的意义。任何孤立的观点，即只注意到一方的观点，并不能得出答案。对于只注意到法律规范的人来说，规范只能“适用”或“不适用”，没有其他可能；对于只注意到政治和社会现实的人来说，要么是忽略了这个问题，要么是倾向于忽视法律规范化的意义。

虽然法律上的宪法与政治和社会现实的相互限制这一出发点是不言自明的，但仍需要对其予以特别强调。因为保罗·拉班德（Paul Laband）和格奥尔格·耶利内克学派的“法律实证主义”，以及卡尔·施米特（Carl Schmitt）宪法理论的“社会学实证主义”，[6]在很大程度上都是以这种规范与现实的二分法为特征的，[7]而且这种思维的影响甚至到现在都没有被克服。在宪法中，现实与规范、实然与应然的分离被认为是不可逾越的。正如论者曾多次指出的，[8]此种分离或明或暗地证实了事实关系是唯一决定性力量的命题。[9]甚至每次向某一方向的重点转移，都几乎不可避免地走向榨干现实的规范或榨干规范的现实的两种极端。因此，有必要在放弃规范性与排挤事实性之间寻找一条道路。只有当我们避免在原则性的非此即彼的意义上回答所提出的问题时，才能找到这条道路。

宪法规范并非独立于现实而存在。宪法规范的本质寓于效力之中，也即宪法所规范的状态必然要实现于现实之中。这一有效性要求不能脱离实现它的历史条件，这些历史条件处于多种相互依存的关系中，创造了不能忽视的特殊规律。这里的历史条件包括自然的、技术的、经济的和社会的条件。只有考虑到这些条件，宪法规范力的要求才能得到实现。同样，这些条件也包括在一个民族中已经成为现实精神的内容，即具体的社会观念和价值观念，这两者对于法律命题的效力、理解和权威性具有决定性的影响。

但是，宪法规范的效力要求并不等同于其实现条件，相反，它作为一个单独的要素与这些条件同列。因此，宪法不仅是一种实然的表达，也是一种应然的表达。宪法不仅仅是对影响其效力的实际条件——政治和社会现实——的反映，而且凭借其效力要求，宪法规范尝试规范和塑造政治和社会现实。宪法受到这些现实条件的限制，同时又反过来限制着它们。因而，宪法规范不能被追溯到某一原则，既不能追溯到纯粹的规范性，也不能追溯到纯粹的政治的、社会的或经济的条件。现实的限定性和宪法的规范性只能加以区分，但不能相互分离，也不能相互等同。

2.因此，实际上的宪法和法律上的宪法处于交互关系之中。[10]二者相互联系，但并不相互依存；相反，法律上的宪法具有独立的意义，即使只是相对而言。在生成国家现实性的力量场中，宪法的效力要求也是一种必要要素。宪法的效力要求在多大程度上得以实现，宪法就具备多大程度的规范力。这就会导致进一步的问题，即在此背景下，宪法规范力实现的可能性和局限性的问题。

如前所述，对宪法规范力实现的可能性和局限性的分析，仅能源于对法律上的宪法的实效性的深刻把握。这绝非前人未见之论。这一点对于立宪主义国家理论而言是不言而喻的，将宪法从国家现实的整体中分离出来的观念对其而言也是匪夷所思的。这一点在威廉·洪堡（Wilhelm von Humboldt）的政治著作中有着最明确的表述。

洪堡在他早期的一篇著作中曾写道：“任何国家的宪法，如果仅由理性按照既定的计划先行制定，都不会取得成功；只有在更强大的偶然性与对立的理性的斗争中产生的宪法，才会生机勃勃。”换言之，这样的宪法能够与具体的历史条件相联系，并将其实现条件与以理性为标准的法律规范相结合。洪堡继续写道，“……只有立足于当下所有的独特条件，才能有所建树。理性所欲贯彻的蓝图，尚需从理性所欲加工的对象方面得到规范与修正。如此，宪法方得历久弥新、利国利民。反之，宪法即便得到了施行，其也必将永远徒劳无功。理性或许有塑造现有物质的能力，但绝无产生新物质的力量。这种力量只存在于事物的本质之中，真正起作用的正是这种力量，真正明智的理性只是激发这种力量的作用，并设法引导它们。在这一点上，理性是谦虚的。宪法不能像树苗嫁接到树上那样嫁接到人的身上。在时间和自然没有预先发挥作用的地方就有类似的行为，就好像用线把花绑在树上一样，正午的第一缕阳光就会灼伤它们。”[11]

1813年12月，洪堡在关于德国宪法的备忘录中进一步阐发了相关观点：“宪法是这样一种东西：其存在于生活之中，人们可以感受到它的存在，但却无法完全理解其起源，因而很难对其加以效仿。每一部宪法，即使仅仅被看作是一种理论结构，也必须在时间、环境、民族性格中找到其生命力的物质萌芽，而这种萌芽只需要据此继续发展；想要纯粹根据理性和经验的原则来建立宪法的想法是非常荒谬的。”[12]

洪堡首先明确了宪法规范力的局限性：宪法——这里指“法律上的宪法”——如果不想“永远贫瘠”，就不能不顾历史环境及其力量，只是抽象地、理论地构建国家，宪法无法生成那些不是当下独特条件所固有的东西。在缺乏上述先决条件的地方，宪法就不能赋予“形式和变化”得以继续发展；在不能发挥事物本质力量的地方，宪法也不能引导这种力量；在宪法无视其时代的精神、社会、政治或经济规律的地方，宪法就缺乏其生命力的不可或缺的萌芽，并且无法确保其逆于这些规律而设置的状态能够出现。

但与此同时，这也决定了宪法的生命力和作用力的性质和界限。当宪法有能力面向未来塑造当前独特的条件下所固有的情况时，宪法规范就能产生效力。正如洪堡所表述的那样，当宪法由必要性原则决定时，宪法就能获得力量和威望。[13]换言之，宪法的生命力和作用力的基础寓于如下方面：宪法必须与时代的自发力量和活力倾向相结合；宪法必须确保这些力量得以施展并合理地安排其相互间的关系秩序；宪法必须是由对象（即事实）决定的具体生活条件的总体秩序。然而，宪法规范力并不仅仅基于对既定事物的灵活适应。[14]法律上的宪法可能会成为独自发挥作用的力量，但其前提是这种力量是当下独特条件所固有的东西。法律上的宪法本身不能完成任何事情，永远只能设置一个任务。然而，当人们承担起这项任务时，当人们愿意让宪法所规范的秩序来决定自己的行为时，当人们面对一时功利考量心生疑窦和抗拒并坚决贯彻宪法秩序时，也就是说，当在人们的普遍意识中，特别是在那些对于宪法运行身负重任者的意识中，活跃着的不仅仅是权力意志，而且还有尊崇宪法的意志时，宪法就会成为一种有效的力量。

尊崇宪法的意志源于三个方面：基于对不可动摇的、客观的和规范的秩序的必要性和内在价值的洞察，确信是这种秩序使国家生活摆脱无节制和无形式的恣意状态；基于这样一种信念，即宪法所构成的秩序不仅是一种事实秩序，而且是已被合法化，并将永远被重新合法化的秩序；还基于这样一种认识，即这种秩序不像逻辑规律那样独立于人的意志而存在，而是只能通过人的意志行为加以实施和维持。[15]这种意志之所以能够起作用，是因为国家生活，就像人的全部生活一样，不仅受制于看似不可避免的力量，人们也总被要求对其加以积极地塑造，设定其应担负的任务并完成这些任务。如果我们对国家生活中始终存在的使命性的这一面视而不见，那么我们的思想就会变得贫乏且危险。我们将不可避免地忽视现实的整体性和特殊性，这不仅是一个不可避免的现实问题，而且是一个使命性的秩序问题，即规范问题。

3.宪法规范力的本质和作用在于激发和引导事物本质中的力量，而且宪法本身也是一种自身有效的力量。如前所述，这是其局限性的根源，然而，这也是使宪法展现出最佳规范力的先决条件，这就涉及宪法的内容和宪法的实践。笔者将极尽简要地概述其中一些最重要的条件：

（1）宪法的内容越是植根于其所处的环境，其规范力的实现就越有保障。

因此，宪法规范力最基本的前提——这一点从上述内容中可以明显看出——是其不仅考虑社会、政治或经济方面的规律性，而且最重要的是反映其所处时代的精神情况，这能使其作为一种适当且公正的秩序得到普遍意识的肯定和支持。

然而，同样地，宪法必须也能够适应这些条件的变化。除了纯粹的组织技术条款之外，宪法必须尽可能地局限于一些基本原则。鉴于如今社会和政治现实的变化越来越快，这些原则的具体实现形态也应该不断地得到新的、但同时也是根据这些基本原则的发展。[16]另一方面，任何一时的或特殊的利益在宪法中的——借用一种人们喜闻乐见的表述——“有宪法效力的固定化”，都不可避免地使宪法必须经常进行修改，进而贬损其规范力。

最后，在不断变化的政治和社会现实中，宪法绝不能仅仅凭借着某一片面的模式保持其生命力和规范力。宪法如欲保持其基本原则的规范力，就必须在慎重考虑的基础上吸收对立模式的要素。没有不具有约束性的基本权利，没有不具有权力集中可能性的三权分立，没有不具有一定程度的单一性要素的联邦制。如果宪法试图完全纯粹地实现这些原则，那么，至迟到下文出现的国家紧急状态将会表明，宪法规范力的边界已被逾越，并将被现实所取代，宪法试图实现的原则将被彻底废弃。

（2）宪法规范力的最佳发展不仅仅是宪法内容的问题，也是宪法实践的问题。这里的决定性因素是所有参与宪法生活的人的态度，也就是前文所述的“尊崇宪法的意志”，这一点的重要性体现在各个方面。

所有一时的权宜之计，即使能够实现，也无法与虽然可能有所不利，但仍然坚持尊崇宪法所带来的不可估量的收益相提并论。正如瓦尔特·布克哈特（Walter Burckhardt）所说：“一个人所承认的宪法意志，必须诚实地得到坚持，即使必须为此放弃某些利益，甚至是某些正当的利益。倘若有人为了遵循宪法的要求而主动牺牲自身利益，那么他就强化了对于宪法的尊崇，同时也捍卫了一种对于国家而言——尤其是对于民主国家而言——不可或缺的善”，而凡是回避这种牺牲的人，“都是在用一种远远超过所有好处且一旦耗费就再也无法挽回的资本，换取些微蝇头小利。”[17]

最后，宪法解释对于维护和巩固宪法规范力具有决定性意义。宪法解释必须有助于宪法规范力的最佳化实现。显然，逻辑归纳或概念建构的方式无法满足这一要求。如果法律，尤其是宪法的规范力受制于具体的社会环境，那么解释工作就不能忽视这些现实条件。宪法解释必须充分考虑这些条件，并将其与宪法原则的规范性内容联系起来。最理想的解释，是那种立足于实际情况的具体条件且能够将规范性安排的意义发挥至最优的解释。

这意味着对宪法的解释可以甚至必须随着实际情况的变化而变化。但与此同时，通过解释促成宪法变迁，其界限在于宪法解释所受的规范性安排的意义约束。宪法命题的目的及其明确的规范意志决不能因事实的变迁而被牺牲。当规范性安排的意义在变化了的现实之中不再能够实现时，那就只剩修改宪法这一唯一的可能性。否则，就意味着取消规范与现实之间不可避免的紧张关系，从而取消法律本身。然而，在限制范围之内，续造性解释始终是可能并且必要的。这种灵活性正是宪法规范力的基本条件，因而也是宪法稳定性的基本条件。如果宪法缺乏这种灵活性，其与现有法律状态迟早会不可避免地彻底决裂。

三、宪法规范力的影响因素

1.总结如下：宪法受到历史与现实的制约。宪法不能脱离所处时代的具体社会环境，只有考虑到这些环境因素，宪法规范力的要求才能实现。然而，宪法不仅仅是对现实的反映，凭借其规范性因素，宪法也规范和塑造政治和社会现实。宪法规范力的可能性和局限性都源于“实然”与“应然”的交互性关系。

宪法能够赋予与之相关的现实以“形式和变化”。宪法能够激发“存在于事物本质中的力量”并发挥其作用。此外，宪法本身也能够成为一种积极的力量，在政治和社会现实中发挥作用并在一定程度上决定现实。宪法不受侵犯的观念越是深入人心，尤其是深入那些对于宪法运行身负重任者的意识中，这种力量就越能够在面对反抗时得到坚决贯彻。因此，宪法规范力的强弱首先受到尊崇规范的意志和尊崇宪法的意志的影响。

然而，在那些为宪法所规范者尚未内蕴于当下的独特条件之处，宪法规范力就会触及自身的界限。这些界限并不是固定不变的。因为，与自然的、社会的、经济的和其他的规律一样，尊崇宪法的意志也同样属于这一独特条件的要素。如果宪法基于特殊的力量而极富活力，宪法规范力或许能将其界限推至极远，但绝对无法完全消除这些界限。世界上没有任何力量，甚至宪法也不能改变特定的自然条件。但这仅仅意味着要紧的地方在于宪法的塑造性任务务必保持在这些界限之内。如果宪法符合这些有效性条件，即使是有能力突破或改变宪法规范的有权势者也必须遵守宪法，即使在困难时期宪法也不会失去其规范力，那么宪法就是一种活跃的力量，进而有能力保护国家生活不受无节制、不定型的恣意的暴虐。因此，检验宪法规范力是否得到维护并不是在安定祥和时，而是在紧急状态下。就此而言，这也正是卡尔·施米特著名论点的相对真理所在：紧急状态是决定宪法规范力的关键。但决定性的问题不在于紧急状态是否证明了事实性优于规范性这个次要意义，而在于规范性优于事实性的地位是否得到了维护。

2.由于迄今为止几乎没有人以这种形式讨论过这个问题，所有这一切只能意味着一个初步的、也必然是粗略的定位。然而，这种定位已经能够回答开头提出的问题。宪法绝不是拉萨尔所说的那张废纸；宪法也不像耶利内克教导我们的那样“无法实际控制国家权力的分配”，也不像自命不凡的自然主义者和社会学主义者仍想让我们相信的那样。宪法并非独立于其所处时代的具体历史环境，但其本身也不附属于历史环境。在现实条件与其规范内容发生冲突的情况下，宪法的规范内容不一定是较弱的一方。相反，在一些可实现的条件下，即使是在冲突的情况下，宪法也能保持其规范力。只有在这些条件无法满足的情况下，宪法问题才会变成权力问题，法律上的宪法才会屈服于实际上的宪法。但这一事实并不能成为彻底否定宪法的理由：宪法法与宪法本质并不矛盾。

与此同时，即使是如今，宪法学也无须退位。如果说法律上的宪法相对于实际上的宪法有其自身的意义，那么宪法学并没有丧失其作为一门法律科学的合法性，其并非狭义的社会学或政治学意义上的现实科学。当然，也不像实证主义所认为的那样，其仅仅是一门关于规范的科学。相反，由于其研究对象更加依赖于政治和社会现实，而且宪法规范的实施缺乏外部保障，因此其应兼容并包这两个方面，而且应当比其他法学学科更为如此。宪法的法律规范性与现实相关性的紧密联系迫使宪法学绝不能忽视规范性的条件，假如其不愿与研究对象失之交臂的话。如果要让宪法学的论述在现实面前站得住脚，当然就不能仅局限于用历史的、社会的、经济的或其他的方法对“严格法律”的思维进行外部补充。[18]相反，宪法学必须从根本上洞察所有决定国家生活进程的原则和力量之间的必然关系。因此，宪法学特别依赖于与之相邻的现实学科，如历史学、社会学和经济学。

然而，从以上论述中也可以看出，宪法学必须保持对其局限性的谦逊认知。因为宪法规范力只是生成国家现实的力量之一。而且，这是一种有限的力量，其效果取决于上述先决条件。此间的任务极为艰巨，因为宪法规范力的保障不是一劳永逸的，而是一种使命性的东西。只有在特定的条件之下，宪法规范力才能以最佳的方式得以实现。这种最佳的实现是宪法学研究活动方向的根本指针。较之不遗余力地找寻宪法问题本质上是权力问题的论据，力保宪法问题不成为权力问题才是宪法学真正有益的建树。

这意味着宪法学必须洞悉宪法规范力能够获得最佳效力的条件，必须发展宪法教义学，并从这一角度解释宪法条文。这意味着，宪法学的主要任务是强调、唤醒和维护尊崇宪法的意志，而尊崇宪法的意志是宪法规范力最可靠的保障；[19]这意味着，在必要时宪法学有义务挺身诤言——在国家生死攸关的问题上寄托于幻想，危莫大焉。

四、《德国基本法》的规范力

最后，我将通过审视（德国）当前的宪制状况来证明，我们应当意识到本文提出的问题。

有人可能认为，当今时代显然已经用清晰可见的方式驳斥了对法律上的宪法的质疑。事实上，似乎有许多迹象表明，与过去相比，如今法律上的宪法对于国家生活有着更为重大的意义。国内政治似乎在很大程度上被“法律化”了。在联邦与州的关系中，在国家机关的关系及其职能中，宪法论证和辩论发挥着主导作用。即使是为政治生活提供动力和方向的政党，也会受到法律秩序的约束，尽管其在本质上显然不易被法律规范。政治权力无权修改《德国基本法》中的基本原则，这意味着宪法原则高于人民主权原则。法律上的宪法压倒一切的重要性体现在宪法法院目前仍然未知的、几乎无限的管辖权上，宪法法院有权依法对有争议的案件乃至国家生活的基本问题，作出最后的裁决。此外，法律上的宪法渗透到了法律生活的各个领域，甚至渗透到了原本与宪法严格分离的民法领域，并且，在联邦最高法院的作用下，宪法被赋予了主导地位。

以上事实应予重视。但这也不能掩盖：我们仍旧面临，或许在很大程度上面临，宪法规范力的问题。如前所述，宪法规范力取决于宪法的实践和宪法的内容是否满足某些先决条件，而如今，这些先决条件只在有限的范围内存在。

前文所提及的“尊崇宪法的意志”对于宪法实践具有决定性作用。这一决定性作用不仅要在皇皇大处得到体现，更要在细微之处得到贯彻。批判性的观察者不难看出，当今时代，人们往往并不情愿为了宪法的规定而牺牲自身的利益。相反，人们乐于为了一些蝇头小利而出卖强化宪法之尊崇可以获得的收益。目前，《德国基本法》显然只是在有限的程度上扎根于（尊崇宪法的）普遍意识并得到其支持，这会使上述倾向变得更加危险。[20]

《德国基本法》的一系列规定，因其内容的缘故，使得宪法规范力同样面临着深刻的质疑。德意志联邦共和国宪制制度中存在的宪法与现实之间的紧张关系，经常会引起人们的关注。[21]最著名，尽管可能不是最重要的例子，是《德国基本法》第38条第1款。该款规定：“当选的联邦议员是全体人民的代表，不受任何指令的约束，只遵从自己的良知。”[22]在现代工业社会中，尤其是在现代人的生活态度发生深刻变化的情况下，自由原则逐渐成为一个严重的问题。[23]

在这种情况下，宪法原则的可能性和有效性在完全相反的潮流和趋势的现实面前是否仍然存在的问题，就摆到了我们面前。这些问题尚未涉及非常情况。与《魏玛宪法》不同的是，《德国基本法》迄今还没有在蓬勃的经济增长和相对稳定的政治条件下经受过严峻的考验。如上所述，对宪法规范力最大的考验是政治、经济或社会生活中出现的紧急情况，这些紧急情况无法通过正常的宪法责任和权力来补救。《德国基本法》没有准备好接受这种对其规范力的检验。[24]

众所周知，《德国基本法》根据《魏玛宪法》第48条的经验，取消了紧急状态的规定。在紧急状态下，《德国基本法》只包含了一些孤立的、有限的职责，其甚至不足以应对稍微严重的紧急情况。[25]紧急状态权的问题没有在1949年作出最终的决定，因为根据《占领法规》，[26]该问题属于占领国的保留事项之一。根据《波恩条约》[27]第5条第2款，只有当德国当局获得适当的法律授权，从而有能力应对严重扰乱公共安全和秩序的行为时，这项保留才会失效。

德国当局尚未获得授权，占领国的干预保留仍然存在。然而，只有在联邦共和国受到外部威胁或攻击时，这种干预保留才有意义。《波恩条约》第5条未提及对公共安全秩序或宪制生活造成严重威胁的其他情况，例如经济紧急状态。此外，占领国是否会在必要时行使干预保留也是个问题。因此，有一个事实是不可回避的，即除了上述例外情况，联邦德国没有关于紧急状态权的宪法规定。

紧急状态权的存在是使用这一权力的动力，当然也存在危险。但危险并不能证明我们愿意冒更大的风险来承担没有紧急状态权可能带来的问题。倘若认为没有考虑到的危险便不会发生，这将是一种危险的错觉。如果这种危险确实发生了，那么就不存在规范性的规定，消除危险只能靠事实的力量。人们可能会试图通过一项过于积极的紧急立法来证明所采取的措施是合理的。但是，这种过于积极的紧急立法的内容“必然无戒律”。因此，其不包含任何规范性的规定，也就不可能产生任何规范力。因此，在《德国基本法》中放弃对紧急状态权的规定，是宪法对事实力量的屈服。没有对紧急状态权的规定，便无法检验宪法规范力是否得到维护。唯一悬而未决的是，国家宪制生活是否以及如何重新回到规范的状态。

没有人会希望本文所提及的宪法规范力与政治和社会现实之间的紧张关系演变成两者之间的严重冲突。这种冲突的结果无法预测，因为即使是在冲突的情况下，宪法保持其规范力的条件在当下也仅能部分实现。我们国家未来的问题究竟是权力问题还是法律问题，这将取决于宪法规范力及其基本前提——尊崇宪法的意志，能否得到恪守和强化。

康拉德·黑塞（Konrad Hesse, 1919—2005）。刘亚巍、曾韬译，本文德语版原载K. Hesse, Die Normative Kraft der Verfassung（1959），J. C. B.Mohr（Paul Siebeck），Tübingen，为黑塞1958年在弗莱堡大学法律系的就职演讲稿。

2024-12-05
我们童年的游戏是从哪里来的

丢手绢、打沙包、跳房子、翻花绳、跳皮筋……

一、我们小时候的游戏，都是哪来的？

老鹰捉小鸡/丢手绢/“东西南北”是哪国发明的？

丢手绢加拿大小孩也在玩，只不过手里不一定有手绢，名字叫“Duck, Duck, Goose”。

老鹰捉小鸡，英欧洲小孩叫“狐狸与鹅”。

“东西南北”折纸游戏竟然变成大马特色了。

尼德兰画家彼得·勃鲁盖尔在1560年所作的这幅画里，画了80多种儿童游戏，其中不乏我们非常熟悉的滚铁圈，骑“马”、跳山羊、捉迷藏、抽陀螺、老鹰捉小鸡，甚至抛羊拐。

此时的中国是明嘉靖三十九年，日本是永禄三年。

在智利，丢手绢叫“Corre, Corre la Guaraca（快跑快跑小傻瓜）”，玩法跟我们大同小异；“123不许动”在希腊叫“我是一座雕像”，区别在于他们可能带了点cosplay的成分；“石头剪子布”在苏门答腊群岛叫“蚂蚁大象人”。抛羊拐在韩国抛的是石头，在东南亚抛的是小沙包，但游戏规则近乎一致。

不但如此，有文物反映，在公元前三百多年的古希腊和罗马，羊拐游戏就已经十分普遍，出土于庞贝古城的画作上甚至有两个女神玩羊拐的场景。

古希腊雕塑中，少女在玩羊拐（约公元前330—300年）

出土于庞贝古城的画作

类似古老的游戏还有翻花绳。

关于它最早的记录是在1768年，不同版本遍布世界各地，除了欧美之外，还包括非洲、澳大利亚、太平洋岛屿甚至北极。在英语俚语中，人们用“Cat’s cradle”来特指这个游戏；在俄罗斯，它被称为“弦游戏”；在以色列，这款游戏被叫作“编织”。

1765年的日本浮世绘，两名女士在玩翻绳游戏

“东南西北”很可能也是舶来品本土化的产物。

虽然主流观点认为折纸游戏的起源是中国，随后传播至日本，但“东南西北”这种形式的折纸布局现存最早记载是12世纪西班牙的占星文献，有人认为其起源大概率是西方宗教。

我们玩“东南西北”大多数时候是捉弄人，但这种被叫作“Paper fortune teller（算命纸先生）”的折纸玩具被英国儿童用来占卜。其玩法同我们类似，在内部各个面上写上各种事件和指令，由玩家报出方位及开合次数，最后对应的句子即为其未来之“遭遇”。

西方儿童用来占卜的“东南西北”

16世纪约翰·汉密尔顿大主教的占卜星盘（折痕同“东南西北”的折法一致）

关于跳房子最早的记载是17世纪，在1677年出版的一本名叫《Poor Robin’s Almanack》的书中，这个游戏被称为“苏格兰跳蛙”，其中有苏格兰人在找平的砖地或木板上划分扁或圆形的区域用来跳跃。此外，也有人认为跳房子的历史可以追溯到公元前1200年的印度或古罗马时代。

印度人管它叫“Stapu”，拉美地区叫它“rayuela”，在土耳其语里是“Seksek”，保加利亚称其为“asдама”……总之就是全世界都在玩。

英格兰莫克姆的一种传统跳房子游戏

二、“民间传统游戏”全世界都在玩

有研究表明，类似“丢手绢”的游戏广泛流行于世界各地，如英国、德国、瑞典、美国、印度甚至加纳和智利等国，游戏形式几乎一致——在游戏过程中，大家通常会唱某一首特定儿歌，像我们的《丢手绢》，法国的《邮差没有来》。

在美国，“丢手绢”叫“鸭子，鸭子，鹅”，玩法也是一群人围成一个圈，而一个人喊着“duck”转圈，直到在某人身后喊出“goose”，追逐者换人。这种游戏在美国的不同地区有变体，比如“Drip, drip, drop（滴滴滴）”“Mush pot”。

在印象中，小时候玩的游戏里，玩法同“丢手绢”很像的，还有“白毛女”——小孩们拉着手围成一圈唱歌，圈中蹲一个人蒙着眼，在歌谣停止时指出一个人代替他。

日本也有类似的游戏“笼中鸟”，但与我们玩时大声喊出“白毛女就是你”并随机指一个人不同，日本的玩法多了一些神秘学意味——歌声停止时，站在当“鬼”的人的正背面的的，就要代替“笼中鸟”当“替死鬼”。比起这个游戏本身，我们更为熟悉的是游戏时唱这首童谣《かごめかごめ（笼中鸟）》，它的变奏曾以各种形式出现在《犬夜叉》等动画里。

正如《丢手绢》这首歌是作曲家关鹤岩在1948年为了延安保育员的孩子们游戏所作，游戏的出现早于儿歌。“白毛女”游戏时唱的歌谣，明显出现于1951年《白毛女》电影上映之后。从文献上看，《かごめかごめ（笼中鸟）》这首童谣是江户中期以后出现的，且也有极大可能是为了配合游戏形式所作。

虽然歌曲的创作背景各有不同，但纵观游戏形式本身与它所流行的地区，很容易看出其中贸易往来与殖民主义的影子。

此外，关于儿童游戏的发源和传播虽然少有学者考证，但也不是完全没有。

有学者专门研究过“老鹰捉小鸡”。这个小游戏的足迹遍及除南极洲以外的六大洲，大多数国家都将其作为本国的传统民间游戏来待。而在中国，不同的民族也都认为其是自己民族的传统游戏，同本民族的文化有着千丝万缕的联系。

在日本，这个游戏被叫作“比比丘女”，源于1300年前的平安时代中期，后来演化为“捉鬼子”。在韩国济州岛，它被叫作“大雁”，被认为是韩国传统文化的一部分。在越南，它被认为是起源于童谣舞曲的“龙蛇”游戏；在俄罗斯，它被叫作“鸢”，在本土传播了几个世纪；在土耳其，它被叫作“狼爸爸”，同土耳其人半狼半人阿塞纳的传说有关；在英国，它叫“狐狸与鹅”，与游戏相关的歌谣有三百多年的历史；在马达加斯加，它叫“拉萨林德拉”，早在法国人入侵之前就存在……

1818至1830年间，歌川国芳绘制的《新板儿童游戏浮世绘》。比较有意思的是，从世界范围来看，疑似只有中国和日本的玩法包含了“被抓住就要改换阵营”这项规则。

据学者考证，如果一定要给“老鹰捉小鸡”的传播路径找一个历史脉络的话，它的来源很有可能是古印度“尸毗王割肉养鹰救鸽”的传说。但无论真相是否如此，“老鹰捉小鸡”在世界范畴内的广泛存在是因传播居多还是独立演化占主流，其成形的核心一定是一些我们所熟知的底层逻辑——勇敢、善良、守护，为了他人挺身而出的信念，自我牺牲的觉悟。从这个角度来讲，无论哪种说法都说得通。

学者彼得·弗兰说：“大陆与大陆之间在相互影响，中亚大草原上发生的事情可以在北非感同身受，巴格达发生的事件可以在斯堪的纳维亚找到回响，美洲的新发现会影响中国产品的价格，进而使印度北部的马匹市场需求剧增。”

“儿童游戏的变迁与传播历程印证了古今文化的共通性”，像一根来路不明的引线，串联起整个的人类文明。

从这个角度来说，可能真的是“你的童年我的童年大家都一样”，这个是世界是一个巨大的闭环。

三、儿童游戏，文明史中的善意角落

如此之多的儿童游戏近乎全球统一，是巧合吗？

不排除有巧合的成分，确实有些相似文化产物可以在完全不同的社会条件下被独立孕育。

但对于“游戏”这一贯穿智人进化全过程的行为，更大的可能性，依然是“传播”。

数字时代之前，我们小时候习以为常的东西，经常有匪夷所思的历史源流。其中最知名的案例，应该是“七颗星星”的故事。

关于这个故事，有一种说法是——像每个中国人小时候都听过七颗星星变成七仙女的传说，希腊神话中有七姐妹星的故事，澳大利亚原住民也有类似的故事。至此社会学家发现，几乎包括少数原始族裔在内的全世界大多地方都有类似的传说，然而我们如今只能观测到模糊星云中的六颗星。

至此，“比较神话学”发现，这也许是人类第一个故事，它成型于人类走出非洲之前，是所有人类曾为一母同胞的证据。

至于这些我们小时候习以为常的游戏，早在我们“文化传播”这个概念形成之前，就已经在世界范围内传播了。

儿童游戏简单的框架和逻辑中，所蕴含的是全人类共通的朴素哲学和文化基因。

从“体育”这一伴随人类发展的早期教育概念展开，“老鹰捉小鸡”的本质是家庭与责任感，守护与抵抗；“白毛女”或“笼中鸟”中隐含着对社会性压迫和囚禁的反抗；羊拐和翻绳体现着对简单物质的最极致利用……

而无论是哪一种儿童游戏，其最本质功能之一，是对人与人关系的维系。

游戏，是人在成长过程中最早的社会化行为。遵从游戏规则，便是一种社会实践。

儿童游戏的附属品，是伙伴，是团队，是从周边衍生而来的关系。所谓“有人跟我玩”，是一个人从童年时期开始建立的，最初的社会支持与安全感。

而既然儿童游戏建立在“人与人关系”的基础上，那它们无论传播多远，跨越多漫长的时间，遍布多少形形色色的人种、国家和民族，好像都是一件很正常的事情。

毕竟，这个世界由人构成，是所有人与人关系的集合。

而这些童年游戏的存在，更是人类曾在大众所忽视的地方彼此友善过的证据。

如果不是有人在孩童阶段牵起那双同自己不一样的手，这些游戏又是如何在充斥着战争、侵略、迫害，贸易与文化倾轧的人类文明史中悄然传播的呢？

2024-12-04
冯裕强：集体化时期工分稀释化视域下乡村公共产品供给研究——以广西容县华六大队为例

改革开放以来，不少学者对人民公社制度进行了批判，认为它是低效率的、平均主义的制度。例如，有学者认为由于集体经济产权不完整，影响社员的生产积极性，最终导致劳动质量降低。①但是，也有不少学者对此进行反思，认为导致平均主义的原因主要是国家进行工业建设，加上当时的国际环境等因素，不得不从农村抽取剩余产品，而且人民公社在20世纪60年代初“去工业化”后，大量劳力只能进行单一的农业生产，产出极为有限。即便如此，农村还进行了大量的农田水利建设，这对后来的发展起到重要的铺垫作用，也应算作当时的劳动效率。②争议难分高下。笔者不揣浅陋，试图从工分稀释化视角对乡村公共产品供给进行考究，以期对人民公社有更全面的了解，同时也为当下乡村振兴提供经验启示。

“工分稀释化”，虽有学者提到相近的概念或现象③，但至今尚未有学者对其进行明确定义。笔者以为，工分被稀释主要包括两方面：一是工分的直接稀释，即把非农业生产的工分拿回农业之内进行分配，从而导致工分被稀释，分值下降；其次是物资的间接稀释，即国家、集体从生产队抽走大量物资，从而使队内可分配给社员部分总额减少，最终造成工分贬值。

乡村公共产品是巩固农业基础地位、保障农村社会稳定、促进农村经济可持续发展的重要基础。国内学术界对农村公共产品的界定④大体一致，主要是指乡村中由集体或政府提供，为广大村民的生产、生活服务，具有一定非竞争性和非排他性的社会产品，具体包括农村基础设施、农田水利主干网络、基础教育、公共卫生、社会保障等。

一、农田水利基础设施

华六生产大队（以下简称“华六大队”）位于广西壮族自治区容县南部，隶属于石寨公社，距离县城20多公里，是汉族聚居地，面积约为19.33平方公里，共有十个生产队。⑤容县面积为2257平方公里，其中陆地占97.51％，水域占2.49％，⑥境内岭谷相间，丘陵广布，俗称“八山一水一田”。由于地处山区，为了更好发展农业，华六大队在集体化时期修建了大量农田水利设施。据统计，1963年—1966年间，华六大队修建了大陂、三蛤、枪刀山和长冲等水库，⑦大部分生产队都有受益的山塘或水库。

为了修建这些水库，必然要耗费大量劳动力。华六大队除了平时抽调社员进行水利建设外，还组建了20人—30人的专业队从事农田水利基本建设。曾任记分员的陀某说：“专业队就是专门开田、开荒、种山，每个生产队抽出几个人。比如我们大队有几十个人，天天都在专业队干活，生产队一样要（给他们）记工分。”（TXL，四队记分员，2017年3月16日）⑧专业队的职责很多，包括水利建设、开荒、大队企业、护理林场等，劳动收入归大队所有。曾任林业员的庞某回忆说：“山上的林木就由专业队队员种植，以前（1958年“大炼钢铁”）烧得太光了，没有林木了。每个队要2—3个，都是年轻的男女民兵。”（PWQ，华六大队林业员、党支部书记，2017年3月20日）曾参加过专业队的肖某也说：“县有县的专业（队），乡有乡的专业（队）。像最大的石剑水库、小垌水库，还有乡的红田水库，每个队抽几个人去。那些水库都是那些人去做的，统一调动。”（XYH，六队专业队队员，2017年4月15日）

生产队一年中要进行大量的农田水利基本建设，那么这些非农业生产用工究竟在总用工中占多大比例？以1975年为例，根据各级单位的统计数据，八队、华六大队、石寨公社（统计7个生产队）和容县（统计233个生产队）的农业生产用工占总劳动日的比重分别是82.63%、83.70%、85.95%和82.70%。⑨一般而言，统计的生产队越多，就越接近整个县的平均水平。总体上看，公社以下各级单位的生产用工占总劳动日的比值都在容县的生产用工占总劳动日的比值——82.7%上下浮动，也就是说整个县大约要用17%的劳动日去从事非农业生产。这并非特例，在山西省东北里生产队，1977年的非农业生产用工占比达7.7%，这还不包括高达18.98%的农田基建工。⑩可见，在集体化时期，大量非农业生产用工存在于全国各地。除了基建用工，还有各级专业队队员、生产队干部、集体抽调的社员都要回生产队记工分，这些人员的劳动对当年生产队收入的增加并未起直接作用，因此，在外面挣的大量工分拿回生产队进行分配，必然会稀释生产队的工分值。

那么生产队的实际工分值在集体化时期有什么变化呢？本文从容县档案馆保存的历年分配统计表中整理出表111。

通过表1可以看到各级单位从1963年到1981年社员分配收入和工分值的变化情况。在生产队一级，由于资料的缺失，我们只能比较完整地看到20世纪70年代的数据变化。总体上，八队从1971年至1981年分配给社员的金额、工分值和人均分配收入都呈波浪式上升，在1979年达到最高值；华六大队与石寨公社在相同的项目上虽然也呈波浪式上升，但是振幅相对小得多，除了个别年份回落，大部分年份是逐年增长的。分配给社员的金额与工分值、人均分配收入总体上呈正相关。分配给社员的金额越大，工分值和人均分配收入越高，就意味着人民公社在增产的同时社员也实现了增收，集体经济运行良好。三个不同区域都在1979年达到最高值，人均分配收入分别达到98.9元、77.95元和84.44元。需要注意的是，这里的分配金额并不是真金白银，而是生产队把一年所有的劳动产品和收入都折算成货币，扣除所有费用和税金之后的纯收入，生产队实际拥有的现金并没有这么多。

工分值的高低取决于两方面，一是生产队分配给社员的金额，二是工分的总量。分配给社员的金额是用总收入减去各项支出后得到的数据。而生产队的总收入是农业、林业、畜牧业、副业、渔业和其他收入相加的总和。虽然国家规定生产队应该以发展农业生产为主，12但是非农业生产对生产队的收入也有重要影响。那么生产队的农业生产和非农业生产收入占比各有多少呢？我们以1974年为例。

本文发现，1974年，农业生产收入占总收入的比重，从生产队到公社再到容县是逐渐降低的，但容县比玉林地区的平均值低了近13个百分点，也就是说容县的非农业生产收入占总收入的比重比玉林地区高出约13个百分点（见表2）。通过对各年份数据进行比对，13%是容县非农业生产收入占比超过玉林地区的正常比值。那么是什么原因导致容县的非农业生产收入占比如此之高呢？要解决这一问题，必须了解林业、畜牧业、副业、渔业和其他收入的占比具体是多少，进而明了容县与玉林地区拉开差距的原因。

经对比，在畜牧业、渔业和其他收入占总收入的比重方面，容县和玉林地区相差不大，差异产生的主要原因在于林业和副业的收入占总收入的比重，容县比玉林地区分别高出6.33个和4.82个百分点（见表3）。林业收入主要来源于山林，容县地处丘陵，全县有480438人，水田面积为29万亩，人均水田面积仅为0.6亩，山地总面积为225万亩，人均山地面积为4.67亩。13“全县179个大队，山区大队98个，占全县大队55%……一九七一年生产木材28234立方米，占全县木材生产31874立方米的90%。”14华六大队就是这98个山区大队其中之一。据1960年普查，华六大队总面积为24038亩（约16平方公里），其中林地面积为17189亩，151974年，华六大队有1778人和1813亩耕地（1683亩水田），16人均有9.67亩山林、0.95亩水田。如此丰富的森林资源，林业产品具体又有什么呢？1974年的统计年报表显示，石寨公社造林719.3亩，其中用材林（松木和杉木）483.4亩，油茶196亩，玉桂14亩；收获的林副产品有：油茶籽515.9担、油桐籽73.45担、松脂9215.95担；收获的水果为：沙田柚41.5担、龙眼65担和荔枝88.6担；另外还有茶叶96.87担、桑蚕茧129.16担等。这些产品收入是属于林业收入还是副业收入？此问题涉及容县林业与副业的收入来源问题。在八队分类账本中，林业收入主要来源于售卖原木，副业收入内容则更多，包括松脂、纸浆、茶叶、砖瓦、石灰等。这与1975年容县林业局统计分类相符。1975年容县产量较大的林副产品有：油桐籽812担、松脂238131担、木柴183541担、木炭2199担、土纸（纸浆）4220担、沙田柚142830担；木材产品35489立方米（原木30911立方米）17。因此，容县的林业收入主要来源于各类木材，副业收入则主要来源于松脂、木柴和沙田柚等，而就收入占比来说，松脂的收入无疑是最大的。早在1963年，容县就申请建立容县松脂基地，通过调配物资和劳动力有计划地造林和割松脂。181972年，十队割松脂收入达4720.05元，除去人工和材料，净收入3794.2元。19正是有了松脂和其他各类林木和林副产品，才使得容县的非农业生产收入占总收入的比例远高于其他县。

明晰公社的各项收入后，可以发现，表2中分配给社员的部分占总收入的比重，八队与其他各级单位之间差别较大，除了八队的超过60%，其他各级单位都在55%以下。这意味着整个地区人民公社平均分配到社员的部分占比并不高。导致这样的原因与生产队的管理水平有着密切关系。八队与其他单位相比，税率（主要是农业税——公粮）和集体提留基本保持一致，相差不大；其缴纳的公粮基本保持不变，高产年份会稍提高，减产年份会稍减；集体提留主要包括公积金、公益金、储备粮基金、生产费基金和统筹金，这些不管如何都是要拿出来发展生产和上交集体的。关键是在费用支出占总收入比重方面，八队比玉林地区全部生产队的平均水平低5.66个百分点。根据八队的账本和收益分配统计表的金额，本文计算出八队在1977年和1979年分配给社员的部分占总收入的比重分别为66.9%和64.8%，20分配给社员部分占比很高，说明八队在支出控制和经济管理方面做得比较好。

费用支出主要包括生产费用、管理费用和其他费用，支出越多，能分配给社员的收入就越少，工分值就越低，所以费用支出直接影响工分值的大小。那么，其他生产队的费用支出高的原因到底是什么？在玉林市档案局笔者发现一份1976年的档案——《关于人民公社收益分配的情况问题和意见》，其内容可以较好地说明这一问题。

该份档案主要是对1975年玉林地区人民公社收益分配中的分配收入出现的一些问题进行总结并提出整改意见。1975年全地区粮食大增产，但是分配给社员的部分占比并不高，主要原因是费用开支大，全自治区费用支出占总收入的27%，但是玉林地区费用支出占总收入的比重高达33%。费用开支大的原因有八点。第一，有的地方发展生产不坚持“自力更生”原则，远途高价购买或调换化肥，费用开支大，生产成本高；第二，有的地方农田基建补助花样多，标准高，集体负担重；第三，有的地方扩建学校，增加民办教师，从而增大了集体费用的开支；第四，有的地方变相增加脱产人员，加重了集体负担；第五，有的地方社员上调家禽、生猪派购任务，要生产队补钱、补粮，增加集体负担；第六，有的地方的乱支乱补、大吃大喝、请客送礼、挥霍学杂费等不正之风还没有彻底刹住；第七，有的地方搞账外分配，或者高价（市场价）买入猪肉，然后按照牌价（较低价格）分配给社员；第八，有的地方存在贪污、挪用、超支欠款等问题。21从这些原因中，可以看到生产队在经营管理中存在的各种问题，虽然说这些现象并不必然存在于每个生产队，但是如果不严格控制支出，必然会严重影响社员的收入。

在表1中，本文还注意到工分值的变化。八队的工分值从1965年的0.35元逐渐上升到1981年的0.53元，1979年和1981年都突破了0.5元。由于影响工分值的因素非常多，生产队能够保持增长已属不易。1963年，华六大队的工分值为0.19元，此后逐步增长到1981年的0.55元。与华六大队相比，石寨公社的工分值增势更为平缓，在20世纪70年代总体保持在0.4元左右。这三级单位的工分值虽然涨幅不大，总劳动日却大量增加。通过计算可知，八队在1979年的劳动日是1965年2.1倍；华六大队和石寨公社1979年的劳动日都是1965年的1.76倍。工分主要是靠劳动力挣的，劳动力越多意味着工分越多。1979年，八队的劳动力为70人，是1965年45人的1.5倍；华六大队1979年的劳动力为915人，是1965年719人的1.27倍；石寨公社1979年的劳动力为12643人，是1965年9118人的1.39倍。22可见劳动力的增加速度远没有工分的增长速度快，工分的快速增长必然导致工分被稀释。同时应注意到，人口增长特别是劳动力增长自然使劳动工分增加，但是过多的劳动投入，在单位土地上带来的产出，并不会均一地带来同等幅度的增产。以八队和石寨公社为例，经笔者计算，八队1979年亩产1126.93斤，是1965年亩产886.87斤的1.27倍，而同期八队工分总量增长了1.1倍；1979年石寨公社亩产为1146.15斤，是1965年亩产917.31斤的1.25倍，工分总量却增长了0.76倍。23即便扣除部分工分用于非农业生产，工分的增长速度仍高于每亩的增产速度。这便是黄宗智所讲的“过密化”或“内卷化”现象：“在人多地少和土地的自然生产力有限的现实下，单位土地面积上越来越多的人力投入只可能导致其边际报酬的递减。”24

为避免农业生产上的过度内卷，充分利用劳动力巩固和发展集体经济，1959年年初中共中央农村工作部对当年全国农村人民公社的劳动力进行了分配规划，提出将51.4%的农村劳动力用于农业生产，剩余的48.6%用于国家工业交通、林牧渔副业、社办工业、交通运输、基本建设、生活服务等方面；在农业中，从事粮食生产的约为8000万个劳动力，占农村总劳动力的38.1%，从事其他作物生产的约有2793万个劳动力，占农村总劳动力的13.3%。25也就是说，从事农业生产的劳动力仅占总劳动力的一半，而真正种植粮食的劳动力不到4成。26所以十队的一位妇女说：“强的劳动力又抽出去了呀，就剩下二、三级的婆娘在家了，有的上山搞副业，没有多少劳动力的。” （XJA，十队社员，2017年3月25日）五队老队长补充道：“（种田）天天都是那帮人的。上调的人做不了，他不做这个就去做那个，做田就是做田的，我搞副业就是搞副业，分了工的。”（WGM，五队队长，2017年3月24日）

笔者在各生产队的账册中，发现不少专业队和副业人员的回队账单。例如，十队“1971年5月10日，收许有昌交款回队12—3月48元”（修建广西金红铁路，简称“6927工程”），27八队“1972年1月26日，收其文11—12月回队款23.6元” （专业队修船坝），28“1977年3月 14日收世天泥水工入队8元”。29当时规定专业队队员和从事副业的人员必须按一定比例将收入交回生产队，生产队按同等劳动力记工分，这样才能参与生产队的分配，同时生产队还要按时给外出的专业队队员寄口粮。例如，1969年广西从玉林抽调民工18000人，参加金红铁路修建工程，其中容县被抽调3000人。工程文件规定，“民工的生活待遇，每人每月30元，其中40%交回生产队，参加生产队分配，60%由民工个人支配。民工的口粮供应，除从生产队带足本人的口粮外，按工种定量标准，不足部分由国家供应”。30由于路途遥远，口粮是无法送去的，生产队只能通过转账的方式给民工购买口粮，如八队“1970年9月20日，支成才转6927（工程）粮200斤，每百斤9.3元，金额18.6元”。31可见，除去各级专业队队员、副业人员、民办教师等精干劳动力，真正进行农业生产的劳动力是很有限的。在非常有限的劳动力从事农业生产的情况下，其产出自然不会太高。

1974年，广西壮族自治区革命委员会水利电力局提出要大力组建专业队。“不但骨干水利工程要坚持常年施工，而且社社队队都要组织农田基建专业队，大搞常年施工。一个队、一个社、一个县如果抽出百分之十的劳动力，一年坚持施工十个月，就等于抽调百分之五十的劳动力每年突击两个月要完成的工作量。”32在容县，仅从1974年至1975年2月25日，全县动工大小水利工程727处，完工243处；完成造田、造地10896亩（其中造田5337亩，造地5559亩），另开茶山地9059亩；完成改土面积11.63万亩，共用去452.8万工日。33那么在集体化时期，容县在农田水利基本建设上大概用了多少工呢？

图1显示，新中国成立后，容县在集体化时期的农田水利基本建设完成的劳动工日的变化。由于这是官方统计资料，所以其中的数据只统计较大的工程，如华六大队除了大陂水库，其他四个小水库均未统计在内，34即还有很多大队、生产队自主修建的小型水库、山塘、沟渠等都没有统计在内。即便如此，以上数据也在总体上体现了集体化时期劳动投入的规律。新中国成立初期，由于国力较弱，集体经济制度还未建全，人们只能对小型水利设施进行修缮，投入的材料和劳动都很少，只有23.5万工日。1953年至1959年间，容县从农业互助组过渡到人民公社，完成的工作量明显增加，完成劳动日35也随之剧增，特别是1958年前后，也就是在“大跃进”时期，劳动投入达到一个小高峰，共投入565万工日。在1970年到1978年间，无论是在工作量上，还是在完成劳动日上，都呈现梯度式剧增之势，特别是在1976年，达到历史的高峰，耗费了1007万工日36。1980年以后，农田水利基本建设基本处于停滞状态。另外，在所有完成劳动工日中，水利用工占了绝大部分，主要是用于兴修大小型水利工程。1980年，由于集体经济的解体，大量农田水利基本建设失去了生产队的人力和物力支持。

综上可知，一方面，在集体化时期，容县乃至整个广西都抽调了大量劳力进行农田水利基本建设。采取的方法是，专业队常年施工与群众性突击相结合。专业队不仅有建设专项水利工程的，还有从事造田、造地等农田水利基本建设的，另外在级别上还分为大队级、公社级和县级的专业队。这样无论是在农忙时，还是在农闲时，大量劳力都被抽调出去进行各类农田水利基本建设。另一方面，这些农田水利基本建设属于共同生产条件的改进投入，对山区生产队的农业生产尤为重要。虽然短期内对农业生产的影响并不明显，但在灾荒之年，它在一定程度上可以避免或者降低灾害带来的减产程度，甚至可以保证部分农田旱涝保收。

二、生活性公共产品

人民公社除了为当地提供大量农田水利基础设施外，还为广大社员提供了各类生活性公共产品，包括文化教育、医疗保健和社会救济等。这些公共产品并不是完全由国家来提供，绝大部分是由当地人民群众自力更生、自筹解决的。这些公共产品的积累并不会在短期内提高农业生产率，只有经过较长时段后，才能显现它们的作用和影响力，所以卢晖临主张要“打开视野看效率”，特别是延后的效率。37而要实现这些积累，社员不得不从相对干瘪的腰包中再掏出一部分劳动产品，这样就会导致分配给社员的产品总量减少，体现在工分上就是工分贬值，进而影响社员的生产积极性。

（一）以民办教师为主的基础教育

1969年之前，华六大队有两所小学，共4名公办教师，当地整体文化水平较低。据1964年第二次全国人口普查，华六大队有1392人，具有初小（小学一年级至四年级）以上文化水平的只有664人，占总人口的47.7%；石寨公社有18130人，具有初小以上文化水平的有9425人，占总人口的52%，其中只读完初小，13岁—40岁的青壮年有2686人；读完高小（小学五年级至六年级）的有3348人；初中文化水平有606人；高中文化水平有90人；拥有大学文凭的只有9人。38为提高广大人民群众的科学文化水平，1969年广西要求各地将农村公办小学下放给大队、生产队办，农村公办中学下放给当地社、镇革命委员会直接领导和管理。经县、社统一调整后，仍缺教师的大队，根据实际需要，选拔民办教师充实教师队伍。选拔的要求是：家庭出身好，并有一定教学能力，如果是复退军人和知识青年，则优先录用。对于这些民办教师的工资待遇，补助多少由贫下中农讨论决定。39

1970年，华六大队共有4所小学，1所小学附初中，公办教师7人，民办教师9人。40在集体化时期，公办教师的薪酬全部由国家支付，而民办教师的薪酬由生产队承担（统筹）。华六大队的年终统计表显示，1973年，十队上交了981斤统筹粮和161元统筹金，其中统筹金是为4名大队干部、4名民办教师以及1名兽医上交的。41然而，同年，华六大队共有13名民办教师，一般生产队原则上选派1名教师，十队由于和九队合开一所分校选派了2名。据当时的大队干部介绍，并不是所有的民办教师都可以统筹，只有教得比较好的才有资格统筹。至于没有得到统筹的教师则回各自的生产队记工分，大队再发给少量的补贴。42

随着教育事业的发展，到1978年，容县有6161名教师，其中民办教师3626名，占教师总数的59%；43华六大队共有7名公办教师，16名民办教师，44民办教师占比约为69.6%。由于小学教师大部分是民办教师，业务水平低，课堂教学中出现差错屡见不鲜，再加上“半天学习、半天劳动”，“以劳代学”的教学安排，学生学习规律被打乱，知识基础较差，甚至出现大量留级现象。为提高教学质量，容县积极采取多项措施，通过举办轮训班，办函授学校、进修学校，开展巡回辅导等提高教师的业务水平。45

教学质量不高，除了教师能力不足以外，民办教师的工资待遇没有得到很好的保障，使其不能安心教学也是重要原因。“我县民办教师（不包括自筹教师）的生活待遇有两种，一是国家补助加大队统筹，二是国家补助加生产队记工分，不足部分由学校学费或勤工俭学收入补足。不管是采用哪种办法的，都有较长期拖欠民办教师工资的问题。”46拖欠情况包括：一是教师工资统筹不上来。例如1978年，华六大队5个民办教师，总共被拖欠工资551元，人均被拖欠110.2元。二是教师工资未发齐。部分大队给民办教师一年只发十个月的工资，而且每月工资未达到初中教师30元、小学教师24元的标准。三是教师粮食收不齐。部分大队规定，民办教师的粮食要本人到各生产队收，然而，实际上有的粮食收不全，有的收到次等谷。47可见，当时民办教师待遇存在长期拖欠和粮食以次充好等问题，极大地影响了教师的正常生活。

根据收益分配统计表的数据，华六大队在1980年已经从1979年的13个生产队分为18个生产队，不少生产队内部开始酝酿分田分地了。民办教师的工资和粮食主要由生产队提供，生产队的解体必然会引起民办教师群体的动荡。1981年，容县教育局在汇报普及教育工作时指出：“我县最难解决的问题有：我县民办教师比例大，群众负担较重，近年来，由于生产队体制的改变，民办教师的粮、款很难统筹解决，严重地影响着民办教师生活和工作的安定。”48十队的许某正是由于分队，导致报酬没有兑现，退出了教师队伍。“（19）80年分田到户，这里（十队）分成三个小队，我们没有统筹得上，我就不做了。”（XJA，十队民办教师，2017年3月25日）由于民办教师和其他上调人员的物资、工分很难从生产队进行统筹，教师队伍面临严峻挑战。从表1中1979年至1981年的数据变化便能推断出各级单位大量缩减支出。虽然华六大队和石寨公社的劳动日与分配给社员部分的金额都减少了，但是分配给社员部分的金额占总收入的比重增加了（华六大队增加了7.54个百分点、石寨公社增加了4.93个百分点）。

为解决民办教师的教学质量和后勤保障问题，1981年，容县教育局和财政局开始整顿民办教师队伍，辞退了思想品质、业务水平和健康状况不能胜任教学工作的教师，业务能力强、业绩突出的民办骨干教师则被吸收为公办教师。49华六大队的民办教师除了部分因为工资低没有坚持下来，大部分后来都转为编制内教师，成为真正的骨干力量。

容县在1950年至1981年间，小学生人数从13236人增加到78134人，教职工从641人扩大到3749人（含民办教师1910人）；中学生人数从898人增加到30435人，教职工从98人增加到2317人（含民办教师1039人）。50小学生数量增长了近4.9倍，教职工人数增长4.8倍，适龄儿童的入学率高达93.4%。1981年，容县中小学民办教师人数仍占教师总数的48.6%。据统计，当年全国有民办教师近396.7万人51，占教师总数的47%。支撑这支庞大的民办教师队伍的是全国600多万个生产队52。保守估算，一位民办教师的月工资约为24元，国家和生产队各承担12元，生产队每年还要另外提供600斤口粮（100斤口粮折价为9.5元，600斤口粮折价为57元），因此每位民办教师需要生产队每年为其支付201元，全国396.7万名民办教师每年需要生产队支付79736.7万元，10年便接近80亿元，平均每个生产队10年共支付1333元支持基础教育。事实上还有相当一部分民办教师没有得到统筹，需要回生产队记工分参加分配。53当然，这些支出是值得的。据1982年第三次全国人口普查统计，华六大队有1720人，小学以上文化水平的人数为1398人，占总人口的81.28%，比1964年高出33个百分点；整个公社有24492人，小学以上文化水平的有20275人，占比为72.95%，比1964年高出20个百分点。54可以说，民办教师在广大农村地区，极大提高了社员的科学文化水平，这些学生在改革开放后，逐渐成长为社会主义建设的主力军。

（二）以赤脚医生为主的公共卫生

除了教育事业，农村的医疗卫生事业也主要由生产队负担。在毛泽东要求“把医疗卫生工作的重点放到农村去”的“六二六”指示的推动下，全国各地都把这项工作当作一项重要的政治任务，迅速组织医疗队，开展农村合作医疗。

为积极响应中央号召，使广大群众看得上病，看得起病，吃得起药，1966年5月1日，容县人民委员会卫生科根据中央文件，制定《关于实行合作医疗的卫生所的有关意见》，这份文件成为容县后来开展合作医疗的重要纲领。“凡实行合作医疗的区，则在全区范围内看病不收诊费（门诊、出诊）、注射费、处置费；凡有条件的卫生所要开设中、西药柜，以利方便病者，减轻社员合作医疗负担，解决医生部分工资和卫生所办公费，还可以解决部分贫下中农的医药困难的减免；医生到生产队巡回要背下乡中西成药下去，以利方便病者，但要实行保本保值，收入归卫生所；诊费、药价要坚决贯彻执行国家规定的标准收费；为了减少病者负担，每个医生（接生员）都要学会针灸、使用针灸和使用中草药医疗疾病。”55从这些意见中可以知道，容县主张医生要通过各种手段，尽可能地减轻人民群众的负担，收入归集体，强调中西医结合，特别要充分利用中药和针灸为社员治病。

要健全合作医疗制度，除了上述规定外，还要解决好医生的生活问题。1967年1月，容县发布《关于人民公社成立卫生所，医生、接生员实行合作医疗制度的通知》，规定每个公社成立卫生所，每个卫生所安排医生1—3名（逐步配备中、西医生1—2人），接生员1—2人。医生和接生员领取的粮食和工资全部由公社统筹解决，医生的月工资为15元至30元，接生员的月工资为15元至20元，另外，他们每月领取大米30斤；统筹粮由生产队统一送当地粮所，粮所则每月按量供应大米。56此文件对医生与接生员的待遇进行了相关规定，但是“粮食、工资全部由公社统筹解决”只是把问题抛给了公社，工资到底怎么解决并没有明确规定。统筹粮经粮所再转到医生手中虽然更有保障和方便管理，在现实中却很难实行，尤其像华六大队这样的山区大队，离县城路途遥远且崎岖不平，医生每月领粮既费时又费力，所以后来大部分大队都是让医生到生产队挑粮而不是到粮所领取。为进一步减轻人民群众负担，容县对药品、医疗器械的采购和零售价格作出规定：“今后凡已实行合作医疗大队的卫生室到所在供销社（县医药公司各门市部）采购中、西药品，医疗器械不论金额多少，一律按批发价作价供应。各卫生所一律按当地供销社零售价销售。”57这些规定从成本、服务等方面要求尽量以最低价格为广大人民群众提供服务。

由于合作医疗制度是个新事物，具体怎么做只能不断探索，寻找适合本地的制度。容县采取树立典型、相互学习的办法让合作医疗尽快办起来。1970年石头公社的合作医疗办得较好，成为各公社学习的对象。该公社的卫生队伍包括：大队医生、采制药人员、接生员和生产队卫生员。大队级人员的报酬向生产队统筹，实行工分加补贴的办法，医生一般每月补助5—10元，其他工作人员每月补助3—4元，卫生员则回生产队记工分。对于合作医疗资金的筹集，由生产队统一计算按参加人数支付。每人交1元，其中个人交0.5元，生产队交0.4元，大队和公社各交0.05元。在收费制度方面，大队卫生所一般收挂号费0.05元，出诊费0.1元，注射费0.1元，接生费每个小儿0.5元，这些费用由病人负担，药费全部由合作医疗开支。如果是重病号到公社以上医院治病或住院，合作医疗支付60%，剩余的40%由病人负担。合作医疗主张自力更生、全民办医，贯彻“三土（土医、土法、土药）”“四自（自种、自采、自制、自用）”方针。石头公社各级单位均设有草药室，以草药为主（用量要求达到70%—80%），中药为辅，适量西药备急，其中草药的来源为：抽专人采集和群众献药相结合，三级有专人采药、制药，采药、制药人员报酬由大队负责。58石头公社根据本社的经济状况，各级单位分摊社员的部分医疗资金，同时大力采用中草药治疗疾病。因为容县山多，药材丰富，可就地取材，加上生产队种植草药，大大节省了药费开支。

“合作医疗是收每个人的钱，那时没有收钱（看病），试过两年吃药不要钱，之后就不行了，反正像大队的企业那样。”（TFQ，三队赤脚医生，2017年3月17日）华六大队的佟某当时是一名赤脚医生，1947年出生，1965年在容县学医，1968年9月在大队开始行医。对于赤脚医生的报酬，他说：“最初几年就是吃工资，（19）68—（19）72年，24元每月，8毛钱一天。工资是从各生产队统筹，整个大队有副业人员、大队干部、医生，全部按照整个村有多少收入，再分配每个月多少钱，各个生产队抽多少上来，统一分配的。1972年以后就是吃工分，做医生就相当于搞副业一样，每天记12分。算起来就是三四毛1天，有些生产队只有两三毛，那时很穷的。以前我们容县村医大部分都是吃工分。一个月是50斤稻谷，一年600斤。”（TFQ，三队赤脚医生，2017年1月6日）但是，对于1972年以后一直是吃工分的说法，在曾任大队干部的陈某那里得到不一样的答案。“医生就是从利润那里支付工资，粮食就从村里统筹，老师的工资和粮食也是从村里统筹。”（CPY，华六大队会计辅导员，2017年7月6日）陈某1974年12月至1978年冬在大队任会计辅导员。59由于合作医疗制度在不断完善，华六大队根据上级相关政策，既实行过工分加补贴，也实行过工资制。

1972年，广西制定了《农村合作医疗制度试行草案》，规定“凡是参加合作医疗者，按规定交纳合作医疗基金或以药代金。基金由个人和集体（公益金）负担，负担比例由社队根据情况自行确定。由生产队统一计算按参加人数支付”；要合理解决赤脚医生的报酬和口粮，其报酬要略高于同等劳动力的水平。60“合理解决”意味着赤脚医生的报酬既可以是工资的形式也可以是工分的形式，只要合情合理，并能够调动医护人员的积极性就行，所以1973年，容县各公社的赤脚医生报酬存在各式各样的形式，例如“实行工资制，开支从收入中解决……合作医疗变大队企业，收入归大队，赤脚医生实行工分加补贴，全部向生产队统筹”。61到1974年，石寨公社有23个医生，报酬都是以工资的形式发放，每月工资最低20元，最高30元。62到1977年，《广西壮族自治区农村合作医疗章程》规定赤脚医生的报酬为：“实行‘工分加补贴’的办法，每年由大队根据赤脚医生的政治思想、工作表现、技术水平、劳动态度等情况评定，一般应略高于同等劳动力的收入水平”。63那么“工分加补贴”具体是如何实行呢？这在1978年《关于加强合作医疗基金筹集和稳定赤脚医生报酬的请示报告》中有介绍：“每个赤脚医生每月在队记260分或300分，每天补助贰角生活费，每月补助六元，有的补九元，按该医生所在队分值计算工分所得部分，平均每月加生活补贴不达24—30元的，再从合作医疗收入中补足。”641979年，容县179个大队中，实行合作医疗的有155个，共有627名赤脚医生。在本大队报销的比例，大部分在30%—50%之间，上送报销比例在20%—40%之间，其中华六大队的合作医疗报销额度是30%。65

在广大赤脚医生的努力下，1982年，容县60岁—90岁的人口从1964的26517人增长到42699人，占当年总人口的比重由7.07%提高到7.69%。661982年全国人口已超10亿人，60岁以上人口比重达到7.62%，比1964年的6.13%高出近1.5个百分点，67这在一定程度上说明我国医疗水平和卫生保健系统更加完善，而这离不开无数赤脚医生和基层医护工作者的默默奉献。1980年全国农村赤脚医生总人数达146.3万人，其中女赤脚医生48.9万人，农村生产队卫生员235.7万人，农村接生员63.5万人。68而这些不脱产医护人员的工资、口粮主要靠生产队解决。仅就工资方面，医生的月工资为24元，一年为288元，146.3万人一年工资共为42134.4万元，10年便是42亿元。事实上生产队所付出的要远远高于这一数字。国家只支付了少量的管理费和药费，以非常低的成本构建了完善的农村医疗卫生系统，保障了社员的身心健康，提高了出勤率，促进了集体经济的健康发展。

（三）保障困难群众的基本生活

在大部分人的回忆中，似乎并没有什么困难户，因为大家都很穷。然而，贫富只是相对的。在各生产队的账本中，笔者发现不少困难户领取国家救济金的凭证。例如，笔者在八队的账本中看到1972年3月7日，“大队拨来仕华救济金10元，交丽梅领”，69后面还有刘丽梅的印章。大队保存的阶级档案显示，陆仕华生于1932年9月，1972年已40岁，一家6口人，育有两儿两女，均不满10岁。70从这些情况来看，陆仕华一家的生活非常艰难。

生产队用来救助军烈属、五保户和困难户的资金、粮食，一般是用公益金。公益金“要根据每一年度的需要和可能，由社员大会认真讨论决定，不能超过可分配的总收入的百分之二至三……生产大队对于生活没有依靠的老、弱、孤、寡、残疾的社员，遭到不幸事故、生活发生困难的社员，经过社员大会讨论和同意，实行供给或者给以补助”。71

除了上述困难户外，还有一类困难户往往被人们所忽视，那就是“超支户”，顾名思义，即一年的收入不足以抵扣一年开支的农户。社员一年的劳动收入是通过工分来兑现的，生产队通过工分把各种生产、生活资料分配给社员。如果他们一年的工分收入不足以抵扣其一年的开支，那么这一年不仅没有盈余，反而欠生产队的钱粮。本文以八队的陆仕忠一户为例展开说明。

1976年，陆仕忠一户共有7人，夫妻二人加五个子女，大女儿1962年生，14岁，属于半劳力；第二个是儿子，1964年生，12岁，其他均为10岁以下儿童。72从表4的支出中，可以看到，陆仕忠一家支出金额最高的是口粮，全年消耗口粮3548.1斤，平均每人消耗506.87斤，需支付335.17元，占总支出的91%；当年挣得工分8599.3分，每个工分值为0.38元，全年总收入为326.77元，不足以抵扣总支出（367.93元），超支了41.07元。

八队在1976年共有6户超支，11户有盈余，4户平收，总户数为21户，超支户约占29%。这个比例在华六大队应该说是非常低的。1976年，华六大队超支户高达186户，占比55.3%，欠款共计11287元，73不管是占比还是欠款数额都在集体化时期达到最高值。由于欠款数额不断累积，到人民公社后期，生产队处于入不敷出的艰难境地。

为何会产生如此多的超支户？这是一个不得不探讨的问题。

超支户的存在，表面上看是农户挣的工分不够多，不足以抵扣从生产队获得的生活物资，本质上是因为生产队的物资不足导致工分含金量不高，以至于农户的工分不够支付其生活开销。如果物资充足，每一个工分所含的物资就更多，大部分农户的工分是足以支付其生活开销的。而物资短缺又与农业的产出密切相关。那么农业产出为何不高呢？当笔者把这一问题抛给村里的老人时，往往得到的答案是：没有肥料和农药。

农谚说“有收无收在于水，收多收少在于肥”。“那个时候由于生产条件落后，种子也很落后，肥料在市面上也很少有卖。一般都没有肥料来卖，到后期才有这个碳铵和这个氨水。（19）80、（19）79年以前都是没有肥料供应的，基本上是山上的草皮泥，也就是这些人上山铲这个草皮泥来烧，烧了以后再撒到水田里面去，过去都是这样耕种的，也没有什么杂交种子，都是落后的种子，一般是（收获）200—300斤每亩，现在（每亩）都有1000—1200斤。”（CPY，华六大队会计辅导员，2017年1月6日）“过去主要是没有肥料，没有这个良种，现在则有良种、有农药，所以生产好，过去喝粥也难有喝。”（HZN，六队队长，2017年1月6日）

在八队1977年的分类账中，“农业支出”记录了一整年的所有支出项目。经笔者统计，八队当年共购买了复合肥2斤，尿素415.1斤；碳铵10950斤，包括一级碳铵和次级碳铵（肥力较低，价格较便宜）；农药品种有“乐果”“毒杀芬”“六六粉”“敌百虫”等；早稻浸谷种2270斤，晚稻谷种2884斤，共5154斤。74八队当年有130.5亩耕地，其中水田117亩，旱地13.5亩（4.2亩自留地），75除去自留地，集体实际拥有耕地126.3亩，两季共252.6亩，平均每亩施1.64斤尿素和43.35斤碳铵，每亩水田要22斤谷种，农药以“六六粉”为主。投入这些生产要素后，当年八队共收获109523斤稻谷，亩产468斤。76

此外，农业产出低还受到生态环境的制约。正如黄宗智对新自由主义经济学理论批判的那样，农业不同于工业，不是投入的生产要素越多，单位产出就越多，甚至总量和产出几乎可以无限制扩大。把农业等同于工业，本身就是对农业的误解。农业说到底是人在土地上种植植物的有机问题，而不是一个机器生产的无机问题。因为农业生产受地力和生态环境的限制，土地不可能无限产出。77很可能一场洪涝或者干旱就能把农民辛苦劳作一年的成果化为乌有。

从表5可知，容县在1969年—1982年的14年间，影响早稻的各类自然灾害频繁发生，发生率从高到低依次排列是：病虫害、龙舟水、倒春寒和夏涝。需要注意的是，表5并未统计对晚稻影响较大的寒露风。当这些灾害组合性地发生时，会给农业造成致命打击。例如1976年，由于倒春寒的发生，当地烂秧严重，既损失了大量稻种，又推迟了播种季节。不巧的是，当年不仅出现龙舟水，病虫害也大发生，由于预防及时和经营管理得较好，早稻损失不大。但是，由于早稻种植推迟，导致晚稻插播也推迟，这样就使晚稻在扬花灌浆期遭遇寒露风。“抽穗扬花期遇到寒露风天气，直接影响抽穗开花的速度，使空秕粒增多，降低千粒重，造成减产。”78当年水稻产量八队比1975年减收5874斤，人均分配口粮减少20斤；华六大队减产110075斤，人均分配口粮减少54.9斤；石寨公社减产1178295斤，人均分配口粮减少56斤。79这最终导致华六大队的超支户数量由1975年144户增加到186户，占比为55.3%。同年，容县减产3423万斤，人均分配口粮减少70.6斤，超支户由35005户增加到36779户，增加了1774户。80这些数据说明，农业生产深受生态环境的制约，尤其是自然灾害对农作物的影响。然而，经济学家们往往有意或无意地忽视了这一重要因素。生产队有超支户、平收户和盈余户，其中最容易由不欠生产队转变成欠生产队的农户是平收户。自然灾害对平收户的影响，就像“一个处身于水深没颈的人，即使是一阵轻波细浪，也可能把它淹没”。81可以说，生产经营中的任何一个环节出现异常，都有可能使平收户变为超支户。这也是为什么在集体化时期，人民公社要进行大量的农田水利基本建设。有了完善的农田水利设施，可较好地降低自然灾害对农作物的损害程度，使得农民在面对寒露风时，并不是无能为力。由于容县历年出现寒露风概率最多的时间是从每年10月11日至11月10日82，所以较好的办法是种植早熟和中熟的稻种，这样就可以让水稻在抽穗扬花期避开寒露风，但这需要优良的稻种。此外，根据广大人民群众长期的耕作经验：“有水不怕寒露风”，在寒露风到来之前往田里灌水，就可以保存地温和增加稻田小环境的温度，从而减轻寒露风对水稻的危害。83而要有大量水源，就需要水库贮存水，以及通过相应的沟渠和设施把水引入田中。

当然，生态环境并不是造成农户超支的主要原因，它只能在一定程度上限制生产队农业产出的总量。造成农户超支的主要原因是国家与集体从生产队中提取了过多物资。国家之所以提取大量物资，是为了满足工业化的需要。陈云在1950年6月说：“中国是个农业国，工业化的投资不能不从农业上打主意。搞工业要投资，必须拿出一批资金来，不从农业打主意，这批资金转不过来。”84刘少奇也认为：“发展中国经济，使中国工业化，是需要巨大的资金的……但是从哪里并且怎样来筹集这些资金呢？……只有由中国人民自己节约……而要人民节省出大量的资金，就不能不影响人民生活水平提高的速度，就是说，在最近一二十年内人民生活水平提高的速度不能不受到一些限制。这并不是为了别的，只是为了创造劳动人民将来更好的生活”。851955年7月31日，毛泽东强调：“为了完成国家工业化和农业技术改造所需要的大量资金，其中有一个相当大的部分是要从农业方面积累起来的。这除了直接的农业税以外，就是发展为农民所需要的大量生活资料的轻工业的生产，拿这些东西去同农民的商品粮食和轻工业原粮相交换，既满足了农民和国家两方面的物资需要，又为国家积累了资金。”86可见，在集体化时期，人民生活水平的提高和加快工业化进程是矛盾的。国家从长远考虑，只能牺牲人民生活水平的快速提高。

1960年，《中共中央关于农村人民公社分配工作的指示》指出：中央原来规定的总扣留占40%左右，分配给社员的部分占60%左右。如果当地收入水平较高，如每人分配在100元以上的，扣留可以多于40%；如果收入水平较低，如每人分配在50元以下的，扣留可以少于40%。87也就是说，正常情况下，人民公社要向国家和集体贡献大约四成左右的劳动成果。虽然人民公社制度在不断调整，但这一核心规定一直贯穿于集体化时期。1974年玉林地区分配给社员的部分占总收入的比重只有53.94%，该地区当年分配给社员的部分占比最高的是平南县，为55.77%，最低的是陆川县，为48.54%。88当“上下左右向生产队伸手，四面八方挖生产队墙角”89时，社员辛苦劳作一年，分配总量甚至不足一半，超支户怎能不多？

除了生产水平低、生态环境制约和国家、集体抽取过多物资，还有一个重要原因直接影响超支户的数量，即人民公社的分配制度。分配制度是生产关系的一部分，采用什么样的分配制度取决于生产发展的水平。1962年通过的《农村人民公社工作条例修正草案》指出，粮食分配应根据本队的情况和大多数社员的意见，分别采取各种不同的办法，可以采取基本口粮和按劳动工分分配粮食相结合的办法，也可以采取按劳动工分分配加照顾的办法等。不管采取何种办法，都应该做到既要调动大多数社员的劳动积极性，又要确保困难户能够吃到一般标准的口粮。90虽然国家要求生产队要遵循按劳分配、多劳多得的原则，避免分配上的平均主义，但是在实际分配中，基本口粮占比往往较大，很难进行真正意义上的按劳分配。

“在目前口粮不高的情况下，必须首先保证各等人口留粮放在安全线上，过分强调多劳多吃，是不符合粮食分配原则，是不正视当前粮食状况，是没有全面了解社员的要求，其后果，必然引起今后粮食安排的被动，亦不能达到发挥全体社员的劳动积极性。”91所以，华六大队在集体化时期粮食分配的70%按人口定量分配，30%按劳动工分分配。在农业生产水平较低的情况下，生产队首先要保证每一位社员都有口饭吃，也就是学者们所说的生存伦理92，当社员的基本生活都难以保障时，生产队就会面临解体的风险。如果国家政策允许生产队切实贯彻按劳分配，多劳多得，不劳动者不得食的社会主义分配原则，社员的生产积极性可能会大大增加，超支户的数量也可能会减少，但是也可能会导致部分农户的生活非常困难，甚至饿死人。这样的结果不仅国家政策不允许，熟人社会中的道德规范也是不允许的。虽然按三七开的比例分配物资具有一定的平均主义倾向，但它在保证大部分人的基本生活和激励劳动力积极出工参加生产活动上较好地进行了平衡。

三、小结

第一，人民公社为广大乡村提供了丰富的公共产品，内容涉及社员生产、生活的方方面面。与当下的政策不同，集体化时期的公共产品均由生产队或生产大队自我供给，生产、运输、管理、消费等各个环节都在本地进行，并没有获得足够的财政和物资支持。然而，这恰恰表明集体经济具有社会经济的属性，即经济活动和参与经济活动中的人及其所在的社会网络是紧密地结合在一起的，它们是相互嵌入的关系，集体经济的效益最终是让所有社员都能够受益，而不是像资本主义经济那样，脱离地方社会和文化，以攫取地方社会资源为目的进行经济活动，虽然经济效益非常可观，但是将所有的问题和矛盾都遗留给当地，以竭泽而渔的方式破坏当地的可持续发展。潘毅认为，社会经济的要旨，就是以人为本，立足社区而不是让资本剥削社区，互助合作，民主参与，人类与土地和谐共生。生产不是为了消费，而是为了解决民生，追求共同富裕，是一种多元化的社会所有制。在本质上，社会经济不是服务于资本累积，而是将社会重新嵌入社会关系中的一种新形态的经济模式。93

正如毛泽东在《中国农村的社会主义高潮》编者按中所言：“人民群众有无限的创造力。他们可以组织起来，向一切可以发挥自己力量的地方和部门进军，向生产的深度和广度进军，替自己创造日益增多的福利事业。”94作为社会经济重要组成部分的集体经济，在生产力发展水平较低的条件下，在农村修建了大量水利设施，尽可能地提高了土地生产效率，同时增强了生产队抗灾、救灾能力。此外，人民公社还广泛组织群众发展基础教育事业和医疗卫生事业。这些福利事业不仅价格低廉，而且在广度和深度上都动员了社员进行自我教育、自我成长和自我保健，满足了社员自身发展的需要。事实上，农民在集体化过程当中所受到的洗礼要远远高于笔者所看到的，包括管理水平、纪律教育和科技创新等，所有的这些都在塑造着“新型农民”，为改革开放后国家的飞速发展，提供了优质劳动力。所以，笔者以为，要实现乡村的再次振兴，必须把广大人民群众重新组织起来，使经济回归社会，尤其是作为社会经济的集体经济，这是一条可供选择的路径。

第二，通过研究发现，人民公社时期的农业生产效率客观上的确存在效率低下的问题，例如人们的收入水平较低，生活条件改善缓慢等。但是，在这些事实背后蕴含着错综复杂的原因及逻辑，当本文剥离这些原因后再度审视集体经济制度时，发现导致社员收入不高的原因是工分被稀释了。农户总收入计算公式能很好地对此进行说明。

由于农户的劳动力在一年或者数年内，基本保持不变或者变化不大，所以，农户总工分事实上是在相对平稳的区间内浮动。因此，影响农户总收入的因素主要是生产队的工分值。而导致生产队工分值变化的因素主要有两个，即生产队的纯收入（总收入－生产成本）与生产队总工分数。当纯收入保持不变时，生产队的总工分越多，即分母越大，工分值越小；当总工分数保持不变时，纯收入越少，工分值也会随之变小。所以，工分稀释化主要包括两个方面，一方面是工分的直接稀释，即把非农业生产的工分拿回农业之内进行分配，从而导致工分被稀释，分值下降；另一方面是实物和现金等物质上的间接稀释，即从生产队中抽走、消耗大量物资，减少生产队的纯收入，进而降低了生产队的工分值。如果把各级单位强加在生产队身上的各种“包袱”给抛弃掉，工分值和社员所得将会大大提高。

第三，在学界，对人民公社批判最多的就是平均主义和“大锅饭”，其中“大锅饭”几乎成了人民公社的代名词，污名化非常严重。笔者以为把造成平均主义的原因归结为人民公社制度本身是值得商榷的。因为“人民公社低效率的原因是综合的，既有公社自身的原因，也有公社自身之外的原因，但公社自身之外的原因是主要原因，而不是相反”。95当国家和集体从生产队拿走过多的剩余产品时，可供分配的产品自然不足，人均占有量也就无法提高，如此才导致所谓的平均主义。经研究，本文发现，在生产力、人力和物资都非常有限的条件下，人民公社的农业生产仍能保持较平稳的增长，实属不易。同时，人民公社为支援国家工业化建设，提供了大量公购粮和农副产品；为满足人们的生产生活需要，在农村地区提供了丰富的基建、教育、医疗和社会保障等社会公共品。可以说，它的效果是多元的。因此，对人民公社的评价，不能仅局限于某一方面或某一时段，而应放大到整个国家层面和历史的脉络中进行考究，才能得出较为客观的结论。

本文转自《开放时代》2024年第6期

2024-12-03
熊谋林：“实证法学”的概念术语回顾与回归 ——基于文献的实证法学研究整合路径

一、引言

近十年来，法学界关于实证研究范式和思想的学术讨论异常活跃，各大期刊均围绕相关概念以专题形式发表论文。就在实证法学家们关于如何定义、命名传统概念争论不休时，出现了大量基于裁判文书网和各大平台所发布的大数据而创造出的新兴概念。与此同时，诸多研究用新的范式、范式革命等新词，实现实证研究在近十年脱胎换骨的创新或改变。然而，关于新旧范式的结论已遭到学者的质疑，曾赟就指出“认为实证法学是近来才兴起的法学新范式的观点是值得商榷的”。①当然，近十年的这一场牵涉术语和概念的争议，既有语言翻译和学科习惯问题，②也有可能来源于学者的“代际之争”，③还有可能是“概念泛化”“名副其实”“空谈”的玄学。④但最重要的原因，或许是法学家们对计算机技术过于崇拜，或对新技术、新产品表现出学术恐惧，从而造成对实证分析的基本对象和进路理解过于激进。不少实证法学家们都陷入足够大的样本或全样本的技术陷阱，以为只有这样才能构造近似客观真相的大数据。然而，夏一巍的研究却反映出，大样本对于实证研究没有必要，只需500个随机抽样样本就可达到与几十万样本近似的分析结果。⑤

事实上，作为方法的实证研究与其他研究方法一起，在“文革”结束后的法制建设过程中即被广泛讨论。那个特殊时代所讨论的法学研究方法，丝毫不比今天逊色，甚至堪称更加出彩。学者们从法学内部拓展到外部，寻求与自然科学和社会科学相结合来繁荣和发展法学研究。这一时期主要围绕计算机和定量分析而展开，并由此直接产生了“数量法学”⑥和“电脑法学”⑦。这两概念从开始就超越研究方法的范畴，被作为一个学科构想而提出，并进一步推动与实证研究相关的讨论。

遗憾的是，无论是研究方法还是学科概念，当前所讨论的实证研究概念几乎没有注意到二十世纪八十年代的讨论。从某种意义上说，忽视二十世纪八十年代的这场宏伟、壮丽、深邃的讨论，也是穿新鞋走老路的关键。今天所呈现出的学术盛况，是否真的能达到真知灼见的程度，可能需要打上大问号。法学领域的实证研究概念之争，恰恰可以归结于没有进行文献回顾的主观论述。然而，没有文献回顾的概念之争，在理论和学科意义上不仅对实证研究没有任何好处，反而会制约实证研究的发展，其所建立的概念范畴本身更是沙滩上的大厦。一方面，欠缺原始或初期概念的细致研究，仅凭一种想象的概念和领地之争，显然无法综合评估各种概念的来龙去脉。另一方面，实证法学家们提出的以自我为中心的概念，甚至出现与自己先前的学术立场相左，或者与文献不符的失真论断。以左卫民为代表的法律实证研究和以陈柏峰为代表的法律经验研究，⑧主要或直接将苏力的“社科法学”定位在质性研究、个案研究、田野研究上，并由此引发法律经验研究是否是实证研究，以及社科法学和自科法学之争。然而，苏力在提出社科法学时的表述和表达上，不仅肯定社科法学是实证研究，更高度肯定基于数学、统计上的定量研究。⑨

总体来看，这一场关于如何命名法学领域的实证研究的大讨论，虽然法学核心期刊的论文产出绝对可观，但方家大论基本以自己的学科、背景和理解构筑概念。可怕的是，实证法学家们所营造出的概念争议，事实上也成为实证研究阵营分化的起点。⑩其结果是，在实证研究尚未成形或成为可接受的研究方法之前，出现了力量分散的学术阵营分化。是故，程金华呼吁保持“开放、多元、互补、合作”的学术共同体，(11)尤陈俊也有关于“彼此尊重，砥砺前行……相互学习、借鉴和融合”的评论。(12)

面对这些概念或术语相互分离、山头并立的局面，已有学者试图从不同角度解决实证研究领域的术语问题。这主要表现为分离和整合两条路径。分离路径，试图缩小“实证研究”的范围，从而将不属于实证研究的范式排除在外。整合路径，试图用新术语的内涵和外延去解决先前术语的问题。然而，无论哪种路径，各种概念论者仍以自己的学术立场或倡导为中心，展现出相互攻击、自我否定、互相蚕食的学术生态现象。

就分离路径来说，主要表现为曾经趋同于实证研究的学者，逐渐用自己的新兴概念，将自己从传统实证研究中分离出来。例如，左卫民笔下的法律实证研究长期被定义以“数据”为核心的定量研究，甚至直接用“前统计法学”“计量法学”“定量法学”来概括。(13)近年来，左卫民不仅认为计算法学“可以视为法律实证研究的衍生或者2.0版”，(14)而且创立以大数据为中心的“自科法学”，从而与传统实证研究的小数据和社科法学基于个案的“‘实’而不‘证’”相分离。(15)曾赟在论述数据法学应该是独立于实证法学、计算法学的新学科时，也提出“不宜将定性研究归于实证法学研究”。(16)张永健和程金华在探讨法律实证研究的内涵时，本意是用“法律实证研究”的两种形态来整合“实证法学”和“实证社科法学”，将定量和定性的研究都放在实证研究体系之下，但他们围绕“是否应用社会科学的范式”所创造的两种概念，事实上又成为加剧社科法学和法律实证研究差异的重要诱因。(17)与此同时，侯猛注意到法律实证研究的名称问题，鼓励用“实证研究”“经验研究”“定性研究”或“定量研究”来区分各自的差异。(18)陈柏峰为建立田野调查的质性方法坐标，创设基于质性的“法律经验研究”，刻意区别于以定量为基础的法律实证研究。(19)

就整合路径来说，各种新兴概念都在试图统筹和统一其他既有概念。左卫民基于法律大数据时代的特性，认为人工智能法学、计量法学、计算法学的概念周延性“值得推敲”，“自科法学”更加妥当。(20)马长山在认识到“近年来各地设置了名目繁多的新兴学科，如互联网法学、信息法学、人工智能法学、数据法学、计算法学、认知法学、未来法学等”后，提出应当将这些新名称统一为“数字法学”。(21)马长山的观点得到姜伟的支持。(22)胡铭在谈到数字法学的相关概念时，认为其包括网络法学、数据法学、计算法学、人工智能法学“等基本板块”。(23)苏宇却试图把数据法学、网络法学、互联网法学、网络信息法学、数字法学、计算法学、人工智能法学等新概念用“信息技术—法学”融合在一起。(24)刘艳红在谈人工智能法学领域的名称不一、内涵不清、学科归属不明的问题时，将网络与信息法学、数字法学、大数据法学、计算法学等统一在人工智能法学之下，并在“法学一级学科之下设置全新的二级法学科”。(25)肖金明、方琨在高度赞扬计算法学时，认为这是“对人工智能法学、数据法学到数字法学的理论概括”。(26)

本文不希望从概念术语发展体系上提出新概念，而是告诉读者这些概念的前世今生，并基于整合路径重申什么术语才是统一法学领域的实证研究、实证法学学科并促进其发展的最好术语。

二、中国实证研究的当代起源：钱学森的系统工程及其影响

(一)系统工程下的法治系统工程学和系统法学

从知网文献来看，源于二十世纪八十年代的法学研究方法的讨论，主要是关于数学研究方法和系统科学的讨论，这其中不乏包含实证研究的宝贵结论和分析。

1979年，钱学森先生在其系统工程的总体框架下，号召建立包括法治系统工程在内的14个系统工程。(27)受钱学森的影响，吴世宦发表了《建立我国法治系统工程学浅议》一文，围绕数学表达式、数据表格或网络图形、语言方式模型，呼吁建立评价法治状况好坏的法治模型评估和法治系统工程学。吴世宦认为，法治系统工程，需要用模型和最优化解决。他提出一个可科学表达法治状态和法治状况的数学模型，因为这有利于研究思考问题、集体讨论协调、应用计算机、定性定量分析、建立通常方法。他认为法治系统工程主要是对法治问题作出治乱预测、系统分析、方案评比、政策评价，并给出符合法律制度方案的最优决策。(28)

针对吴世宦的论文，钱学森指出“系统工程如同土木工程一样，是直接改造客观世界的，是技术工作，不是什么‘学’；围绕有关法治的模型建构问题似乎是个社会学的问题”。(29)钱学森的评价和建议对吴世宦有所触动，他们相互折中，随后合作发文，号召使用电脑和系统工程的方法，建立社会主义法治系统工程。(30)他们强调使用电子计算机和系统工程的方法，应用电脑办理案件、检索和检查典型案例、建立犯罪治理工程和法律咨询中心，对法律进行纵向和横向系统性分类。(31)

紧接着，钱学森将他和吴世宦的文章总结为6条，包括：建立法制信息库，把资料、法律、法规、规定、案例等存入库里；将信息库用于法制工作中，检索资料、情报、档案，以提高律师工作效率；运用普遍正在搞的人工智能、知识工程和专家系统技术；利用计算机建立系统识别技术，识别办案线索，理出真实案情；利用计算机检索法律，识别出法律漏洞，建立完善周密的法制系统；建立法制和法治系统和体系，但需要做具体工作。(32)

自此以后，广大研究者不仅深入讨论法治系统工程，(33)而且从方法论阐述系统科学或系统论对法学研究的意义。(34)夏勇和熊继宁等尤其在谈论系统科学方法引入法学领域时，分别肯定了以中国法学现状、实践第一、以经验材料为基础的“三论”科学思维。(35)韩修山在讨论信息论、系统论、控制论对法学研究的影响时就提出，“科学研究的内容由对事物及其运动规律的定性分析转入定量分析，日趋数学化、精确化”，并论述法学不能脱离“三论”。(36)自此以后，法学研究方法的讨论如火如荼，但基本没有偏离钱学森和吴世宦所设计的框架。

吴世宦的专著《论法治系统工程学》虽然并没有给出法治系统工程学概念，只定义了如系统、工程、系统工程等相关的内容，但阐明了系统工程事实上就是应用定量研究，“从应用的角度来说，系统工程实际上就是定量化系统思想方法的实际应用”。(37)由此可见，“法治系统工程学”有着明显的实证意蕴。

(二)作为独立学科的数量法学与电脑法学

1985年4月26-28日，中国政法大学法治系统科学研究会与中山大学法治系统工程研究会联合发起的全国首次法制系统科学讨论会举行。这次会议是“把以系统论、控制论、信息论为代表的现代科学成果引进法学研究和法制建设领域的初步尝试”。(38)钱学森受邀参加此次会议。钱学森基于数学方程、数学模型、电子计算机模拟建立法学系统工程的理论，提出需要“把法学这门学问现代化”的宏伟设想。在具体路径上，他明确指出“要用电子计算机，就是要定量”。他给这门现代化的学问命名为“数量法学”，具体依据是数量经济学和中国社会科学院已经成立的数量经济研究所。钱学森认为会议只是开端，请司法部部长邹瑜“下决心建立个研究单位”，因为“需要一支强大的队伍”。(39)

从知网文献来看，作为方法的法学领域的实证研究，最初以钱学森命名的“数量法学”而提出。这个概念从提出时就具有法学学科下的二级学科概念。宋健明确把社会科学的定量研究方法纳入其中。(40)与此同时，吴世宦对以计算机为核心的法治系统工程学也有不同理解。他提出“电脑法学”概念，认为其“以研究电脑与法律的相互关系为对象，是运用系统科学思想研究电脑在法学领域的应用保护和发展的原理和方法，探索法治最优化途径的科学”。(41)后来，吴世宦等提出了利用数学模型建构法治系统和系统工程，包含法律规范、行为和心理控制、经济法、青少年犯罪治理、量刑、森林经营系统、整治“不正之风”系统工程的模型和模拟。(42)

(三)计算机主导下的数学方法和定量分析

二十世纪八十年代所讨论的系统科学思想，以计算机、数学和定量分析为基础，为整个法学界和法学家展现了全新的视角。较早参与计算机法律话题讨论的，应该是龚瑞祥和李克强。他们在1983年发表的《法律工作的计算机化》一文中详细介绍了西德、苏联、美国运用计算机处理法律工作和资料的方方面面，充分肯定未来的世界里计算机将参与到每个工作环节，强调计算机的定量分析重要性。“现代社会和科学的发展，还要求进行定量分析，要求有系统的观念，用复杂的系统来如实地反映复杂的系统。”计算机引入法学研究后，才可以对各种复杂因素展开定量研究和系统分析，因为“法律现象作为一种社会现象十分复杂，数据庞大，随机因素很多”。(43)他们将这种变革，称为“法律科学方法论的革命”和“社会科学化”“法律工作计算机化”的新纪元。(44)应当承认，龚瑞祥和李克强的这篇论文受到钱学森和吴世宦的影响，文中不仅多次提到控制论、信息论、系统论、法学控制论、法治系统工程学等内容，而且高度肯定计算机参与整合、运算、处理资料和情报等法律工作。

值得注意的是，这一时期各种法学研究方法的讨论呈现出两个特点。一是基本每篇关于方法论的文章，都要提到数学和计算机运算的影响。(45)二是一些文章专门探讨数学方法在法学研究中的影响和运用。(46)计算机技术所承载的定量分析或定量研究，几乎是所有方法论文章反复论述的内容。

沈志坤总结了二十世纪八十年代法学研究的十大新趋势，其中有三点与实证研究显著关联。第一，多学科的综合研究，将数学和预测等新自然科学技术嵌入到“纯法学”中，法学与经济学、社会学等人文社会学科结合。第二，开始注重定量研究。第三，研究手段的更新，主要表现在信息化收集、处理和法学研究从个体走向集体研究。(47)孙国华也总结出二十世纪八十年代法学研究的四个新趋势，每个都事实上有实证研究的味道。一是社会学化，即把法律现象作为社会现象对待，运用社会学方法来研究；二是数学化、科学化，即把数学方法、现代一般科学方法引入法学研究，采用包括计算机在内的储存和处理资料的手段实现数量和定量分析；三是多方面性和综合化，即对法律现象进行多方面综合研究；四是大科学化，从个人朝集体合作研究发展，吸收经济学家、社会学家、心理学家、数学家、统计学家和其他专家。(48)

钱学森的系统工程和吴世宦的法治系统工程(学)，启迪着改革开放后的一批批法学家。正如熊继宁在缅怀钱学森的讲座中总结的那样，“现在看来，钱老的学术思想，仍具有相当的超前性，我们至今仍在为实现他当时的设想而努力”。(49)也正因为“超前性”，钱学森提出的数量法学和吴世宦的法治系统工程学、电脑法学，在那个刚刚恢复学术生机的年代并没有成形为学科。(50)但这场借助自然科学和数学的方法论探讨，对跨学科、跨专业的交叉研究影响深远，尤其是以数学或定量研究和分析为核心的方法更是广泛衍生到具体的法学科目，甚至提升到法学教育和法制建设的历程中。(51)

(四)实证法学与相关衍生概念

尽管早期的方法论讨论并未用“实证法学”名称，但实证思想经过系统论或系统科学的讨论后，各种文章逐渐加入“实证”或“实证研究”两个词。(52)熊继宁在谈到法学理论的危机时，指出“原始社会有没有法律，并不是靠纯粹思辨所能解决的，它必须借助于实证研究的成果才能说明。但是到目前为止，几乎没有人进行实证的研究”。(53)季卫东和齐海滨对声势浩荡的系统论方法首先提出质疑，转而将实证和实证研究全面纳入法学研究方法，并首次给实证研究做了定义。他们讨论的内容异常丰富，其文章以“实证”为高频词，分别提到7次“实证研究”、10次“实证主义”和6次“实证主义法学”，应该算是法学领域实证研究的里程碑式开端。(54)

经过长达多年的讨论，或许受季卫东和齐海滨论述的影响，葛洪义在他的文章《实证法学和价值法学的协调与我国法学研究》中正式命名了“实证法学”。他明确提出了作为与规范或价值法学相对应的“实证法学”概念，并定义或强调“实证法学侧重于用科学分析和逻辑推理的方法研究现实中的法律、法律规范和法律制度”。(55)

由此以观，当代中国法学研究朝科学化道路的迈进史，事实上就是一部实证研究的发展史。作为方法的实证研究，其开端必然与数学、计算机、系统工程、定量分析四个科学命题不可分割。以实际、实践、实证为核心的法学科学化思维并非偶然，既有来自外部学科的影响，也有法学家从法社会学或从法理学角度的内部呼唤。作为学科概念的实证法学，除了数量法学和电脑法学外，还涌现出以实证研究和定量分析为核心的诸多概念，如科技法学、计量法学、系统法学、综合法学、信息法学、司法统计学、法律计量学、计量法律学。(56)虽然这些概念本来同出一脉，但名称却比较混乱。徐永康充分注意到这个现象，并明确指出这是由于观察角度不同，在引进外国的概念翻译过程中因使用习惯和学科用语而出现差异。(57)这一评价颇为中肯，用在过去几十年都一点不为过。

三、传统概念体系下的相关术语

(一)实证法学

实证法学，也有用作“法学实证”。关于实证法学，如前述，葛洪义早在1987年就提出这个概念。(58)但是，这个概念在更早时候作为“实证主义法学”的简称。(59)可能也正因为如此，季卫东和齐海滨才在文章中广泛讨论法律实证主义、实证主义法学和实证研究。(60)直到今天，仍有成果用“实证法学”讨论其在法理学方面的方法价值，(61)尤其是大量使用“分析实证法学”。(62)文献中能够查阅到的用“实证法学”作为标题的文章，主要是某主题的具体研究或在注释法层面运用。(63)值得注意的是，澳门大学法学院最近成立了以刘建宏为主导的“实证法学研究中心”，并在澳门政府的支持下成立“实证法学中心实验室”，旨在将人工智慧和大数据处理技术融入法学研究方法论中。(64)

熊秉元和叶斌在探讨法律经济学、法律和经济时，认为“实证法学是由实证、而非规范的角度，构建法学理论，采取的方法论是‘先了解社会，再了解法律’”。(65)但他们没有具体讨论“实证法学”概念。张永健和程金华在论证法律实证研究的概念和外延时，认为法律实证研究包含实证社会科学和实证法学，两者差异表现为英文和中文学术的差异。他们笔下的实证法学是“仅对法律进行实证分析”“仅对法律现象做实证分析”“一种是不应用社会科学范式，但运用资料对法进行实然分析”“只研究法律相关的事实问题”“与法学以外的问题或者知识并没有直接的关联”。(66)笔者在早期讲座中，从研究层面使用和解读了“实证法学”概念，“一切致力探索事实真相、证明或解读法律运作机制等研究，都是实证法学研究，具体包括访谈、问卷调查、案例分析、大数据研究等”。(67)最近，由丁文睿翻译的论文再次使用了“实证法学”概念，但文中也同时使用“实证法律研究”。(68)

《法学研究》在2013年第6期刊发左卫民和黄辉讨论“法学实证研究”的两篇文章，但均未涉及术语概念，而是直接围绕实证研究展开讨论。(69)尽管如此，左卫民认为“实证研究则是在社科法学的基础上，强调基于实证数据来真实、准确、全面地把握某种法律现象，并在此基础上或进行深度阐释，或提出法律改革建议”，并要从“‘前统计法学’提升到‘计量法学’”，作根本性提升。(70)徐文鸣在论述“法学实证研究”的概念时，借助文献指出“法学实证研究是一种归纳推理的方法，从广义上看包括任何系统地收集、整理和分析信息(数据)的研究”，并在区分定性和定量基础上，提出定量分析是狭义的法学实证研究，“强调遵守统计学的基本原则和程序，收集、处理和分析大样本数据”。(71)

(二)实证主义法学

实证主义法学，早期在法理学内部作为方法讨论，故也常用成“法学实证主义”或“法律实证主义”。究其本质来说，实证主义法学仍然是实证研究。如前述，季卫东和齐海滨围绕实证主义法学、法律实证主义和马克思主义的实践法学来讨论实证研究。(72)刘同苏在系统论述法理学上的学派概念时，用“法律实证主义”来区分自然法学派，他将法理学从哲学独立成为自成学派的学科归功于法律实证主义，尤其是将奥斯汀称为法律实证主义的第一代大师，将其功绩定位为描述了“法理学的对象是实在法”，将凯尔逊的贡献总结为表达了“只有实在法，才是纯粹法学的对象”。(73)刘同苏所运用的“法律实证主义”虽然限定于法理学贡献，但其论述法律实证主义的实证方法时又用“实证就是现实的验证，就是客观试验”。(74)

新近几年有学者将实证主义法学作为实证法学家所理解的“实证研究”来讨论。何柏生从数学角度论证实证主义法学时，就认为“实证主义法学是一种描述性的法学理论，重视逻辑分析方法和量化分析方法，摒弃法的价值，将法学的研究对象限定于实在法领域”。他认为法学要想科学化就必须数学化，因为“法学问题的不断定量化才是法学不断走向科学化的关键”。(75)虽然何柏生的论述反映出定性和定量研究的双举，但他所定义的实证主义法学更多是定量研究。

事实上，法理学界所用的法学实证主义、实证主义法学、法律实证主义，应是对英文positive law或者legal positivism的翻译有些问题，更准确的含义可能是存在法、实在法、成文法，或法律存在/实在/成文主义。其本来的含义，大概是法律是“制定(laid down and set firmly)”或“存在(exists)”的法。(76)然而，实证主义法学从实证角度理解法，反而在概念上阴差阳错地走向了以实证为核心的实证法学道路。其最大的贡献，是讲明了法律的起源和法律的内容。例如，博登海默在评价奥斯汀的positive law时指出，“奥斯汀希望将普通法排除在成文法之外，因为普通法并不能归结为是君主的命令”。(77)也正因为这样，奥斯汀所提出的法律是由一个君主(或他的代理人)所发出的命令，才招致legal positivism是关于“独裁”的批评。(78)

虽然从白建军的论述来看，实证分析和实证主义哲学完全是两回事开始，后续研究在论述相关概念时也都标榜实证研究与实证主义法学有区别，(79)然而，诚如季卫东和齐海滨所描述的一样，实证主义法学只不过是实证研究的早期形态，只是表达不同而已。尽管翻译词汇在汉语中已经形成习惯，并且一时半会改不了，但这却歪打正着地肯定了法学研究的实证精神，实证主义法学从一个纯法理学问题上升到各种法律部门的实证研究。因此，实证主义法学的本质依然是以实证为核心的法律或法学研究流派，故本文将其放在传统概念中讨论。

(三)法律实证

法律实证，也有用作“实证法律”，在早期作为“法律实证主义”的法理学派出现。作为耳熟能详的实证研究的方法术语，多以研究或分析作后缀，但2000年后才差不多得以大量使用。

从知网检索情况来看，白建军在区分“实证”和“实证主义”关系后，较早地提出法律实证分析概念，将其作为一种分析和研究方法呈现。(80)在后来的专著中，白建军将“法律实证分析”提高到“法律实证研究”。(81)他注意到法律实证分析与“实证主义法学”或“实证主义哲学”的联系，“都强调感觉、经验、客观观察在认识活动中的重要性”，(82)但他的贡献是明确将“法律实证分析”区别于法理学上的“实证主义法学”。他认为，实证主义是对世界理论认识的哲学思想，是从事科学研究活动的成果；实证分析是研究方法、认识工具，是获得理论认识所凭借的工具；实证分析不同于实证主义哲学，法律实证分析也不等于实证主义法学；法律实证分析只是法学研究的一种具体方法，不是一种独立的法哲学或法理学理论。据此，他认为，“所谓法律实证分析，是指按照一定程序规范对一切可进行标准化处理的法律信息进行经验研究、量化分析的研究方法。也可以说，法律实证分析就是其他学科中实证分析方法向法律研究的移植，借助实证分析方法改造法学传统研究模式的一种方式”。(83)值得注意的是，白建军笔下的法律实证分析包含定量和定性的研究，绝不能误认为他的法律实证只包含定量研究。他关于法律实证分析的三个要素，两个分别指向了“经验”和“量化”。

左卫民认为，法律实证研究，“本质上是一种以数据分析为中心的经验性法学研究。详言之，就是以法律实践的经验现象作为关注点，通过收集、整理分析和运用数据，特别是尝试应用统计学的方法进行相关研究的范式”。(84)他笔下的法律实证研究定位在“以数据分析为中心的”定量研究中，并明确肯定“可以认为是一种‘定量法学’”。(85)在进一步解读后，他认为“法律实证研究是一种法学研究范式，其研究对象和研究方法与具有‘血缘关系’的经验研究存在较大差异”。(86)

张永健、程金华从简单和复杂的二维层面，论述法律实证研究涵盖“法律+X”的实证社会科学和只对法律作实证分析的实证法学。他们将法律实证研究定义为“研究和‘法’有关的各种事实”“只要应用资料的(定性或者定量)方法去分析法律”。(87)他们将法理解为广义的“法”，包含立法者制定的法律，行政机关制定的规章，法院的判决，社会规范，以及与法有关的人，并认为法律实证研究包括定性研究和定量研究两种范式。(88)总体来看，他们试图用“应用资料”来整合社科法学和定量实证研究的道路值得肯定，但使用的各种概念内涵不明，甚至因交叉使用而疑问处不少。

陈柏峰将法律实证研究限定为“对法律问题的定量实证分析”。他明确指出，“法律实证研究以法律规范为参照，通过逻辑演绎来说明变量之间的规律关系，通过中立观察所获取的数据来验证理论假设，用数据统计方法分析法律现象中的数量关系”。不仅如此，他还将法律实证研究限定在“大样本”中，“法律实证研究强调针对研究对象收集较大范围内的样本和数据，根据大样本数据的分析得出结论，阐述因果系”。(89)

(四)社科法学

学界公认苏力是“社科法学”术语的提出者。苏力所表述的社科法学，仍是实证研究的一种称谓而已，而且更多是参考法律经济学、数学、统计学论证定量研究。其文章《也许正在发生——中国当代法学发展的一个概览》出现“社科法学”13次，“实证研究”出现6次，“经济学”出现4次，“数学”和“统计学”各出现2次以上。例如，苏力指出，越来越多的学校和课程讲授分析论证的方法，数学公式普遍进入教室，更多学者注重当代社会科学的实证研究传统。他把这种现象总结为社科法学派，最大的共同特点是，从法律话语与社会实践联系起来考察其实践效果，侧重于用实证研究去发现因果关系，发现法律实践的制度条件。(90)

然而，这篇文章虽然反复论述社科法学，但苏力并未给“社科法学”下定义。可能也正是因为如此，社科法学才长期被误解为是只注重或侧重于田野调查的质性研究。从苏力描述的内容来看，他将社科法学和实证研究的框架勾勒了出来，尤其是肯定运用统计学对社科法学的重要性。他在肯定实证研究的贡献时，呼吁社科法学学者关注现实、注重实证研究，以此作出理论贡献。他基于“更多的专业化的实证研究成果的出现以及它们的方便获得”，对社科法学持乐观态度。他在谈到以统计学和定量实证研究的学术市场转变时，提出“我们的法律正处在一个向将由比喻意义上的统计学家和经济学家主宰的过渡期”，这更有利于社科法学的变化。(91)

在《法律与社会科学》创刊序里，苏力谈到中国法学界的转变时，再次提出必须以定量实证为中心大力度地促进社会科学的发展，“就是要实证……但更要注意现代社会科学的研究方法，包括统计分析和博弈论”。(92)这些内容再次高度反映出，苏力笔下的“社科法学”应当是定性和定量相结合的实证研究，而且更加注重定量研究。在苏力看来，只有“知识的转变和社会科学的兴起也才可能参与真正的世界性的学术竞争”，这是中国文明重新崛起的需要和必然。为此，苏力表达了《法律与社会科学》的宗旨和目的：“努力推动法学的经验研究和实证研究，推动法学与其他诸多社会科学的交叉学科研究。”(93)

自苏力提出社科法学后，有学者尝试解读社科法学。例如，王夏昊解读为“社科法学是指以其他社会科学的方法研究法律的学科的总称”。(94)更具体的内容，大概在2014年前后，才由苏力或侯猛单独或一起总结完成。在《法律与社会科学》2014年8月出版的年刊上，徐涤宇在“什么是社科法学”的框架下，提出“首先要讨论的问题是社科法学到底是什么”。侯猛据此正式解读出与王夏昊相似的社科法学的概念，但他的贡献是放在英语世界来理解这个含义。侯猛认为，英文概念“Law and Social Science”或“Social Science of Law”比中文概念“一种跨学科的研究或者说跨学科的知识，即法学和其他学科的知识”更加清晰。侯猛进一步强调社科法学的基调是跨学科，并因为对法学的交叉科学、跨学科法律研究、法律和社会科学、跨法学等总感觉不行，以及鉴于更简洁和上口的表达、更容易和法教义学对话等原因，“直接称为社科法学”。(95)

苏力在2014年9月发表的文章中，将社科法学全面解读为，“是针对一切与法律有关的现象和问题的研究，既包括法律制度研究、立法和立法效果研究，也包括法教义学关注的法律适用和解释，主张运用一切有解释力且简明的经验研究方法”。(96)与侯猛从知识层面来解读不同，苏力这次从“研究”和“经验研究”层面来解读，而且还包含他早期区分的“诠释法学”或“法教义学”。虽然苏力的概念中仍然有“一切有解释力且简明的研究方法”，文中也明确表态“社科法学强调并注重经验和实证研究”，(97)但这篇文章中没有再用“统计学”或“统计分析”等定量研究的相关词汇。在界定是否为社会科学研究上，苏力提出“社科法学的研究不应当仅仅以学者的学科出身来界定，而应当以其研究法律问题的思路和方法来界定”。(98)但学科出身和思路、方法到底是什么，经验研究和实证研究到底又是什么，苏力没有回答，这或许暗示苏力也已开始改变早期的社科法学含义，并支持法律经验研究与法律实证研究分离。

就在苏力发文的同一期学刊中，侯猛再次提出“社科法学的英文名称是Social Science of Law。中文直译‘法律的社会科学’，只是简称社科法学而已”。在文章里，他虽然澄清社科法学不能被“误认为是法学的分支学科”，(99)但在评论“不再是法社会学”时，事实上又在强调“法社会学转向社科法学”。(100)或许，从某种意义上来讲，社科法学可能正是法社会学的代名词。这样的论断从季卫东将社科法学和法社会学交替使用，也可看出其端倪。(101)6年以后，侯猛发文再次更新了社科法学的定义，将其描述为“法社科研究，全称是法律的社会科学研究(social sciences of law)，又简称社科法学，是指运用社会科学的知识和方法来研究法律问题”。(102)与之前社科法学处在云雾里不同，侯猛这时的解释应该才是最清晰和直接的。在文章中，侯猛提出了与苏力早先所强调的社科法学的进路有相似之处，但社科研究与实证研究的关系上却有所不同。一方面，他肯定社科法学和实证研究的通用含义，因为社会科学方法也包括定性和定量。另一方面，他又注意到运用自然科学进行法律实证研究不能与法社科研究等同，同时并非所有法社科研究都可称为实证研究，尤其是冯象和苏力的法律与文学作品不再是实证研究。(103)

总体来看，社科法学的定位、内涵、外延在不断变化中，但各种讨论都充分肯定社科法学与实证研究的相同或相似性。侯猛早期事实上一直在强调实证，不仅在文章标题中同时使用社科法学和实证，而且在内容上也肯定实证研究。同时，他在论述社科法学的优势时指出，“实证研究，也是社科法学相较于诠释法学的比较优势”，尤以“建设实证的社科法学传统”更明。侯猛虽然认为实证研究并不等于定量研究，更不能轻视定性研究，但也指出“实证研究的一个基本趋势是定量化”。(104)这些都反映了他对实证研究和定量研究的高度肯定。只不过，自从法律实证研究与法律经验研究分野以后，侯猛才呼吁分别使用定性研究或定量研究，实证研究或经验研究，同时将社科法学区别于法律实证研究。(105)

(五)计量法学

计量法学，顾名思义是因计量方法而产生，注重对数量关系变化的法律现象进行研究。后来，这个概念被作为交叉学科上的概念而提出。就中国大陆而言，虽然何勤华较早提出“计量法律学”概念，但“计量法学”这个术语是由屈茂辉领衔的团队创办的“数理—计量法学”研究中心、论坛一步一步发展壮大。经过10多年的“计量法学”用语以后，大概在2022年的第八届数理—计量法学论坛上被更名为“数量法学”。笔者特在2023年年会上请教为何要改名为“数量法学”。屈茂辉及其团队的解释是，“我们最初所称的‘计量法学’乃特别强调对具有数量关系的法律现象进行研究，乃借鉴计量经济学而来，但容易误解为是专门研究《中华人民共和国计量法》的法学分支。为了简便同时也为避免误解，就改为‘数量法学’”。这个名称的转变，恰好与钱学森提出的“数量法学”不谋而合。这也再次说明，本文将当代法学领域的实证研究的源头定位在二十世纪八十年代具有合理性。纵观“计量法学”或“数量法学”的轨迹，二者一直放在法学实证研究中，专门探讨定量实证研究的方法和学科概念。这可以从2022年年会会议综述的表达看出，(106)只不过不像前几届在主标题中加入法学实证研究。(107)

大概在2008年，屈茂辉获得资助，主持湖南省软科学“法学中的数理计量方法及其运用研究”项目，开始研究计量方法在法学中的运用。为此，屈茂辉在和学生合作的论文中，阐明“计量方法在法学研究中的运用，是指以一定的法学理论和统计资料为基础，综合运用数学、统计学与计算机技术，以建立数学模型为主要手段，研究具有数量关系的法律现象”。(108)在谈到法学研究中为什么需要计量方法时，他们认为这是法学研究对象的全面把握要求，是法律规则制定、适用、评价的科学化要求，是中国法学研究的国际化要求。难能可贵的是，他们在谈到法学计量方法与实证分析的关系时，明确提出计量方法“是实证分析研究范式下的较为普遍的方法或者一般方法，两者是种属关系”。(109)更为重要的是，他们将定性与定量分析作为增加法学科学性的共同路径，随着实证分析的地位提高，法学研究从描述性的定性分析层面走向定性和定量分析相结合的新层面。(110)然而，在论证计量方法初见端倪时，他们将源头定位在白建军的实证研究和李晓明的数学量刑、社科法学中的实证研究、刘复瑞的数量法学上。这也就说明，他们没有注意到计量方法就是钱学森和吴世宦笔下的系统工程、法治系统工程、数量法学所表达的定量研究方法。

2010年，屈茂辉和张杰首次阐述“计量法学”概念。与先前只谈方法不同，这次是从学科和方法两方面阐述。他们所描述的计量法学，无论是从方法还是学科上都没有离开“法学实证”这一关键词。在研究方法层面上，计量法学和传统法学不一样，“主要运用定量的研究方法并结合传统法学的研究方法进行法学研究”。在学科层面，计量法学“是一门研究具有数量变化关系的法现象的法学学科，它有其独立的研究对象和特殊的研究价值”或“是通过以一定的法学理论和统计资料为基础，综合运用数学、统计学与计算机技术，以建立数学模型为主要手段，来研究具有数量关系的法律现象的学科”。(111)

2012年，在计量方法方面，屈茂辉更加明确地在定量层面强调实证研究方法。所谓计量法学方法，是“实证研究中通过对研究对象的观察、实验和调查会产生大量数据，必须对这些数据进行统计分析，探寻各个影响变量之间复杂的因果联系”。(112)与此同时，他明确提出这种计量法学方法就是必须使用的定量方法，并阐述了计量法学、计量研究、实证研究对民法学研究的重要性。(113)同年，屈茂辉再次在学科概念下，围绕“大样本”全面阐述计量法学。“计量法学是指通过收集大样本数据，对具有数量变化关系的法律现象进行运用定量研究的交叉学科。它是一门独立的学科，其研究对象是具有数量变化关系的法律现象，研究方法是实证方法和计量方法。”屈茂辉阐明，计量法学的英文渊源是Lee Loevinger于1949年发表的Jurimetrics：the Next Step Forward。(114)在学科价值方面，他认为计量法学是对当前法学研究方法的创新，使中国法学向精细化方向发展，是实现中国法学的国际化途径之一。在实践价值方面，他总结了计量法学的三个贡献：计量法学自身确定的客观标准可作为社会控制和监督的工具，运用系统、实证的观测对其他权力的运用方式来实现；计量法学通过得到控制和监督结果进而反思效用，用定量方法来对政府政策绩效进行评估；计量法学改变了传统方式的法学体系，注重引入统计、计量和社会效果的预测评估方法，更加强调法学的定量、实证和技术性。据此，屈茂辉认为，计量法学是颠覆了传统法学的研究方法。(115)

2014年，屈茂辉与匡凯发文讨论计量法学的学科发展史。从实证研究的影响出发，基于定量研究的缺乏，论述计量法学在二十世纪八十年代以前、八九十年代、九十年代以后的三个阶段的作用。他们基于实证研究，特别是定量方法还处于推广阶段，提出了中国未来的法学定量研究的三个着力点：在立法预测和立法后评价方面发挥突出作用；引入判决预测和数据论证两个司法运用领域并产生积极效果；定量研究应在建立数据库上加大投入力度。(116)与之前的文章讨论相比，这篇文章虽然在讨论计量法学，但无论是标题，还是整篇文章，都在表达定量实证研究。从屈茂辉近几年的文章标题用语可以明确发现，他虽然是计量法学的提出者，但事实上一直是法学实证研究的坚定守护者。(117)

四、近十年出现的新兴术语概念

(一)计算法学

目前很难精确定位谁是“计算法学”的提出者，张妮和蒲亦非(以下简称“张蒲”，合著《计算法学导论》)应该是这一概念的首倡者。自从这个概念提出以后，广大学者反复在方法和学科层面讨论。然而，张蒲二人所提出的计算法学本身就是实证研究的代名词，更进一步讲就是基于数量法学、计量法学、量化法学而衍生出来的定量研究。

关于计算法学的概念，张蒲指出“计算法学是以具有数量变化关系的法律现象作为研究的出发点，采用统计学、现代数学、计算智能等技术方法对相关数据进行研究，旨在通过实证研究评估司法的实际效果、反思法律规范立法的合理性，探究法律规范与经济社会的内在关系”。(118)很明显，这里的计算法学，事实上是定量分析、实证研究，或数量法学、计量法学的定量研究。关于定量研究，张蒲在论述计算法学与传统研究差异时明确指出，“计算法学则主要运用定量的研究方法进行法学研究”。(119)关于数量法学的源头，二人仍定位在刘瑞复的文章。关于计量法学，二人提到屈茂辉在2009年发表的《论计量方法在法学研究中的运用》以及洛文杰的《计量法学：展望新纪元》两篇文章。在比较数量法学和计量法学的不同称谓后，张蒲认为“计算有从现有数量推断、预测出未知的意思”，“计算法学与计算机和计算智能联系在一起，以建立计算机网络、大型数据库、强大计算功能为背景”。他们的计算法学只是强调了计算机的作用，将计算机技术应用于法律现象的模拟研究，用计算机建立立法、司法模型，甚至用计算机进行多主体模拟效果。(120)

张蒲在《计算法学导论》第一章从数量变化、应用法学、量化分析、分析手段的数学和实证、数量关系等五个层面全面阐述了计算法学的含义。(121)然而，这本书余下几章的内容，差不多就是张妮先前量刑失衡和精神损害赔偿的定量实证研究学位或期刊论文。(122)因此，就张蒲而言，计算法学本质上就是定量研究和实证研究，计算法学只是术语差异而已。这从张妮在序言中提到的对白建军和屈茂辉的感激，可以明确找到证据。(123)

2019年，张妮在和徐静村合作的论文中，更新了她先前的定义，把人工智能嵌入进来。更新的定义在肯定研究层面的含义时，更多从人工智能和大数据挖掘方面强调计算分析的能力。他们认为，计算法学与计量法学、法信息学、计算法律学等概念相关，并将计算法学定义为“是随着人工智能在法学中深入应用而产生的一门交叉学科，使用建模、模拟等计算方法来分析法律关系，让法律信息从传统分析转为实时应答的信息化、智能化体系，旨在发现法律系统的运行规律”。(124)与此同时，他们把计算法学作为学科概念提出，“计算法学是法学与计算机科学、现代统计学的交叉学科，基于现代人工智能技术和大数据挖掘技术，属于法学的研究分支”。(125)在张妮与蒲亦非的其他论文中，计算法学作为“新兴学科交叉分支”被明确复述。(126)

计算法学在近年朝研究方法和学科概念两个方向发展，但都没有离开实证研究。第一个方向是，将计算法学作为一种研究方法，强调挖掘和处理大数据或海量判决书方面的能力。(127)但无论是否和如何用大数据和人工智能修饰计算机应用技术，基本都在论证实证研究或定量研究的能力和价值。例如，肖金明、方琨就将计算法学定位在法律实证研究中，“立于计算法学作为大数据时代学科演化之果的准确定位，基于法律实证研究的法学范式变革的明确定向”。(128)申卫星、刘云在梳理计算法学与数量法学的概念上，基于从法律计量学、法律信息学走向计算法学的基本思路，提出计算法学是“利用计算工具探索法律问题的实证分析，是指变传统的规范法学研究为以事实和数据为基础的实证研究，特别是在大数据时代，利用大数据挖掘技术对传统法律问题进行实证分析将成为探究法律问题的新方向”。(129)申卫星关于计算法学本身就是实证研究或实证分析的观点或论述，得到广泛支持。(130)

第二个方向是，将计算法学作为学科定义，并植入数学计算、计量法学、司法统计、数量分析、实证研究等元素。由季卫东领衔创立的中国计算机学会计算法学分会，将计算法学定义为“计算法学包括对于借助计算机科学和技术为手段开展的任何法学研究，其中包括利用司法统计资料进行判决分析和预测的计量法律学”。值得注意，这里的计量法律学，也就是何勤华所使用的概念。在进一步阐述计算能力时，他们还明确使用了“法律实证研究”。在描述其为法治中国搭建一个真正跨业界、国界、生态圈协作的开放性大平台业务方向时，第一个例子便是“基于中国大数据优势的预测式侦查和警务以及电子证据，同时开展关于判决预测和法律文书自动生成的实证研究”。(131)与此同时，季卫东在讨论计算法学的疆域时，也提出“计算法学的基本架构应该具备四个不可或缺、相辅相成的维度，即计量法律、自动推理、数据算法、网络代码”，并运用司法统计、大数据法学、数量分析方法等表现出实证法学的定量含义。他不仅引用张妮和蒲亦非的教材说明计算法学还处于初级阶段，还在讨论计量法律时提到数量分析方法和计量法律学。(132)因此，季卫东所描述的计算法学仍基于实证法学和实证研究而展开，这恰好又回到了他之前的理念和起点上。唯一的问题是，计算法学能否、何时朝他笔下三个维度展开，以及当前是否就具备了他质疑系统法学方法论时所提出的现实基础和条件。

在最近的研究中，刘建宏和余频也探讨了计算法学的相关问题。虽然他们得出的结论与先前的讨论相似，但他们的立场恰恰是回归“实证法学”。一方面，他们认为“计算法学代表了经验研究的2.0阶段，以数据为主导”，只是在“数据处理量级和数据处理效能上都有显著进展”而已。另一方面，他们虽然高度肯定“计算法学在方法论上强调数据主导和计算工具的应用”，但仍然指出“不能脱离传统法学研究的本体”。(133)从研究中心和实验室的名字叫“实证法学研究中心”和“实证法学中心实验室”可以看出，回归“实证法学”似乎是实证法学家们的初衷和用意。

(二)法律经验研究

“法律经验研究”概念首先由陈柏峰于2016年提出并在过去几年里反复被使用。(134)法律经验研究也被描述为“法律的经验研究”，(135)或“经验地研究法律”。(136)虽然这三个术语之间内部有差异，但都以“经验”为核心，都从强调经验研究和实证研究说起。例如，陈柏峰早期在论证法律实证研究与经验的关系时，将对西方理论背后的经验缺乏足够的认识和警醒、过于相信个人生活或调研个案的直接经验、对间接经验缺乏反思，直接总结为法律实证研究的经验偏差，并阐明经验是法律实证研究的一部分。(137)

从陈柏峰近乎完美的跨界学术历程来看，他事实上是从规范法学或法律社会学开始，(138)将法律社会学的经验研究嵌入社会学的田野调查。(139)他本人也是从注重田野和经验的实证研究者，(140)再到基于田野的社会学或社科法学的坚定支持者，(141)最后才提出自己独立的法律经验研究体系。(142)纵观陈柏峰的学术出版物，无论他是否明确以实证贯注他的研究，无论他用了什么概念和术语，基本上可以将其描述为基于乡村田野调查、经验观察的法学、社会学或法社会学实证研究者。一方面，陈柏峰出版物履历可以给出答案，他的研究基本上是围绕村镇、农村、乡村、基层、“混混”、农民而展开经验或田野调查。另一方面，陈柏峰自己也坦诚，“自2005年进入乡村研究领域以来，笔者坚持走经验研究的路线，坚持田野的灵感、野性的思维、直白的文风，关注了乡村司法、农地制度、农民自杀、村庄性质等多方面的问题”。(143)他也曾坦诚“基于实证分析结论，本文提出了保护贫弱农户地权的政策建议”。(144)只不过，他的法社会学、社会学、社科法学的实证研究道路并非沿着定量方法发展，而是朝定性方向走。关于这个结论，陈柏峰的名著《乡村江湖：两湖平原“混混”研究》封底的内容简介“对当前乡村社会性质变迁作定性理解”便是最好的证据。(145)

在2016年以前，陈柏峰虽然在各种研究中倾注对田野研究和经验研究的热情，但并没有独立地提出特有称谓。为了展示陈柏峰的法律经验研究的创新性，《法商研究》直接开设“法学新视野”专题讨论其“法律经验研究”。陈柏峰认为“法律经验研究的任务，是对法律现象作出质性判断，分析法律现象或要素之间的关联和作用机制”，“在法律经验研究中，田野工作至关重要，它是问题意识的来源，也是机制分析的场域”。(146)后来的笔谈中，他清晰阐明法律经验研究就是在苏力的社科法学上发展出的成熟方法论，田野调查是获取经验的最主要渠道。(147)然而，如前述，苏力提出社科法学时质性和经验研究论述不多，反而是实证和定量元素更多。

在后来的研究中，陈柏峰继续将法律经验研究总结为注重田野调查的质性研究，以此区别于他所定义的注重定量研究的法律实证研究。他指出，“将对法律问题的定量实证分析称为法律实证研究，将对法律问题田野调查基础上的质性研究称为法律经验研究，后者特别强调对研究对象的质性把握，强调研究者的经验质感”。(148)然而，在如此看重二者区别的同时，他又用英文“Empirical Legal Research”将法律实证研究和法律经验研究放在一个概念之下。(149)陈柏峰没有深入论证为何法律实证研究可以和定量研究画等号，但程金华的评述或许给予了他启发，“因为定义不同，学者们对于法律实证研究同‘社科法学’和‘法社会学’的关系认定也不一样。美国学者通常把‘Empirical Legal Studies’等同为定量研究……中国也有学者把法律实证研究等同于定量研究的，比如参见白建军……”(150)然而，程金华关于白建军和苏力的总结，也仅是个人理解，而非基于文献的考察。事实上，二者在提出之初都包含定量研究和定性研究，甚至是以定量为主。(151)

与陈柏峰将法律经验研究解读为质性田野研究不同，侯猛解读的“法律的经验研究”概念，事实上是放在与社会科学研究方法、实证法学研究、法律的社会科学研究、社科法学、实证研究同一水平。关于法律经验的研究与社会科学研究方法的关系，他指出法律的经验研究用宏观社会、微观社会、微观个体三种基本社会科学视角进行观察。在表达法律的经验研究的规模时，他用实证法学研究的英文概念(empirical legal research)。在关于法律的经验研究与社科法学的概念时，他指出法律的经验研究“主要运用社会科学的知识和方法，因此又称为法律的社会科学研究，在国内通常被称为社科法学”。(152)因此，侯猛笔下的“法律的经验研究”是定性和定量的结合，只是“定性方法的运用争议较小，但定量方法的运用就存有不同争议”。(153)贺欣在论证“经验地研究法律”时，也指出法律经验研究的根本特点是，“运用社会科学的方法，从法律的外部来研究法律”。(154)但贺欣笔下的“经验地研究法律”和“社会科学的方法”也是定量和定性的结合，只不过“定量的研究更像科学”，“定性的研究更像艺术”。(155)

陈柏峰的田野研究和经验调查，绝对助力其成为最了解中国乡村法治的法学家。这无疑与其博士生导师、华中科技大学社会学教授贺雪峰注重经典阅读和田野调查的“两经训练”有关。在贺雪峰看来，“社会学的长处是注重经验，注重用事实说话”，(156)故“中国社会科学研究应该坚持田野的灵感、野性的思维、直白的文风”。(157)从这个角度来看，陈柏峰用苏力的社科法学解读其定性、田野调查、经验研究，也只是围绕自己的社会学博士学位和学术经历解读定性的法律经验研究。也正因为如此，陈柏峰所提出的法律经验研究，在侯猛和贺欣看来，只是社科法学下位概念的定性研究。但如前述，苏力的社科法学概念本身，也只是实证研究的另一种提法，只不过近年来重新冠名而已。各种迹象表明，陈柏峰关于法律实证研究分化的理解，虽名义是定性和定量研究的分化，(158)但更准确的表达应是陈柏峰将过去和现在的理解分化，或者是法学与社会学的分化。

(三)人工智能法学

尽管人工智能或大数据已经在法学界被反复提及多年，但“人工智能法学”很长一段时间并没有作为一个正式概念被提出。申卫星将2017年称为“人工智能元年”。(159)虽然多数学者在论述人工智能法学并未直接提及实证法学或法学实证研究，但大量作品事实上又在用实证研究及相关概念反复论述。

从知网检索情况来看，程龙于2018年正式以“人工智能法学”这一概念署名发文。他认为“为实现具有主体性、整体性、体系性和可对话性的强法律人工智能研究即人工智能法学，需要以研究主体跨界参与、人才培养方式转变、研究方法革新和国际间交流合作等方式达致”。(160)就当前来看，虽多从学科和教学体系方面阐述，但仍有不少学者从研究方法的层面来谈人工智能法学。就作为方法的人工智能法学而言，实证法学家们论述了大数据背景下的实证或定量研究的重要性。例如，左卫民在论证法律界对人工智能的疏离时，尤其强调法学界对定量法学研究不多、善于运用统计方法的研究不多，强调人工智能算法和模型的重要性。(161)与此同时，论述计算法学和数量、数据、数字法学的学者，又反复强调人工智能的意义。例如，季卫东在讨论计算法学时，将人工智能和计算法学结合在一起。(162)即使刘艳红在讨论人工智能法学时并没有提到实证研究，但她呼吁建立传统社会科学和自然科学的新文科时，因注重“法学的实践性”而没有远离实证范畴。(163)郑妮在谈到人工智能法学的概念误区时，也提到“人工智能法学也更具备立足现实、关注当下的基本品格，秉持实证主义法学、实践性法学的思想观念”。(164)就学科体系来看，刘艳红认为，人工智能法学不是“人工智能+部门法学或(计算)数据信息+法学”，而是由“人工智能+法学”交叉融合而成的独立新型学科，所以她建议“应在法学一级学科之下设立全新的二级学科人工智能法学”。(165)

虽然人工智能法学在学科和教育范畴很难直接与实证法学直接画上等号，但若从钱学森笔下的数学方法、计算机建模、人工智能、数量法学来看，以及吴世宦的电脑法学等来看，人工智能法学仍然可以划归于实证研究行列。当代学者关于人工智能法学的表达和理解，无一例外地运用了与实证法学相关的其他概念。例如，苏宇在对江溯关于法律人工智能的专访中提问：“江老师您好，请问您是怎样接触到‘信息技术+法学’，或者说是数据法学/网络信息法学的呢？是怎样的一种机缘呢？”(166)江溯本人算是开展实证研究的学者，他还承接白建军担任北京大学实证法务研究所主任。(167)更好的例子，应该是岳彩申、侯东德主编的《人工智能法学研究》，几乎每期都刊登实证研究论文。此外，《现代法学》在计算法学专题中刊登人工智能的算法文章。(168)

(四)数据法学

较早提出“数据法学”这个概念的应当是何海波。他在迈向数据法学的专题絮语中，从方法层面阐明数据法学就是指“以数据获取和分析为重心的法律实证研究”。(169)不难看出，数据法学本身就是实证研究的一个分支，只是因为数据提供了资料、思路、方法。在迈向数据法学的第二期专题絮语中，何海波明确“为进一步推动以数据为基础的实证研究，我们再次组织这个专题”。总体上来看，何海波没有强调数据法学的特有方法和学科概念，而是充分尊重实证研究的传统定位。这从专题絮语末尾可以充分看出：“《清华法学》历来重视法律实证研究，发表过多篇实证研究文章。”(170)

与何海波相比，曾赟大力提倡数据法学作为一种独立的法学学科，并认为这是继法教义学、实证法学、计算法学后的第四种法学知识新形态。曾赟将数据法学定义为，“以法律数据为研究对象，运用数据科学方法创造法律数据产品和发现法学知识的独立的法律科学”。(171)基于法律大数据和全样本的研究方法特征，他认为这是数据法学不属于实证法学和计算法学的关键特征。他围绕法律大数据方法归纳出数据法学的三个特征，法律大数据是物质特征，机器学习算法是技术特征，算力支持是动力特征。他在没有比较和论证情况下，就直接判定法律大数据方法与实证法学、数据法教义学、计算法学的研究方法有明显不同。他用SIR模型举例来说明其理由，但没说明其逻辑基础、算法来源、阻断方法的内容。事实上，这个例子本身反映出他笔下的数据法学，恰恰是实证法学或计算法学的一部分。他一方面用“毒品基本传播数RO则可采用法律大数据方法计算得出”说明计算模拟和模型研究方法是计算法学和法律大数据研究的计算方法，另一方面又说明“RO可以通过抽样调查得出，而抽样调查的方法就是实证法学研究方法”。(172)

曾赟关于数据法学的学科和定位不明且自我矛盾，这也就决定了数据法学不可能是独立学科。按照他对数据法学的定位逻辑，数据法学方法是基于算法的理性演绎和基于法律数据的归纳推理。但理性演绎和归纳推理恰恰是大多数实证研究的品格，只不过是“算法”复杂程度和样本量多少的差异。曾赟本人长期肯定和偏爱实证研究，更主张实证研究限于定量研究。他直接命名为“实证研究”的多篇成果说明，数据法学本身就是实证研究。(173)

(五)数字法学

“数字法学”这个中文术语到底谁先提出，可能很难精准定位。从知网和图书检索来看，大致可以归功于马长山或胡铭。马长山注重学科含义，胡铭注重研究方法。但无论如何表达，是否明确肯定或使用“实证”，数字法学在实证研究层面与“实证”都是相同或相似的。例如，广州大学法学院创办的《数字法学》创刊词指出，本集刊“集中展示优秀的数字法学理论最新研究成果，以规范研究、实证研究、多学科交叉研究的方法”。(174)

马长山从学科层面提出概念时，认为“数字法学是新法科的重要学科，它是以数字社会的法律现象以及其规律性为研究内容的科学，是对数字社会的生产生活关系、行为规律和社会秩序的学理阐释和理论表达”，并认为这是“数字时代法律变革的必然要求和未来趋势，是数字时代的一场法学理论‘革命’”。(175)然而，他论证的数字法学三种演进路径，事实上都与实证研究不可分割。在表达新文科方法论路径时，他指出应突破传统文科的理论工具和研究手段，特别要运用算法，将文科的定性方法与定量方法相统一。在讨论认识论路径时，强调用算法把数字法学视为由归纳演绎向数据分析，由知识理性向计算理性，由人类认知向机器认知的范式转型。在讨论本体论路径时，他将计算法学等作为方法论的参照，阐明本体论中仍然是定量分析。他高度肯定计算法学是现代法学的当代转型，强调数字法学只不过比计算法学的范围和属性更为庞大复杂。(176)

胡铭在讨论数字法学时，没有给出概念定义，只是围绕“数字+法学”或“法学+数字”讨论基本定位和范畴。但是，他的多方面论述，似乎高度肯定了数字法学仍然是定量实证研究的一种方式和路径而已。在强调数字技术作为法治工具时，他提出改造升级现有定量法律实证研究，有助于更契合社会科学的研究范式。在讨论多元化贡献时，他强调保留法律实证研究等偏社会科学研究范式的方法论，引入大数据、机器学习等方法，数字法学的贡献就是丰富既有以统计学算法为主的工具箱。(177)在最新的研究中，胡铭更是交叉使用“数字法学”和“实证法学”，只不过新增实验方法而已。例如，文章摘要里明确描述“数字法学研究有必要引入实验方法。相较传统的实证法学研究方法，实验方法在挖掘数据规律、确定变量之间因果关系等方面具有可复制性、可验证性等优势”。(178)总体来看，就胡铭所解答的数字法学与实证法学而言，最大的区别也仅在于传统与非传统。

姜伟和龙卫球编写的《数字法学原理》，采用“本体论”概念，认为数字法学是“将基于数字技术应用而产生的法律现象本身作为研究对象，侧重对于具体法律问题和法律制度的分析”。(179)此外，这本书介绍的数字法学研究方法，包括规范分析、社会学、比较法、计算法学四种。(180)在后续的讨论中，姜伟也将数字法学上升为独立的学科层面，把数字法学看成是法学的分支。虽然似乎看不出姜伟的数字法学概念与实证有多大关系，但他在论述数字法学的学科特点时，又强调数字法学是“综合和交叉学科”“计算性的实证法学”“实践性的理论学科”。(181)因此，姜伟和龙卫球笔下的数字法学，无论是直接归结于社会学的社科法学，还是直接归结于计算性的实证法学，实质都是基于实证研究的实证法学。

(六)其他概念

一是认知法学。张妮、蒲亦非以量化为核心，在2021年发文首次提出“认知法学”概念，并认为“从计量法学、计算法学发展到认知法学是法学研究的必然趋势”。(182)然而，张妮本人是从《量刑的模糊评价研究》的实证研究结论开始，(183)发展为法学实证分析、法学量化分析、法学定量研究的博士论文《精神损害的定量研究——以医疗损害赔偿裁判为例》。(184)之后，张妮又从“实证研究”概念开始，(185)发展出“量化法学”，再经计量法学发展出“计算法学”和“认知法学”概念。(186)例如，张妮本人在论述量化法学或计算法学的概念时，依然表达了定量研究是在实证法学基础上发展出来的，“法学定量研究是实证法学研究与现代计算机科学发展的必然趋势”。因此，虽然张妮是各种新概念的提出者，但她的多篇课题成果始终围绕司法案例，利用数学和统计学方法展开法学定量研究、实证法学研究。这也是张妮本人多年来长期混用各种概念和术语的重要原因。

二是实践法学。这个概念最早由左卫民提出，但他倒没有为“实践法学”定义。不过，按照他的论述，“未来中国刑事诉讼法学的研究应该在坚持规范研究、价值分析的基础上，适度迈向实践法学，不仅要注重借鉴社会学传统的访谈、参与式观察等定性实证研究方法，更要注重借鉴最近几十年来在社会学、经济学、统计学等领域兴起和发展的数理分析等定量方法”，(187)他笔下的实践法学仍是实证研究的另一种表达而已。

三是自科法学。这个概念同样由左卫民提出。他认为在法律大数据时代，传统的实证研究方法对有限的数据进行基于人力计算的整理、分类与分析，可能不再受到重视、没有生存空间。因此，他认为“法学与其他学科的交融似乎开始由社会科学扩张到自然科学。这种交融的产物似乎可与社科法学相对应地称之为‘自科法学’，即运用自然科学的思维方法以及技术，特别是统计学、数据科学等来研究法律问题与现象的法学研究范式”。(188)他认为社科法学是为了使法学研究经验化，自科法学是为了法学研究的科学化。鉴于此，他认为首先需要思维理念的变革，不再简单借助统计工具解决传统法学研究的问题。其次，自科法学的关键功能是强调法学研究的证伪思维，借助数学、统计学、计算机科学等学科方法排除个人主观影响，利用数据来判断某一法律问题究竟只是局部、偶然现象还是制度流弊。(189)左卫民坦诚无意介入学科之争，传统、单一概念无法概括新的研究范式，但他又认为实证研究在大数据的助力下应该迈向自科法学。这个论断意义深远，不仅暗示着尚未普及就已经作为传统的实证研究可能终结，也可能是将其提倡的定量实证研究提升到新高度。但如前文所述，自科法学的论断事实上与二十世纪八十年代钱学森的论断如出一辙。从左卫民将法学研究范式的讨论定位在苏力的《也许正在发生》可以看出，(190)他同样没有意识到，二十世纪八十年代已有关于法学向社科和自科发展的阐述。

事实上，如前文所提，我国法学界关于“实证研究”的概念，还有如“法律计量学”“法律信息学”“计算法律学”“信息法学”“未来法学”等，限于篇幅无法一一梳理。但可以肯定，无论是否明确用实证研究，也不管是否坚守实证研究，更不管以哪种定义模式，最近十年的争议本身是围绕大数据分析而产生的新兴概念。从这个角度来说，周翔的评论可能最为中肯，“大数据技术对于实证研究而言有一种接力的价值，两者的共性大于差异。大数据技术主要应定位于加强实证研究的某些环节，但不改变实证研究基本的方法论框架”。(191)但问题是，“大数据”真的就那么重要吗？如笔者几年前早已论断那样，如果没有裁判文书网所承载或开源的判决资料，难道法学家就不研究法律的实际运行状态了吗？

五、回归“实证法学”的倡议和路径

(一)多元语境下的实证法学本质

前文已反复说明，不管这些传统或新兴术语如何解释或描述，法学家们都不约而同地直接或间接阐释实证研究。因此，与其用多种复杂多变的概念创造新意，还不如在法学领域里坚守实证研究这个传统领地。原因很简单，实证研究是前述各种术语的本质，作为研究方法的实证法学当然也是法学家开展实证研究的本质。即使作为学科概念，实证法学也是法学领域、法学教育和法学家身份的本质。与此相比，其他术语只不过是法学家们在多元语境中的替代性术语罢了。对于法学和法学家来说，无论是学科还是方法概念，实证应该是那些善于求真的法学家们讨论问题的起点和终点。理解这个本质，有如下几方面值得注意：

第一，主流实证法学家在提倡概念或术语多元化的过程中，可能需要根植于实证研究这个基本概念。左卫民应该是思考实证研究较为深入的学者，不可否认也是法律实证研究的守护者。在过去十年里，他在不同文章中围绕定量研究创造和提出了多种概念术语，均在阐明法律或法学实证研究的各种内容。他基于美国定性和定量实证研究范式的迅猛发展，开始用实践法学呼吁用数理和计量研究方式研究中国的客观实践。后来，他的这些观察陆续发展为作为专门探讨实证研究的法律实证研究、实证研究、实证法律研究、法学实证研究、定量法学、计量法学，阐明实证研究的定量、数量、数理特征。直到最近，他用计算法学、自科法学等高级版本，为实证研究做更新换代。然而，左卫民的解读不仅没有使实证研究的概念术语定型和更加稳固，反而因定量研究和数据分析而缩小了实证研究的圈子。

第二，就传统术语的发展和演变过程来说，实证法学家们应当从源头上始终抓住实证研究这个本质。当代中国的实证研究起源已长期被误解，应当重新将实证研究归功于钱学森在二十世纪八十年代所激励的范式讨论和转型。与此同时，社科法学从提出开始，就没有否定实证研究，也没有否定定量研究，而是包含定量和定性两种范式的实证研究。然而，该中文称谓实在有欠科学性，并且经过近十年翻新，社科法学已从注重定量和定性的实证研究，被解读为只注重定性、经验、田野调查的质性研究。无论是主张定量为核心的实证研究学者，还是社科法学的传承者，言过其实地将质性研究作为社科法学的主阵地，都应当反思实证研究与社科法学的初始关系。与此相比，屈茂辉所在的湖南大学长期是计量或数量法学的主要阵地，以定量的实证研究为特色，有别于个案分析的实证研究，但从不否定其实证法学定位。

第三，主张新兴概念的实证研究者，应当在科技和技术背景下坚守实证研究与实证法学的原动力和阵地。关于这一点，计量法学(数量法学)论者注意到人工智能对实证研究的影响，但仍然倾力和坚守实证法学研究这个本体，值得赞赏。(192)于晓虹、王翔指出，“计算法学是计量法学进入大数据时代的产物。从学科构成看，计算法学属于实证法学的范畴”。(193)再如，何海波在论述数据法学时，仍然将其作为实证研究的表达方式。(195)还如，胡铭在论述数字法学时，仍然强调定量法律实证研究方法的充分运用。(195)因此，就新兴概念来说，无论是精于实证的法学家，还是本就不擅长或不熟悉实证的法学家，都应当注意，只有做出好的实证研究，才能展现出其学科和学术的吸引力。如果不能如此，新兴概念要么只有概念意义，要么只能是非学术层面的商业或技术概念。此时，擅长实证研究的法学家又如何坚守其学术阵地，历史教训已经相当深刻!虽然二十世纪八十年代广泛讨论研究方法，但除了高赞和畅想概念之美以外，基本没有参与讨论者在其概念体系下坚守发展。否则，中国今天的定量研究早已领先世界，而不会出现仍唯美国马首是瞻的怪圈。

第四，回到当代中国实证研究的真正起点，不刻意用中断年代或阻断学术传承的方式，创造法学研究的新范式或新概念。中国实证法学家应当做的是，在阅读文献的立场上，在尊重前人术语的基础上，展现学术传承的高风亮节。如果每一个概念都能追本溯源，学术研究早就从华而不实的思想讨论，迈到丰富的具体研究中了。无论概念家们在论证时是否引用，或者是否知道钱学森与吴世宦的贡献，但其围绕数学、统计、计算机分析等各种论述均没有超越法治系统工程和系统科学方法的范畴。数据法学、数字法学、计算法学、人工智能法学与早期的计量法学、计量法律学、数量法学本质上并无差异。只不过，今天所谓的大数据时代为新术语创造了新素材，但这不能成为创造新概念或术语的绝对理由。值得指出的是，这些新兴概念本身是否严谨、概念是否周延，仍有疑问。

第五，实证法学家们使用外国概念或术语论证传统与新兴概念时，既应充分尊重翻译和用语习惯，也应注意各种外文词汇的原始语境其实就是实证研究。信息化时代即使不懂英语，也可随意通过翻译软件查阅中文含义，然后大论连篇。抑或，随意翻阅几篇中文大论，参加几次会议拾取后生牙慧，参考同行的简短评语，有心的术语家也可凭高超的灵感创造新概念。然而，仅凭这些学术捷径，显然不足以创造出具有理论和现实品格的真研究，反而会因术语翻译陷入无休止或自说自话的争论中。对于传统概念来说，关于empirical legal studies或empirical legal research在中文就有法律实证研究、实证法律研究、实证法学、法学实证研究、实证法学研究、法律经验研究等翻译。英文术语不仅被用在传统概念中，而且还直接或间接用于或论证几乎所有新兴概念中，如计算法学、法律经验研究、数据法学、数字法学、人工智能法学等。对于新兴概念来说，洛文杰的jurimetrics在各种文章中被翻译或成论证成计算法学、数据法学、人工智能法学、法律计量学、计量法律学。然而，应当注意到，洛文杰的jurimetrics是与法理学jurisprudence相对应的概念，“jurimetrics强调路径上的实践(practical)，jurisprudence强调哲学上的思辨(speculations)”。(196)只要稍微阅读原文，便可发现他笔下的jurimetrics更是在肯定“调查”重要性，作为实证研究的词汇和词义而提出。例如，这篇文章empirical使用了3次，statistics或statistical使用了6次，甚至还用了定性qualitatively和定量quantitatively。

(二)实证法学可以实现整合的路径

解决了法学领域实证研究的本质问题，技术上就可以探寻整合实证法学的路径。经过前文对各种术语的定义和产生背景的追溯，笔者认为，相比于其他概念而言，只有实证法学可以实现法学领域实证研究的真正统一。在约定俗成和用语习惯的背景下，法学家们都在用“实证”或“实证研究”这两个高频词。既然如此，实证法学在统领各种概念语系后，先作为方法的实证研究，进而上升为作为学科层面的实证法学，也就顺理成章了。一方面，法学家的身份证和法学的学科定位，使得“法学”自然是核心的内容。另一方面，实证作为各种概念体系下的“中心词”，就如身份证所对应的人一样，使实证法学家的个体身份特征才有独特标示。从这两个层面来看，实证法学作为研究方法和学科层面的上位概念，不仅可以统领作为某一种或某一面的下位概念，也是整合学术研究队伍的最佳途径。与此相比，其他概念或术语不可能具有这种得天独厚的优势，也不可能实现学术整合。具体理由如下：

第一，从语法和法学学科的表达习惯来看，“实证法学”比“法学实证”更好。尽管与其他概念相比，法学实证已经是很不错的表达了，但仍没有达到实证法学的效果。一方面，法学实证中的“法学”是作为形容词“法学的”修饰词，“实证”是作为实词的名词。这样一来，法学实证的重心就没有放在“法学”的学科概念上，反而是将“实证”放在中心词地位强调法学的实证。与此相反，实证法学作“实证的法学”解读，就不仅将“法学”作为核心上位和中心词理解，而且“实证的”也强调了“实证”是作为法学下位概念的修饰词而已。另一方面，实证法学这个称谓也符合其他部门法层面的二级学科用语习惯。既然实证法学要作为二级学科概念，就应当参考其他术语。除了法学理论或法律史学外，与刑法学、民法学、行政法学等其他二级学科一样，实证法学倒比较适合这个用语习惯。从术语渊源来讲，实证法学作为“实证的法学”表达，也和英文的表达结构差不多。

第二，从术语概念的产生时间来看，葛洪义的实证法学应当是继钱学森的数量法学后，形成较早体系的概念。差不多与实证法学是同一时期的概念，也就是何勤华笔下的计量法律学或计量法学，这也是从数量法学和法治系统工程学所演化而来。虽然数量法学用得更早，也更正统，但这个术语过于强调数学或量化研究。从学科发展来说，如果用小概念而放弃更大概念，因小失大无异于舍本逐末。如前述，无论是季卫东、齐海滨笔下的实证研究，还是葛洪义明确提出的实证法学，两者都基于区分规范(价值)研究的法学方法展开。虽然实证法学概念在初期并未展现出今天的内容，甚至还在计量和实证主义法学之间寻找容身处。但应看到，这一概念扎根“客观规律”或“现实世界”，(197)以及“实际存在的法律制度和它的实际运行状况”。(198)与此相比，法律实证研究或社科法学是更晚概念，其他新兴概念更是最近十年才出现。

第三，从关于实证的学科和方法含义来看，实证法学比法律实证或实证法律更具有含义。一方面，法律实证或实证法律(研究)是从方法层面把实证作为分析法律的研究手段，这很难上升到学科概念。原因很简单，只有纯规范的法律问题才能展现出与法律条文紧密相关，大量法社会问题和法运行、法经济、法心理、法律史问题并没有直接展现与法律条文相关，而是展现出法学背后的大量其他问题。例如，社科法学或法社会学所研究的问题，本身并不一定与法律条文相关，但他们所研究的每一个小问题又时刻关系到法律的生命力。另一方面，实证法律或法律实证(研究)总是围绕法律而转，而没有从学科上升至与法律相关的学问，这就大大降低了实证法学本身的学术价值。如果能上升到法学的学科和学问层面，实证法学就能够建立与理论法学或部门法学相对应的学术倾向，更与新兴概念号召成立二级学科的使用规律相似。

第四，从内涵来看，实证法学不仅可树立以法学为中心的学科体系，而且中立地将各种研究方法纳入其中。就学科而言，实证法学比社科法学、计算法学、人工智能法学等各种概念更能突显法学的中心地位，也可避免数量法学或计量法学有模仿经济学的称谓痕迹。这样一来，不会出现社科法学用社会科学这一上位概念掩盖社会学、法社会学的寓意，也不会出现像数量法学、计算法学一样过于强调统计学和计算机专业或技术的特性。事实上，左卫民2013年提出的以定量和数据分析为中心才是实证研究的评价或定义，同时认为社科法学是个案式研究而无法关注普遍性。(199)这才导致法律实证研究和社科法学的方法彻底分离，并最终挤出法律经验研究这个概念，并在近年来讨论得越来越深入。然而，正如张永健和程金华试图调和的一样，法律实证研究包括定性和定量研究。(200)社科法学和法律实证研究事实上都在强调实证，彼此并无否定实证研究特性的意思。多数新兴概念无论是从方法还是学科概念来看，均不自觉地朝算法、大数据分析、计算机等理工科方向跑，这自然也有其固有缺陷。

第五，实证法学的概念，以极其简便和简单的概念术语回归“法学”，有利于形成以法学家和法科学生为中心的方法和学科概念。就当下讨论实证研究的学者们而言，应时刻以法学家的教育背景和承受法学教育的法科学生为中心。脱离法学家和法科学生的法学教育，或者割裂法学研究者能力的任何高技术表达，都注定只能远离法学。即使听起来多么酷炫和合理，最终只能因过高的起点，让法学家的参与度越来越低，终因方法恐惧而被挡在起点上。例如，那些擅长用人工智能法学和计算法学来包装其学科和方法论者，要么是用高科技公司的数据挖掘和建模技术构建商业路径来吸眼球，要么是为计算机学科作嫁衣。就商业动机来说，高科技概念背后的大数据采集本身就源于法律灰色产业。无论是否愿意承认，大量以司法判决为核心的数据爬虫公司，已用或正用计算机手段违法或违规地进行信息或数据采集。此时，新兴概念所赖以生存的技术路线，已经脱离和远离了学术、学科的法学本来面貌，成为大数据公司挖掘判决文书的最好借口。就为其他学科作嫁衣来说，最好的证据是季卫东所领衔的计算法学，以“中国计算机学会计算行业分会”的形式搭建在中国计算机学会下面。但应当注意，中国计算机学会的官方口号是“为计算领域的专业人士服务”，(201)但法学家在计算(机)领域的专业度又是什么呢？除了法学概念或规则的阐释和帮助以外，技术问题还只能由计算机专家解决，这不过是貌合神离的两张皮。

第六，实证法学家永远是各种术语概念下的学术核心力量，将各种概念回归实证法学本身也算是名正言顺。当前各种概念所催生的学术组织或学术活动，参加者要么以擅长定性研究的学者为主，要么以擅长定量研究的学者为主，要么以实证法学的综合性或下位概念的支持者为主。再以中国计算机学会计算法学分会的管理层为例，(202)会长季卫东是主张计量法律学、法社会学、社科法学、实证分析、实证研究的学者；副会长左卫民是众所周知的法律实证研究学者；副会长申卫星又是阐释计算法学为“利用计算工具探索法律问题的实证分析……以事实和数据为基础的实证研究”的学者；(203)秘书长林喜芬也是擅长实证研究的知名学者，在中外法学期刊都大量发文。中国计算机学会计算法学分会只是若干场学术活动的缩影，人工智能法学也大概如此。可以毫不隐讳地说，离开擅长实证研究的学者，任何传统和新兴概念注定不可能发展起来。例如，阙梓冰总结的10篇计算法学成果，基本主要来自实证研究的学者或论证计算法学作为实证研究方法的学者。(204)当然，实证法学家参加各种新兴概念所组织的活动，一方面可以理解为拓宽路径的有力形式，另一方面也可理解为新兴概念在挖墙脚。如果是前者，实证法学家需要的恰恰是回到学术起点，先将实证法学发展壮大起来，而不是与自己所擅长或经营的学术阵地渐行渐远。如果是后者，就不必说了。

(三)“实证法学”术语重申和概念重解

要整合既有法学领域关于实证的研究或学科提案，需要进一步完善“实证法学”概念。当然，具体定义可能需探索，但整合学术队伍和研究实力在任何情况下是正当的。如前述，笔者在几年前讲座中，从研究层面使用了实证法学概念(以下简称“旧定义”)。(205)然而，在近几年“实证法学导论”授课中，笔者越感自己提出的旧定义不太完整，各种问题还不少。因此，为展示实证法学的宏观路径，笔者2020年就已在授课中重新定义实证法学。本文中，笔者将“实证法学”重新定义为，“利用各种资料对法学相关问题展开实证研究的学问”(以下简称“新定义”)。笔者希望，这可以为实现整合路径做些铺垫。

从概念内容来看，新定义注重各种资料、法学相关问题、实证研究、学问等四个方面的内容。“各种资料”是实证研究的素材，这在旧定义的解读中已经说明，不仅包含数据，还包含文字、语言、符号、声音、手语、代码等。新定义不问资料以何种形式呈现，只要它是一种素材，不强调数据或经验，只要可以成为和称为研究资料，都是资料。如过去在讲座所描述的，“实证资料的核心是信息转换”，“任何资料的本质是一种信息交流，而有效交流的核心是建立一套恰当的信息转换机制，使不同形式的资料之间有效互通”，“无论资料形式是什么，都是建立在信息交换机制下的产物”。(206)新定义将研究对象定位为“法学相关问题”，换句话说，只要与法学相关的问题，都是实证研究的对象。如此看来，新定义不是仅将法律作为实证研究的对象，而是将与法学相关问题作为研究对象。新定义突出实证研究的本质特征，这不仅可以突出“实证”的方法本质和特征，而且也可由其方法产品筑成作为学问基础的“研究”。总之，新定义与旧定义相比，自认为更为合理，但限于篇幅无法在本文详细展开，如下略述其在方法或学科层面实现整合目标的优势：

首先，新定义从“资料”层面来看，不仅避免了传统概念的实证维度争议，也避免了新兴概念所倡导的是否为大数据分析的争议。资料决定方法，有什么样的资料，就有什么样的方法。要整合实证法学当前出现的概念争议怪圈，首先要将资料这个基本问题解决掉。就传统概念来说，是否属于实证的概念争议，本质上集中在定量和定性对应的资料之争。张永健和程金华充分注意到这个问题，指出定性和定量研究资料只有形态、获取方式、学术努力方向差异。(207)与此同时，就新兴概念来说，近年来的争议实质就在于计算机技术所代表的大数据、大样本，和传统研究方法的小数据、小样本分析，这本身也是个资料面的问题。唯一的区别是，人财物成本差异和技术路径的实现方式不一致。但只要稍有统计学知识，即可知抽样对总体的代表性原理，夏一巍也充分证明了这一点。(208)因此，“各种资料”只强调资料是哪一种，资料的多少、大小、形式、内容、属性差异完全不是问题。

其次，新定义从研究对象来看，强调“法学相关问题”，而不仅仅是法律或法律相关问题。就实证法学本身的概念而言，较先前的“法律”对象来说，新定义的概念在广度和深度上更加有利。笔者的旧定义将实证研究定位在“法律运作”，这明显太窄。与此相似，各种传统概念都将“法律”作为概念本体，但“法律”这个研究对象显然不够宽大。(209)就新兴概念来说，不管是否明确表示其研究对象，几乎也是以“法律”为核心，如法律经验研究就定位于“法律问题”。(210)如果能跳出像规范法学那样以“法律”为中心，回归实证法学家所关注的包含但不限于法律的法学问题，没有人会怀疑身在法学院的法学家研究法学问题的能力。

再次，用“实证研究”概括所有研究方法，可以消除因方法差异而引发的概念和群体分化，实现方法和理念整合。就实证研究本身来说，本来应只有具体方法不同，而不存在是否是实证研究的差异。然而，当前关于是否属于实证研究的概念争议，很大程度上也是方法不同所导致的是否是实证研究的差异。例如，左卫民在长期定义法律实证研究或其相关概念时，都用大范围、大样本、全样本、大数据、超大样本定义实证研究，将其理解为运用统计学或计算机技术的定量数据分析。近年来提倡的法律经验研究以经验和田野的质性研究为核心，并因此区别于基于定量分析的实证研究。(211)与此同时，新兴概念的数据法学和计算法学本身就承认是实证研究，人工智能法学、数字法学等本来也仅是计算机参与程度不同的实证研究。

最后，新定义将核心放在“学问”上，将从方法和成果层面的实证研究拔高到基于学问体系而成的学科概念。尽管从目前来看，基于方法和成果层面的实证研究还需努力，所有讨论实证法学的学科概念都为时尚早。原因很简单，实证法学不论是从方法和成果层面展示其独特性和可接受性，还是要在新文科理念下形成自己的学科，都必须先由众多实证法学家所组成的“研究队伍”产出够分量的实证研究学问。只有当学问的体系足够丰富和多样，才有可能发展为学科概念上的实证法学。从目前来看，实证法学家们还停留在自己“道”的讨论上，远没有形成“术”的整合，这种局面不可能有助于形成学问体系。因此，只有把握“学问”本身的学术意义，平息概念之争，才有可能实现学问共同体。

六、结语

整体上来看，过去十多年各种概念体系下所组织的有关实证研究的年会交流频繁，但基本属于同一个小众群体在不同场合的学术奔波，内行也都明白这些年会都在勉为其难地苦苦支撑。因此，从整个法学研究的科学化道路，以及发展实证研究的研究队伍上看，改变当前学术分化的局面是刻不容缓的。为此，本文在梳理各种相关概念的来龙去脉后，本着壮大实证研究队伍的基本立场，共商中国实证法学发展道路。本文尊重既有文献的讨论，尊重前辈先贤的知识贡献，注重梳理各种概念的前后联系。写作过程中虽保持客观真实和事实描述，但难免因篇幅删减出现描述或表达不到位的问题。文献回顾最难，这不仅因为要评前人，而且可能总结不准。故虽费劲完成，但也可能费力不讨好，甚至还会因此得罪前辈先贤。但笔者相信，中国实证法学的发展必然首先需要整合和统一理念，总会有人为了学术共同理想而挺身而出，只不过是谁和什么时候而已!若能如此理解，笔者最大的心愿是，读者和同仁能回归实证法学和实证研究本身，开展学术研究和共建学术队伍。

笔者重申，一个人的研究叫爱好，一群人的研究叫队伍。因此，所有致力实证研究的法学家们，应该思考的是实证法学及其队伍的未来发展和学术影响力问题。当务之际是共商和共谋学问大计，搁置人为构置或理解所引发的窝里斗，寻找发展中国实证法学的同一片蓝天。在此，笔者坦诚地呼吁，停止有关法学领域的概念或术语争论，停止一切分裂或分离实证研究队伍的做法，将实证法学作为统一的术语和概念研究中国法制、法治特色，为世界贡献中国法学的智慧。理由极其简单，只有如下四点：

首先，深入了解当代实证法学真正起源于二十世纪八十年代后，今天及未来很长一段时间关于实证法学的概念和方法讨论，不可能超越由钱学森和吴世宦所引领的数学、计算机、人工智能、法治系统工程的方方面面。除了文字表达差异或技术途径的具体化以外，各方面内容都不会逃离系统论、信息论、控制论三个维度。

其次，本文已充分展示，各种术语及其概念体系下都是以实证研究为本质，任何关于概念或方法的争议只是文献断代式误解或曲解。只有充分了解和梳理现有文献的情况，才能有理有据地提出新概念。否则，任何唐突地创造概念，或以偏好或擅长为概念基础，只会让实证法学研究永远都停留在概念阶段。

再次，只有同心协力和万众一心，才能真正实现实证法学研究队伍的发展和壮大。过去十多年，各种学术活动都是新标题、老面孔，真正的学术新人实际很少。虽然名为实证研究的数量有爆发式增长，但主要是概念和思想的量产，真正的实证研究还一如既往地艰难挣扎在起步和发表阶段。实证研究长期被称为小众，原因就在于，真正的实证研究在中文世界产出量少，实证研究的学问体系还未真正建立。

最后，用最简单、中立、宽广的概念，比用高大上的概念更能吸引新兴学者参与到研究队伍中。实证法学最大的危机，不是仍是或将来还是小众，而在于新概念的技术和理念复杂性，造成没有人愿意和能够加入这个群体。法学的文科属性决定了其与自然科学的课程和体系的差异，如过于强调数学、数量、数理、数据、数字，或计算机、计算、算法、人工智能，实证法学永远难以有质的发展。

笔者相信，实证法学家愿意回归实证研究的本来面目，愿意将开展以法学问题为核心的实证研究作为己任。故，实证法学家们真正需要做的是，从建立或继续建立自己的学问研究体系入手，围绕某一个问题、领域、学科持续深入跟进。在更多人的理解和努力下，将每个问题、领域、学科做深做大，形成群体或集体性的学问整合体系。只有如此，才能发展出真正意义上的实证法学。当然，这至少还需要一代或两代学者的勤奋耕耘，最后才有可能实现学科层面的实证法学宏伟目标。

在“首届智慧法治高峰论坛暨第九届数量法学论坛”上，作者报告了与本文相关的问题，但限于主题只提到文献依据而没有展开系统回顾。虽王禄生教授高度肯定报告内容，但屈茂辉教授和魏建教授对相关概念是否属于实证法学有不同意见。本文的完成，应该特别感谢三位同仁的启发式讨论，故作者认为有必要从最基础的概念术语梳理着手。

本文转自《湖湘法学评论》(长沙)2024年第1期

2024-12-03
黄波粼钟子善：上海农村集体托幼实践的考察（1958—1962）

从思想史的脉络来看，关于托儿所与幼儿园的构想无疑具有相当长之历史。从柏拉图的《理想国》①，到启蒙运动时期众多的空想社会主义者②，至恩格斯③，后至康有为的《大同书》④、青年毛泽东⑤等都提出过“幼儿公育”的想法。从实践层面而言，清末以降，许多政治、社会力量都认识到了公共育儿的必要性，这使得公育思想在本土化建构中与实体建设并举⑥，托幼事业逐渐成为各种现代性设计不可或缺的一项重要内容。新中国成立后，公共托幼的必要性和重要性更加突出，因为恩格斯曾提出随着生产资料转归社会所有，“孩子的抚养和教育成为公共的事业”⑦，列宁也曾强调托儿所及幼儿园是共产主义“幼芽的标本”⑧。新中国托幼实践的大规模开展，大体发生在1958年至1962年，学界在这方面成果较丰，大多从妇女史的视角来回应马克思主义理论下的“妇女解放”问题。⑨有学者认为，大力推广集体托幼有着培育“共产主义新人”的意义。⑩有学者通过分析20世纪50年代末的农村集体托幼进一步指出，兴办托幼并非只为培育“共产主义新人”，它更是一个塑造“共产主义新农民”的过程。11尽管现有成果已经对新中国集体托幼有了比较深刻的认识，但仍有一些问题需要继续追问。例如，回到具体的历史进程和情境之中，国家诠释的“共产主义”具有哪些意涵？它们又是如何被农民接受的？本文拟以地方档案为主要史料，考察1958年—1962年的上海农村12集体托幼实践，着力展现其围绕各个阶段中心工作而曲折发展的轨迹，通过回溯若干具体措施深入农村的过程，呈现一段塑造共产主义新农民的历史。

一、塑造农民对于共产主义精神的政治认同（1958年9月—12月）

延安时期，毛泽东给延安第一保育院题词：“儿童万岁”13，又强调一定要为教育后代而努力。新中国成立初期，“社会主义老大哥”苏联的集体托幼模式引发国人极大兴趣，14集体托幼在当时看来是一种与共产主义生活相适应的教养模式，被视为“共产主义萌芽”，直接关乎共产主义新一代的培养，“为将来的共产主义社会准备了‘人’的条件”，因而是“一万年都要做的工作”。15

1958年北戴河会议后，毛泽东就指出人民公社办托儿所的重要性：“是搞钢铁，搞棉花、小麦重要？还是孩子重要？这是涉及下一代的问题。托儿所一定要比家里好些，才能看到人民公社的优越性”。16当年9月，国务院下发的《关于教育工作的指示》明确提出：全国应在3年—5年内，完成“使学龄前儿童大多数都能入托儿所和幼儿园”的任务。17上海农村托幼组织的大幅增长也是从1958年9月开始的。与此同时，上海农村确立将“家务劳动社会化”作为向共产主义过渡的重要内容。其中，“儿童教养集体化”是“家务劳动社会化”的首要目标18，即开办托儿所和幼儿园，集中教养7岁以下的社员子女19。除房屋外，摇篮、凳子、床铺、被子等都是开办园所必须具备的，全托还需食具、毛巾、水瓶、浴室、脚盆等。仅凭公社或生产大队积累下来的少量物资与资金很难满足办所办园所必要的物质条件。如何在“少花钱”乃至“不花钱”的情况下快速搭好托幼机构的架子，将儿童“迎”进来，成为摆在基层干部面前的首要难题。考虑到添置新的设备、物资需要花一大笔钱，且在短期内还不易购买到，因此不仅要坚持“因陋就简，勤俭节约，自给自足”的原则，还要发动群众通过借、调、征用等方式凑起必需物品。宝山县红旗人民公社第五生产大队全托幼儿园的建立就是一个典型例子。首先，大队向农民宣传“儿童集体化”的伟大意义，解释什么是“我为人人、人人为我”20的共产主义精神，并深入浅出地说明全托可以解放劳动力投入生产，能够更好地教养儿童，以及为何须自力更生，即如何遵循“勤俭办园”的原则。接着，动员有空房的或房子较大的社员，紧缩一部分住房，或移居到其他社员家中。最后，腾出一幢房子共八间，四间做宿舍，三间做教室，一间做炊事房，天井做小活动场，场地上扎竹篱做运动场地和花园。此外，要求入园幼儿的家长自带床及其他日常品，公共用具则是发动妈妈们有什么带什么，发扬集体互助精神。对实在困难、拿不出钱的家庭，就与其他孩子的家长协商合用，不够的部分，由干部发动其他社员适当添一些。最终，办园所需物品都是妈妈们自己送来的，共计24张床（包括床板）、50条被头，每个小孩1只矮凳、2只碗、1只匙。连没有小孩的金大妹也借出了长凳、马桶等。21典型事例的示范，鼓舞各地以共济互助精神大力兴办集体托幼。安乐生产队幼儿园的物资筹备过程亦是如此。该园于1958年11月7日建立，是由大队直接领导，社员在“不花钱”的原则下办起来的。22上海农村之所以在1958年秋季出现集体托幼的高潮，很大程度上是因为广泛推行此法。

无论是房屋还是日常用具，筹备托幼，“物”的基本设施自然是题中之义，而将“人”和“物”两者结合起来考虑也是对建立集体托幼机构的基本要求。“人”的要素首先是要解决部分干部和多数家长“思想不通”的问题。对干部群体而言，并非所有的基层干部都支持开办园所，其原因在于他们认为托幼事业对农业生产不但没什么帮助反而会拖生产的后腿，因而“不划算”。另外，在“热心”办园所的干部眼中，集体托幼对于家长无疑是一件天大的好事，但事实上，多数家长，尤其妇女并不这样认为。尽管她们对于基层传达的集体托幼与自身解放的关系已是耳熟能详，却因“人在田里，心在家里”的切身体验，生出不少顾虑：孩子交给别人看不放心，怕别人照管不好得了病，怕保育员偏心眼等。23

为了“打通思想”，大队干部通常会先在干部会议上通过算细账——“孩子在田里农作物损失，劳动力不能发挥等赔账”——让干部群体在兴办园所的必要性上达成共识。后要求干部召开妈妈或社员会议，设身处地地以妇女的实际家庭经济利益算家庭帐。如一家六口人，夫妻二人，一个老人带三个孩子，若三个孩子都送托儿所或幼儿园的话，老人就能去挣工分了。按一个老人一天最少挣5分计算，一个月可得150分。此外，针对妈妈们心理上的不安，以及小孩放在家里容易发生危险和事故等，进一步打消妈妈的顾虑。24不过，做通家长的思想工作并非易事，在农村工作“全面开花”的形势下需要耗费一定的时间与精力。

“打通思想”之余，还须确定由谁来照管园所的幼儿。由于以农业生产为中心任务，当时的保教人员往往由大队干部指定女性半劳动力或辅助劳动力来担任，由此形成“青壮年上前方，老弱做后勤”的人员配备模式。上海县马桥大队、奉贤县南桥大队及松江县张朴生产队53个托儿所和13个幼儿园的保育员绝大多数是老妈妈，年龄最大的72岁，最小的46岁，平均年龄在50岁左右。不少身体残疾，无法参加劳动生产的妇女也当起了保育员。有个托儿所的盲人阿姨王修金，一个人同时带三个孩子。25还有部分园所则是“小囡带小囡”。宝山县红旗人民公社第五生产大队的全托幼儿园由5个教养员负责，年龄最大的21岁，最小的才13岁。26一旦孩子在户外活动，有些保教人员在体力上难免力有不逮。在“物”与“人”的准备环节，两者通常是同步进行的。由于“边组织、边教育、边行动”27，很多园所在短短几天就办了起来。值得注意的是，早在1956年教育部、卫生部、内务部三个部门就曾联合发文，“收3周岁以下的儿童者为托儿所，收3至6周岁的儿童者为幼儿园”28。因此，尽管因陋就简是一贯方针，但这一时期上海农村在开办园所的过程中还是严格遵循了将幼儿园与托儿所分开等相关要求。托儿所以生产队（即一个自然村）为单位开办，便于妈妈接送，幼儿园则以两三个生产队为单位联合开办，或以生产大队为单位开办。

由此可见，人民公社化初期，兴办集体托幼除了具有基本的公共育儿功能，还有塑造农民对共产主义精神的政治认同的任务。农民群体的政治认同之所以重要，在于他们是党和国家确定的阶级柱石之一。需要明确的是，这里的农民实际指的是村民、农村基层干部、园所保育员及幼儿。尽管这些群体有各自的角色，但实质上仍是农民。1958年12月，中共中央高度肯定湖北省委《关于做好当前人民生活的几项工作的规定》，这份文件指出，办好园所“适用于农村，原则上也适用于城市”29，明确了托幼工作先面向农村的取向，更加凸显了这一时期塑造农民对共产主义精神政治认同的重要性。

和多数群众运动一样，上海农村在推行集体托幼时因为急于求成造成了不少问题。特别是为了“赶、学、比、超”与应付上级检查，很多公社不管孩子是否有老人带，家长有无需要、有无意愿，大讲“托儿化”“包下来”，以“强迫命令”的形式组织儿童进园进所过集体生活。30从1958年9月30日实现公社化，到10月下旬，不到一个月的时间里上海农村成立了1400多个幼儿园，4300多个托儿所，480个托儿组，收托儿童10万多人。31当年年底，江苏省苏州专区所辖六县与南通专区崇明县先后划入上海，这时，上海农村地区辖有11个县。32由于所辖区域的扩大以及农村福利工作的持续推进，托幼机构数量及入园入所的幼儿人数就更多了。据上海市妇联统计，这一时期共办幼儿园、托儿所29603个，收托孩子 582762名，收托孩子占上海农村学龄前儿童总数的80%。33

人民公社化初期，上海农村在推行集体托幼时虽在“物”“人”等问题上遇到诸多困难，但在动员及组织农民的过程中完成了塑造具有共产主义精神的“新农民”的第一步。尽管，因推广时急于求成而问题渐显，但上海农村的集体托幼并没有停止，反而在“全民托幼”的大潮中真正步入“实践期”，进入一个崭新阶段。这是因为它始终配合农业生产发展34和各个阶段的政策调整而进行，中心工作起起伏伏，农村集体托幼实践便随之波浪式地向前发展。35

二、共产主义议题凸显与农民集体托幼需求（1959年1月—8月）

国家要求在农村大力推行集体托幼，并不意味着所有农民都会自觉参与。一些家长担心孩子在托儿所可能会“吃不饱”，或被大孩子“欺辱”，36上海市委妇女工作委员会发现“儿童实到数往往少于报名数，有时仅及一半”37。多数社员对生产大队负担全部托费不满意，有人发牢骚说：“领这两个囡，工分弄光”38，认为有孩子入托特别是有多个孩子入托的家庭占尽便宜39。之所以出现此类声音，是因为虽然新中国成立前后的土地改革重组了农村权力关系，使农民感受到了国家权威，但尚未改变其劳作和生活秩序，仍然如传统乡土社会时一样生活，没有产生公共育儿的需求。40随着20世纪50年代后期党和国家在农村的中心工作——人民公社化运动的到来，这种秩序才被彻底改变。自此，参加集体劳动与做好分配成为农民的基本任务和利益所在。为了保障个人在人民公社中的利益并维持其运作发展，他们较为普遍地“自动”产生了集体托幼的需求。

站在历史的比较视野来看这一问题，会更加清晰。不独中国共产党，20世纪二三十年代进行乡村幼稚园试验的陶行知和国民政府也曾在推行托幼的同时着力推进集体合作，却都没能将二者结构性地联系起来。

1926年，在关于创设乡村幼稚园问题的文章中，陶行知将平民化视作建立乡村幼稚园的关键之一。如其所言，农民贫且忙，幼稚园应济“农村需要”。41因此，他试图把建设乡村幼稚园与改善农民生计结合起来：通过就地取材、物尽其用解决房屋、用具等设备，以农民生产和生活时间为准安排幼儿活动，并将幼儿的康健放在第一位，以此实现幼稚园“下乡”。42陶行知建设乡村幼稚园的初衷在于通过解决农村幼儿集中教养问题纾解农民的“穷愚”困境，带有造福农民的公益性和福利性。但由于时局动荡与经费短缺，以及农村生产方式43等诸多因素，陶行知的乡村幼稚园试验没有也不可能促生农民内在的托幼需求。在此情况下，他力图通过建设乡村幼稚园为国育儿的愿望也只能止于零星试验。

20世纪30年代国民政府主导的托幼事业，其政策体系包括宗旨、课程、经费、管理及师资等，无疑为托幼的发展提供了政策保障，这套政策体系的建构是“自上而下”与“自下而上”相结合，既有中央政府为主导的取向，也有专家学术研究成果的推进，如陶行知等的幼稚园试验对中央政策的制定助益良多。44因而，在相当程度上，国民政府认可了陶行知等人乡村幼稚园建设的理论及实践，只不过注重为推行托幼提供政策保障的做法仅在表面上加深了乡村幼稚园建设的“国家化”程度，并没有改变民间“自治”的格局。45诚如时人所批评的，在乡村，大部分的合作都被豪绅所把持，外界无法跳过他们去直接组织农民。46国民政府虽曾试图将政权下沉至乡村，但因轻视乡村又缺乏动员能力，以失败告终，47即国家权力难以“下乡”。概言之，虽然国民政府加大了政策层面的介入力度，但仍属民间“自治”的乡村幼稚园注定很难与农民的集体托幼需求结构性地联系起来。九一八事变后，国民政府的托幼事业虽有发展，但极不平衡。1932年，即便在托幼机构较为发达的上海，其幼稚园的幼儿总数也只有1045人，48而上海县、青浦县、南汇县、松江县、金山县、川沙县保姆所幼儿数合计仅190人，49乡村托幼严重落后。此外，当局更将扩大托幼规模与“培养教化国民”联系起来，50在未能创造出农民内在的集体托幼需求的情况下，此等流于空泛、无所着落的说教51预示着乡村托幼只能停留在专家的实践层面。

反观20世纪50年代后期，中国共产党在农村推行集体托幼时，在国家强力主导下，首先将分散的小农全部组织到共产主义的人民公社之内。实践中，开始时大办集体托幼的热潮和随后的整顿总体上还算稳步、健康地发展。521959年3月以后迅猛发展，全国形成集体托幼高潮，这使“共产主义”成为农民不得不面对的严肃议题。对于一些无子女农民的不满，干部教育他们要基于集体利益，且要有长远眼光，认识到“办托儿所、幼儿园是人民公社的优越性，现在没有囡，将来有囡，现在没有囡，下代有囡”53。一些保育员认为自己工分低，还被家长看不起，情愿去田里生产，干部就开展“我为人人、人人为我”的辩论，讲明带孩子比生产更重要，使她们明了保育事业的意义，树立托幼工作的光荣感。54此种情形非上海独有，全国各地屡见不鲜。如河北省徐水县就“小孩子要不要由公社来抚养”展开辩论，没有小孩的社员认为自己“吃亏了”，不同意开办，有小孩的社员当即反驳：“你现在养活俺小孩，将来你老了还不是由俺小孩养活你吗？”结果，那些怕吃亏的人最终被“辩倒了”。55所谓“辩论”，事实上就是毛泽东批评的“动不动‘辩你一家伙’”56，实际上已容不得落后分子。上海市嘉定县要求办幼托的大字报就占了50%，奉贤县的家长一定要把未满3岁的幼儿送入全托，为解决孩子穿衣服的后顾之忧，还把布票也交给了幼儿园。57无论抱怨还是从众，对于农民来说，传统的劳作和生活秩序发生了彻底改变。既然进了共产主义的人民公社，就要参加集体劳动并做好分配。这意味着在家带孩子就少了参加集体生产的劳动力，影响全家拿工分及年底收入，于是，农民在传统乡土社会所没有的集体托幼需求被激发出来。

概言之，与陶行知和国民政府推行的乡村幼稚园相比，中国共产党主导的人民公社化运动在凸显共产主义议题的同时，结构性地激发出农民集体托幼的需求。此种需求虽然源于国家这一外部性动力，却因为农民难以选择而成为其内在需要。毛泽东认为，小孩进托儿所，教育、食宿等都由社会负担，不是破灭家庭，而是废除家长制。58这也回应了费孝通对传统乡土社会孩子由家庭抚育的思考。费孝通曾明言，若“以家庭和保育院来比较的话，大体上家庭里所生长出来的孩子比较健全些”59。他还认为，乡土社会的农民只有在偶然或临时的非常态中才需要伙伴和团体60，除非中国社会乡土性的基层发生了变化61。如果说20世纪50年代中期的农业合作化运动所带来的正是这样一种根本性变化的话，那么50年代末的人民公社化运动所造成的农村集体托幼则是这种根本性变化的更高表现形式。

关于人民公社化之后农民迫切需要集体托幼的原因，上海市幼儿保健教育委员会的调查报告极具代表性：

对于有孩子入托特别是有多个孩子入托的家庭而言，孩子越多，用在孩子身上的补贴就越多。日托尚且如此，贴粮贴钱的全托对孩子的补贴则更多，生产大队除补贴25斤左右粮食外，每年用于一个全托儿童的费用更是多达30—40元。62

类似的情况还发生在七一公社联明生产大队。由于是棉、粮、菜夹种地区，该大队较为富裕，一个劳动力全年平均收入为272.28元，63如果以此为基数，补贴一个全托儿童的30元—40元相当于一个劳动力一年收入的11%—14.9%。也就是说，一个家庭如果有一个孩子进全托，其所耗补贴相当于一个劳动力一个多月的收入。事实上，有些家庭不止一个孩子进全托。加之，托儿所、幼儿园的日常开支由大队公益金担负，在家长负担一定工分后，大队全年托幼经费的支出占公益金的65.6%，远高于困难户补助、工伤等其他福利性开支。64由于集体托幼实行包下来的政策，特别是“吃饭不要钱”65，在上海农村，甚至出现这样的情形：

不少夫妻一方或两人从事非农业生产的职工家庭以及有亲戚在农村的家庭也将孩子送入托儿所幼儿园中，甚至有少数人代人领养，自己却“拿了领养费”。对于这些未参加集体生产劳动的家庭而言，无疑享受了与社员一样的福利。66

农民内在的集体托幼需求正是在这样的情境下被激发了出来。换言之，人民公社化初期的集体托幼成为农村干部群众接轨共产主义的重要工具。

国家不仅激发了农民集体托幼的需求，还通过兴办更完善的托幼组织充分保障这种需求的实现。1959年3月，全国妇联提出“在具有基本条件的地方”必须“积极办好全托”之后，上海市妇联要求配合人民公社化运动和农业生产发展需要，切实将办好全托纳入全面规划。67不仅将原有的临时性、季节性的托儿所全部转为日托，还兴办了大量新的日托与全托。大量园所的兴办确实解决了不少农村妇女生产牵累与孩子照管的难题，使女性安心投入生产。至此，上海农村的集体托幼真正成为一项“运动”。4月，仅嘉定、奉贤、松江三个县，共办起5539个托儿所，11个县共办15635个托儿所，入托幼儿达到233999名。68

然而，1958年年底至1959年年初麻疹的流行，打乱了工作步调。由于卫生知识、传染病预防及隔离条件缺乏，致麻疹传染面扩大，造成不少幼儿死亡。69家长们非常惊慌，纷纷把孩子抱回家，致使托儿所、幼儿园缩减。不久，上海召开五级干部会议贯彻第二次郑州会议精神，“精简生活服务人员到7%左右”以补充、加强农业生产战线。在此形势下，奉贤县一生产队队长认为劳动力如此紧张，却让十几个人带孩子“不合算”，急欲把幼儿园“砍掉”，将省下来的保教人员充实到大田上去，70部分托幼机构形成了“无人照管孩子”的局面。这些言行不免偏颇，但由此也可以看出，生活还是要让位于生产。这种影响在1959年夏季全面体现出来，上海农村托儿所和幼儿园数量出现下降。如金山县新农公社，原有109个幼儿园，收托 1000 多名孩子，在五级干部会议后仅留下13个幼儿园，收50多名孩子，下降90%。71

三、在革命口号中促使农民奔向共产主义（1959年9月—1962年）

1959年庐山会议后，中央层面开始“反右倾”斗争，这为处于低迷的农村托幼事业带来了转机。在“反右倾”的总形势下，农村福利工作的“倒退”往往被视为“右倾”的表现。1959年9月，上海开始着手恢复、提高农村福利方面的工作，再次确立“一手抓生产，一手抓生活”的工作方针。为了强调政治挂帅，各级党委书记都亲自抓托幼工作。72在干部群众中间，针对错误观点或行为进行了思想教育。73最为常见的教育方式是“调查”。例如，川沙县对一些公社进行调查后，发现不少家长“有需要而未送托”，在深入了解原因的基础上，不仅教育他们应关心儿童的安全，还设法帮助解决实际问题。74据说这个办法很灵验，通过调查及时发现并解决各种问题，使很多家长、保教员及基层干部都期盼将日托转为全托。75

“反右倾、鼓干劲”开始后，上海市委在基层公社紧锣密鼓地宣传八届八中全会精神，要求各项工作“鼓足干劲”76，提出了不少脱离实际的园所建设任务和办园目标，农村再次掀起“全面跃进”的高潮，加强对托幼的领导成为基层干部的重要工作。如南汇县惠明公社明六生产队周水连听了会议精神传达后，回去立即办了4个托儿所和4个幼儿园。77

在“大跃进”的氛围下，为了不断开办新幼儿园以完成高指标，1959年12月，上海市教育局制定《农村幼儿园民办公助办法（草案）》（以下简称《办法》），要求继续在做好师资培养，提供教材，充实公社幼教干部，发动“公带民、老带新”等工作的基础上，对一些经济困难的幼儿园给予一定的补助，并对办得好的幼儿园给予一定的奖励，以促进农村幼儿园的繁荣和发展。如补助新建幼儿园3元—5元，补助困难幼儿园每班每季度不超过6元，每季度被评为先进的幼儿园奖励不超过10元。此外，还规定补助及奖励应作充实教育设备之用，如教养员用的参考资料、图书、、教具，或简单的卫生医药箱、毛巾、脸盆、肥皂等。78该《办法》的出台是自1958年以来，上海市财政第一次大规模地对农村人民公社开办的园所给予资金补助。在当时财政极为有限的情况下，这项补助极为可贵，充分表露了政府对托幼工作的期待，在一定程度上也有助于解决实际的资金困难。随着《办法》的逐步落实，上海的农村托幼事业迎来了1960年的大发展。

1960年4月，为了掀起集体托幼的新高潮，上海市委周密部署，各部门高度重视并做了大量工作。9日，时任上海市委第一书记的柯庆施提出上海必须“逐步分批实现公社化”79；中旬，上海市卫生局与教育局向上海农村派驻“幼托事业工作队”80，宣传托幼的意义，培训保育人员，组织示范教学，制定规章制度等81。该工作队抓住薄弱环节，帮助兴办托幼，同时也起到了“督促”公社及大队干部的作用。如奉贤县肖塘公社原来只有7所幼儿园，在工作队的帮助下办起了82所幼儿园，托儿所也从97所增加到172所。82当月，上海成立市幼儿保健教育委员会，旨在加强对托幼工作的统一领导。83不久，该委员会向上海市委提交了《关于幼儿园、托儿所发展情况和今后打算的报告》，在肯定前期依靠“穷办法、土办法”兴办托幼事业做法的基础上提出，今后必须坚持“边发展、边整顿、边巩固、边提高”的原则，才能促进托幼事业“多、快、好、省”地发展。此外，它还对农村托幼作出系统性规划：“县、公社、生产队各级的幼托工作组织，层层要有专人负责，以加强对这项工作的领导”，要求农村入园入托儿童国庆前达到85%以上。为了完成指标以“出色的成绩迎接国庆”，须掀起三个高潮：结合“六一”评比表扬先进儿童工作者和先进儿童工作集体，造声势、树标兵；7月，对幼儿园、托儿所，组织一次夏令卫生工作大检查，以提高卫生保健水平；9月，再组织一次托幼工作的全面性对口检查。84这些部署不可谓不细致周到。

为了完成上述指标，上海农村各县首先指定专人负责托幼工作的领导。金山、青浦、松江等县设立生活福利委员会85，嘉定、上海、川沙、浦东、奉贤等县成立了生活福利办公室。通过“六一”评选工作，南汇、崇明两县又在生活福利办公室下设托幼小组或托幼办公室领导托幼工作，宝山县则专门建立了托幼委员会。这些职能部门或小组的建立，使得托幼工作经常被提到议事日程并做统一布署。基层公社也专门配备了负责托幼工作的干部。如泥城公社的党委书记就对托幼工作做到“五抓”（一抓干部群众的思想教育；二抓规划，做到心中有数；三抓统一安排，安排生产的同时安排托幼工；四抓具体问题，如粮食、房屋、设备、工分等；五抓专线领导，层层有人领导，副书记挂帅）。86其次，掀起检查评比、树立标兵的浪潮。公社频繁组织各类检查评比活动，在检查评比后，将检查的情况与各生产队发展托幼的进度表，分发至各生产队，激起“落后”生产队的赶超想法。如金山县紧紧抓住兴塔公社红旗幼儿园这一标兵，开现场会交流经验，全县掀起了“学兴塔、赶兴塔、超兴塔”的高潮。87此外，在浦东、上海、川沙、宝山、松江、崇明、嘉定等县的25个公社和7个镇的妇联联合倡议“积极发展幼儿园，做到凡是无人照管的儿童全部入园入所”之下，大办托幼的友谊竞赛在这些区域不断涌现。88

在上下贯通的组织领导下，至1960年6月，上海农村掀起了一股大办托儿所、幼儿园的热潮。据《文汇报》称，仅嘉定一个县，一个多月内入园入托幼儿就增加了18000多名，当月，全县入托入园幼儿占学前儿童人数的70%以上。其中，势头较好的如城西、封浜等公社这一比例更是达到90%。89又据上海市妇联农村工作部1960年4月统计，农村入托入园人数上升到 40 万。90另据统计，1960年6月中旬，上海农村共有托儿所20201所，收托幼儿251338人，共有幼儿园8354所，收托幼儿26216人，托儿所的收托比例提高到80%，幼儿园的收托比例提高到72.6%。91在以指标为先的“跃进”氛围中，这些数据可能有浮夸成分，但也从侧面呈现出农民在革命口号下奔向共产主义的历史情境。

“大跃进”期间，高指标、浮夸风并非农村集体托幼所独有，但这项工作似乎又有其独特价值，当时的革命口号精炼地体现了这种价值—— “一夜托儿化”，“实行寄宿制，消灭三大差别”92。从某种意义上讲，彼时的农村集体托幼在这种夸张的“革命”氛围中，已然浮现毛泽东所期待的“六亿神州尽舜尧”93的美好图景。

实际上，“大跃进”期间，上海农村的园所大多是匆匆上马，存在诸多问题，比如保育员的卫生知识、业务能力、托儿所设备、环境卫生、管理水平等跟不上，加之其他条件限制，前面描绘的集体托幼成效恐怕与真实情况存在不小落差。当时就有人质疑1958年的集体托幼，比如，“囡多占便宜，我们负担领囡费，做来做去担几个共囡，啥叫按劳取酬”94，引起了有子女入托家庭和无子女入托家庭之间的矛盾95；有的因托儿所“路远不便”，认为没必要再办96；有的在孩子入托后不久就要接回家，态度还十分强硬97。出现这些现象的原因或许是托幼工作不够扎实而使农民心生怨言，但也从侧面说明上海农村此时仍未实现以集体托幼形式塑造共产主义新农民的目标。

尽管“大跃进”时期的集体托幼成果存在一定的虚报浮夸，98但上海农村的集体托幼实践还是取得了一定成效。以上海农村某生产大队为例，自1958年人民公社建立的三年里，该大队由原来1个农忙托儿所发展为3个常年托儿所。幼儿在托儿所里生活得很健康，也未发生过重大事故，家长说“有了托儿所孩子高兴，妈妈也可安心做生活”99。通过兴办集体托幼，上海农村妇女不仅摆脱了孩子的拖累，全心搞生产，还有时间学习文化并脱盲。其中，有的人当上了保育员、教养员和妇女干部；100有的人以往毫无卫生知识，现在能当保育员；有的人过去一字不识，如今能当教养员；有的人以往从不关心政治，现在当起了妇女干部。这不仅大大解放了妇女，还提高了人民公社托幼事业的工作质量和业务水平。101仅1958年下半年，上海市妇联与卫生、教育等部门联合训练公社托幼工作干部就达 250 余人，102这些人能够在人民公社中胜任各自的工作，客观上展现了农村集体托幼实践的效果。

1961年，中央提出“调整、巩固、充实、提高”的方针之后，人民公社迎来了大调整，生产工作成为农村的中心任务，托幼工程自然退居“次要”。此后，农村集体托幼进入常态化阶段。随着工作重心的转移，1962年8月，上海市幼儿保健教育委员会撤销103，农村托幼工作恢复到由上海市妇联农村工作部主管，该部将办园办所的决定权下放给社员群众，托幼机构“办不办，怎么办，办什么样子的”成为群众自己决定的一项事务。104不难发现，上海农村托幼实践进行到此时，已经因为中心工作的变化而承载起新的政治话语。“大跃进”时期培育“共产主义接班人”的宏大历史进程也进入尾声。

四、融入托幼日常的具体措施：培育“共产主义接班人”

1959年六一儿童节，《人民日报》发表社论，号召将托儿所和幼儿园办成“培养共产主义事业接班人的基地”105。上海在推动农村集体托幼实践落地生根的过程中，既要有切合受教育对象特点的动员技术，又要根据“因陋就简”的现实条件做出周密安排，在此基础上，再为具体的国家任务服务。上海实施若干措施，将共产主义从方方面面点滴渗入农民的日常生活，对“接班人”的共产主义塑造愈发深入。

（一）注重卫生保健

在对农民进行集体托幼的有效动员之后，还需培训具有一定业务能力的保育员，这是农村集体托幼能够顺利开展的必备条件。1958年年底以前，由于多种原因，园所的保育员难堪重任，尤其缺乏卫生保健知识。对于实质上仍是农民，业余充当保育员的群体，需要经常性地开展培训工作。1959年春，上海农村开始建构由文教、妇联及卫生部门三者相互协作、配合的培训体系，在编写保健知识丛书的基础上，由各级医务系统如县医院、护士学校、妇幼保健所及公社医院主导，分层分批对保育员进行脱产或不脱产、定期或不定期、长期或短期的卫生保健知识培训，106如培训保育员必须学会和做好除“七害”107、讲卫生，晨间检查，预防传染病，孩子有病会隔离和报告，培养孩子的卫生习惯，孩子的饮食和卫生，安排孩子的生活，各种消毒工作，保护孩子的安全，教养孩子等十件事，使其逐步达到初级保育护士水平。108

事实上，要做好幼儿疾病预防及应急处理，仅凭培训保育员这一项措施是远远不够的。因此，在“预防为主、防治结合”的要求下，公社还依托地段与区域的专业医务力量，搭建幼儿疾病预防与治疗平台，通过预防接种，建立上下贯通、层层负责的卫生保健网，并以此作为示范向全县推广。如川沙县蔡路公社幼儿园与卫生院取得联系，通过卫生院妇幼科医生或保健员经常来园做卫生保健的业务指导，建立定期检查制度，并按时进行预防接种。109自1959年卫生保健网建立后，对保教人员及专业医务人员执行卫生保健措施起到了监督、引导与帮助的作用。同时，该卫生保健网在预防幼儿常见传染病上也产生了良好效果110，一旦出现病孩也能及时治疗111，幼儿因病致死的情况很少发生。1960年，园所的麻疹发病率较1958年同期成倍下降，很多园所甚至一年来都没有发生过麻疹。112

此外，上海市妇联农村工作部于1959年3月制订了《关于农村托儿所、幼儿园工作暂行条例（草稿）》，其中，不少详尽的规定为幼儿的卫生保健提供了制度保障。比如，在园所选址方面，必须选择兼顾幼儿安全和方便家长接送的地方，应注意平坦宽敞、清洁卫生、空气流通、阳光充足，水塘、河沟、畜圈、马路及医院等旁边不宜设园所；在食具方面，幼儿的开水壶、碗筷须自备一套，并设自来水冲洗脸、手，避免传染疾病；在预防保健方面，定期为幼儿预防接种和健康检查，有病须立即隔离；在生活制度方面，吃饭、睡眠、游戏、洗脸要有秩序。每日晨间检查，食具、玩具每周须消毒1至2次，饭前便后要洗手。113据上海市妇联农村工作部检查，有1/4的园所落实较好114，这些规定促使幼儿养成了清洁卫生的习惯。115一些模范幼儿园如宝山县吴淞公社卫星幼儿园的日常保健常态化，每日写生活记录，随时掌握幼儿的吃睡、大小便情况，还设有隔离室。116以上举措可谓具体而微。

应该说，通过卫生保健的宣传、人员培训、体系建构及制度落实等各个环节，农民对于以“新农民”之姿投入托幼卫生保健事业的认知还是有所提高，学习保健知识及养成卫生习惯的积极性也随之被激发出来。农民与集体、国家之间的联结被强化的同时，对共产主义的认同也会显著加深。

（二）优先供应饮食

在构建卫生保健体系的同时，还需优先供应幼儿的饮食。“身体是革命的本钱”，幼儿茁壮成长、体魄强健是推行农村集体托幼的必然要求。正是出于这种考虑，基层干部对幼儿的膳食给予了极大关照，在饮食供应方面，托幼组织在同期的农村福利组织中处于被优先供给的地位。

上海市妇联农村工作部要求幼儿饮食必须由专人负责，单独做适合幼儿年龄的饭菜，并按50∶1的比例配备炊事员。117幼儿的一日三餐要与成人饮食有所区别，且专门烧各式小菜，以换口味。就连幼儿平日喝的开水，洗脸洗脚用的热水，也都由食堂供应，以减少疾病的发生。118并且，根据幼儿年龄定粮，实行“专人管理、计划用粮”。如奉贤县三官公社胡村生产队的全托幼儿园每人每月定粮，大的孩子（虚岁7岁—8岁）每天12两左右，小的孩子（4岁—6岁）9两左右。119“吃得饱”这一要求基本能得到保障，甚至还略有积余。

在“吃得饱”的基础上，还讲究营养均衡、荤素搭配，即“吃得好”。一些生产队特意为全托幼儿园配置了一定面积的自留地，供保教人员耕种以提高幼儿的伙食水平。如南汇县大团公社沙庙生产队的全托幼儿园自种高粱、黄豆、玉米、芋艿、卷心菜、红萝卜等，做到了生产队不再贴粮食，蔬菜也可基本自给。120这些粮蔬专供幼儿园使用，不用上交集体。为了保证蛋白质的供应，人民公社还通过各种手段尽量予以满足。不少时候，由生产大队专为幼儿购买含有蛋白质成分较高、易于消化吸收的鱼、蛋、肉类等食物。121园所自养的家禽家畜也可稍作补充，改善幼儿伙食。如宝山县新生生产队幼儿园养了鸡、鸭、猪、羊和兔子，每逢节日都可以吃到荤菜122，有些幼儿园甚至隔一天就能吃到荤菜123。此外，不少公社的供销部专为幼儿提供一定量的糕点、点心、糖果、饼干及线粉124，在夏季，还为幼儿供给特定的食物或饮料以防暑降温125。至于经费，一般由大队公益金担负。126公益金不足时主要靠农民自己解决，保教人员、食堂等工作人员也主要来源于农民，这使得农村集体托幼看似颇有群众“自主办园”的味道。但很显然，这场实践的主导力量、决定性因素还是党和政府。

由于公社对托幼饮食及营养方面的重视，园所的幼儿一般比散养在家的幼儿待遇更好，有生产队队长称“小囡粮食够吃，荤菜、红枣、饼干样样优先供应，比家里养得壮多了”127。在缺粮少食的困难时期，这些共产主义“幼苗”无疑享受着优先待遇。一个显著的事实就是，送进全托的孩子除本人口粮外，生产队给每个孩子每月平均补贴2斤左右的粮食，散养幼儿则没有。也难怪一些没有子女入托的社员自认为“吃亏”。128应该指出的是，对托幼“优先供应饮食”是符合实际的合理安排。广大农民在回报很少的情况下兢兢业业投入集体托幼实践，其历史作用不应被低估。即便有不足，主要也源于难以逾越的历史条件限制，不应过于苛责。

（三）强调全面教育

集体托幼实践的基本诉求在于“教养结合”以培育共产主义接班人。所谓教养结合，即仅关注幼儿的卫生保健、饮食营养是远远不够的，还必须加入“教”的因素。上海市教育局明确要求从幼儿园的性质和“两大任务”129出发，强调幼儿园既是福利机构又是教育机构，特别批评了对“教育机构”性质认识不足的问题。130为此，在幼教人员的配备上，上海市妇联农村工作部要求个人成分、身体状况、工作态度等基本条件满足外，还特别强调“最好是具有一定的文化水平的青壮年”担任。131总的来看，这一时期农村托幼在培养德育和智育方面的“接班人”这一问题上做出了许多尝试性探索。

为了培育“德才兼备”的共产主义接班人，园所十分注重幼儿的德育。在德育方面，主要包括集体主义教育、人民公社的认同教育及热爱国家与领袖方面的教育。中国福利会幼儿园不仅编写出版书籍，还利用游戏及表演让幼儿明白集体主义的好处与意义，如“集体劳动生产出来的农作物，比个人劳动生产出来的农作物既多又好”，同时将集体主义意识落实在幼儿日常行为中，要求各班互助浇水施肥，“共享”劳动成果，提醒孩子“长出来的菜是大家的，不分你的我的”。132此外，教养员时常带领幼儿观察公社出现的各种新兴事业和群众福利事业，如工厂、食堂、敬老院、俱乐部等，使孩子们懂得“有了共产党和毛主席，成立了人民公社，人们的生活将越过越好”133，教幼儿学唱《打夯歌》《人民公社真正好》《国旗歌》《爱毛主席》等儿歌134，要求中班孩子学会写“毛主席”三个字135。

在智育方面，要求逐步培养幼儿的感官、语言、思维、动手等诸多能力。1960年前后，上海市教育局出版了大量幼儿教材与教辅资料，要求幼儿学习拼音、认识汉字及学会计算，并针对幼儿年龄，提出了不同层次的要求。如小班学会1—5的数的概念，中班学会10以内的加减法，大班学会20以内不进位、不退位的加减法，学会口编应用题等。136其中，南汇县惠南幼儿园在计算教学方面被列为先进，该幼儿园在教学思想上实现了智育与德育两方面的紧密结合，联系政治形势，生活实际以及生产实际进行计算教学。例如，给幼儿讲“人民公社好”时，把平时使用的教具和增添的新教具结合起来反映农村全面发展，表现农民生活水平的提高，并采取多样化、针对性的教学形式，用蜡光纸剪成钢铁元帅、棉花姑娘、麦公公之类等进行计算教学，用实物和玩具、卡片进行计算，用计算木架数数、计算等。137凡此种种或许可以说明，上海农村集体托幼实践在贯彻落实各项国家任务方面发挥了积极作用。

无论是幼儿，还是从事托幼事业的工作人员在承接以上三项具体措施的过程中，感受到的不再是宏大高远的共产主义意象，而是融入日常的共产主义。由此，农村集体托幼实践对农民的共产主义塑造润物无声地走向具体深入。

五、结语

从全国范围看，人民公社化时期的农村集体托幼实践成绩斐然，帮助大量农民，尤其妇女，有更多精力投入农业生产。但这场集体托幼实践并非只为实现家务劳动社会化以解放妇女，更有着力塑造共产主义新农民之意图。通过考察1958年至1962年的上海集体托幼实践可以发现，由于这场实践始终跟随党和国家各个时期的中心工作，而且实行了注重卫生保健等具体措施，其对农民的共产主义塑造体现为一系列的革命性实践。

人民公社化初期，无论是推行“儿童集体化”还是倡导“我为人人、人人为我”，都将农村集体托幼与塑造农民对共产主义精神的政治认同结合起来。人民公社化运动在凸显共产主义这一严肃议题的同时，结构性地激发出农民集体托幼的内在需求，集体托幼实践由此真正成为一项“运动”，集体托幼成为农民接轨共产主义的重要工具。“大跃进”及其后的常态化集体托幼促使农民在革命口号中奔向共产主义。而培训保育员，构建卫生保健网，优先供应幼儿饮食，以及强调全面教育，则是推行农村集体托幼的具体措施，共产主义由此融入农民的日常。这些革命性实践为农民诠释出集体主义、人民公社、爱国、爱领袖等诸多意涵，促使他们形成对共产主义的政治认同。农村集体托幼实践塑造共产主义新农民的丰富历史图景也由此展开。

最后，似有必要对农村集体托幼实践塑造出的共产主义新农民的内涵作进一步讨论。前已述及，新农民不仅包括有孩子的农民、无孩子的农民，也包括农村基层干部、园所的保教人员，还包括园所的幼儿。对这些看似不同群体的农民而言，集体托幼都有一个从最初的诸多顾虑发展成为其内在需求的过程，且这种需求在“大跃进”的情境中达到最高潮。站在国家的角度来看，无论有孩子的农民还是无孩子的农民，都是为了更好地担负起他们作为”国家的农民”的责任，农村的基层干部与园所的保教人员为的是更好地完成“国家的托幼工作”，园所的幼儿则是为了更好地成为“国家的接班人”。因此，虽然陶行知和国民政府的乡村幼稚园实践与人民公社化初期中国共产党主导的农村集体托幼实践都担负着“为国”的重任，但只有后者才能有效完成这一任务。

本文转自《开放时代》2024年第6期

2024-12-02
黄玉顺：杨叔姬：辩证美恶的春秋女哲

杨叔姬（生平不详），杨氏，晋国大夫羊舌职（？—前570年）之妻，羊舌肸_xī（叔向）之母，史称“羊舌叔姬”。孔颖达说：“羊舌，氏也，爵为大夫，号曰‘羊舌大夫’。”[②] 杨叔姬之“姬”并非姓氏，因为其丈夫羊舌职为姬姓，同姓不婚，则杨叔姬不可能姓姬；“姬”是古代女子通用之美称，犹如“子”是古代男子通用之美称。至于杨叔姬的“杨”，究竟是其父族姓氏，还是其夫族姓氏，暂无定论。或以为羊舌氏即“杨氏”，因为叔向食邑在杨（今山西省洪洞县东南）。[③] 如《左传》“晋杀祁盈及杨食我”杜预注：“杨，叔向邑”[④]；又“分羊舌氏之田以为三县”孔颖达疏：“伯石（叔向之子）为杨石，明杨氏是羊舌之田也”[⑤]。但孔颖达却又说：“《谱》云：‘……羊舌，其所食邑也。’”[⑥] 因此，叔向的食邑究竟是“杨”，还是“羊舌”，待考。

但杨叔姬的儿子叔向，即杨叔姬与羊舌职的次子羊舌肸，却是春秋时期大名鼎鼎的人物，姬姓，羊舌氏，名肸，字叔向，又称“叔肸”“杨肸”，晋国大夫，乃是当时著名的政治家，与郑国的子产、齐国的晏婴齐名。

杨叔姬的事迹，见于《左传》《国语》及刘向《列女传》等。

一

杨叔姬是一位极具智慧的女性，这主要表现在她对丈夫羊舌职加以规劝和对儿子羊舌肸加以训诫的言论之中。

（一）文献的记载

刘向《烈女传》记载的杨叔姬对丈夫的规劝：

羊舌子好正，不容于晋，去而之三室之邑。三室之邑人相与攘羊而遗（wèi）之，羊舌子不受。叔姬曰：“夫子居晋，不容；去之三室之邑，又不容于三室之邑，是于夫子不容也，不如受之。”羊舌子受之，曰：“为肸_xī与鲋亨（pēng）之。”叔姬曰：“不可。南方有鸟，名曰乾吉，食（sì）其子不择肉，子常不遂。今肸与鲋，童子也，随大夫而化者，不可食以不义之肉。不若埋之，以明不与（yù）。”于是乃盛以瓮，埋垆阴。后二年，攘羊之事发，都吏至，羊舌子曰：“吾受之不敢食也。”发而视之，则其骨存焉。都吏曰：“君子哉！羊舌子不与（yù）攘羊之事矣。”君子谓叔姬为能防害远疑。[⑦]

这里“羊舌子”即指杨叔姬的丈夫羊舌职，故下文杨叔姬称其为“夫子”。“好正”意谓正直。三室之邑，地名，不详。据《左传》载：“天子建国，诸侯立家，卿置侧室，大夫有贰宗，士有隶子弟，庶人、工商，各有分亲，皆有等衰。”郑玄注：“侧室，众子也。”孔颖达疏：“正室是適_dí子（嫡子），故知侧室是众子，言其在適子之旁侧也”；“其侧室一官，必用同族，是卿荫所及，唯知宗事”。[⑧]《左传》“赵有侧室曰穿”郑玄注：“侧室，支子”；孔颖达疏：“正室是適子，知侧室是支子，言在適子之侧也”；“（赵）盾为正室，故谓（赵）穿为侧室”。[⑨] 此说可资参考。

攘，偷窃。遗，馈赠。“肸”指羊舌肸（叔向）；“鲋”指羊舌鲋（叔鱼），羊舌职和杨叔姬的儿子，即叔向的同母弟。亨，同“烹”。上古“烹”“享”“亨”不分，作“亯”，许慎《说文》解释：“亯，献也”；“象进孰（熟）物形。《孝经》：‘祭则鬼亯之。’”徐铉注音：“许两切，普庚切，许庚切。”[⑩] 乾吉，鸟名，出处不详。食，喂养。遂，成长。“大夫”指羊舌职。“化”，变化。“随大夫而化”，意谓儿子会受父亲的影响而变化心性。“不与”，没有参与。垆，通“庐”；垆阴，屋后。都吏，都邑的官吏。

（二）“不可食以不义之肉”的哲学意义

刘向赞誉杨叔姬“能防害远疑”，纯粹是从“明哲保身”的功利角度而论；其实不仅如此，杨叔姬强调“不可食以不义之肉”，乃是一个涉及“义利之辨”的问题。这是一个重要的中国思想传统，例如，《左传》开篇即载：“大（tài）叔（共叔段）又收贰以为己邑，至于廩延。……公（郑庄公）曰：‘不义，不昵，厚将崩。’”郑庄公还指出：“多行不义必自毙。”[11] 这是说共叔段的贪利忘义，必将不得善终。

至于杨叔姬所谈及的怎样教养儿子的问题，卫国大夫石碏_què也曾指出：“臣闻：爱子，教之以义方，弗纳于邪。”[12] 这里的“义”“义方”，孔颖达解释为：“义者，宜也。教之义方，使得其宜。”[13] 诚然，“义”经常可以释为“宜”。例如《中庸》也这样讲：“义者，宜也。”[14] 不过，“义”也常释为“正”。“义”兼“正”与“宜”二义，后来成为儒家正义论的两条基本的正义原则，即正当性原则和适宜性原则。[15] 石碏这里所谈的“义”，乃是与“邪”相对而言的，显然意谓“正”，诚如孟子所说：“义，人之正路也。”[16] 这与杨叔姬所要表达的意思是一致的，她所警诫的“不义”，是指接受“攘羊”，正是说的不正当、非正义。

这种“义利之辨”的思想传统，后来孔孟儒学特别加以发挥，朱熹称之为“儒者第一义”[17]。如孔子说：“君子喻于义，小人喻于利”[18]；“见利思义”[19]；“不义而富且贵，于我如浮云”[20]。《孟子》开篇就讲：“何必曰利？亦有仁义而已矣。”[21] 孟子的问题意识是：“其所取之者，义乎，不义乎？”[22] 他主张“非其义也，非其道也，一介不以与人，一介不以取诸人”[23]；“不义之禄而不食也”，“不义之室而不居也”[24]；否则，“君臣、父子、兄弟终去仁义，怀利以相接，然而不亡者，未之有也”[25]。对于儒家这种“义利之辨”的思想来说，杨叔姬乃是其先驱之一。

二

刘向对杨叔姬的赞誉，主要是突出她洞察人性、推知人生、预见命运的智慧，从而“颂曰：叔向之母，察于情性，推人之生，以穷其命”[26]；但实际上，杨叔姬的言论所蕴含的思想意义远不止此。

（一）文献的记载

据《国语》载：

叔鱼生，其母视之，曰：“是虎目而豕喙_huì，鸢_yuān肩而牛腹，谿_xī壑可盈，是不可厌也，必以贿死。”遂不视。[27]

叔鱼，羊舌鲋，叔向的同母弟弟，晋国大夫。“其母”即杨叔姬。虎目，指涉贪欲，出自《周易》“虎视耽耽，其欲逐逐”[28]。鸢肩，像鸱鸟两肩上耸，形容其丑陋。牛腹，指其胃口很大，与下文“谿壑可盈”相呼应。“谿壑可盈，是不可厌”，犹今所谓“欲壑难填”。韦昭注：“（叔鱼）后为赞理，受雍子女而抑邢侯，邢侯杀之”；“食我既长，党于祁盈，盈获罪，晋杀盈及食我，遂灭祁氏、羊舌氏，在鲁昭二十八年”。厌，满足。贿，受贿，这里具体指“雍子入其女于叔鱼”（详下）。

此事另见于刘向《烈女传》：

叔姬之始生叔鱼也，而视之曰：“是虎目而豕啄，鸢肩而牛腹，溪壑可盈，是不可厌也，必以赂死。”遂不见。及叔鱼长_zhǎng，为国赞理。邢侯与雍子争田，雍子入其女于叔鱼以求直，邢侯杀叔鱼与雍子于朝。……遂族邢侯氏，而尸叔鱼与雍子于市。叔鱼卒以贪死，叔姬可谓智矣。[29]

赞理，理官（掌管诉讼）的助理。“入其女以求直”，将女儿嫁给叔鱼，以求胜诉。这里“族”谓灭族，动词。尸，暴尸示众。

这里刘向评价杨叔姬“智”，是指她能预见羊舌氏将来会遭到毁灭的命运。有意思的是，这种预见的原初依据，却是她的儿子相貌之丑恶。这在今天看起来颇为荒诞，似乎丑人必是恶人、必有恶报。不过，这并不是杨叔姬思想的特色；这件事情的意义并不在此，而在其涉及美丑善恶的关系问题。且看《左传》的一段记载：

初，叔向之母妒叔虎之母美而不使，其子皆谏其母。其母曰：“深山大泽，实生龙蛇。彼美，余惧其生龙蛇以祸女（rǔ）。女（rǔ）敝族也。国多大宠，不仁人间（jiàn）之，不亦难乎？余何爱焉！”使往视寝，生叔虎，美而有勇力，栾怀子嬖（bì）之，故羊舌氏之族及于难。[30]

叔虎，叔向的异母弟弟。叔虎之母是叔向的父亲羊舌职之妾。“不使”，不让她侍奉羊舌职。杜预注：“不使见叔向父。”敝族，衰败的家族。大宠，有权势的宠臣。杜预注：“六卿专权。”间，在君主和羊舌氏之间离间。爱，不舍，此处指嫉妒。嬖，宠爱。栾怀子，栾盈，姬姓，栾氏，名盈，栾桓子之子，晋国下军佐。在栾盈（栾氏）和范宣子（范氏）的斗争中，叔虎因依附栾盈而被杀，其兄叔向亦受牵连而被囚，最终导致羊舌氏被灭族。

（二）“不仁人间之不亦难乎”的哲学意义

杨叔姬的这句话不可轻轻放过：“不仁人间之，不亦难乎？”这里，杨叔姬特别强调了“仁”；众所周知，“仁”是后世儒家的核心观念。同时，杨叔姬还强调“义”，谓之“德义”（详下），如上文谈到的“不可食以不义之肉”；我们知道，“义”也是后世儒家的一个核心观念，即儒家的社会正义原则。[31] 这就涉及“义”与“仁”的关系问题。其实，在中国思想传统中，“仁”与“义”不是并列的观念，即不是后世所理解的并立的“德目”，而是一种观念奠基关系，即“仁→义”。[32] 孟子指出：“仁，人之安宅也；义，人之正路也”；这是“居仁由义”的理路。[33] 孟子还说：“仁，人心也；义，人路也。”[34] 这就是说，正义原则是由仁爱精神奠基的。显然，杨叔姬的思想已经蕴含着这种观念奠基关系。

（三）“彼美其生龙蛇”的哲学意义

杨叔姬所说的“深山大泽，实生龙蛇”，比喻“彼美，其生龙蛇以祸汝”。《左传》的时代，龙并不一定是后世的正面形象。[35] 如《左传》载：“郑大水，龙斗于时门之外洧_wěi渊，国人请为萗_cè焉。子产弗许，曰：‘我斗，龙不我觌_dí也；龙斗，我独何觌焉？禳_ráng之，则彼其室也。吾无求于龙，龙亦无求于我。’乃止也。”[36] 又如：“董父，实甚好龙，能求其耆欲以饮食之，龙多归之，乃扰畜龙，以服事帝舜。”孔颖达疏：“扰，顺也。顺龙之所欲而畜养之。”[37] 这里杨叔姬所说的“龙”，颇类似西方人所说的“dragon”，乃是凶恶的形象。

杨叔姬将“龙”与“蛇”相提并论，也是这种意味。“蛇”古字为“它”，《说文》解释：“它，虫也。从虫而长，象冤曲垂尾形。上古艸居患它，故相问：‘无它乎？’”[38] 这就犹如今天见面的问候：别来无恙？段玉裁注：“相问‘无它’，犹后人之‘不恙’‘无恙’也。”[39] 最古的例证，《周易》古经三处谈到“有它”，均指作为敌对势力的外族：《比卦》“有孚盈缶，终来有它”[40]；《大过卦》“有它，吝”[41]；《中孚卦》“有它，不燕”[42]。[43] 显然，“它”即“蛇”是一种凶险的“他者”（the other）的象征。[44]

上面这段记载中的“彼美……其生龙蛇以祸汝”和“美而有勇力……故羊舌氏之族及于难”，初步透露了杨叔姬的“甚美必有甚恶”思想（详下）。

三

杨叔姬最具有哲学意义的思想，就是“甚美必有甚恶”的命题，揭示了美丑善恶之间的辩证关系。同时，她所提出的“有奇福者必有奇祸”，也是颇具哲学意义的命题。

（一）文献的记载

命题“甚美必有甚恶”，出自《左传》的记载：

初，叔向欲娶于申公巫臣氏，其母欲娶其党。叔向曰：“吾母多而庶鲜，吾惩舅氏矣。”其母曰：“子灵之妻杀三夫，一君、一子，而亡一国、两卿矣，可无惩乎？吾闻之：‘甚美必有甚恶。’是郑穆少妃姚子之子，子貉之妹。子貉早死无后，而天锺美于是，将必以是大有败也。昔有仍氏生女，黰_zhěn黑而甚美，光可以鉴，名曰玄妻。乐正后夔取之，生伯封，实有豕心，贪惏_lán无厌，忿颣_lèi无期，谓之封豕。有穷后羿灭之，夔是以不祀。且三代之亡、共子之废，皆是物也，女（rǔ）何以为哉？夫有尤物，足以移人。苟非德义，则必有祸。”叔向惧，不敢取。平公强使取之，生伯石。伯石始生，子容之母走谒诸姑曰：“长_zhǎng叔姒生男。”姑视之。及堂，闻其声而还，曰：“是豺狼之声也。狼子野心。非是，莫丧羊舌氏矣。”遂弗视。[45]

申公巫臣，芈_mǐ姓，屈氏，名巫臣（一名巫），字子灵，曾任申县之尹，故称“申公”。其妻夏姬，姬姓，郑穆公之女，春秋时期四大美女之一，原为陈国司马夏御叔之妻，故史称“夏姬”；先后七次嫁人，最后与巫臣私奔晋国。“叔向欲娶于申公巫臣氏”，叔向想娶巫臣和夏姬的女儿为妻。党，亲族。“母多而庶鲜”，杨叔姬的亲族女子陪嫁过来的很多，但她们能生儿子的却很少。杜预注：“言父多妾媵，而庶子鲜少，嫌母氏性不旷。”“惩舅氏”，以杨叔姬的亲族女子为戒。

子灵之妻，即夏姬。“三夫”指夏姬的三任丈夫，杜预注：陈御叔、楚襄老、巫臣（此时巫臣已死）。“一君、一子”，杜预注：陈灵公（与夏姬私通）、夏徵舒（夏姬之子）。“一国、两卿”，杜预注：陈国；孔宁、仪行父（均与夏姬私通）。“可无惩乎”，能不引以为戒吗？“郑穆少妃姚子之子，子貉之妹”，夏姬是郑穆公的妃子姚子之女，郑灵公子貉之妹。“天锺美于是”，上天将美丽集中在夏姬身上。

有仍氏，古国名。黰，通“鬒_zhěn”，稠密的头发。乐正后夔，帝舜的乐正，杜预注：“夔，舜典乐之君长。”贪惏，贪婪。忿颣，忿怒狼戾。孔颖达疏：“其人贪耆财利饮食，无知厌足，忿怒狼戾，无有期度，时人谓之大猪。”有穷，夏代国名。共子，晋国太子申生。杜预注：“夏以末喜，殷以妲己，周以褒姒，三代所由亡也。共子，晋申生，以骊姬废。”“是物”，这个东西，指美色。尤物，特异的东西，指特别美丽的女子。

伯石，又称“杨石”，即杨食我（？－前514年），杨氏，即羊舌氏，名食我，字伯石，叔向之子；其母是叔向之妻、夏姬之女。子容之母，叔向之嫂。“走谒诸姑”，跑去见她的公婆（即杨叔姬）。长叔，指叔向。姒，指叔向之妻、夏姬之女。杜预注：“兄弟之妻相谓姒。”

杨叔姬所说的“非是，莫丧羊舌氏”，意谓除此人（伯石）以外，没人能够毁掉羊舌氏家族；言下之意，此人将毁掉羊舌氏。如《国语》载：

杨食我生，叔向之母闻之，往，及堂，闻其号也，乃还，曰：“其声，豺狼之声，终灭羊舌氏之宗者，必是子也。”[46]

这段事迹，另见于刘向《烈女传》，文字颇有出入：

叔向欲娶于申公巫臣氏，夏姬之女，美而有色，叔姬不欲娶其族。叔向曰：“吾母之族，贵而无庶。吾惩舅氏矣。”叔姬曰：“子灵之妻杀三夫、一君、一子而亡一国、两卿矣。尔不惩此，而反惩吾族，何也？且吾闻之，有奇福者必有奇祸，有甚美者必有甚恶。今是郑穆少妃姚子之子，子貉之妹也。子貉早死，无后，而天钟美于是，将必以是大有败也。昔有仍氏生女，发黑而甚美，光可监人，名曰玄妻。乐正夔娶之，生伯封，宕有豕心，贪婪毋期，忿戾无厌，谓之封豕。有穷后羿灭之，夔是用不祀。且三代之亡及恭太子之废，皆是物也。汝何以为哉！夫有美物，足以移人。苟非德义，则必有祸也。”叔向惧而不敢娶。平公强使娶之，生杨食我，食我号曰伯硕。伯硕生时，侍者谒之叔姬曰：“长姒产男。”叔姬往视之，及堂，闻其号也而还，曰：“豺狼之声也。狼子野心。今将灭羊舌氏者，必是子也。”遂不肯见。及长，与祁胜为乱，晋人杀食我。羊舌氏由是遂灭。君子谓叔姬为能推类。[47]

这里的“叔姬不欲娶其族”指夏姬之族，不同于《左传》“其母欲娶其党”指杨叔姬之族。宕，放纵。祁胜，晋国大夫祁盈的家臣。

（二）“甚美必有甚恶”的哲学意义

杨叔姬所说的“恶”，兼有两层含义，即形象上的“丑”和道德上的“恶”。这是古汉语“恶”字的常见用法，例如《左传》“美疢_chèn不如恶石”[48]；“姬纳诸御，嬖，生佐，恶而婉；太子痤_cuó，美而很（狠）”[49]；“己恶而掠美为昏”[50]；“丑类恶物，顽嚚不友”杜预注：“丑，亦恶也。”[51] 与此相应，“美”也兼指形象上的美丽和道德上的美善。[52] 这与英文一样，“beauty”兼具美丽、美德之义，“ugliness”兼具丑陋、丑恶、邪恶之义。

命题“甚美必有甚恶”，杨叔姬虽然说是“吾闻之”，似乎那是一句既有的名言，而不是她的首创；但是，在早于杨叔姬的传世文献中，我们却找不到这样的表述。当然，在杨叔姬之前或其同时，也有两个比较类似的表达，均见于《左传》：（1）“齐庆封来聘，其车美。孟孙谓叔孙曰：‘庆季之车，不亦美乎！’叔孙曰：‘豹闻之：“服美不称，必以恶终。”美车何为？’”[53] 这是说其人之德与其车之美不相称，并非杨叔姬所说的“恶是从美转化而来”之意。（2）“侨又闻之：内官不及同姓，其生不殖。美先尽矣，则相生疾，君子是以恶之。”[54] 以上两例，“服美不称，必以恶终”“美尽疾生”的表述，不仅不同于杨叔姬的表述，而且都只谈及具体的“车”“疾”，而没有杨叔姬的表述那种普遍性的全称命题的涵盖力。这就是说，至少从既有的传世文献来看，命题“甚美必有甚恶”乃是杨叔姬的首创。

当然，必须承认，杨叔姬的这番议论，与关于妺（mò）喜（末喜）、妲己、褒姒的“红颜祸水”传统观念是不无干系的。然而，我们必须承认：杨叔姬的表述“甚美必有甚恶”乃是全称判断，其字面含义所呈现出来的乃是一种普遍命题，即揭示了美与丑、善与恶之间的普遍的辩证关系。

不仅如此，还应当注意的是，杨叔姬并没有将“甚美必有甚恶”绝对化，她说：“夫有尤物，足以移人。苟非德义，则必有祸。”这就是说，“甚美必有甚恶”并非绝对的，而是有条件的，那就是“非德义”，即缺乏道德上的正义性，才会由美转恶；反之，如果具有“德义”，则可以说“甚美未必甚恶”。

关于美丑善恶之间的辩证关系，人们通常熟知的是老子的思想：“天下皆知美之为美，斯恶已；皆知善之为善，斯不善已”[55]；“信言不美，美言不信”[56]。但是，老子生活的时代，至今仍然存疑。若根据孔子问礼于老子的历史记载，即老子生活在春秋晚期，则晚于杨叔姬。据《史记》载：“（孔子）适周问礼，盖见老子云。辞去，而老子送之曰：‘吾闻富贵者送人以财，仁人者送人以言。吾不能富贵，窃仁人之号，送子以言，曰：“聪明深察而近于死者，好议人者也。博辩广大危其身者，发人之恶者也。为人子者毋以有己，为人臣者毋以有己。”’”[57] 又载：“孔子适周，将问礼于老子。老子曰：‘子所言者，其人与骨皆已朽矣，独其言在耳。且君子得其时则驾，不得其时则蓬累而行。吾闻之，良贾深藏若虚，君子盛德容貌若愚。去子之骄气与多欲，态色与淫志，是皆无益于子之身。吾所以告子，若是而已。’孔子去，谓弟子曰：‘鸟，吾知其能飞；鱼，吾知其能游；兽，吾知其能走。走者可以为罔，游者可以为纶，飞者可以为矰。至于龙，吾不能知其乘风云而上天。吾今日见老子，其犹龙邪！’”[58] 据此可见，杨叔姬揭示美丑善恶之间的辩证关系，确实是在早于老子的时代。

（三）“有奇福者必有奇祸”的哲学意义

此外还值得注意的是，刘向《列女传》还记载了杨叔姬的另外一个命题“有奇福者必有奇祸”。这是揭示祸福相倚的辩证原理。众所周知，老子也有这样的命题：“祸兮福之所倚，福兮祸之所伏”[59]；“以智治国，国之贼；不以智治国，国之福”[60]。但是，杨叔姬揭示祸福相倚的道理，仍然早于老子。不仅如此，在杨叔姬之前的文献中，也找不到她这样的表述；换言之，命题“有奇福者必有奇祸”同样是杨叔姬的首创。

综括全文，杨叔姬是春秋时期的一位杰出的女哲学家。她早于老子揭示了美丑善恶之间的辩证关系，提出了“甚美必有甚恶”的哲学命题。并且，她所提出的“甚美必有甚恶”命题不是绝对的，而是有条件的，即“非德义”。显然，她所强调的“德义”原则，包括“不可食以不义之肉”原则，乃是儒家“义利之辨”的思想先驱之一。同时，她还先于老子揭示了祸福相倚的道理，提出了“有奇福者必有奇祸”的哲学命题。此外，她还触及了后来儒家“仁→义”之间的奠基关系的正义论原理，这一点同样难能可贵。

本文原载《吉林师范大学学报》（人文社会科学版）2024年第6期

2024-12-02
韩昇：武则天时代的官僚阶层与科举

本文载于《学术月刊》2024年第11期

武则天登基以来，内部大狱频兴，朝政空转；外部烽火四起，挫折连连。国势日蹙，完全无法同唐太宗“贞观之治”同日而语，和唐高宗在位时期相比也颇为不如。从人事的角度观察，没有治国统兵的人才是一个重要的原因。要真实反映武则天的用人状况，必须进行全面考察，不可以偏概全。这里从两条线、三个层面切入，观其全貌。

所谓的两条线，第一条线是理应掌管国政的朝官，第二条线是武则天真正委以重任的近幸宠臣。第二条线还可以细分为武家子弟、宠幸嬖臣；以及酷吏等两个层面。结合第一条线，构成了任用官吏的三个层面。

一、朝官

第一条线。朝廷最高执政者为一般所称的宰相，亦即唐朝的“同中书门下三品”，或称“同中书门下平章事”，武则天改“中书”和“门下”为“凤阁”“鸾台”，故中书门下称作“凤阁鸾台”。从嗣圣元年（684）武则天废中宗、垂帘听政以来，直到她被政变推翻的神龙元年（705）的二十一年间，任用了五十三位“凤阁鸾台三品（平章事）”头衔的宰相：

刘祎之，武承嗣，魏玄同，苏良嗣，韦思谦，韦待价，张光辅，王本立，范履冰，邢文伟，周允元，岑长倩，裴居道，傅游艺，格辅元，乐思晦，崔神基，狄仁杰，杨执柔，崔元琮，李昭德，姚璹，李元素，韦巨源，陆元方，苏味道，王孝杰，杨再思，杜景俭，王方庆，李道广，娄师德，武三思，武攸宁，姚元崇，李峤，魏元忠，吉顼，王及善，豆卢钦望，张锡，韦安石，李怀远，顾琮，李廻秀，唐休璟，韦承庆，朱敬则，韦嗣立，宗楚客，崔玄𬀩，张柬之，苏瓌。

鸾台（门下省）和凤阁（中书省）的首长亦是宰相。这两个机构掌管皇帝诏敕和军国政令，在皇城内办公，最能接近大内里面的武则天，宛如皇帝左右的鸾凤。

鸾台纳言：王德真，苏良嗣，韦思谦，裴居道，魏玄同，武承嗣，武攸宁，史务滋，欧阳通，姚璹，娄师德，狄仁杰，李峤，韦安石。

凤阁内史：裴居道，岑长倩，张光辅，邢文伟，宗秦客，豆卢钦望，王及善，武三思，狄仁杰，李峤，杨再思。

鸾台凤阁的首长人数亦多，经常变更，受酷吏政治迫害者约三分之一上下。

宰相是朝廷最高首长，中流砥柱，安危所系，本应最为稳定。唐朝建立以后，高祖任用的裴寂，太宗任用房玄龄、杜如晦等宰相，任职时间都很长，对于安定社稷、保持政策的连续性起到重要的作用。然而，到了武则天时代，这种局面骤然巨变，宰相更迭极为频繁，没有任何一个朝廷部门堪与相比。而且，这个席不暇暖的群体，即使把政争中遭到贬黜的情况排除在外，也至少有四分之一以上受到酷吏的迫害乃至屠戮。宰相被呼来唤去，弃之如同敝屣，则所有官吏的处境可想而知，武则天时代政情的高度不稳和内斗的极端残酷，实态毕露。

为什么宰相群体更替最频繁呢？因为武则天对他们把控最严也最直接。武则天身处大内，既无政绩也无功绩，无以服人；同时自然没有共创事业的部属，堪以寄任。而且，她作为高宗内眷的妇女身份，不方便经常和朝廷大臣聚乐宴饮，了解外请，增进感情。所以，她只能紧紧控制权力中枢的宰相群体，通过他们掌控全局。宰相作为政令下达、沟通内外最重要的渠道，必须盯紧看牢。由于朝廷乃至社会民情、官吏所思所想，只能通过文件和密报等间接途径获取，又要应对篡唐立周的改朝换代剧变，随时有被颠覆的危险，这些都会极大加剧生性多疑的武则天的猜忌。所以，她采取频繁更替宰相乃至施以毒手的苛酷手段，势所必然。这里是她控制全局的关键，亦是命门所在。

安危系于此地，宰相的治国能力并不重要，竭尽忠诚才是关键。所以，武则天时代的宰相群体有两个特点：第一，实务政绩型官员很少，大多出自政务官员。第二，武氏子弟实际掌权。武氏子弟是武则天政权最稳定的人事，无论他们是否身处宰相位置，宰相都要听命于他们。推而广之，武则天宠幸的汉子，宰相也要接受其统辖。例如征伐契丹时，武三思为主帅，宰相姚璹为副；征伐突厥时，薛怀义为主帅，宰相李昭德为副等等，乃至造天枢、进颂词之类事务，也是武氏子弟统领宰相实施。

与此形成鲜明对照的是中央最高行政部门的尚书省，其首长在武则天时代最为稳定，二十一年间仅有六位，分别是左仆射：苏良嗣，武承嗣，王及善；右仆射：韦待价，岑长倩，豆卢钦望。除了岑长倩一人被迫害致死外，其余五人基本平安。左右仆射为尚书省主官，武则天时代曾经改称“左相”“右相”，实际上有名无实，和宰相沾不上边。武承嗣作为武氏子弟，不管担任什么职务都大权在握。尚书省长官之所以相对稳定且平安，根本原因就是没有实权。武则天通过宰相直接指挥六部、九卿，作为六部上级主管部门的尚书省形同摆设，主官在位唯唯诺诺，乏善可陈，故各人本传事迹记载寥寥，滥竽亦可充数。韦待价以军功起家，武则天用他担任天官（吏部）尚书、文昌右相，“素无藻鉴之才，自武职而起，居选部，既铨综无叙，甚为当时所嗤”。韦待价自知非治国之才，“既累登非据，颇不自安，频上表辞职，则天每降优制不许之”。武则天为什么坚持把不懂行政的人放在行政主管的位置上呢？其实就是为了将其虚化为承旨画押的华丽道具，便于她直接掌控朝廷。

尚书省上层的人事，如表1所示（分为武则天垂帘听政与称帝两个时段）：

表1

尚书省长官被弱化乃至虚化，但尚书省的职能不可完全废弃，因而出现上权下移的情况，亦即尚书省的左丞（正四品上）和右丞（正四品下）实际处理都省事务。尚书省主官左、右相（从二品）位高权虚，人少稳定，但左、右丞官员颇多，频有更迭，表明他们才是真正主事者。以下级官员主持事务，是独裁者常用的集权手段。由于官位卑下，受到超常重用时感恩戴德，听话卖力。而且，还因为官位卑下，制度上无权参预重大国务，所以让他们参预何种事务，以及参预到什么程度等等，君主皆可随心所欲，权力收放自如。武则天以此手段控制尚书省。

外朝机构主要是六部，其人事任用情况如表2：

表2

人事变更的频度，依次为吏部、兵部、刑部、礼部、户部、工部。用这个指标观察武则天时代各个官署的情况，可以发现它们在朝廷权力结构上的重要性同人事变更频度成正比，越是重要，掌控越严，人事更迭越发频繁。由此归纳出武则天朝的权力秩序及其结构如图1。

图1

这明显是一个以军政为中心的朝廷：一切以皇帝集权独裁为最高目标，由吏部担纲彻底更换官吏队伍，兵部作为权力支柱，刑部作为整肃工具，礼部制造改朝换代的理论与合法性。皇权笼罩于全社会，生产、技术、民生等皆处于从属地位。武则天彻底改变了唐太宗建立的社会发展国策与朝廷架构。

朝廷中最受重视的吏部和兵部，副职的变动异常的频繁，还多次出现其他部难以见到的官员再任的情况。这明显是武则天直接插手，安插委任亲信；同时表明在至关重要的权力部门，武则天倚重副职，越级操控部务，使之完全听命于皇帝。

朝官这条线高级官员的选任情况，根据《旧唐书》和《新唐书》的不完全记载，列示如表3：

表3

两《唐书》官员传记固然不能覆盖官员全体，然而，达到一定的量亦足以反映用人原则和基本面貌。根据上表所示，至少可以确认以下两点：

第一，官员大都出自官宦之家。

唐朝建立后，功臣和高官后裔，特别是军功子弟在仕宦上获得优待。唐高宗仪凤年间，魏元忠上封事指出：

当今朝廷用人，类取将门子弟，亦有死事之家而蒙抽擢者。

北魏孝文帝迁都洛阳推行新官制，按照官职高低分为甲乙丙丁“四姓”等级，确立了优先录用官宦子弟的制度规定，北齐、北周、隋朝和唐朝都沿袭这一原则，武则天亦是如此，故官宦出身者出仕比例甚高，功臣子弟更受重用。太宗、高宗朝名将薛仁贵，儿子薛讷，“则天以讷将门，使摄左武威卫将军、安东道经略”。

从唐朝建立到武则天全面掌权，经过了大约半个世纪，许多功臣业已凋零，功勋门第逐渐变味为官宦之家。官宦出身既是政治可靠的凭证，在入仕升迁上受到重视，也是官场的护身符，在仕途挫折罹难时，能够起到从轻处罚或者事过境迁后东山再起的佑庇作用。开国初期的重视功勋，逐渐蜕变为建政后常规铨选时讲究家世亲缘，武则天对此颇为坚持，有所发挥。岑文本是唐太宗任用的宰辅重臣，其侄子岑长倩因此得到重用，高宗时出任宰相，支持武则天夺权，故长年身居权力中枢，直到武则天欲立武承嗣为皇位继承人之际，因为主张维持亲子继承而得罪武则天，下狱处死。此后在朝廷举荐人才的时候，凤阁侍郎韦嗣立推荐岑长倩族子岑羲入朝任职，并说明其为朝廷罪犯亲属，武则天不但批准了岑羲的任用，而且还为受牵连的高管亲属的任用开了绿灯，“由是缘坐近亲，相次入省”。对落难或者受牵连的官宦子弟网开一面予以任用，显然不是个例，而成为规则，维护着优待官宦子弟入仕的一贯方针。

第二，注重名门家世，尤其是亲缘关系。武氏子弟不循正常途径入仕，应置于第二条人事线论述。武则天的母亲自称出自天下名门之弘农杨氏，实为隋朝皇族之杨氏。隋室杨氏因为是武则天外家的缘故，一直受到重用。武周“时武承嗣、攸宁相次知政事”，武则天对地官尚书杨执柔说：“‘我令当宗及外家，常一人为宰相。’由是执柔同中书门下三品。”武氏和杨氏联合坐庄朝政，成为一条规则。

李唐与隋杨乃姻亲，政治上虽为敌手，亲情却深。李渊的母亲和隋文帝独孤皇后为亲姊妹，隋朝灭亡后，李渊对隋杨皇族给予照顾，亲自做媒将隋朝纳言杨达女儿嫁给武士彟，让这位河东木材商人粘上皇亲国戚的边，成为武则天日后飞黄腾达不可或缺的门槛。武则天显然领悟到王朝政治的奥秘，深知金字塔权力结构的顶端是少数门阀士族垄断权力，运用朝廷强力部门作为工具，实现对整个官僚体系的控制。在她的理解中，管理社会的核心不是遵守规则，而是追求权力的无限扩大，笼罩一切。权力需要人来掌握，掌权的人越少则权力集中，越有利于皇权。因此，等级森严的寡头政治成为她的营造蓝图。武氏家族（包括赐予“武”姓的皇子）居于金字塔尖，被选中的士族与近宠佞幸组成朝廷上层。有所不同的是被选中的士族相对稳定，而近宠佞幸与酷吏则频频变动，道理在于这些人作为工具固然必不可少，但落到具体的走狗却需要经常更换。近宠佞幸与酷吏属于第二条人事线研究的对象，留待后述。被选中的少数门阀士族颇受重用，飞黄腾达，纷纷跻身于权力上层，与武氏家族共同构成核心统治集团。例如杨氏家族在“则天时，又以外戚崇宠。一家之内，驸马三人，王妃五人，赠皇后一人，三品以上官二十余人，遂为盛族”；韦氏家族之“巨源与安石及则天时文昌右相待价，并是五服之亲，自余近属至大官者数十人”。

唐朝是贵族建立的王朝，高祖李渊以此为荣，建政初期曾经对宰臣裴寂说道：

我李氏昔在陇西，富有龟玉，降及祖祢，姻娅帝王，及举义兵，四海云集，才涉数月，升为天子。至如前代皇王，多起微贱，劬劳行阵，下不聊生。公复世胄名家，历职清要，岂若萧何、曹参起自刀笔吏也。惟我与公，千载之后，无愧前修矣。

得意之情溢于言表。倚重士族和功勋家族成为唐朝人事的重要原则，唐太宗修《氏族志》和武则天重用士族皆为此原则的一脉相传，除了武氏因武则天而破格崛起之外，老牌士族左右高层政治的局面一仍其旧，未有改观。武则天重用士族，寄任之深甚至扭曲制度。唐朝制度规定，近亲不得同时担任高官要职，以防止某一家族权力过大。对此项规定，武则天采取变通的办法规避，李峤担任宰相，两年后其舅张锡也升任宰相，武则天让李峤转任成均祭酒，“舅甥相继在相位，时人荣之”。士族对于权位的诉求也直言不讳。垂拱年间的宰臣韦思谦把两个儿子韦嗣立和韦承庆径直托付给武则天，说：“臣有两男忠孝，堪事陛下。”武则天欣然接受，对韦嗣立明言：“今授卿凤阁舍人，令卿兄弟自相替代。”果如其言，先是韦嗣立接替韦承庆担任凤阁舍人，然后由韦承庆轮替韦嗣立出任天官侍郎，不久又接下韦嗣立的宰辅要职，等到韦承庆去世，又让韦嗣立接任黄门侍郎，前后四度轮替，宛如左右手传接一般。

优待功臣后人，讲究官宦家世，倚重名门士族这三条铨选的基本原则，武则天无不坚持贯彻，比起唐太宗时代逐步开放用人的家世条件，有所倒退。陈寅恪未对武则天用人的实际情况进行整体考察，断言武则天破格用人，培养出新兴阶级攘夺替代西魏、北周、杨隋及唐初将相旧家之政权尊位，“故武周之代李唐，不仅为政治之变迁，实亦社会之革命。”此番议论完全得不到事实的支持。武则天称帝充其量只是僭主篡政，酷吏政治绝非社会革命，新兴阶级亦非权力所能制造，只能是社会生产形态所决定的客观存在。

一朝有一朝的组织原则。武则天朝对于太宗朝组织原则的最大改变，是把对唐朝的忠诚演变为对她个人的效忠。她遴选并重用的官宦士族都遵循这条最高原则。

从小生活在权贵圈子里成长的功臣高官子弟，对于政治人事嗅觉最为敏感，察言观色得风向之先，其中想飞黄腾达的人跟风最紧。丘和、丘行恭父子建唐时立有大功，皆获陪葬皇陵的殊荣。丘行恭之子丘神勣_jì属于最早投靠武则天的功臣子弟，充当鹰犬，出手害死章怀太子，与酷吏周兴、来俊臣齐名；岑文本侄子岑长倩等一批功臣子弟因为支持武则天取代李唐而得到重用，俱见前述。李大亮的族孙李廻秀，在武则天晚年当上宰相，“颇托附权幸，倾心以事张易之、昌宗兄弟，由是深为谠正之士所讥”。

在逢迎武则天近幸方面，王朝体制内的士族亦不遑多让。崔义玄精通儒经，以学干禄，为唐高宗立武则天为皇后出谋卖力，主持审判长孙无忌。因为这份功劳，两个儿子崔神基和崔神庆都得到武则天的重用。武则天晚年，朝中大臣拼死控告武则天的男宠张昌宗犯罪，崔神庆受命审理此案，竟然为其开脱。张昌宗、张易之兄弟为武则天晚年之最爱，士族官员趋势附炎，卑躬攀附，父子三人皆为宰相的韦氏，韦承庆讨好张氏兄弟；几度进谏武则天的宰相李峤，其实和张氏兄弟交情甚深，以至于武则天倒台后，他们都为此遭到贬黜。宰相杨再思历仕三朝，主持政务十余年，地道的官油子。他善于体察上意，皇上喜欢的，他吹捧得天花乱坠，皇上讨厌的，他诋毁得丑陋无比。有人私下问他身居高位何苦如此呢？他道出为官数十载的心得：正直的官员招灾惹祸，唯有望风顺旨才能保全性命。原来赞美颂圣的合唱队充斥着虚情矫饰的歌手，声嘶力竭的领唱者往往最洞悉内里幽暗。杨再思年轻就通过科举，腹有经纶，黠于应对。张昌宗遭诉，群情汹汹。武则天询问杨再思意见，杨再思说张昌宗炼仙丹给皇上服用，皇上身强体健便是国家万幸，所以张昌宗功劳莫大。避开犯罪事实，只谈皇上重于社稷，利君则利国，情郎瞬间成为英雄，迎合了武则天万难割舍的感情。张易之兄弟大宴朝官，饮酒互捧，张昌宗容貌粉嫩而得武则天欢心，一众官员赞美张昌宗貌似莲花，杨再思挺身纠正道：“人言六郎面似莲花；再思以为莲花似六郎，非六郎似莲花也。”这等话术浸染弥漫成为武周王朝的官风。

各路出身的王朝官员汇聚在一起，国家正事做不了，真话说不得，有失品格的种种表演，未必都是他们猥琐卑劣，而是那个时代的政治生态所致。当然，他们的所作所为反过来也强化了那种环境，互为因果，最终无人幸免。于是，官场晋升的秘径变成通途，“时朝廷谀佞者多获进用，故幸恩者，事无大小，但近谄谀，皆获进见”。

拍马溜须而不做事，即使身居高位也不敢有所作为。在朝不为恶，偶尔说些合乎道理的建言，这在正常的社会属于常识底线，但在武周却足以振聋发聩，勇气有加，难能可贵。武周时代朝官的水平，后人颇有评论：

豆卢钦望、张光辅、史务滋、崔元综、周允元等，或有片言，非无小善，登于大用，可谓具臣。

苏味道、李峤等，俱为辅相，各处穹崇。观其章疏之能，非无奥赡；验以弼谐之道，罔有贞纯。

崔融、卢藏用、徐彦伯等，文学之功，不让苏、李，止有守常之道，而无应变之机。

崔（融）与卢（藏用）、徐（彦伯），皆攻翰墨。文虽堪尚，义无可则。备位守常，斯言罔忒。

这些评价并非贬低之辞。武则天晚年请狄仁杰举荐宰辅高官，狄仁杰当面询问武则天是否觉得当朝主官乃“文吏”之流，不堪大任？武则天深以为然。一朝皆凡庸，是谁之过？然而，到此地步，不想崩溃只能举贤任能，转机因此萌生，历史总要做出选择。

二、近幸宠臣

武周政权的朝官，在大清洗的肃杀氛围中，实际上已经沦为摆设，把朝廷门面装潢得煞有介事，敷衍日常事务，跑腿当差。真正掌握权力的是第二条线，亦即武则天委以重任的近幸宠臣。同第一条朝官线的重要区别，在于他们基本不经过吏部铨选正途入仕。这条线上的人物可以分为两个层面，首先是中心层面，有处于权力中枢、出将入相的武氏子弟，以及武则天信赖有加的男宠团队。其次是前台层面，有刮起血雨腥风、致令人人自危的酷吏集团。这两拨人的权力都来源于武则天。

首先来看中心层面的武氏子弟。武则天时代构成政治权力的亲属基础者，有下面这些人。在亲属关系上，他们分别是武则天的侄子和侄孙两代人；在政治秩序上，他们分别被封为亲王和郡王。

亲王

梁王武三思武则天长兄武元庆之子。

魏王武承嗣武则天次兄武元爽之子。

陈王武承业武承嗣弟，追封。

定王武攸暨武则天伯父武士让之孙，始封千乘王，尚太平公主后进封。

亲王四人：武三思和武承嗣为武则天异母侄子，武承业为追封，三人皆为侄子辈；武攸暨因为尚太平公主而进封亲王，为侄孙辈，乃特例。

郡王

武崇训武三思子，尚安乐公主，封高阳王。

武崇烈武崇训弟，封新安王。

武延基武承嗣子，始封南阳王，后袭父封，坐私议张昌宗，被杀。

武延义武延基弟，袭父封，继魏王。

武延秀武延义弟，封淮阳王。

武延晖武承业子，袭父封，嗣陈王。

武延祚武延晖弟，封咸安王。

武攸宜武则天堂兄武惟良之子，封建安王。

武攸绪武攸宜弟，封安平王。

武攸宁武则天堂兄武怀运之子，武攸暨之兄，封建昌王。

武攸归武攸宁弟，封九江王。

武攸止武攸归弟，封恒安王。

武攸望武攸止弟，封会稽王。

武懿宗武则天堂兄武志元之子，封河内王。

武嗣宗武懿宗弟，封临川王。

武尚宾武则天堂兄武仁范之子，封河间王。

武重规武尚宾弟，封高平王。

武载德武重规弟，封颍川王。

作为武氏子弟集团的附庸，可以加上宗秦客、宗楚客、宗晋卿和纪处讷四人。前三人为武则天外甥，纪处讷则是武三思的连襟。

武氏子弟集团最醒目的特色，是完全未见科举出身者。且不论唐朝高度重视文化，自开国以来就建立起文化程度甚高的官吏队伍，从社会发展而言，以武力开国的王朝到了和平年代，其军功集团的后代也在时代潮流的推动下转而向学，子弟通过科举途径入仕晋升，继续仰仗家世门荫者日渐稀少，受人轻视。武士彟作为唐朝开国功臣，其家族子弟这等学历，透露出武氏家族对于文化的态度，落伍于时代。这批武氏权贵中，最有文化，以至于史家给予记载的是武三思，“略涉文史”，仅此而已。他留下诗歌创作的记录是赞颂张昌宗才高貌美，乃神仙王子晋转世。

武氏姻亲子弟，宗秦客、宗楚客、宗晋卿和纪处讷四人同样未见学业与科举记载。如果把视野扩大到整个第二条人事线，亦即将武则天男宠团队也一并考察，情况如下：

薛怀义原名韦小宝，街头摆摊出身，以魁梧雄壮获得宠幸。武则天为了掩盖这段少年劣迹，令其出家为僧，编入女婿薛氏的士族谱中，主持朝廷宗教事业，找人编撰《大云经》，陈说符命，发现武则天是弥勒下凡。

张易之、张昌宗兄弟，出自贞观名臣张行成家族。张行成少时追随大师刘炫，勤学不倦，应科举及第，历仕太宗、高宗两朝，为一代名臣。张易之兄弟是张行成的族孙，不可思议的是文化家族的子弟竟然不循科举正道，张易之是依靠门荫入仕的，因为白皙美貌，擅长音声。其弟张昌宗首先被太平公主发掘出来，用得称心，转而推荐给武则天，同样表现不俗，大得欢心。张昌宗推荐兄长张易之说：“臣兄易之器用过臣，兼工合炼。”原来这对兄弟兼具炼丹才能。由此可知，他们自小研修道家房中阴阳之术，耽于学业，故难应科举，只好走门荫之路。张易之、张昌宗兄弟几乎专宠，武氏权贵争相为他们牵马前导，招摇过市，加上以权贪赃，惹来妒忌非议，沸沸扬扬。武则天为了遮掩丑闻，让他们主持朝廷文化事业，集中天下美少年和宰辅大臣们，济济一堂，组建文化机构“控鹤监”，更美其名称“奉宸府”，编撰《三教珠英》等大型文集，煌煌千余卷。

薛怀义的宗教事业，张易之兄弟的文化事业，再往前追溯到北门学士的巨著编撰，有人称之为武则天大力推动的文化盛世。

做出如此不凡成就的张氏兄弟，虽然没有科举出身，亦非胸无点墨，史书记载张氏兄弟勉强能写成文章，至于和武则天酬唱应对的诗文，自有宋之问、阎朝隐等文学工匠代笔。

武氏子弟与男宠团队，以及他们同武则天的关系如何呢？在政治风头上，男宠团队风光无限。早先得宠的薛怀义，乃至后来的新欢张易之兄弟，进出内外，武承嗣、武三思一帮武氏子弟争先恐后为其牵马执辔，献诗赞颂，卑辞厚礼，媚态可掬。作为武周皇族却要竭力逢迎男宠，武氏子弟心有不甘，武承嗣的儿子武延基与妻子永泰郡主，以及懿德太子等人私下聚集议论，谈到张易之兄弟任意出入宫中，无不愤恨难耐，摩拳擦掌。这些议论竟然不翼而飞传入武则天耳朵，武则天大怒，逼令武延基自尽。私下非议竟然要付出生命的代价，武则天心中的情感天平清晰可见。然而，这只是表面现象，在政治利益的天平上，武氏子弟才是根本，是武周政权的根基和血脉，武周政权总归要传给姓武的，以至于武则天的亲生子女都要改姓武，试图将他们塞进武氏血脉。武氏子弟充当男宠的马前卒，武则天当然知道，且乐见所为，如果企图反抗则铁腕镇压，绝不留情。这是什么道理呢？并不是男宠金贵，而是武则天将他们当作自己的化身和试金石，测试属下是否绝对驯服而已。服从男宠就是服从武则天，男宠不为人齿，却能够做到诚心悦服，证明对于武则天的驯服臻于精纯，绝对到肝脑涂地，万死不辞。

武延基是未经风浪的权贵子弟，自命不凡，这恰是心生二志的萌芽，咎由自取。其父辈武承嗣和武三思则迥然不同。武承嗣写不了诗文，却将马牵得十分安稳，让薛怀义和张易之兄弟享尽荣耀。武三思粗通文墨，双眼如炬看出张昌宗乃神仙转世，亲自写诗，还组织编排大型音乐舞蹈表现仙人下凡的绚丽场面，让崔融动情绝唱：“昔遇浮丘伯，今同丁令威。中郎才貌是，藏史姓名非”，把自己感动得涕泗俱下。为什么父子两代差距如此巨大呢？道理就在于武则天同兄弟的关系。武承嗣的父亲武元爽、武三思的父亲武元庆，以及其他诸武的父辈如武惟良、武怀运等等，武则天幼年饱受他们的欺负，尤其是武则天的母亲对他们恨之入骨绝不宽恕，让武则天掌权后给予摧残泄恨，武元庆、武元爽遭黜，配流岭外而死；武惟良、武怀运被诬陷下毒害死外甥女韩国夫人，被处死。武则天的兄弟，自己不死，就只能等待处死。武承嗣和武三思早年都曾随父亲配流边荒，武则天决意篡唐建周以后，出于政治需要才把他们召回京城。骤落暴起，亲尝政治炎凉与绝情，武承嗣和武三思对于姑妈早已胆战心惊，变得十分乖巧，虽然身居高官，却十分清楚权力来自何方，对此顶礼膜拜。这种出格的表现有违自然，看似尽忠，实为恐惧。捆绑到篡唐立周的战船上，构成吴越同舟的共同命运，捍卫武则天就是保卫自身的政治特权，背后的驱动力不是绝对忠诚的感情，而是荣辱与共的利益。

利益为本，必定得陇望蜀。武承嗣欲望和野心膨胀起来，想独占权力，便策动武则天尽诛李唐子孙，同时组织宵小请愿，试图成为太子，吞下武周的果实。武则天未遂其愿，致令武承嗣怏怏而死。作为政治精算师的武则天，是信任有父仇的侄儿，还是相信亲生的儿子呢？武承嗣越界了，利令智昏，自取灭亡。从他儿子武延基非议张易之兄弟一事，武则天难道看不出来武承嗣不为人知的家庭内部只讲利益不尽忠诚的真情吗？武承嗣和武延基父子之死，显现出武则天的底线：皇位传给姓武的亲生儿子，武氏子弟掌控朝廷，成为武周政权的核心。所以，武则天花费更多的心血培育武氏第三代，几乎都封为郡王，出将入相，以保武周江山长远稳固。武则天的政治算盘在内心早已权衡清楚，决不是晚年在大臣的谏言下幡然醒悟，立子继承。大臣们的谏言因为契合武则天的心意而被采纳，同时也给了跃跃欲试的武氏子弟一个无法扭转的交代。通过和朝中大臣讨论继承人问题，武则天也摸清了大臣们的政治态度。她这个决定是明智的，武氏第三代在内政外交上的庸劣表现，根本不可能作为皇帝撑起大局，与其被推翻，不如回归政治合法性。之所以成为糊不上墙的烂泥，武氏几代人皆无学业与科举，已经有了答案。

其次来看前台层面的酷吏集团。在唐朝，武则天时代首次出现酷吏，完全改变了政治规则和社会风气，影响深远。唐朝的出现不仅是一次成功的改朝换代，而且是一场重要的政治革新。五胡十六国南北朝分裂时代，恃力使诈成为政治常态，社会上层失德，下层失信，导致国家数百年难以真正统一。唐太宗总结历史教训，致力于重建法律与制度，取信于民。唐朝建立到武周时代将近七十年，垂拱而治，依靠的就是官民互信，制度公平。到唐太宗晚年，“天下刑几措，是时州县有良吏，无酷吏”。武则天僭主当政，威望不足，忧惧群臣不服，便重用一批酷吏大规模整肃异己，构陷告密，开启酷吏政治时代。酷吏政治与武则天执政相始终，甚至长于武周政权的存在时间。武则天倒台之后，酷吏政治随之而去。但是，它没有消亡，而是潜伏在帝制体内，不时兴风作浪。

酷吏作为僭主独裁的主要工具，威慑并实际控制整个官僚阶层，因此，他们无疑处于政治权利结构的顶层。另一方面，酷吏的所作所为，乃秉承上意，因此，他们常常被轻视为君权行使的道具，而非具有独立意志和利益的集团。事实并不尽然，当工具坐大的时候，便逐渐膨胀起欲望，从狐假虎威，假公济私，直至奴大欺主。武则天对薛怀义隐忍再三，唐朝多少皇帝死于宦官之手，说明任何政治集团一旦成形便有了主张和利益诉求。所以，酷吏集团不可仅仅当作皇权的影子简单处理。当告密和清洗全面铺开之后，海量的案件并非君主所能掌控，检举何人，镇压什么，都与发起的酷吏的感情、学识、见地和利益息息相关。他们的指向性变成强有力的鞭子和精神指挥棒，逼迫并规定着官僚队伍的思想观念、施政行为和价值取向，进而深深地影响文化程度不高的芸芸众生，形成弥漫世间的社会风气。最终出现的结果往往和君主最初的政治蓝图不尽吻合，甚至相去甚远，原因就在于君主和酷吏文化水平和利益见识的落差。君主用工具剪裁世界，酷吏则以其品行见识塑造世界。大千世界从来不是单方面所能制造的，而是各方面合力的产物。

酷吏的身世塑造其品行和情感，文化见识规定其眼光和行为。这两者又极大地左右着官僚队伍乃至整个社会的文明水平。吏治庸劣从来都是社会堕落的驱动力。

武则天时代，告密成风，酷吏成群。然而，能够得到武则天重视，挑选出来兴风作浪，成为酷吏代表的主要有以下这些人：

来俊臣，乡间地痞；左台御史中丞。

周兴，少习法律；秋官侍郎，尚书左丞。

傅游艺，吏员；同凤阁鸾台平章事。

丘神勣，官宦子弟；左金吾卫大将军。

索元礼，胡人；游击将军。

侯思止，家奴无赖，文盲；朝散大夫，左台侍御史。

万国俊，乡间地痞；朝散大夫，肃正台侍御史。

来子珣，无学，告密入仕；左台监察御史。

王弘义，告密入仕；左台侍御史。

郭霸，吏员，革命举；左台监察御史。

吉顼，进士；天官侍郎，同凤阁鸾台平章事。

让一个时代陷入血腥恐怖的酷吏，只有吉顼一人是进士出身。少时读过书的仅见周兴，曾经学习法律，为日后翻弄法条打下基础，属于刀笔吏。上述11人中，9人出自乡间地痞无赖，甚至侯思止还是个文盲，却官至左台侍御史，主持监察炼狱。这批人的行迹与文化程度，显然无法通过朝廷正规的仕进考察，所以都由武则天直接提拔重用。武则天用的人，再荒唐也不容议论。侯思止言行举行粗野愚蛮，成为官场笑柄。武则天知道后，怒斥嘲笑者：“我已用之，卿何笑也？”当听说了侯思止那些惊动四座的话语，自己也忍不住喷笑。侯思止丑态百出，在武则天看来却是愚忠可靠，故其官位坐得十分牢靠。11人中，有文化学业者2人，占18%。另一方面，升任宰相的也是2人，同样占18%。文化低同官职高形成鲜明的对照。

武则天时代用人的两条线、三个层面，第一条朝官线基本遵循入仕正常规则考察录用。在武则天酷吏大清洗的恐怖气氛下，动辄犯咎下狱，故上上下下明哲保身，敷衍了事。他们整体文化水平最高，权力却最小，得过且过，形同摆设。第二条线的中心层面，有武氏子弟和武则天男宠团队，文化程度颇低，职位最高，握有大权，构成武周政权的政治人事基础；前台层面的酷吏集团，基本由地痞无赖出身者组成，通过诬告或者兼进谀词而获重用，飞速蹿升，权势熏天。和中心层面相比，前台层面的酷吏集团是必须的存在，至于具体的个人则需要经常更换，败亡亦在瞬间。他们得意之时极尽残忍，破灭之际人剐其肉，遗臭万年。他们刮起互害之风，自己无一幸免，“既为祸始，必以凶终”。

从人事结构来看，武则天时代是武氏子弟、男宠团队和酷吏集团联合管控朝官，进而掌控全社会；同时也是无知对文化的压制，权力对于法律制度的践踏。

三、士族政治与科举

武则天基本遵循唐朝官员入仕与晋升的铨选原则，另一方面则在权力的上层重用武氏子弟、男宠团队和酷吏，掌控百官的黜骘乃至生杀大权，主宰政局。她重用之人学历低，非贵族名门出身，格外引人注目，以至于有研究者把武则天作为唐朝政局的分水岭，认为武则天大量提拔庶族寒门，改变了门阀士族对于政治的垄断。陈寅恪进一步把视野扩大到北周，认为当年宇文泰组建关陇地区胡汉各族实力人物组成的“关陇集团”，垄断政治直到武则天方才打破。武则天大批提拔科举出身的人入仕，形成“新兴阶级”，如此则武则天不仅在唐朝，乃至在中国古代历史上都是改变历史进程的领袖。以一人之力改变三代王朝的历史方向，这样的功业恐怕空前绝后。

陈寅恪对于南北朝隋唐史研究的贡献，在于提出了宏大的问题，启发历史学家去思考和论证。学说的成立，首先要通过证伪的检验，其次才是不同视角的分析论辩，在思想碰撞中发展。

陈寅恪对武则天历史定位的基点是魏晋南北朝以来的士族政治。首先需要厘清的概念是这个历史阶段的士族与士族政治。士族指的是社会的统治阶层。士族与皇帝为主导的政治、军事势力结合，相互依靠，掌控并长期把持中央王朝到地方的政治权力。曹魏建立“九品官人之法”，表面高举“唯才是举”大旗，很快转为重视家世，到了西晋则日益强调家世礼法，从铨选制度上极大强化了官僚士族的特权地位，世袭垄断政治权力，形成固化的士族政治形态。魏晋南北朝的士族，大多起自东汉崩溃以后一再出现的大动乱，在兵荒马乱中聚集亲族乡党据险自保，组成自立武装，割据乡村，概称为“坞壁”。几百年的战乱和外族入侵，使得坞壁得以长期维持，遂演变为世家大族，将地方社会碎片化，以至于重新建立的各个王朝都必须得到他们的支持才能控制地方。世家大族大小不等，大者跨郡连州，千家万户；小者数百家一族，武断乡曲。他们通过联姻构成亲族网络，跻身于王朝官僚之中，凭借在乡势力支持政权，利用国家权力垄断地方。婚和宦是支撑士族长久不衰的两大法宝。士族内部有高下等级之分，这种区分不仅凭借在乡实力和官位高低，还根据文化和声誉，虽然不像确定官品那样清晰严格，但也有必备的条件：连续几代人中出现公卿宰辅一级的高官，属于政治条件；颇有文化学养，遵循礼法家教，属于文化条件。政治和文化两方面条件都具备的世家大族，受到社会普遍的承认与重视，例如北朝隋唐的崔、卢、李、郑、王等山东士族，韦、裴、柳、薛、杨、杜等关中士族，被视为最高的门第。其下还有各个州郡级别的士族等级，构成从朝廷到地方的世家大族等级结构。王朝在此基础上，结合在当朝官职的高低，编撰氏族谱，作为铨选的家庭条件和分配政治权利的依据。北魏孝文帝开其端，划分甲乙丙丁等第；后续王朝全都跟进。唐太宗修《氏族志》，唐高宗和武则天重修《姓氏录》，表明对于士族等级秩序的高度重视。据此可知，说武则天力图打破士族政治，不知从何说起。兼具实力、官品、文化三者优势的士族，得到各大政治势力的积极拉拢，成为其政权的支柱。他们在朝身居高位，在地雄踞一方，并且根据各自的身份地位形成比较固定的通婚圈，备受瞩目，演变成社会上重视的“门阀”。这种政治生态称为“士族门阀政治”。在确定士族身份等第的时候，文化条件颇为重要，品行与学术决定家族的声誉和社会影响。官职高却没有文化被视为权势豪门，地方上有实力缺少文化的家族被称作豪强，总之同具有文化色彩的“士”难以沾边。所以，士族研究从这个角度区分兼具文化学养者为士族，仅凭官职或者强宗势力者为世家大族。当然，这一区分并不是那么严格。作为统治阶层，常见笼统使用士族一词。

在世家大族或者士族等级秩序的框架之内，其下层被称作“庶族”“寒门”等。以往的研究对于士庶之分并不清晰，如果以五品以上官职划线，那么庶族就是下层官吏直至小地主之家，缺乏权势的中小地主自然被归为“寒门”。他们也被称作“庶族地主”等。然而，无论士族、庶族，他们都属于统治阶层。即使武则天时代出现大量提拔庶族寒门的现象，既不构成“新兴阶级”，也完全称不上“社会革命”，充其量只是统治阶层内部的成分调整。何况武则天任用的官员，如前面列示的三个层面，酷吏多为无业的地痞游民，连“寒门”都构不上；武氏与男宠固然文化水平低，但其家族在唐朝已经上升为功臣权贵，甚至是皇族，无法再用“庶族”指称他们；而朝官的选任与唐朝开国以来的状况没有大的变化。综合三个层面所展示的真实状况，无法支持陈寅恪所谓武则天缔造庶族寒门“新兴阶级”的假说。陈寅恪并未提供实证分析的根据，不知所本，故只能对其结论提出商榷。

其次，陈寅恪提出的“新兴阶级”，最重要的特点是科举进士出身，工于为文。亦即武则天之前，唐朝铨选重视门第家世，用的是“西魏、北周、杨隋及唐初将相旧家”，而武则天破格录用科举出身者，形成与所谓“关陇集团”对立的“新兴阶级”。

这里涉及两个问题：1.支撑起北周、隋、唐政权的旧家，亦即所谓的“关陇集团”的存续状况。宇文泰以后来所封的八柱国、十二大将军等二十余家创业家族为核心建立西魏、北周政权，所言甚是。但是，这一创业功勋集团从北周宇文护专政时起就遭受猜忌和镇压；北周武帝辉煌的功业昙花一现，人亡政毁；杨坚政变建隋，抑制并清洗宇文泰组建的关陇集团主要家族，隋炀帝则重用江南士族。李渊建唐，依靠的是河东士族与大姓，唐太宗则强调用人上的五湖四海。这一历史进程呈现了走出关陇的清晰脚印。政权长治久安的人事基础在于用人区域和社会阶层的广泛性，统治者只要不失心智，自然深谙个中道理。

2.政治史所讲的地域政治集团，是指集中任用某地人的政策与原则。西魏、北周的统治地域仅仅局限于关陇地区，只能任用关陇人事，别无他选，并不是拥有广阔的统治区域而有意识地专门任用关陇人事。所以，所谓“关陇本位政策”或者“关陇集团”，说了等于没说。更何况严酷的政治现实，生死攸关，且政治目标与利益各不相同，从来都是一朝天子一朝臣，甚至是一朝天子数朝臣，哪有一朝大臣数朝皇帝，更加不可思议的竟然是一朝大臣三代王朝，从理论到现实都不成立。从宇文护到隋文帝，执政者仅有关陇地区的从政经历，故人事基础局限在这里。即便如此，他们也在扩大用人的面，压制宇文泰的创业“旧家”。明白无误的变化出现在隋炀帝时代。隋朝成为全国性政权之后，用人的区域日渐扩大。隋炀帝曾经指挥统一江南的战争，皇后又出自南朝皇族萧氏，故他重视南方，拔擢江南士人，委以重任，甚至主导朝政，极大改变了关陇官僚居多的成色。唐朝自创业时起，就以太原组建的班底构成核心人事圈，笼络山东士族。武士彟就在此时进入政治核心圈，崛起于政坛；武则天也是因为功臣之女才选入宫中，日后执掌权柄。武氏是李唐政权下的既得利益者，和其他创业家族命运与共，构成唐朝人事的基本盘。武则天出于一己之私，打压李唐政权的忠实支持者，但她主要用没有社会根基的酷吏集团作为打手整肃官僚，并没有从根本上改变官僚队伍的成分和用人路线。其道理显而易见，武则天要的是至尊皇权，而不是摧毁自己赖以生存的政权根基；她要在最高权贵阶层中长期占有武氏一席之地，并不为酷吏之流痞子政客谋求利益，改变权贵阶层的结构；她处心积虑推进李、武联姻，就是为了补武氏合法性短板从而获得长远安定；她深知根深蒂固的士族阶层的重要性，所以对韦氏、杨氏、崔氏等老牌士族笼络重用，甚至让他们父子兄弟同时身居要职，宽容他们对于男宠团队乃至武氏子弟的轻蔑；她扮演官僚、“旧家”敌对者的角色以煽动下层，却没有改变李唐依靠官宦士族的组织路线。所以，武则天表面上看似泼辣凌厉，其实内心极其精明，她走在极端政策的边缘，却在最关键之处未越雷池一步。从本质上看，她是士族政治的坚定维护者，而非掘墓人。

北周隋唐的相关性在于三代王朝的建立者同出一源，此偶然现象的关键在于北周、杨隋皆短祚，事起仓促，只要不是被其他政治势力所征服，剩下的便是同一平台脱颖而出的新秀。新的创业者都是受到当政者压迫而心生异志的雄才，而非同一事业的前仆后继者。理念、目标和利益各不相同，如何构成同一体质的政治集团呢？所以，所谓的“关陇集团”把持北周、隋、唐三朝政治的议论，属于想象的建构。

不存在垄断三朝政治的所谓“关陇集团”，却出现一个确定不移的现象，那就是官员铨选与晋升中，科举出身者日益增多，反映出对于文化的要求越来越高，成为大势所趋。为什么会出现这样的变化呢？这一趋势究竟是个人意志的产物，还是国家社会发展的必然？

自从东汉末年董卓被杀时起，朝廷就失去了对全国的统制，内战爆发，一步步沦为彻底的分裂割据，直至唐朝建立为止，中国在战乱和分裂中度过了将近四个半世纪的漫长岁月，其间虽然有过西晋和隋朝短暂的统一，却都以失败告终。分裂战乱的时代，真理由思辨的洞彻发现沦落为暴力的胜负角逐所决定，乱世的最高道理就是胜利。所以，这个时期过眼云烟般的繁多政权无不把实力和功绩作为用人的根本标准。曹操一再发布《求贤令》，公开倡导重用反道德能取胜的人，开启其端。魏文帝时代创立“九品官人之法”，任命中央到地方各级中正官来评选人才。这个制度存在根本性的内在冲突，亦即选用士族来贯彻“唯才是举”，不啻缘木求鱼，随着时间推移越来越走向反面。士族出身的中正用“家世”条件评定人才品级，结果“九品中正制”成为加强并固化士族门阀的强有力机器。另一方面，频仍的战争涌现出许多勇武的将领，在军事化的国家机器中占据主流。“九品官人之法”制造门阀士族，军国体制制造军功阶层，源源不断，在王朝政权内混杂合流，形成门阀和军功两大特色，在权斗中共存。权斗源于军功阶层对于文化和士人的蔑视，痛下杀手，必欲将其奴仆化。北魏崔浩事件等等，层出不穷的文化大狱莫不因此而生。共存则是对现实的屈服。攀援上权力宝座的各色人物，无不企图固化既得利益，军功和权势皆不可长久，丛林互噬注定没有胜者，欲求长在，最有效的途径就是转变为门阀，故不可一世的军功阶层不得不向现实低头，与士族合流，借其金字招牌悬挂在列戟的大门之上。北魏孝文帝确定胡汉姓族等第，令其通婚联姻，这样的事例屡见不鲜，道理如出一辙。皇帝亲自做媒，不是开婚姻介绍所的业余爱好，而是营造铁打江山的专业操作。

军功士族门阀政治，是东汉灭亡以来中国长期不能统一、政权无法稳固和不断发生社会动乱的根源之一。因此，想要建设稳固富强国家的统治者都必须改变这种局面，任何根本性的社会变革，一定依靠法律、制度乃至文化价值观的改造重塑。人治最不可靠，凌驾于法律制度之上的人治，既可行善，亦可为恶，方向上飘忽不定，则易于颠覆。隋文帝建国之后，断然拆除士族门阀政治的制度台柱，“九品及中正至开皇中方罢”。自曹魏创立以来沿用数百年的“九品官人之法”终于被废除。此举有利于扩大政权的社会基础，故为后面的王朝所遵循。

军功士族门阀政治是战乱时代军国体制的产物，打破门阀政治不仅是制度的变革，更是治国理念的转变。古人说马上得天下，王朝是靠武力打出来的。可是，政权建立之后，国家能否继续采用军国体制管理呢？唐太宗对此有深刻的认识，他还是秦王、天策上将的时候就积极延揽四方文学之士，皆为一时之选，其佼佼者号称“十八学士”。戎马倥偬之际，唐太宗仍然和文士一起深入研讨如何治理国家，从根本上认识到治国不能采用军事命令式的行政强制，更不能听任权力恶性膨胀，凌驾于一切之上，必须讲道理，重规则，建立完善的法律与制度，提升社会文化道德水平，才能实现国家繁荣强大、长治久安的目标。政治路线须要人去落实，什么样的人做什么样的事，所以官吏铨选至关重要。隋文帝废除“九品官人之法”，代之以科举考试，隋炀帝进一步确立进士科的主流地位。唐朝建立之后，强化了科举在铨选中的比重，特别是唐太宗以军事统帅的威望率领创业的军功部属转变崇尚武力的思想观念，积极倡导文治，大力办学，拓展科举入仕的途径，坚持不懈，推动社会形成尊重文化、推重科举的风气。唐朝科举设秀才、明经、进士、明法、明书、明算六科，秀才科后来停止，明法、明书、明算三科为专门科目，故常设科目为明经和进士。盛唐以后进士科越来越显赫，压过明经科，求仕进者趋之若鹜。唐五代时人王定保撰述唐代科举状况，说道：“进士科始于隋大业中，盛于贞观、永徽之际。搢绅虽位极人臣，不由进士者，终不为美，以至岁贡常不减八九百人。”显而易见，唐朝甫建即大力推进科举制，发展迅速，到唐太宗贞观年代已经被世人视为仕进正道，很受尊崇，以至于权贵子弟通过门荫或者军功起家者都以不能由科举入仕而深感不足，一生抱憾。唐太宗以广收天下英才为己任，当年见到新进士缀行而出的场面，欣然说道：“天下英雄入吾彀中矣！”偃武修文打下唐朝将近三百年的根基。文化的盛大，不是用读书人的多少所能表现，最重要的是社会各个阶层都崇尚文化，以此为荣，蔚然成风，这就是王定保盛赞贞观时代的立论所在。读书人虽多却甘当鹰犬，重用近乎文盲的酷吏镇压士人，哪怕编撰出颂扬的诗文，王定保并不以为是文化盛世。把科举作为入仕正道是唐朝既定国策，坚持推动，不待武则天方才肇始。这个重要转变是军功立国后走向长治久安的必由之路，具有客观的必要性与必然性，从这个意义上看，唐太宗并无首创之功，却是及早的觉悟者，避免了有害无益的弯路和折腾。

科举制同九品官人之法相比，颇具公平性。后者注重家世出身，造成士族高门几乎垄断官场的僵化局面。科举则允许士子报名投考，凭借个人成绩录用，打开了社会下层之人上升的通道。侯君集、孙伏伽等都是自寒素举进士入仕、初唐位居朝廷大臣的著名例子。唐高祖武德五年（622），居住在邺城的陇西人李义琛、李义琰兄弟，及其堂弟李上德三人同年考上进士，成为佳话，载入史册。投名报考的科举制一旦取代九品官人之法，下层士子入仕的比例必然越来越高，毋庸置疑。然而这是一个发展的进程，积累数十年的科考，到武则天时代寒门出身增多，只是结果的呈现。社会的进步必定是人身及身份性限制的日益解除。科举制代表的正是这个方向。

需要特别指出，论述科举制时往往把录用人数的增多作为制度推进的直接有力证据，实则南辕北辙。科举制不是一般的入学考试，而是入仕当官的铨选，所以不可能大量录取，其名额由每年需要补充的官吏员数来决定。唐太宗励行小朝廷，精兵简政，其组建的朝廷人数有各种记载，这里选取较多的一种，《新唐书·百官一》记载：“初，太宗省内外官，定制为七百三十员。”乱世的社会平均年龄颇低，故初唐朝廷需要世代更新的人数很少，决定了科举录取人数必定较少。依照唐朝“壮室而仕，耳顺而退”的三十年仕宦期，到唐高宗年代，科举录用人数呈现梯度增长，并随时代推移构成阶梯式上升，都是自然而然之势，绝非个人一己托天之功。

依据官吏世代更新的需要决定科举登科人数的原则，唐高宗显庆二年（657），主持人事的黄门侍郎、知吏部选事刘祥道上疏，核算当时内外文武官，九品以上者13465人，取大数放宽至14000人，每年补充500人都要多出不少，而当年补充了1400人，超过需要2倍多。亦即唐高宗时代，科举登科加上依靠门荫等入仕者早已供大于求，不能再扩大科举登科人数了。由此可知，唐太宗贞观时代科举录取人数较少，反映当时严格执行职官的编制。高宗时代已经在10∶1的求仕压力下增加了很多登科名额，人浮于事。到了武则天时代，为了夺权乃至篡唐建周，武则天“务收人心”，做了重要改变，一是取消考试糊名，致使贿赂公行。例如来俊臣握生杀大权之时，大肆收受请托，每次铨选都要违法安排数百人入仕，至于王公权贵的插手科考，难以尽述。二是不按成绩，任意扩大录取人数，天授二年（691），十道举人，大批拔擢官吏；长寿元年（692）一月，武则天接见各地举人，“无问贤愚，悉加擢用，高者试凤阁舍人、给事中，次试员外郎、侍御史、补阙、拾遗、校书郎”，数以百计。而且还开启了试官制度，大批未能通过考试和品行考察的人，先行任职，把政务和民生当作儿戏。新录用的官吏如此之多，致使各个部门冗官泛滥，社会流传歌谣称曰：“补阙连车载，拾遗平斗量；欋推侍御史，碗脱校书郎。”

官多至滥，并不会给下层寒士带来普遍的机会。武则天任用士族主持铨选，李峤录用权势之家的亲戚二千余人，以员外郎身份到各部门掌管事务，同在编的主官发生激烈争执，甚至互相殴击。此类官场乱象，是士族权门内部利益之争，和平民有什么关系呢？在古代史上，冗员滥官从来是权力泛滥的表现，带来的是更大的不公平。从武则天时代到中宗、睿宗朝急剧膨胀的“墨敕官”“斜封官”，请托贿赂充斥官场，便是其结果，而庶族寒门更无升迁的希望，如何形成与士族“旧家”对抗的“新兴阶级”呢？冗员滥官是对法制的破坏，绝不是科举制的进步，更不是所谓的“社会革命”。只有一律平等的公平制度，才是下层士子的上升通道。

武则天时代的官僚阶层，呈现出非制度性越级提拔的武氏子弟、男宠团队、酷吏集团真正掌握权力，控制朝官的状态。朝官则基本遵循铨选途径入仕，没有改变唐初以来士族与官宦子弟为主的基本面貌。魏晋南北朝隋唐时代最重要的社会变革是科举制取代九品官人之法，拆除士族门阀政治的制度性支柱。这场变革肇始于隋朝，成为主流而蔚然大观于唐太宗时代，武则天时代未见制度上的进步，冗员滥官却对制度造成重大伤害，反而阻碍了寒门士子的正常上升。

2024-12-02
刘永华《程允亨的十九世纪：危机》

文章节选自《程允亨的十九世纪：一个徽州乡民的生活世界及其变迁》（刘永华著三联书店2024-11）

危机_（节选）

光绪九年程氏兄弟分家之后的最初几年里，无论是清王朝还是程家自身，似乎都没有发生太大的变化。不过分家十几年后，这个世界发生了几次令人震惊的事件。甲午一役，大清海陆军败北，朝廷不仅面临巨额赔款问题，国内改革的呼声也越来越高。维新运动接踵而至，但不久即告失败。随后是庚子年的义和团运动爆发、八国联军侵华和辛丑年的巨额赔款。程家自身的变化发生得要早些。光绪十六年、光绪十八年，允亨的双亲先后去世。光绪十九年，同仓成亲。一年后，新一代出生。程家完成了新一轮的代际继替周期。此后，八国联军攻占北京那年，程家发生了家计危机。这些发生在一个王朝和一个农户层面的国事与家事，并非没有丝毫关联。这两个层面发生的事件，以及其他一些因素，都在程家家计危机的发生中扮演了或大或小的角色。

国事与家事

为理解这一时期程家的生计环境，我们先来梳理一下此期对程家生计影响最大的两种商品——大米与茶叶——的价格。太平天国后期，江南、安徽、江西一带米价大幅上涨。

太平天国结束后，各地米价普遍下跌。从 19世纪70年代中叶至80年代中叶，米价基本保持稳定。此后价格逐渐上涨，1895年的米价比1875年上涨了50%。至清朝覆亡时，米价比 1875年上涨了1.5倍。可以想见， 1895年前，米价上涨相对缓慢，而此后15年的时间里，价格涨幅较大。回到婺北米市， 19世纪后期价格运动的总体方向与其他地区相似，不过 19世纪末以前的涨幅不甚突出。如表 7.3所示，太平天国后，婺北地区的米价大致回落至19世纪40年代的水平（但略低于道光十九年、二十年），这个价位基本维持至90年代中期。至 19世纪末20世纪初，在全国米价攀升的影响下，婺北地区的米价也迅速上涨。光绪二十二年米价是每石2.5元，光绪二十六年攀升至3.06元，比光绪二十二年上涨了22%。那么茶价呢？根据第六章的讨论，太平天国运动结束后，茶价为0.19元/斤左右，较太平天国运动开始后的价格（0.13—0.15元/斤）稍有回升，但远低于运动爆发前的水平（ 0.29元/斤）。分家后，茶价一度有所下跌（ 0.155元/斤），光绪十八年后稍有回升（ 0.166元/斤），但仍较分家前低了将近13%。与此同时，跟分家前相比，程家生产的茶叶总量也有所下降。分家前，每年产茶 2担左右，分家前几年甚至达到年产 3担的峰值。分家后因茶园分割，产量回落至年产1.5担至2担余的规模。因此，分家后的十余年时间里，米价和茶价/茶叶年产量之间的剪刀差有所缩小，但收缩幅度不算大。至 20世纪之交，随着米价的攀升，这个剪刀差才进一步收缩，开始对程家的生计构成威胁。

除了茶叶收入稍有回落外，这一时期程家的其他现金收入也有一定缩水，其中最重要的是山货。山货在程家现金收入中的地位，在相当长的时间里仅次于茶叶，最高时全年收入可达近19元（分家前）。

程家历年收入一览表（单位：元）

但分家后，仅光绪二十一年、光绪二十二年超过10元（分别是13.20元和11元），其余年份都在9元以下。这一时期投入收购、加工黄精和挖掘葛根、制作葛粉的时间，都出现了大幅下降的情形。截至太平天国前期，在这两种山货的生产与贸易方面，程家共投入253日，占所有生计行事投入天数的8.42%；太平天国结束后至分家前，劳动投入增加至767.5日，在生计投入时间中的占比上升至 13.05%；分家后，劳动投入下降至99.5日，占比降至仅 2.65%，两者的时间投入，无论是绝对数量还是相对比例都大幅下降。细读排日账，山货收入的下降，跟葛粉产量的下降有直接关系。分家后，程家投入葛根挖掘的时间越来越少。这一方面跟分家后程家劳力的减少有一定关系，但更重要的原因，或许是经过数十年的密集挖掘后，葛根资源逐渐减少（同期制作葛巾的时间也减少了，两者应该是有内在关联的）。其结果是，分家后，程家从山货获取的现金收入逐渐下降，这对 19世纪 90年代以后的程家生计来说，无疑是一个不好的消息。

程家历年粮食产量估算表（单位：市斤）

不过，对程家生计带来影响的，并不限于米价、茶价的波动和山货收入下降的问题，借贷在其中也扮演了重要角色。根据托尼（ RichardH. Tawney）的说法，借贷是历史上乡民社会的基本问题之一，他曾经指出：“在所有小农经营耕作的国家里，乡村社会的根本问题并不是工资收入问题，而是借贷问题。”在困扰20世纪二三十年代中国农户生计的各种因素中，他将债务视为其中很重要的一项。他的看法得到了其他研究的证实。据陈翰笙30年代的调查，广东番禺调查的67个村子中，有50个村子的负债农户占70%以上。他估计，整个广东有三分之二的农户负有某种债务。他指出，广东农户的借债，十分之三是因为疾病、婚丧或其他临时的费用，而十分之七只是为了购买粮食养家糊口。由于本书开头谈到的那场光绪二十六年十月发生的危机是由债务引起的，我们有必要梳理一下此前十年（1891—1900）程家的债务状况。

从排日账看，程家在 19世纪七八十年代并非完全不举债，但这些债务数量不大，在程家的偿还能力范围内，这种状况一直延续至分家后最初几年。光绪十七年，程家仍无数额较大的举债记录。不过此年发开过世。次年，发开的妻子也亡故。当年出现了两笔举债记录。第一笔发生于三月初十日，通过抵押田皮一秤，向廷远祠借入5.5银元。第二笔发生于同月廿五日，从余味山祠借来英洋22元。这两次举债原因不详，不过主要原因估计有二：其一，支付前一年与本年为发开夫妇办理小规模丧葬仪式的开销；其二，支付同仓娶亲的聘金。光绪十八年四月二日，也就是允亨母亲过世不到三个月后，程家举行了订亲仪式，聘金46元，这是一笔不小的开销，已经超出了当时程家一年茶叶销售的毛收入。加上公堂礼、谢媒人钱及举办婚礼酒席等各种费用，这场婚礼的开销当不在60元以下。如计入排日账记录的其他相关开销，此年的仪式与礼物开销高达73元余，为全年总开支的68%。

程家历年开支结构表（光绪十八年—二十一年）（单位：元）

允亨的儿媳是在次年正月二十五日进门的。在此前后，发生了一系列借贷行为。第一笔发生于进门十天前，程家以田皮字一张为抵押，向一位村民借入英洋5元。第二笔发生于此次借贷一个月后，也以田皮字一张为押，从一个会社借入英洋 10元。这两次举债很可能是为了支付同仓成亲酒席的开销。这一年的第三笔借贷发生于八月十四日，当天允亨从兄长和一位村民处借来11元，当天归还给余氏云青祠，取回契字（总共花销 24元，另 13元由允亨自筹）。七天后，程家再次以田皮字为押，从余味山祠借入英洋 20元。这几次借贷应该也是为了处理娶亲的费用，而第三次借贷显示，程家期望通过资金的周转，保住自己的一块耕地。

光绪二十年三月初二日，程家支付了2.3元，赎回一处茶坦的契字，这是当年发生的唯一与借贷有关的行为，而且这次还是取赎而非举债。光绪二十一年发生两次借贷。第一笔发生于正月十五日，程家以庄下田契为押，从一位村民手上借入英洋13元。第二笔发生于六月初七日，这次以牛栏田契为押，从一位村民那里借入英洋 17元。光绪二十二年五月，程家再次以顿底田皮字为押，从一位邻居处借入英洋 10元。这三次举债的用途不详，很可能是为了分拆前两年所借债款的利息及支付光绪二十一年十二月十六日允亨孙子“做三朝”的开销。自光绪二十三年至二十五年，程家的排日账已佚。不过从光绪二十六年程家的债务清单看，这几年程家共借入 6笔款子，其中光绪二十四年（1898）借入4笔，光绪二十五年借入 2笔，总计 65元（详下），占清单所列债务总额的一半余。由于这几年的排日账没有保存下来，这几笔债务的用途已无从知晓。

那么，这些债务对程家带来多大的经济压力呢？我们来看看沱川的借贷利息问题。综合排日账记录的的借贷案例，沱川借贷利息大概有三种情况。其一，无利。这种情况很少见。前面提到，道光二十年二月十九日，发开从母亲手上借到8两余银子，没有还款记录，应该是无利的。光绪二十七年四月二十日，允亨归还有兴2元，排日账记录“无利”，查借入时间是三月十六日，可能因时间较短，有兴没有收利息。其二，10%左右。这种情况也比较少见。咸丰五年七月七日，发开向彦兄借钱，“言定加一”，也即年息率10%。光绪十八年十二月二日，允亨向春元借入5元，次年五月二日归还，刚好满半年，利息为240文，可求得年息率为9.6%。光绪十九年八月十四日，允亨从兄长允兴处借入6元，光绪二十一年七月初六日支付利息1元，外加铜钱100文，可推得年息率为 9.2%左右。不过光绪二十二年七月十七日支付利息1元，年息率升至 16.7%。这个案例说明，就算关系很近的亲属，也会收取不低的利息（下面余熊能借贷例也是如此）。其三，20%左右。这种情况最为常见。道光二十五年二月十七日，发开向社会借入400文，次年二月二十二日归还本息共480文，可求得年息率为20%。光绪十九年八月十四日，允亨从钦五祠借入5元，光绪二十年八月十四日还支付利息1元，年息率为20%。光绪十八年四月初十日，允亨从外甥余熊能处借入1元，六月十九日还，付利息30文，可求得利息率为 18%。光绪十九年八月九日，允亨向灶子母借入5元，光绪二十年八月八日支付利息1元，年息率为20%。前两例是向会社、祠堂借贷的事例，后两例是向个体借贷的事例，除余熊能事例可能因有亲属关系利息稍低外，其他均为 20%的年息率。此外，排日账中还记录了以田地、房屋为抵押，利息以租谷形式交付的事例，也不多见，兹不赘述。

参照光绪二十六年程家债务清单，从历年借贷数额看，光绪二十三年以前，程家的借贷总数累计54元；光绪二十四年、光绪二十五年两年累计70元。可见光绪二十三年之前，程家借贷问题还不算严重，光绪二十二年、二十三年甚至没有借入大笔款项（同时，光绪二十一年、二十二年程家的收入不错），如以20%的年息率计算，每年需偿付利息10.8元，其数额尚在基本可控范围内。相比之下，光绪二十四年后，借贷数量明显增加，光绪二十四年、二十五年，共借入70元，特别是光绪二十四年借入了50元，程家的财务状况急转直下。如以20%的年息率计算，每年需偿付利息24.8元，如以光绪二十六年程家的年收入计算，程家每年需支付的借贷利息，就高达年收入的64%，这还没计入米价上涨造成的经济压力及其他小笔借贷的利息。因此可以断定，随着债务的大幅增加，程家仅仅靠生计收入已无力偿清债务。使情况变得更糟的是，田地的抵押，意味着程家每年必须缴纳更多的地租，程家自身的口粮供给能力也受到影响。

程家历年收入一览表（单位：元）

光绪二十六年发生的一笔不成功的交易，直接影响到程家资金的周转，也有必要稍做讨论。程家生产的茶叶，一般是由茶商前来沱川收购。这一年华北爆发义和团运动，茶叶市场似乎不太顺畅。根据当年的海关报告，截至 1900年上半年，中国多数地区贸易正常进行，华北地区只是到了 6月局势才开始变得严峻，但其他地区贸易照常进行，长江流域的局势风平浪静。在上海茶市方面，跟1899年相比， 1900年红茶出口英、德、美、俄四国的数量有相当程度的提高。报告还提到，此年徽州茶（Hyson）的交易数量跟上一年相似。不过报告也显示，1900年中国的绿茶出口量，比上一年少了13300多担（但较之 1898年增加 15100多担）。报告还提到，“绿茶市场于 6月8日开启，开始出售的是少量平水茶，其价格比前一季度开市低了大约10%。茶叶质量与 1899年不相上下；但由于对主要消费市场 —美国—的预期很糟，一开始成交量很小。但是，后来需求增长，7月中旬前，价格已回升了 5%—10%”。国际贸易的波动，尽管对总出口量的影响不大，但可能造成地方茶市的震荡。

据排日账记载，光绪二十六年五月四日（1900年5月31日），“己早晨挑茶乙头上小沱，遇汪顺意兄家卖，未卖，转回家”。五月十一日（ 6月7日），将茶叶售予休宁大连的一位茶商，总计茶叶177斤余，售价英洋29元余。不幸的是，由于某种原因，这位茶商一直没有支付购茶款。于是从此年六月至次年十二月底，允亨频频前往大连催账，但每次至多讨得一、二元，有时甚至空手而回。上海绿茶市场开市的日期，晚于程家出售春茶的时间，因此不能说开市初期茶市的行情，会直接影响到徽州地方茶市。不过上海茶商对市场的基本判断，会辗转影响到徽州茶市，则不无可能。毕竟，茶叶在一段时间内找不到买主的情形，是程家此前从未遇见过的问题。而且茶叶出口量的下降，也可能给茶市带来震荡。最后买入程家茶叶的吴发祥，是大连人，程家此前对其为人应有一定了解。如果他是一个经常赖账的人，程家应会有所耳闻。因此，此人可能受到茶市波动的影响，本身也折了本，因而无力偿付购茶款。这笔茶款的金额看似不大，不过对当时负债累累的程家来说，却事关自身的资金周转和借贷信用。无论如何，最终悲剧还是发生了。

此外，允亨自身的消费习惯，也给家计带来一定的压力。对比允亨与发开的排日账，允亨似乎不如发开节俭。他不时请朋友打平伙。他还有饮酒的嗜好，平日经常到食杂店买酒买菜。笔者观察到，分家后允亨买酒的次数似有变化，特别是到了光绪后期，经常买酒喝（参见第八章）。在生计逐渐恶化的时期，这种嗜好无疑会增加开支压力。

总体而言，程家家计危机的出现，主要原因不在于茶款没有着落导致的资金紧缺问题，而是经过数年的累积，程家举债的数额已经达到难以偿还的危险境地，即使在正常的年份，他们也丧失了偿清债务的能力。而这些债务的产生，并非由于国际的、全国性或区域性的政经变动，而是由于两三场人生礼仪，尤其是娶亲的昂贵开支。假如程家将娶亲时间推迟几年，他们还会借入这么大笔的债款吗？未必。但是我们能说，这场悲剧纯粹是因为允亨个人决策的错误？也许不能这么说，毕竟影响程家生计的米价、茶价波动，是受到区域性乃至全国性的市场影响的。因此，在这场灾难中，包括米价上涨、茶价稍有下降、山货逐渐枯竭在内的经济局势，加上义和团运动带来的短时段的政经局势，以及允亨的个人嗜好及作为家长做出的决策，都在这种灾难的发生过程中扮演了一定角色。

危机的应对

光绪二十六年十月的危机，似乎来得有些突然。事发七天前，允亨还在家中筹办一场酒席，并请人前来“做伙头办碗”。次日，接女婿，请来几位亲友吃酒。这似乎是允亨长女的出嫁酒。十九日至二十三日，允亨如常砍柴、休息。然后到了二十四日，便发生了债主带人抬走他家中猪的事。但继续往回看，我们发现，九月十五日，允亨就以 10元的价格，当出了一处田产（参见附录六）。那位债主很可能是了解到程家债台高筑、屡次讨债未果后，才带人抬走他的猪的。

危机发生后，允亨似乎有些震惊，接下来的两天内，他没有采取任何行动，似乎不知如何应对。二十四日，排日账只交代“己在家嬉”，又记录“欠少云先生娘来取账，旺成经手，带鸟人（鲸）〔掠〕玉猪去”。后来他在一张纸条上交代，带人前来讨债的债主是巧娇嫂，而抬走猪的是一位“烟鬼人”。次日写道，“己在家里事，欠账难身”。终于，十月廿六日，也即危机发生后的第三天，程氏父子委托本族的程敬敷和好友余添丁前来清理债务。当日，他们俩“到余架家、余竹孙家二家账项，了通无阻”。后面这两位是程家的债主，允亨大概请敬敷、添丁去商讨债务事宜。他们还拟了一份程家债务清单，这份清单夹在光绪二十六年排日账内，保存至今：

借来账项人员述后：
启架兄家：
癸巳八月廿乙日借来英洋贰拾元。有顿底田皮约乙纸。
乙未正月十五日借来英洋拾元。有庄下田皮约乙纸，又加拾贰员。
六月初七借来亦洋拾柒元。有牛栏田契乙纸。
祝孙兄家：
戊戌五月廿九日借来亦洋拾伍元。有顿底田皮约乙纸。
己亥五月廿八日借来英洋叁元，又利洋贰元。三共贰十元正。
兴良兄家：
戊戌七月十七日借来亦洋拾伍元。有顿底田皮约乙纸，中见胞兄。
素从祠：
己亥六月六日借来亦洋拾元。有庄下田皮约乙张。
培掘祠：
戊戌五月初八日借来亦洋拾伍元。有顿底田皮约乙纸。
万青兄：
戊戌二月初乙日借来英洋伍元。有牛栏田契乙纸。
成林祠：
甲午三月十六日借来英洋拾贰元。有牛栏田契乙纸，中见胞兄。

根据这份清单，程家借贷的重要账款共10笔，最早的是光绪十九年（1893）的一笔债款，最晚的是光绪二十五年（1899）的债款，其中光绪十九年借入20元，光绪二十年借入 12元，光绪二十一年借入 22元，光绪二十四年借入50元，光绪二十五年借入 20元，所涉债务共 124元，约当这一年程家茶叶销售毛收入的4倍多。

为偿清债务，程家采取了一系列措施。首先，十月二十七日，“出当青（布）三丈零七寸，又白布三丈八尺零八寸，又青布三丈五尺零贰寸，托兴娥嫂出当英洋贰元正”。同时，“又去英洋二员上素从祠利，掉字乙纸，（伏）〔复〕写一纸，写屋契字一张，付素从祠”。素从祠是清单所列债权人之一，程家借入 10元，此次除支付利息外，还重新立契，以房屋抵押，估计通过这个办法，取回了此前抵押的庄下田皮契。其次，十月三十日，“己同儿托余添灯兄、敬敷弟卖池鱼卅六斤，每洋四斤，计英洋八员，（低）〔抵？〕账”。将鱼塘养的鱼出售，得价8元抵债。再次，十一月初一日，出售顿底、庄下田皮二处，筹得英洋80元。初五日，又支银 5元还培拙祠（应即账单中的培掘祠），将顿底田皮契赎回，同时将菜园一处抵押给该祠，计价10元。初十日，大概账目基本处理完毕，请余添丁吃酒。

排日账中夹了一张纸条，上面交代了程家出售田皮等物业、财产的详情，很可能是允亨在料理账目的过程中写下的：

光绪二十六年十一月初乙日，巧娇嫂倩烟鬼人抢去猪乙口，因身该欠账项甚多，只得向家兄及瑞弟商情，将顿底併庄下贰处田皮共八秤，卖与余慰农兄家，计英洋捌拾员，支洋陆拾员还慰农兄，账项清讫。支洋拾贰元还兴良兄，帐目清讫。支洋伍元还培拙祠，下欠拾员，将门口前菜园押在祠内生殖。支洋柒元还巧娇婶，将猪乙口抵英洋陆元五角。又将塘鱼叁拾乙斤抵英洋柒元五角，三共还贰拾乙元，清讫。支洋贰元还素从祠利钱，下欠英洋拾元正，将身住屋当与祠内，长年加贰行息。

这份文件交代的信息，远不止于出售田皮，还包括前面提到的卖鱼等信息。出售田皮得到的 80元中， 60元是用于向买主还债，实际仅收到20元现金。然后偿还兴良 12元（上面的清单欠 15元）。程家共欠巧娇21元，猪估价 6.5元，鱼售价得 7.5元，另付 7元，偿清了债务。此外就是需要偿还几个祠堂的欠款，培拙祠欠款是15元，付还 5元，另欠 10元以一块菜园做抵押；素从祠欠款是10元，以房屋做抵押，这一点前面已谈到。对照前面的清单，程家还需偿还启架47元，万青 5元，成林祠 12元，共计64元，仍是一笔不小的欠款。

经过这场危机，程家无疑已经元气大伤，经济状况濒临破产。允亨自身似乎深受打击。十月二十九日，他在账中写到：“己在家事体多端。”十一月阴雨天气多，他常在家中休息。十二月，他接连生了八天病。手头拮据，他没有找医生诊治。十二月二十六日，他再进大连找吴发祥讨债，仍是一无所获。尽管如此，他也试图恢复正常生活。他继续参与劳动，上山砍柴、帮人扛木材。十一月二十三日，他托敬敷出燕山买来小猪一头，要价 2.8元，他手头没钱，买猪的钱只能暂时先欠着。

2024-11-30
拱玉书：楔形文字文明的特点

就字面意义而言，两河流域文明或美索不达米亚文明就是发生在两河流域的文明。这个定义只指出了文明发生的地点，只回答了“在哪”的问题，没有涉及这个文明的突出特点。这个文明的突出特点是什么？我认为是文字，即楔形文字。如果根据一个文明的特点来给这个文明下个定义，那么，我现在谈及的这个文明应该叫楔形文字文明，即用楔形文字记录语言以储存和传递信息的文明。这个定义可以摆脱地域束缚，把地理上不属于两河流域、却使用楔形文字记录自己的民族语言、因而属于楔形文字文化圈的古代西亚地区的文明都囊括在内。“书同文”是这个文明的最显著的特点，也是最大“公约数”。因此，我首先从文字谈起。

一、“书同文”。“书同文”就是用同一种文字书写，上古时代的整个西亚地区几乎都用或曾用楔形文字书写，因此可以说，他们“书同文”。但他们的“书同文”只是一种表象，与中华文明中的“书同文”貌合神离。貌合是说，从表面上看，无论是对以古代两河流域为中的西亚地区而言，还是对中华文明而言，“书同文”都意味着在一个跨行政区、甚至跨国界的广大地区使用同一种文字，西亚上古时代的大部分族群都曾使用楔形文字，中华文明使用汉字，此所谓二者貌合。神离是说，楔形文字书写的语言非止一种，而汉字书写的语言只是汉语一种(指在中国境内)。

两河流域(底格里斯河和幼发拉底河)南部是楔形文字的发祥地。早在公元前3200年前后，苏美尔人就发明了楔形文字，并用它来记录自己的民族语言苏美尔语(苏美尔人把自己的语言叫作eme─gi7“土著语”)。早在公元前2700年前后的早王朝时期，苏美尔人在用楔形文字书写苏美尔语文献的同时，时而也用楔形文字书写阿卡德语文献。阿卡德王朝时期(公元前2334—前2154年)，阿卡德语成为官方语言。在此后的一个多世纪里，除一些文学作品外，几乎所有文献都用阿卡德语书写。由于楔形文字是为苏美尔语发明的，所有独体字(从发生的角度观察)都在形式上是象形字，功能上是表意字，有时兼用来表音(节)，所以，用这种文字体系表达(或记载)苏美尔语不成问题，但表达阿卡德语时却显得蹩脚。于是，这时的书吏对楔形文字的使用方式进行了改革：一、多数表意字不再用来表意，而是用来表音，即表音节；二、弃用大部分表意字，只保留一部分表意字的表意用法。这种改革改变了楔形文字的性质，使楔形文字从表意文字(logographic writing)变成了音节文字(syllabic writing)。不论是作为表意文字的楔形文字，还是作为音节文字的楔形文字，其中的任何单字，不论是独体字，还是复合字，都不能只表辅音，不表元音，而必须是表达音节，或元音—辅音式音节，如in、ap等，或辅音—元音式音节，如ba、ti等，抑或辅音—元音—辅音式音节，如tam、?ul等。公元前14世纪，地中海沿岸的乌迦里特出现了楔形字母文字，30个符号分别代表30个辅音，如b、d、?、t等，其中的27个字母是基本字母，3个字母属于附加字母，只用于一些特殊场合，例如用来表达外来借词。到了公元前6世纪的古波斯时期，在国家权力的干预和组织下，在传统埃兰楔文的基础上，波斯人治下的埃兰书吏创造了一个由36个音节符号、5个表意符号组成的文字体系，这个文字体系是在很短的时间内，专门为古波斯语量身打造的。在形式上和功能上，这套楔形符号体系与“字母文字”几乎没有区别，绝大多数学者认为，这36个符号中的任何符号，都不代表语音的最小单位语素(phoneme)，而代表音节(syllable)。我的看法不同，我认为这套符号体系是字母+表意的混合文字体系(下面将说明理由)。这套符号体系与此前的阿卡德(包括巴比伦和亚述)音节文字和埃兰音节文字都有很大区别。最大的区别在于用字量，阿卡德—巴比伦—亚述音节文字体系用字数量约600个符号，书写中埃兰语和新埃兰语的音节文字体系用字量约120个符号，而用来书写古波斯语的符号体系只有36个“音节”(实为字母)符号，加上5个表意符号，加起来不过41个符号。不论古波斯时期创造的这套文字体系属于字母文字，还是属于音节文字，这套文字体系在人类文明史上都是划时代的创新。

楔形文字的使用范围不限于两河流域，埃兰和古波斯帝国的统治中心都不在两河流域，曾借用楔形文字的赫梯人所处的位置更是与楔形文字发祥地的苏美尔相去甚远。公元前2500—前2400年间，楔形文字西传到了叙利亚地区，那里的埃布拉(Ebla)古国接受了楔形文字，开始用楔形文字记录自己的民族语言——埃布拉语(Eblaite)。至于埃布拉语属于西塞姆语还是东塞姆语，在学术界仍有争议；但确定无疑的是，它更接近古阿卡德语。在埃布拉语中，双音节或三音节词汇居多，不适合用表意文字表达，于是，埃布拉人把以表意为主的苏美尔楔形文字改造成为以表音(节)为主的音节文字，这与稍后的阿卡德帝国的做法是一样的。不过，目前还不能确定，究竟是阿卡德人效法埃布拉人，把苏美尔人的表意文字体系变成了音节文字体系，还是恰恰相反。两个族群所操的语言十分接近，在政治舞台上活跃的时间也大致相同，二者在文字方面的创新应该不是平行而独立的，更不应该是巧合，而是二者之中一个是创新者，一个是借鉴者。在阿卡德人统治时期，两河流域东边的埃兰人也接受了楔形文字，用来书写与达罗毗荼语(Dravidian)有关联的埃兰语。公元前1500年前后，小亚细亚的赫梯人也开始借用楔形文字来书写自己的民族语言——属于印欧语系的赫梯语。地中海沿岸的乌迦里特人于公元前14世纪甚至发明了楔形字母来书写属于西塞姆语的乌迦里特语。这套字母包括30个辅音字母和一个隔字符。

可见，古代西亚地区的“书同文”是真实的，但这种“书同文”只流于表面，背后的实际情况是：在“书同文”过程中，楔形文字经历了三次脱胎换骨的根本变化，第一次变化发生于公元前2400年前后，从表意文字体系发展出音节文字体系(或音节—表意体系)；第二次变化发生于公元前14世纪，在音节文字的基础上，地中海沿岸产生楔形字母，即乌迦里特字母(30个辅音符号)；第三次变化发生于公元前6世纪的古波斯帝国，在埃兰音节文字的基础上产生古波斯楔形字母+表意字的混合文字体系，36个字母+5个表意字。第一个在波斯波利斯(Persepolis)完整而准确地临摹古波斯语铭文的尼布尔(Karsten Niebuhr，1733—1815)在完全读不懂铭文的情况下，仅凭直觉判断，认为书写古波斯语的楔形文字是字母(Buchstaben)文字。德国的格罗特芬(G. F. Grotefend，1775─1853)是第一个成功解读古波斯语铭文的人，而他是把这种文字当作字母文字来解读的，因而获得成功，例如，他把书写“大流士”的7个符号解读为d─a─r─h─e─u─sh，显然，在格罗特芬看来，这七个符号就是七个字母，代表语音中的最小单位。从20世纪50年代起，学术著作中的古波斯字母表都成了音节表，a、i、u、ka、ku、ga、gu等等。专门研究古波斯语语法的美国宾大教授肯特(R. G. Kent)认为，每个辅音都自带一个“固有”(inherent)的元音。他一边这样认为，一边又将(仅举一例)“我是大流士”音译为adam：Drayavau?，而不是adama：Drayavau?a，这令人费解。依我浅见，古波斯的这套文字体系属于字母+表意字的混合文字体系，36个字母+5个表意字。在36个字母中，除三个元音(a、i、u)字母外，其余都是辅音字母，不自带“固有”的元音，元音需由阅读者根据语言中的正确形式自行添加。很多(如果不是全部的)文字体系，包括这套古波斯文字体系，都是为某种特定语言发明的，更是为以那种特定语言为母语的人发明的。就古波斯的这套字母而言，只要波斯人掌握了这套辅音字母的发音，就能正确地书写和阅读，也就是说，这套字母文字体系具有与生俱来的助记性质，不完全表达语言。

楔文的上述变化代表了人类历史上出现的三种主要的文字类型：表意文字、音节文字和字母文字。这三种类型产生的先后顺序是先有表意文字(公元前3200年前后)，若干世纪后产生音节文字(公元前2400前后)，再过千年后产生字母文字(公元前14世纪)，但这不代表文字由低级向高级的发展，更不是文字发展的三阶段。这三种文字类型没有高低之分和优劣之别，它们都是为适应各自所表达的语言的需要而产生的，都是原配语言的完美的可视符号。它们有各自的产生途径和发展规律，它们之间的关系不是取代关系，也不是晋级关系，而是互不干扰、平行发展、各走各路的关系。音节楔形文字产生后，作为表意的楔形文字并未退出历史舞台，而是继续使用。乌迦里特楔形字母产生后，很快就消失了，这也不是字母文字本身的错。古波斯时期的楔形字母+表意字的混合文字体系也很快走完了自己的生命历程。这也不是说这种文字体系本身多么不好而一定短命。某种文字体系的终结往往不是文字本身的原因，而是另有原因。

楔形文字的种种变化都发生在公元前。从楔文产生的公元前3200年前后，到公元前1世纪，公元前的这最后三千年见证了楔形文字本身的种种变化，包括楔形文字被多个古代民族借用来书写自己的民族语言。上古时代的整个西亚地区族群复杂，政治风云变幻莫测，文明周期相对较短，究其原因，其中有地理原因，这里是欧亚非的交汇点，也是各文明的汇聚点，民族交融和交锋从古到今一直在上演。除这个原因外，可能还存在一个重要原因，那就是，在这个地区，始终没有出现一个在人口数量上具有绝对优势、在文化上足够优秀、文化认同感足够强烈，以至于可以由此产生巨大的文化凝聚力、长期立于不败之地的主体民族(或族群)。

“书同文”本来可以带来文化上的凝聚力，但由于古代西亚的情况是同文不同语，同文不同种，所以，这种“同文”没有给这里的文化带来凝聚力，也没有给这里的人带来文化认同感。中华文明中的“书同文”是国家推行的政策，具有明确目的，那就是维护大一统，本身自带凝聚力和向心力。古代西亚地区楔形文字文化圈的“书同文”，是后进文化为保持自身文化的延续和发展而采取的拿来而后进行改造的措施，目的是为了在一种强势文化中保留自己的语言和文化，本身自带离心性，即脱离先进文化或至少与先进文化保持平行而不被完全融合或同化的离心性。

二、这个文明的另一个特点是尊同神。苏美尔人创造的或尊崇的各种神灵也被后来的不同族群所崇拜。苏美尔人尊崇的天神安(An)、“风”神恩利尔(Enlil)、智慧者恩基(Enki)、月神楠纳(Nanna)、战神和爱神伊楠娜(Inanna)、太阳神乌图(Utu)等等，也都是后来的阿卡德人、埃布拉人、巴比伦人和亚述人尊崇的神。多神崇拜始终是楔形文字文明的唯一宗教形式，这个文明的意识形态深深植根于多神崇拜。中巴比伦后期，即公元前1200年前后，开始出现独尊一神的倾向，但一神教始终没有能够打破多神崇拜的传统。很显然，楔形文字文明在宗教方面缺乏创新，或可谓守成有余、创新不足。

楔形文字文明中各族群崇拜的神绝不限于上面提到的几个或多个自然神，戴梅尔在1914年发表的《巴比伦万神殿》里罗列了3300个神的具体名称，在1950年的第2版中，神的数量增加到5580个，去掉重复的，仍有5367个，这还是仅限于巴比伦尼亚地区，不包括其他地区。舒鲁帕克遗址出土了很多早王朝时期(约公元前2500年)的神表，其中最大的一块神表泥版记载了560个神的名字，这些神都是苏美尔人崇拜的神，至少神的名字是苏美尔语，不包括名字属于非苏美尔语的神。一般说来，每个城市都有一到两个保护神，国王有自己的个人保护神，大概普通百姓也有自己的保护神，至少官员或社会名流如此。拉迦什出土的早王朝时期的文献常提到与邻邦发生冲突，也常提到冲突一方的主神对冲突另一方国王的某种行为不满，于是发动战争，为神而战，胜利也属于神。虽然国王们常常打着神的旗号发动战争，但针对的都不是对方的神，而是人。

神有等级，有大神，有小神，大神中还有等级，上面提到的神都是大神中的大神。不论是大神还是小神，神之间不存在仇恨，也不存在神之间的相互杀戮，《创世神话》中的神间大战发生在造人之前，与人间没有关系。人间的城市(国家)都有保护神，保护神的地位有高有低，但每个城市(国家)的政治、经济以及宗教地位并非取决于保护神的地位。尼普尔是例外，这里是众神之父(ab─ba─dingir─dingir─ré─ne─ke4)恩利尔的崇拜地，是苏美尔人的宗教中心，取得霸权的国王通常要到这里为恩利尔建立神庙或修缮神庙，为自己的统治或霸权营造合法性。这个所谓的宗教中心是个政权更迭的见证地，是君王政治表演的舞台，与普通百姓的信仰没有关系。在历史文献中也不乏某国之神奉恩利尔之命向另一国开战的例子，如拉迦什向温玛宣战被视为“宁吉苏神，恩利尔的战士，遵(恩利尔)正义之命，与吉萨(温玛)开战”。可见，一个神对某一城市(国家)而言是保护神，而对其他城市(国家)而言可能是威胁和灾难。多神崇拜的宗教信仰和一城一神(有的城市不止一神)的实际操作把历史上、文化上以及宗教等方面都高度认同的同一族群从精神上和物理上分割开来，在精神上和物理上都给这样的族群赋予了潜在的离心力，带来了分裂隐患。多神崇拜应该是楔形文字文明逐渐衰败而最终走向消亡的原因之一。

三、求一统也是这个文明的特点之一。大一统始终是有抱负的统治者的追求目标。乌鲁克早期文明(即公元前3200年前后)时期的政治大势目前尚无从知晓，早王朝时期(约公元前2800—2350年)的天下大势趋于明朗，这个时期城邦林立，战争频繁，城邦间常常相互攻伐，争夺地区霸权。公元前2330年前后，萨尔贡(Sargon)征服各邦，以阿卡德为都建立统一帝国，统治范围包括西至地中海、南到波斯湾的广大地区。这种统一局面仅仅维持了一个多世纪，之后很多传统的独立城邦就纷纷独立，这时又遭到古提(Gutium)人入侵，以两河为中心的广大西亚地区进入古提人统治时期。由于古提人留下的历史铭文极少，现代学者对这个时期的了解十分有限。根据《苏美尔王表》的记载，古提人的统治历经21王，享国91年零40天，而后遭到乌鲁克人图黑伽尔(Utuhegal)领导的苏美尔联军的驱逐，乌鲁克恢复独立，其他地区的传统城市(国家)也都恢复独立。乌尔娜玛(Urnamma，公元前2111—前2094年)很快把这些城市(国家)又统一在他的治下，建立了中央集权制国家，现代学者名之曰乌尔第三王朝，盛极一时。但仅仅历经五王便亡国，末王被俘往埃兰，两河流域再度陷入分裂，这种局面持续大约两个世纪。此后，汉穆拉比(Hammurapi，公元前1792—前1750年)建立统一帝国，享国约一个半世纪，于公元前1600年前后，灭于赫梯王穆尔什里一世(Mur?iliI)之手。赫梯人没有统治巴比伦尼亚的意图，班师回国。凯喜特人(Kassites)趁虚而入，取得巴比伦尼亚的统治权。凯喜特人既不是塞姆人，也不是苏美尔人，其语言归属问题至今悬而未解。凯喜特人不但接管了前朝天下，还继承和发扬了巴比伦人的文化传统，建立了稳固的政权，历经36王，享国近400年，从公元前1530年到前1155年。凯喜特王朝灭亡后，经海国第一王朝和伊辛第二王朝，西亚地区再次统一，这次是统一在亚述人的统治下，现代学者称这个时期为新亚述时期(约公元前1000—前625年)。公元前7世纪，权力中心又南移到巴比伦尼亚的迦勒底王朝(公元前625—前539年)。公元前539年，波斯人占领巴比伦，两河流域的历史进入波斯人统治时期，即古波斯时期(公元前539—前331年)。之后是亚历山大大帝(公元前336—前323年)的短暂统治。亚历山大去世后，西亚地区再次陷入分裂，在塞琉古统治时期，苏美尔书写传统一度在两河流域南部的文明发祥地乌鲁克复兴。目前发现的最后一块楔文泥版属于公元74年。至此，楔形文字文明彻底成为历史。

纵观楔形文字文明的整个发展、衰亡的历程可以发现，统一可以实现，但不可持续，原因很多，其中一个重要原因是参与这个文明的族群众多，但没有一个主体族群，即没有一个人数足够多，文化足够强，任何人也打不倒，即使一时倒下，也能再度复兴的主体族群。这是这个地区不断出现统一、分裂、再统一、再分裂，朝代不断更替、权力频频易主、传统逐渐丧失、文化一再受到冲击而最终彻底消亡的重要原因。如果说在楔形文字文化圈中哪个族群在一定程度上可称得上主体族群，那一定是苏美尔人，他们最接近“主体民族”的标准，他们发明了文字，创造了一套宗教体系，在文学艺术和科学技术方面也取得了卓越成就。他们的文明延续千余年，可谓千年不倒(从公元前3200—前1800年)，在倒下后的近两千年里影响仍在。到了纪元前后，这个文明才彻底消失。不可思议的是，这个曾经引领世界千余年的文明消失得如此彻底，以至于“苏美尔”和“苏美尔人”在希伯来《旧约圣经》和西方古典时期的著作中没有留下一点痕迹。没有近现代的考古发掘和文献学家的努力，就没有苏美尔文明的再现和复活。

四、最后谈谈宽容性。时代的变迁和朝代的更替往往都是在血雨腥风中实现的，即使是邻邦之间争夺土地或水源也会杀得尸横遍野。在历史文献中，很多君王极力鼓吹他们杀敌、洗城的功绩，到了新亚述时期，这种鼓吹更是达到登峰造极的程度。文献中的鼓吹也许就是现实中的真实。毋庸置疑，残酷性和血腥性是战争的常态。但也有少数例外，从这些例外中可以看到一些人性的光芒，值得了解，也值得借鉴。

早在公元前2800年前后，巴比伦尼亚北部的基什(Ki?)国王阿伽曾率军南下，包围了两河流域南部的乌鲁克。乌鲁克国王吉尔伽美什率众应敌，不但战胜强敌，还俘获敌军的亲征国王。然而，吉尔伽美什没有加害于这位来犯国王，而是让他安全地重返家园。不论出于什么理由和目的，这都是人性善良一面的体现，都是一种包容和宽容。自身强大，战胜敌人，然后原谅敌人，宽容敌人，化敌为友，这是强者的自信，也是强者的智慧和善良。吉尔伽美什被视为古代君王的典范，一定与他的强大、智慧、善良和宽容有关。《吉尔伽美什与阿伽》歌颂的正是他的这样品质。

古波斯时期的居鲁士(公元前559—前530年)更是把强者和宽容演绎到了极致。公元前539年(一说前538年)，居鲁士的军队占领巴比伦。对巴比伦人而言，波斯人是外族，历史上的外族入侵都是血腥的，阿卡德帝国、乌尔第三王朝建立的帝国以及古巴比伦帝国都是在外族入侵中灭亡的，他们遭到的打击是毁灭性的。然而，居鲁士对巴比伦人却采取了怀柔政策，尤其在宗教方面，居鲁士展现了包容和宽容，这让巴比伦人感激不已。于是，巴比伦书吏撰文赞美居鲁士的功德，他们把铭文写在一个腰鼓形的泥质载体上，这就是“居鲁士圆柱铭文”。铭文不但讲到居鲁士允许尼布甲尼撒统治时期的“巴比伦之囚”返回自己的家园，还讲到居鲁士采取的其他宗教包容政策：把以前被运到苏萨的属于“苏美尔和阿卡德”的神像都毫发无损地送回原神庙。按照苏美尔、巴比伦以及亚述的传统，毁掉一座城市，一定要毁掉神庙，毁灭神像，或把神像作为战利品掠走。居鲁士不但没有这样做，还使那些以前被运到苏萨的神像物归原主，这对巴比伦人而言是莫大的恩惠和宽容，所以，巴比伦人感恩戴德，作文盛赞恩主。居鲁士是一代枭雄，是大征服者，占领巴比伦后不久就去征服马萨盖特人，并战死沙场。可以说，居鲁士对巴比伦人采取的怀柔和宽容超乎寻常。居鲁士为什么唯独对巴比伦人采取了怀柔和宽容政策？也许是出于对先进文化的尊重或敬畏！巴比伦人的悠久历史以及文化、科技(尤其是天文学)、文学等方面的优势世人有目共睹。从《居鲁士圆柱铭文》可知，居鲁士自称马尔都克(Markuk)神是“我的主人”(EN─ia)。马尔都克是巴比伦人的主神，征服者信奉被征服者的主神，这是信仰认同，也是文化认同。征服者认同被征服者的文化和宗教，说明征服者有接受先进文化的意愿和情怀，更说明先进文化自带一种威力，一种同化后进文化的威力。

本文转自《世界历史》2023年第5期，有节略

2024-11-30
最优学习的85%规则
文章原题目：The Eighty Five Percent Rule for Optimal Learning

论文地址：https://www.biorxiv.org/content/10.1101/255182v1

1. “恰到好处”——学习的迷思

人们在学习新技能时，例如语言或乐器，通常会觉得在能力边界附近进行挑战时感觉最好——不会太难以至于气馁，也不会太容易以至于感到厌烦。

历史传统中有所谓的中庸原则，我们也会有一种简单直觉经验，即做事要“恰到好处”。反映在学习中，即存在一个困难程度的“甜蜜点”，一个“金发姑娘区”。在现代教育研究中，在这个区域的不仅教学最有效果[1]，甚至能解释婴儿在更多更少可学习刺激之间的注意力差异[2]。

在动物学习研究文献中，这个区域是“兴奋”[3]和“失落”[4]背后的原因，通过逐步增加训练任务的难度，动物才得以学习越来越复杂的任务。

在电子游戏中几乎普遍存在的难度等级设置中，也可以观察到这一点，即玩家一旦达到某种游戏水平，就会被鼓励、甚至被迫进行更高难度水平的游戏。

类似地，在机器学习中，对于各种任务进行大规模神经网络训练，不断增加训练的难度已被证明是有用的 [5,6]，这被称为“课程学习”（Curriculum Learning）[7] 和“自步学习”（Self-Paced Learning）[8]。

尽管这些历史经验有很长的历史，但是人们一直不清楚为什么一个特定的难度水平就对学习有益，也不清楚最佳难度水平究竟是多少。

在这篇论文中，作者就讨论了在二分类任务的背景下，一大类学习算法的最佳训练难度问题。更具体而言，论文聚焦于基于梯度下降的学习算法。在这些算法中，模型的参数（例如神经网络的权重）基于反馈进行调整，以便随时间推移降低平均错误率[9]，即降低了作为模型参数函数误差率的梯度。

这种基于梯度下降的学习构成了人工智能中许多算法的基础，从单层感知器到深层神经网络[10]，并且提供了从感知[11]，到运动控制[12]到强化学习[13]等各种情况下人类和动物学习的定量描述。对于这些算法，论文就训练的目标错误率提供了最佳难度的一般结果：在相当温和的假设下，这一最佳错误率约为15.87%，这个数字会根据学习过程中的噪音略有不同。

论文从理论上表明，在这个最佳难度下训练可以导致学习速度的指数级增长，并证明了“85%规则”在两种情况下的适用性：一个简单的人工神经网络：单层感知机，以及一个更复杂、用来描述人类和动物的感知学习[11]的类生物神经网络（biologically plausible network）。

2. 计算最优学习率

在标准的二分类任务中，人、动物或机器学习者需要输入的简单刺激做出二元标签分类判断。

例如，在心理学和神经科学[15,16]的随机点动实验（Random Dot Motion）范例中，刺激由一片移动的点组成 – 其中大多数点随机移动，但有一小部分连贯一致地向左或向右移动。受试者必须判断相应一致点的移动方向。

决定任务感知判断难度的一个主要因素是一致移动点所占的比例。如下图所示，一致点占0%时显然最难，100 %时最容易，在 50%时难度居中。

实验人员可以在训练过程中使用被称为“阶梯化”（staircasing）的程序[17]控制这些一致移动点的比例以获得固定的错误率。

论文假设学习者做出的主观决策为变量 h，由刺激向量 x（如所有点的运动方向) 的经函数 Φ 计算而来，即：h = Φ(x, φ) ——(1)，其中φ是可变参数。并假设变换过程中，会产生一个带噪声表示的真实决策变量Δ（例如，向左移动点的百分比），即又有 h = ∆ + n ——(2)。

噪声 n 由决策变量的不完全描述而产生的，假设 n 是随机的，并从标准偏差σ的零均值正态分布中采样。设 Δ = 16，则主观决策变量 p(h) 的概率分布如图1A所示。

红色曲线是学习之后新的曲线，可以看到其分布标准差σ比原来有所降低，使更多变量分布在了Δ=16 附近。这就说明学习者在学习之后决策准确度有所提高。曲线下方的阴影区域面积（积分）对应于错误率，即在每个难度下做出错误响应的概率。

如果把决策界面（decision boundary）设置为 0，当 h > 0 时模型选择选项 A，当 h < 0 时选择 B， h = 0 时随机选择。那么由带噪声表示的决策变量导致的错误概率分布为：

其中 F(x) 是噪声标准正态分布的累积分布函数，概率密度函数 p(x)= p(x|0,1)。由等式(3)可以得到β = 1/σ。即若 σ 为正态分布的标准差，则 β 精确表示了在任务难度Δ下学习者的技能水平。σ越小，β越大，技能水平越高。

如图1B所示，无论学习前还是学习后，随着决策变得更容易（Δ增加），两条曲线皆趋于下降，从而使错误率变得更低。

但两条曲线的下降速度是不一样的：当β增加（σ变小）后，曲线更集中和陡峭，因此学习之后的红色曲线下降速度也更快，这表示学习者对任务挑战的技能水平越趋于完善。

由最初的公式(1) 可知，学习的目标是调整参数φ，使得主观决策变量 h 更好地反映真实决策变量Δ。即构建模型的目标应该是尽量去调整参数φ以便减小噪声 σ 的幅度，或者等效地去增加技能水平精度 β。

实现这种调节的一种方法是使用误差率函数的梯度下降来调整参数。例如，根据时间 t 来改变参数。论文在将梯度转换为精度β的表示后，发现影响因子只在于最大化学习率 ∂ER/∂β 的值，如图1C所示。显然，最佳难度Δ随着技能水平精度β的函数 dER/dβ 而变化，这意味着必须根据学习者的技能水平实时调整学习难度。不过，通过Δ和ER之间的单调关系（图1B），能够对此以误差率ER来表达最佳难度，于是可以得到图1D。

在变换后，以误差率表达的最佳难度是一个精度函数的常量。这意味着可以通过在学习期间将误差率钳制在固定值上实现最佳学习。论文通过计算得出，对于高斯分布的噪声这个固定值是：

——即最佳学习率下误差率约为 15.87 %。

3. 模拟验证：感知机模型

为了验证“85%规则”的适用性，论文模拟了两种情况下训练准确性对学习的影响：在人工智能领域验证了经典的感知机模型，一种最简单的人工神经网络，已经被应用于从手写识别到自然语言处理等的各种领域。

感知机是一种经典的单层神经网络模型，它通过线性阈值学习过程将多元刺激 x 映射到二分类标签 y 上。为了实现这种映射，感知机通过神经网络权重进行线性变换，并且权重会基于真实标签 t 的反馈进行更新。也就是说，感知机只有在出错时才进行学习。自然的，人们会期望最佳学习与最大错误率相关。然而，因为感知机学习规则实际上是基于梯度下降的，所以前面的分析对这里也适用，即训练的最佳错误率应该是15.87％。

为了验证这个预测，论文模拟了感知机学习情况。以测量范围为0.01到0.5之间的训练误差率，步长为0.01（每个误差率1000次模拟）训练。学习的程度由精确度β确定。正如理论预测的那样，当以最佳错误率学习时，网络学习效率最高。如图2A所示，不同颜色梯度表示了以相对精度β/βmax 作为训练误差率和持续时间的函数，在 β=βmax 时学习下降最快；在不同错误率比例因子下的动态学习过程，图2B也显示，理论对模拟进行了良好的描述。

图2：“85%规则”下的感知机

4. 模拟验证：类生物神经网络

为了证明“85%规则”如何适用于生物系统学习，论文模拟了计算神经科学中感知学习的“Law和Gold模型”[11]。在训练猴子学会执行随机点运动的任务中，该模型已被证明可以解释包括捕捉行为、神经放电和突触权重等长期变化情况。在这些情况下，论文得出相同结果，即当训练以85％的准确率进行时，学习效率达到最大化。

具体来说，该模型假设猴子基于MT脑区的神经活动做出有关左右感知的决策。MT区在视觉系统的背侧视觉通路（Dorsal visual stream），是已知在大脑视觉中表征空间和运动信息的区域[15]，也被称为“空间通路”（where），相对的，视觉系统另一条腹侧视觉通路（Ventral visual stream）则表征知觉形状，也被称为“辨识通路”（what）。

在随机点动任务中，已经发现MT神经元对点运动刺激方向和一致相关性 COH 都有响应，使得每个神经元对特定的偏好方向响应最强，且响应的幅度随着相关性而增加。这种激发模式可通过一组简单的方程进行描述，从而对任意方向与相关刺激响应的噪声规模进行模拟。

根据大脑神经集群响应情况，Law 和 Gold 提出，动物有一个单独脑区（侧面顶侧区域，LIP）用来构建决策变量，作为MT中活动的加权和。不过它与感知机的关键差异在于，存在一个无法通过学习来消除的随机神经元噪声。这意味着无论多么大量的学习都不可能带来完美的性能。不过，由论文计算结果所示，不可约噪声的存在不会改变学习的最佳精度，该精度仍为85%。

Law and Gold 模型和感知机的另一个区别是学习规则的形式。具体来说就是有基于预测误差正确的奖励，会根据强化学习规则进行更新权重。尽管与感知器学习规则有很大的不同，但Law和Gold模型仍然在误差率[13]上实现梯度下降，在 85%左右实现学习最优。

为了测试这一预测，论文以各种不同的目标训练误差率进行了模拟，每个目标用MT神经元的不同参数模拟100次。其中训练网络的精度β，则通过在1%到100%之间以对数变化的一组一致性测试上，拟合网络的模拟行为来进行估计。

如图3A所示，在训练网络精确度β作为训练错误率的函数下，蓝色的理论曲线很好描述了训练后的精度。其中灰点表示单次模拟的结果。红点对应于每个目标误差率的平均精度和实际误差率。

此外，在图3B中，以三条不同颜色测量曲线显示了三种不同训练错误率下行为的预期差异。可以看到，在误差率为 0.16 （接近 15.87%）的黄色曲线上，结果精确度高于过低或过高误差率的两条曲线，即取得了最优的训练效果。

5. 心流的数学理论

沿着相同的思路，论文的工作指向了“心流”状态的数学理论[17]。这种心理状态，即“个体完全沉浸在没有自我意识但具有深度知觉的控制”的活动，最常发生在任务的难度与参与者的技能完全匹配时。

这种技能与挑战之间平衡的思想，如图4A所示，最初通过包括另外两种状态的简单概念图进行描述：挑战高于技能时的“焦虑”和技能超过挑战时的“无聊”，在二者中间即为“心流”。

而以上这三种不同性质的区域：心流，焦虑和无聊，可以本篇论文的模型中自然推演出来。

设技能水平为精度 β，以真实决策变量的反函数 1 /Δ 为技能挑战水平。论文发现当挑战等于技能时，心流与高学习率和高准确性相关，焦虑与低学习率和低准确性相关，厌倦与高准确性和低学习率相关（图4B和图C）。

也就是说，在技能与挑战水平相等时以“心流”状态进行的学习，具有最高的学习率和最高的准确性。

此外论文引述了 Vuorre 和 Metcalfe 最近的研究[18]发现，心流的主观感受达到巅峰时的任务是往往主观评定为中等难度的任务。而在另一项关脑机接口控制学习方面的研究工作发现，主观自我报告的最佳难度测量值，在最大学习任务相关难度处达到峰值，而不是在与神经活动的最佳解码相关难度处达到峰值[19]。

那么一个重要的问题来了，在使用最佳学习错误率，达到主观最佳任务难度即心流状态进行学习时，其学习速度究竟有多快？

论文通过比较最佳错误率与固定但可能次优的错误率、固定难度进行学习来解决了这个问题。通过对训练误差率函数计算，最终得到，在固定错误率下：学习技能β精度随着时间 t 的平方根而增长。

而相对的，在没有使用最佳固定错误率学习，即决策变量固定下一般学习，其结果会强烈地依赖于噪声的分布。不过论文计算出了噪声为正态分布的情况下的近似解，对β的提升，学习技能以更慢的对数速度增长。即若最佳训练率下，可以相当于对后者实现指数级增长的改进。二者学习增速趋势对比图如下：

从论文对感知机和Law and Gold 模型测试，心流理论的数学化可以看出，未来研究者们去测试各种学习类型活动参与度的主观测量值，验证是否在最大学习梯度点达到峰值，“85%规则”是否有效将会是有非常有趣的。

然而这篇论文的作用还远不仅于此，下面就本文意义做进一步深入探讨。

6. 学习的定量时代？讨论、延伸与启示

学习对个体生物个体的重要性不言而喻，甚至比大多数人想得更重要。在2013年1月，《心理学通报与评论》上发表了一篇论文①的就认为，学习不仅一个是认知过程，在更本质的功能层面是一种个体自适应过程，包括生物体在有机环境规律作用下的行为改变，并认为就如演化论是生物学核心一样，学习研究应该是心理学的核心。

然而，自心理学诞生后的诸多理论，对学习的研究往往止于简单行为操作或概念描述层面。比如行为主义研究者巴普洛夫和和斯金纳经典条件反射、操控条件反射，苏联心理学家维果斯基（Lev Vygotsky）有关儿童教育的“最近发展区”理论，有关动机和表现之间的关系的耶基斯–多德森定律（Yerkes–Dodson law）、基于舒适-学习-恐慌区的“舒适圈理论”，还包括米哈里·契克森米哈赖的“心流理论”，安德斯·艾利克森的“刻意练习”等等。

这些学习理论，要么强调学习需要外部刺激相关性、或正向奖励负向惩罚的某些强化，要么强调学习在大周期的效果，或较小周期的最小行动，要么寻求某种任务难度与技能水平、或动机水平与表现水平之间的一个折中区域。但是却从来没有给出如何到能达这种状态的条件，往往只能凭借有教育经验的工作者在实际教学中自行慢慢摸索。

而在这篇论文中，研究者考虑了在二分类任务和基于梯度下降的学习规则情况下训练准确性对学习的影响。准确计算出，当调整训练难度以使训练准确率保持在85％左右时，学习效率达到最大化，要比其他难度训练的速度快得多，会使学习效果指数级快于后者。

这个结果理论在人工神经和类生物学神经网络具有同样的效果。即“85%规则”既适用于包括多层前馈神经网络、递归神经网络、基于反向传播的各种深度学习算法、玻尔兹曼机、甚至水库计算网络（reservoir computing networks）[21, 22])等广泛的机器学习算法。通过对∂ER/∂β梯度最大化的分析，也证明其适用于类生物神经网络的学习，甚至任何影响神经表征精确度的过程，比如注意、投入或更一般的认知控制[23，24]。例如在后者中，当∂ER/∂β最大化时，参与认知控制的好处会最大化。通过关联预期价值控制理论（Expected Value of Control theory）[23，24，25]的研究，可以知道学习梯度 ∂ER/∂β 由大脑中与控制相关的区域 ( 如前扣带回皮层 ) 来进行监控。

因此可以说，本篇论文无论对计算机科学和机器学习领域研究，还是对心理学和神经科学研究，都具有重要的意义。

在前者，通过“课程学习”和“自步学习”诉诸广泛的机器学习算法，本文基于梯度下降学习规则思路下包括神经网络的各种广泛学习算法，都急需后续研究者进行探索和验证。在最佳学习率上，论文的工作仅仅是对机器学习学习效率数学精确化实例的第一步。并且同时也促使研究者思考：如何将这种最优化思路推广到在更广泛的环境和任务的不同算法中？例如贝叶斯学习，很明显和基于梯度下降的学习不同，贝叶斯学习很难受益于精心构建的训练集，无论先出简单或困难的例子，贝叶斯学习者会学得同样好，无法使用 ∂ER/∂β 获得“甜蜜点”。但跳开论文研究我们依然可以思考：有没有其它方法，例如对概念学习，通过更典型或具有代表性的样本、以某种设计的学习策略来加快学习速度和加深学习效果？

另一方面，这篇论文的工作同样对心理学、神经科学和认知科学领域有重大启示。

前面已经提到，有关学习理论大多止步于概念模型和定性描述。除了少数诸如心理物理学中的韦伯-费希纳定律（Weber-Fechner Law）这样，有关心理感受强度与物理刺激强度之间的精确关系，以及数学心理学（Mathematical psychology）的研究取向和一些结论，缺乏数学定量化也一直是心理学研究的不足之处。

而这篇论文不仅结论精确，其结论适用于包括注意、投入或更一般的认知控制下任何影响神经表征精确度的过程。如前所述，如果我们采取“学习不仅一个是认知过程，在更本质的功能层面是一种个体自适应改变过程”有关学习本质的观点，会发现它带来的启示甚至具有更大的适用性，远远超出了一般的认知和学习之外。

例如，在知觉和审美方面的研究中，俄勒冈大学（University of Oregon）的物理学 Richard Taylor 通过对视觉分形图案的研究发现，如设白纸的维度D为1，一张完全涂黑的纸的维度D为2，即画出来的图形维度在 1~2 之间，那么人类的眼睛更偏好于看维度 D=1.3 的图形[26]。事实上许多大自然物体具有的分形维度就是 1.3，在这个复杂度上人们会感到最舒适。一些著名的艺术家，比如抽象表现主义代表人物 ( Jackson Pollock )，他所画的具有分形的抽象画（下图中间一列，左边是自然图，右边为计算机模拟图）分布在 D=1.1 和 1.9 之间，具有更高分形维度的画面会给人带来更大的压迫感[27]。

心理学家 Rolf Reber 在审美愉悦加工的流畅度理论（Processing fluency theory of aesthetic pleasure）中[28]提出，我们有上述这种偏好是因为大脑可以快速加工这些内容。当我们能迅速加工某些内容的时候，就会获得一个正性反应。例如加工 D = 1.3的分形图案时速度很快，所以就会获得愉悦的情绪反应。此外，在设计和艺术领域心理学家域唐纳德·诺曼（Donald Arthur Norman）和艺术史学家贡布里希（Ernst Gombrich）也分别提出过类似思想。

对比下 D = 1.3 和 15.87% 的出错率，如果进行下统一比例，会发现前者多出原有分形维复杂性和整体的配比，未知：已知（或熟悉：意外，秩序与复杂）约为 0.3/1.3 ≈ 23.07%，这个结果比15.87%要大。这种计算方法最早由数学家 George David Birkhoff 在1928 年于《Aesthetic Measure》一书中提出，他认为若 O 为秩序，C 为复杂度，则一个事物的审美度量 M = O/C。

因此，在最简化估计下，可以类似得出 23.07% 额外信息的“最佳审美比”，会让欣赏者感到最舒适。

当然，因为信息复杂度的计算方法不一，上面只是一个非常粗略的估计。审美过程涉及感觉、知觉、认知、注意等多个方面，并且先于狭义的认知和学习过程，因此最佳审美比应该会15.87%要大。但至于具体数值，很可能因为不同环境和文化对不同的主体，以及不同的计算方法有较大差别，例如有学者从香农熵和柯尔莫哥洛夫复杂性方面进行度量的研究[29]。

但不管怎样，从这篇文章的方法和结论中，我们已可以得到巨大启示和信心，无论是在人工智能还是心理学或神经科学，无论学习还是审美、知觉或注意，在涉及各种智能主体对各种信息的处理行为中，我们都可能寻求到一个精确的比例，使得通过恰当选取已知和未知，让智能主体在体验、控制或认知上达到某种最优。而这种选取的结果，会使积累的效果远超自然过程得到改进。从这个意义上讲，这篇论文影响得很可能不只是某些科学研究方向，而是未来人类探索和改进自身的根本认知和实践方法。

参考资料
1. Celeste Kidd, Steven T Piantadosi, and Richard N Aslin. The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PloS one, 7(5):e36399, 2012.
2. Janet Metcalfe. Metacognitive judgments and control of study. Current Directions in Psychological Science, 18(3):159–163, 2009.
3. BF Skinner. The behavior of organisms: An experimental analysis. new york: D.appleton-century company, 1938.
4. Douglas H Lawrence. The transfer of a discrimination along a continuum. Journal of Comparative and Physiological Psychology, 45(6):511, 1952.
5. J L Elman. Learning and development in neural networks: the importance of starting small. Cognition, 48(1):71–99, Jul 1993.
6. Kai A Krueger and Peter Dayan. Flexible shaping: How learning in small steps helps.Cognition, 110(3):380–394, 2009.
7. Yoshua Bengio, Jérˆ ome Louradour, Ronan Collobert, and Jason Weston. Curricu- lum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48. ACM, 2009.
8. M Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pages 1189–1197, 2010.
9. David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et al. Learning represen- tations by back-propagating errors. Cognitive modeling, 5(3):1, 1988.
10. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton.Deep learning.Nature, 521(7553):436–444, 2015.
11. Chi-Tat Law and Joshua I Gold. Reinforcement learning can account for associative and perceptual learning on a visual-decision task. Nat Neurosci, 12(5):655–63, May 2009.
12. WI Schöllhorn, G Mayer-Kress, KM Newell, and M Michelbrink.Time scales of adaptive behavior and motor learning in the presence of stochastic perturbations.Human movement science, 28(3):319–333, 2009.
13. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
14. Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386, 1958.
15. William T Newsome and Edmond B Pare. A selective impairment of motion perception following lesions of the middle temporal visual area (mt). Journal of Neuroscience, 8(6):2201–2211, 1988.
16. Kenneth H Britten, Michael N Shadlen, William T Newsome, and J Anthony Movshon.The analysis of visual motion: a comparison of neuronal and psychophysical perfor- mance. Journal of Neuroscience, 12(12):4745–4765, 1992.
17. Mihaly Csikszentmihalyi. Beyond boredom and anxiety. Jossey-Bass, 2000.
18. Matti Vuorre and Janet Metcalfe. The relation between the sense of agency and the experience of flow. Consciousness and cognition, 43:133–142, 2016.
19. Robert Bauer, Meike Fels, Vladislav Royter, Valerio Raco, and Alireza Gharabaghi.Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforce- ment learning of brain self-regulation. Clinical Neurophysiology, 127(9):3156–3164, 2016.
20. De Houwer J1, Barnes-Holmes D, Moors A..What is learning? On the nature and merits of a functional definition of learning.https://www.ncbi.nlm.nih.gov/pubmed/23359420
21. Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34):13, 2001.
22. Wolfgang Maass, Thomas Natschläger, and Henry Markram. Real-time computing without stable states: A new framework for neural computation based on perturba- tions. Neural computation, 14(11):2531–2560, 2002.
23. Amitai Shenhav, Matthew M Botvinick, and Jonathan D Cohen. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2):217–240, 2013.
24. Amitai Shenhav, Sebastian Musslick, Falk Lieder, Wouter Kool, Thomas L Griffiths, Jonathan D Cohen, and Matthew M Botvinick. Toward a rational and mechanistic account of mental effort. Annual Review of Neuroscience, (0), 2017.
25. Joshua W Brown and Todd S Braver. Learned predictions of error likelihood in the anterior cingulate cortex. Science, 307(5712):1118–1121, 2005.
26. Hagerhall, C., Purcell, T., and Taylor, R.P. (2004). Fractal dimension of landscape silhouette as a predictor for landscape preference. Journal of Environmental Psychology 24: 247–55.
27. A Di Ieva.The Fractal Geometry of the Brain.
28. Rolf Reber, Norbert Schwarz, Piotr Winkielman.Processing Fluency and Aesthetic Pleasure:Is Beauty in the Perceiver’s Processing Experience.http://dx.doi.org/10.1207/s15327957pspr0804_3
29. Rigau,Jaume Feixas,Miquel Sbert,Mateu.Conceptualizing Birkhoff’s Aesthetic Measure Using Shannon Entropy and Kolmogorov Complexity. https://doi.org/10.2312/COMPAESTH/COMPAESTH07/105-112
2024-11-27
郭睿君卜宪群：中国古代文书行政与国家治理

文书萌芽最早可追溯到文字产生的初期。《尚书》中称“惟殷先人，有册有典”。《尚书》及甲骨文中“令”“告”“册”等字，已包含着文书的某些特征。商王与王畿和方国之间也存在着简单的文书往来，如甲骨文中有方国向商王报告敌情，商王下达命令令其执行等记录。周王册命官员与分封诸侯等文辞，也具有文书及文书行政的功能与特征，但由于缺乏中央集权制度，三代时期文书行政未能获得更大发展，文书制度建设也很薄弱。战国以降，以中央集权与君主专制为基本特征的官僚制逐渐形成，推动了文书行政的快速发展，相关制度与规定随之产生。如《商君书·禁使》载“夫吏专制决事于千里之外，十二月而计书以定事”，“计书”就是文书。湖北出土的云梦睡虎地秦简《内史杂》载“有事请殹（也），必以书，毋口请，毋

（羁）请”，到秦统一前后，官府形成了“必以书”的文书行政规定，而不允许“口请”“羁请”（口头请示或托人代为请示），反映出文书是行政运作的必要形式。

秦汉大一统国家的建立，有力地推动了国家治理技术手段的发展，文书行政即是具体表现之一。秦汉是中国古代文书行政的重大发展时期，无论在文书类型划分、运转程序及制度规定上，都为中国古代文书行政的发展奠定了重要基础。秦汉初具系统化的文书制度建立，是与君主专制中央集权国家的建立紧密联系在一起的，也是以大一统中央集权郡县制为主体的国家治理的现实需求。大一统中央集权郡县制国家是一种全新的国家形态和治理形式，作为一种规定中央与地方关系的制度，秦汉中央集权主要表现在郡县制的不断发展完善上。秦汉中央政府日常事务的重点之一，就是对郡县乡里地方政务事务的立法、管理、监督与考核。及时有效传达信息，在各层级行政机构间显得极为重要，文书便成了信息传达的基本形式。严格的文书行政保障了政令畅通，加强了中央与地方之间的联系，是国家实现有效治理的重要保障。秦汉时期全国范围内郡县制的确立，以及完备的官僚制与统一的文字，又为秦汉在全国实现更为彻底的文书行政提供了可能。

汉承秦制，“萧何入秦，收拾文书。汉所以能制九州者，文书之力也”（《论衡·别通篇》），萧何入关中舍弃宫室财物而独取文书，反映了汉初统治者已经认识到文书在国家治理上的重要意义。至东汉，人们已经充分认识到“以文书御天下”的道理。

秦汉对行政文书已经作出合理的分类，大体有皇帝御用文书，官僚疏奏、上书，官府考绩管理公文，官府行移公文四种类型。皇帝文书大致分为制、诏、策、戒书，蔡邕《独断》说：“天子之言曰制诏，其命令，一曰策书，二曰制书，三曰诏书，四曰戒书。”郡县官僚奏疏、上书属于上行文书，《独断》中对汉代官僚奏疏的形式有详细介绍，大体有章、奏、表、驳议几种。秦汉各级机构为自身管理需要和定期向上汇报而制作了大量的考绩管理文书，如吏卒名籍、日迹簿、功劳簿、钱谷出入簿、文书收发记录等。地方与中央、地方与地方、个人与政府之间政务往来又产生了檄书、牒书、府书、记、爰书、变事书、奔命书、报书、举书、劾状等各类行移文书。秦汉统一了文书书写格式，对文书书写内容有明确的规定，对文书传递方式、对象与时效制定了严格要求，文书写作开始步入规范化、格式化阶段。如《秦律十八种·行书》规定：“行命书及书署急者，辄行之；不急者，日觱（毕），勿敢留。留者以律论之。”秦汉行政文书的收发文记录中，已包含了文书的起发与送达时间，以及递送者的姓名、职务等基本要素，形成严密规范的文书收发记录。

魏晋唐宋时期的文书调适与统一

魏晋南北朝时期，政局长期动荡，但各政权采取的主要还是中央集权郡县制行政方式，出于治理需要，文书行政不仅依然延续，且有所发展。这一时期已普遍称文书为“公文”，文书的名称和体例较秦汉时期规范得更为具体细致。文书书写材料也在这一时期实现了从简帛向纸张的重大变化，使得文书形式、文书制度都发生了重大调适。

纸本文书约出现在两汉时期，但简帛仍是当时文书书写的主要材料。魏晋以后，用纸书写的文书逐渐增多。至东晋末期，以行政指令的方式要求公文全部用纸书写。由于纸张的应用，公文印章方法由竹简上的封泥变为纸张上的朱色水印。又因在纸上便于押署，文书押署制度也在这一时期被定立下来，且押字变成了“骑缝”或“押缝”的形式。

魏晋南北朝时期开始出现一文一事制，即在请示类文书中，一件文书只陈述一件事项，不同的事项不得混杂于一件文书中。一文一事制通行于唐代，及至宋代已是成文制度。南宋《庆元条法事类·文书门》云：“其奏陈公事，皆直述事状，若名件不同，应分送所属，而非一宗事者，不得同为一状。”一文一事制能够突出文书的内容主旨，加快行文及览阅速度，提高文书处理的效率，也便于文书处理完毕后续的管理、查调及保管。

唐宋时期，中国进入一个大繁荣大变革时期，社会经济、文化思想获得高度发展，基本社会关系、社会阶层身份、土地占有制等方面发生了一系列变革。在此背景下，作为国家治理手段的文书行政也得到了空前的发展与强化，并显示出日趋统一的特征。

唐宋时期文书行政的日趋统一表现为规范化与制度化的推进，文书行政逐渐成为具有制度约束力的治理工具，有效提高了国家的治理能力。在中央，中书省负责起草文书，门下省负责审核文书，尚书省负责执行文书，发展出相对成熟的起草章程、审核机制、运行流程。在地方，各级机构都设立了不同称谓与职能的专职文书吏员。从中央至各级机构制定了更为统一的文书名称、体式及用途，对文书的一文一事、拟制和誊写、引黄和贴黄制度，文书的签押、用印、判署制度，文书的收发、登记、催办制度，文书的折叠、封装、编号、用纸制度等进一步规范。

唐宋时期文书行政制度化推进还表现为出现了专门围绕文书制度所著的文献及制度规定，如唐翰林学士杨钜《翰林学士院旧规》，是专门记载翰林院杂事及文书格式的专书，又如《唐六典》中有专门对文书的各项制度规定：“今册书用简，制书、慰劳制书、发日敕用黄麻纸，敕旨、论事敕及敕牒用黄藤纸，其赦书颁下诸州用绢。”再如改动、漏发、错发、造假、盗用文书等行为，皆会依据《唐六典》被处以惩戒。北宋司马光《书仪》、南宋《庆元条法事类·文书门》中专门记载了宋代上行文书的格式。特别是后者对文书分类、格式、运转、保管、违规处罚等都有详尽规定。文书行政的制度化推进表明，其已不仅是行政工具，而且成为具有权威性的政治符号，并带有一定的制度约束力。文书行政中专门机构与系统制度的完备，也使得文书与档案间的分工逐渐明确，档案相关事务开始从行政文书中分离出来，这是唐宋文书行政发展的重要标志之一。

元明清时期文书行政的完备与成熟

元朝疆域辽阔，为实现如此广袤疆域的稳固统治与有效治理，统治者建立起庞大的行政系统，同时沿用历代王朝所使用的文书行政体系。在军事战争频繁、疆域辽阔的背景之下，如何保障中央与地方之间沟通畅达与迅捷，及时掌握各地情况，强化对广大地区的管控，是对元代治理手段与治理能力的巨大考验，中国古代发达的驿传制度在这一时期应时而盛。

驿传制度是中国古代专门接待往来官员和负责政府文书传递事务的组织制度以及为此而征发的徭役制度，始自先秦，历经秦汉隋唐宋元明清，直至清末新式邮政建立才逐渐废止。驿传制度的关键在于驿站，驿站在秦汉时期称为“邮”“传”“亭”或“置”，五里一邮，十里一亭，三十里一置，文书传递方式根据工具分为“传车”“驿马”“步传”。隋唐以后也称为“驿”，并分为陆驿和水驿。自元代开始将“驿”“站”两字连在一起。元代的驿站有水站、陆站及水陆相兼站，除此之外还有前代王朝所未有的蒙古站、汉人站、海青站和海站。在驿站之外，元朝还设有急递铺作为补充。元代的驿传制度对明清具有重要影响，如明代的文书传递机构由驿站和急递铺组成，为确保文书投递的安全迅捷，还施行了驿传勘合制度。清代地方设驿、站、塘、台，分别由州、县官或专设的驿丞管理。除此之外每十五里也另设一急递铺，清末又增设了文报局和电报局。

明清时期，一系列国家政治制度的变革，使得文书行政权完全集中于皇帝手中，皇帝需处理大量文书，因此在文书制度上也为适应这一需求进行了调整。洪武朝除颁布条例规范文书行政之外，朱元璋屡次下诏禁繁文、减案牍，《明实录》中称“虚词失实，浮文乱真，朕甚厌之。自今有以繁文出入人罪者，罪之”。然而文书繁琐之弊，屡禁屡起，至崇祯元年（1628年）皇帝命内阁制作贴黄式样，下令官员用百字左右将上呈的疏奏进行摘要，贴附于文末。“贴黄”一词源于唐代，敕书如需局部改动，就贴上黄纸进行改写。宋代“贴黄”系贴在正文之后的重要补充说明，“引黄”则是将文中要点、呈递日期写在黄纸上，贴于章奏文书的封皮或文首。明代贴黄是对唐、宋贴引黄制度的继承与发展。清朝也沿用贴黄制度，并制定了更严格的规范。贴黄可以使皇帝在最短时间内了解文书内容，缩短文书处理时间，提高了文书处理效率。

清代君主专制达到顶峰，决策权更加集中。因此，文书政令的策源地及决策中心、文书的决策程序等发生了重大变化。表现之一是内阁的实权（文书决策权）逐渐转移到军机处，军机处逐渐成为全国文书政令的策源地及决策中心。另一表现为奏折的使用，康熙年间的密奏文书称为奏折，直接进呈于皇帝，并由皇帝亲笔批答，而不经内阁票拟和批红。雍正继位后，奏折应用更加广泛。“乾纲独握，自增用奏折以后，皆高居紫极，亲御丹毫，在廷之臣，一词莫赞，即朱批谕旨是也”（《四库全书总目》卷五十五史部十一）。

清代前中期的文书制度在多承明制的基础上稳健完善，未有剧烈变化。及至清晚期，时局骤变，内忧外患，清王朝开始进行一系列的政治改革，文书行政也随之发生重大变化。这种变化集中表现为新的文书机构的设置，如设立了机要科、案牍科、秘书科等新机构；在文书传达方式上，光绪二十四年(1898年)规定“嗣后明降谕旨，均著由电报传知，各省督抚即行遵照办理”（《东华续录·光绪一百四十七》）；外交文书中行文关系、文书名称、文书形式、外交称谓等也产生一系列变化。

综上所述，在君主专制和中央集权体制下，文书充当了实施国家治理的基本载体，君主通过文书决策权实现专制集权，也通过御用文书，颁布和实施对于国家治理的思想和策略。统治阶层对于国家治理理念的变迁、治理制度的制定、国家治理目标的调整，直接体现在文书行政的变革上，甚至可以通过调整文书行政改变并重构国家权力格局与政治秩序。中央政府运用下行文书施行政令，并通过对上行文书的批复实现治理方针与政策的传达与执行，同时通过各类考绩文书，实现对地方官吏的了解和监督，掌握基层社会的动态。文书制度的变化与调整，从来都不是孤立进行的，而是受到内外部环境、政治制度、治理手段、统治集团能力高低的共同影响。文书制度的优劣与文书行政的效果，又影响着国家治理水平、折射出国家治理能力。当然，也要看到文书行政制度层面的设计与安排，并不意味着实际执行层面的有效落实与解决。例如明代的文书制度堪称完备，特别是钱粮、刑名方面的文书制度非常严密，所创立的勘合制度更显示出高超的行政手段与能力。但《陈六事疏》等材料反映其实际执行仍存在诸多困境，这些困境往往也是古代王朝文书行政的通病。

本文转自《光明日报》（ 2024年11月18日 14版）

2024-11-19
Richard Dawkins 《The Genetic Book of the Dead_ A Darwinian Reverie》
Contents
1 Reading the Animal
2 ‘Paintings’ and ‘Statues’
3 In the Depths of the Palimpsest
4 Reverse Engineering
5 Common Problem, Common Solution
6 Variations on a Theme
7 In Living Memory
8 The Immortal Gene
9 Out Beyond the Body Wall
10 The Backward Gene’s-Eye View
11 More Glances in the Rear-View Mirror
12 Good Companions, Bad Companions
13 Shared Exit to the Future

1 Reading the Animal

You are a book, an unfinished work of literature, an archive of descriptive history. Your body and your genome can be read as a comprehensive dossier on a succession of colorful worlds long vanished, worlds that surrounded your ancestors long gone: a genetic book of the dead. This truth applies to every animal, plant, fungus, bacterium, and archaean but, in order to avoid tiresome repetition, I shall sometimes treat all living creatures as honorary animals. In the same spirit, I treasure a remark by John Maynard Smith when we were together being shown around the Panama jungle by one of the Smithsonian scientists working there: ‘What a pleasure to listen to a man who really loves his animals.’ The ‘animals’ in question were palm trees.

From the animal’s point of view, the genetic book of the dead can also be seen as a predictor of the future, following the reasonable assumption that the future will not be too different from the past. A third way to say it is that the animal, including its genome, embodies a model of past environments, a model that it uses to, in effect, predict the future and so succeed in the game of Darwinism, which is the game of survival and reproduction, or, more precisely, gene survival. The animal’s genome makes a bet that the future will not be too different from the pasts that its ancestors successfully negotiated.

I said that an animal can be read as a book about past worlds, the worlds of its ancestors. Why didn’t I use the present tense: read the animal as a description of the environment in which it itself lives? It can indeed be read in that way. But (with reservations to be discussed) every aspect of an animal’s survival machinery was bequeathed via its genes by ancestral natural selection. So, when we read the animal, we are actually reading past environments. That is why my title includes ‘the dead’. We are talking about reconstructing ancient worlds in which successive ancestors, now long dead, survived to pass on the genes that shape the way we modern animals are. At present it is a difficult undertaking, but a scientist of the future, presented with a hitherto unknown animal, will be able to read its body, and its genes, as a detailed description of the environments in which its ancestors lived.

I shall have frequent recourse to my imagined Scientist Of the Future, confronted with the body of a hitherto unknown animal and tasked with reading it. For brevity, since I’ll need to mention her often, I shall use her initials, SOF. This distantly resonates with the Greek sophos, meaning ‘wise’ or ‘clever’, as in ‘philosophy’, ‘sophisticated’, etc. In order to avoid ungainly pronoun constructions, and as a courtesy, I arbitrarily assume SOF to be female. If I happened to be a female author, I’d reciprocate.

This genetic book of the dead, this ‘readout’ from the animal and its genes, this richly coded description of ancestral environments, must necessarily be a palimpsest. Ancient documents will be partially over-written by superimposed scripts laid down in later times. A palimpsest is defined by the Oxford English Dictionary as ‘a manuscript in which later writing has been superimposed on earlier (effaced) writing’. A dear colleague, the late Bill Hamilton, had the engaging habit of writing postcards as palimpsests, using different-colored inks to reduce confusion. His sister Dr Mary Bliss kindly lent me this example.

Besides his card being a nicely colorful palimpsest, it is fitting to use it because Professor Hamilton is widely regarded as the most distinguished Darwinian of his generation. Robert Trivers, mourning his death, said, ‘He had the most subtle, multi-layered mind I have ever encountered. What he said often had double and even triple meanings so that, while the rest of us speak and think in single notes, he thought in chords.’ Or should that be palimpsests? Anyway, I like to think he would have enjoyed the idea of evolutionary palimpsests. And, indeed, of the genetic book of the dead itself.

Both Bill’s postcards and my evolution palimpsests depart from the strict dictionary definition: earlier writings are not irretrievably effaced. In the genetic book of the dead, they are partially overwritten, still there to be read, albeit we must peer ‘through a glass darkly’, or through a thicket of later writings. The environments described by the genetic book of the dead run the gamut from ancient Precambrian seas, via all intermediates through the mega-years to very recent. Presumably some kind of weighting balances modern scripts versus ancient ones. I don’t think it follows a simple formula like the Koranic rule for handling internal contradictions – new always trumps old. I’ll return to this in Chapter 3.

If you want to succeed in the world you have to predict, or behave as if predicting, what will happen next. All sensible prediction must be based on the past, and much sensible prediction is statistical rather than absolute. Sometimes the prediction is cognitive – ‘I foresee that if I fall over that cliff (seize that snake by its rattling tail, eat those tempting belladonna berries), it is likely that I will suffer or die in consequence.’ We humans are accustomed to predictions of that cognitive kind, but they are not the predictions I have in mind. I shall be more concerned with unconscious, statistical ‘as-if’ predictions of what might affect an animal’s future chances of surviving and passing on copies of its genes.

This horned lizard of the Mojave, whose skin is tinted and patterned to resemble sand and small stones, embodies a prediction, by its genes, that it would find itself born (well, hatched) into a desert. Equivalently, a zoologist presented with the lizard could read its skin as a vivid description of the sand and stones of the desert environment in which its ancestors lived. And now here’s my central message. Much more than skin deep, the whole body through and through, its very warp and woof, every organ, every cell and biochemical process, every smidgen of any animal, including its genome, can be read as describing ancestral worlds. In the lizard’s case it will no doubt spin the same desert yarn as the skin. ‘Desert’ will be written into every reach of the animal, plus a whole lot more information about its ancestral past, information far exceeding what is available to present-day science.

The lizard burst out of the egg endowed with a genetic prediction that it would find itself in a sun-parched world of sand and pebbles. If it were to violate its genetic prediction, say by straying from the desert onto a golf green, a passing raptor would soon pick it off. Or if the world itself changed, such that its genetic predictions turned out to be wrong, it would also likely be doomed. All useful prediction relies on the future being approximately the same as the past, at least in a statistical sense. A world of continual mad caprice, an environmental bedlam that changed randomly and undependably, would render prediction impossible and put survival in jeopardy. Fortunately, the world is conservative, and genes can safely bet on any given place carrying on pretty much as before. On those occasions when it doesn’t – say after a catastrophic flood or volcanic eruption or, as in the case of the dinosaurs’ tragic end when an asteroid-strike ravaged the world – all predictions are wrong, all bets are off, and whole groups of animals go extinct. More usually, we aren’t dealing with such major catastrophes: not huge swathes of the animal kingdom being wiped out at a stroke, but only those variant individuals whose predictions are slightly wrong, or slightly more wrong than those of competitors within their own species. That is natural selection.

The top scripts of the palimpsest are so recent that they are of a special kind, written during the animal’s own lifetime. The genes’ description of ancestral worlds is overlain by modifications and detailed refinements scripted since the animal was born – modifications written or rewritten by the animal’s learning from experience; or by the remarkable memory of past diseases laid down by the immune system; or by physiological acclimatisation, to altitude, say; or even by simulations in imagination of possible future outcomes. These recent palimpsest scripts are not handed down by the genes (though the equipment needed to write them is), but they still amount to information from the past, called into service to predict the future. It’s just that it’s the very recent past, the past enclosed within the animal’s own lifetime. Chapter 7 is about those parts of the palimpsest that were scribbled in since the animal was born.

There is also an even more recent sense in which an animal’s brain sets up a dynamic model of the immediately fluctuating environment, predicting moment to moment changes in real time. Writing this on the Cornish coast, I take envious pleasure in the gulls as they surf the wind battering the cliffs of the Lizard peninsula. The wings, tail, and even head angle of each bird sensitively adjust themselves to the changing gusts and updraughts. Imagine that SOF, our zoologist of the future, implants radio-linked electrodes in a flying gull’s brain. She could obtain a readout of the gull’s muscle-adjustments, which would translate into a running commentary, in real time, on the whirling eddies of the wind: a predictive model in the brain that sensitively fine-tunes the bird’s flight surfaces so as to carry it into the next split second.

I said that an animal is not only a description of the past, not just a prediction of the future, but also a model. What is a model? A contour map is a model of a country, a model from which you can reconstruct the landscape and navigate its byways. So too is a list of zeros and ones in a computer, being a digitised rendering of the map, perhaps including information tied to it: local population size, crops grown, dominant religions, and so on. As an engineer might understand the word, any two systems are ‘models’ of each other if their behavior shares the same underlying mathematics. You can wire up an electronic model of a pendulum. The periodicity of both pendulum and electronic oscillator are governed by the same equation. It’s just that the symbols in the equation don’t stand for the same things. A mathematician could treat either of them, together with the relevant equation written on paper, as a ‘model’ of any of the others. Weather forecasters construct a dynamic computer model of the world’s weather, continually updated by information from strategically placed thermometers, barometers, anemometers, and nowadays above all, satellites. The model is run on into the future to construct a forecast for any chosen region of the world.

Sense organs do not faithfully project a movie of the outer world into a little cinema in the brain. The brain constructs a virtual reality (VR) model of the real world outside, a model that is continuously updated via the sense organs. Just as weather forecasters run their computer model of the world’s weather into the future, so every animal does the same thing from second to second with its own world model, in order to guide its next action. Each species sets up its own world model, which takes a form useful for the species’ way of life, useful for making vital predictions of how to survive. The model must be very different from species to species. The model in the head of a swallow or a bat must approximate a three-dimensional, aerial world of fast-moving targets. It may not matter that the model is updated by nerve impulses from the eyes in the one case, from the ears in the other. Nerve impulses are nerve impulses are nerve impulses, whatever their origin. A squirrel’s brain must run a VR model similar to that of a squirrel monkey. Both have to navigate a three-dimensional maze of tree trunks and branches. A cow’s model is simpler and closer to two dimensions. A frog doesn’t model a scene as we would understand the word. The frog’s eye largely confines itself to reporting small moving objects to the brain. Such a report typically initiates a stereotyped sequence of events: turning towards the object, hopping to get nearer, and finally shooting the tongue towards the target. The eye’s wiring-up embodies a prediction that, were the frog to shoot out its tongue in the indicated direction, it would be likely to hit food.

My Cornish grandfather was employed by the Marconi company in its pioneering days to teach the principles of radio to young engineers entering the company. Among his teaching aids was a clothesline that he waggled as a model of sound waves – or radio waves, for the same model applied to both, and that’s the point. Any complicated pattern of waves – sound waves, radio waves, or even sea waves at a pinch – can be broken down into component sine waves – ‘Fourier analysis’, named after the French mathematician Joseph Fourier (1768–1830). These in turn can be summed again to reconstitute the original complex wave (Fourier synthesis). To demonstrate this, Grandfather attached his clothesline to rotating wheels. When only one wheel turned, the rope executed serpentine undulations approximating a sine wave. When a coupled wheel rotated at the same time, the rope’s snaking waves became more complex. The sum of the sine waves was an elementary but vivid demonstration of the Fourier principle. Grandfather’s snaking rope was a model of a radio wave travelling from transmitter to receiver. Or of a sound wave entering the ear: a compound wave upon which the brain presumably performs something equivalent to Fourier analysis when it unravels, for example, a pattern even as complex as whispered speech plus intrusive coughing against the background of an orchestral concert. Amazingly, the human ear, well, actually, the human brain, can pick out here an oboe, there a French horn, from the compound waveform of the whole orchestra.

Today’s equivalent of my grandfather would use a computer screen instead of a clothesline, displaying first a simple sine wave, then another sine wave of different frequency, then adding the two together to generate a more complex wiggly line, and so on. The following is a picture of the sound waveform – high-frequency air pressure changes – when I uttered a single English word. If you knew how to analyse it, the numerical data embodied in (a much-expanded image of) the picture would yield a readout of what I said. In fact, it would require a great deal of mathematical wizardry and computer power for you to decipher it. But let the same wiggly line be the groove in which an old-fashioned gramophone needle sits. The resulting waves of changing air pressure would bombard your eardrums and be transduced to pulse patterns in nerve cells connected to your brain. Your brain would then without difficulty, in real time, perform the necessary mathematical wizardry to recognise the spoken word ‘sisters’.

Our sound-processing brain software effortlessly recognises the spoken word, but our sight-processing software has extreme difficulty deciphering it when confronted with a wavy line on paper, on a computer screen, or with the numbers that composed that wavy line. Nevertheless, all the information is contained in the numbers, no matter how they are represented. To decipher it, we’d need to do the mathematics explicitly with the aid of a high-speed computer, and it would be a difficult calculation. Yet our brains find it a doddle if presented with the same data in the form of sound waves. This is a parable to drive home the point – pivotal to my purpose, which is why I said it twice – that some parts of an animal are hugely harder to ‘read’ than others. The patterning on our Mojave lizard’s back was easy: equivalent to hearing ‘sisters’. Obviously, this animal’s ancestors survived in a stony desert. But let us not shrink from the difficult readings – the cellular chemistry of the liver, say. That might be difficult in the same way as seeing the waveform of ‘sisters’ on an oscilloscope screen is difficult. But nothing negates the main point, which is that the information, however hard to decipher, is lurking within. The genetic book of the dead may turn out to be as inscrutable as Linear A or the Indus Valley script. But the information, I believe, is all there.

The pattern to the right is a QR code. It contains a concealed message that your human eye cannot read. But your smartphone can instantly decipher it and reveal a line from my favourite poet. The genetic book of the dead is a palimpsest of messages about ancestral worlds, concealed in an animal’s body and genome. Like QR codes, they mostly cannot be read by the naked eye, but zoologists of the future, armed with advanced computers and other tools of their day, will read them.

To repeat the central point, when we examine an animal there are some cases – the Mojave horned lizard is one – where we can instantly read the embodied description of its ancestral environment, just as our auditory system can instantly decipher the spoken word ‘sisters’. Chapter 2 examines animals who have their ancestral environments almost literally painted on their backs. But mostly we must resort to more indirect and difficult methods in order to extract our readout. Later chapters feel their way towards possible ways of doing this. But in most cases the techniques are not yet properly developed, especially those that involve reading genomes. Part of my purpose is to inspire mathematicians, computer scientists, molecular geneticists, and others better qualified than I am, to develop such methods.

At the outset I need to dispel five possible misunderstandings of the main title, Genetic Book of the Dead. First is the disappointing revelation that I am deferring the task of deciphering much of the book of the dead to the sciences of the future. Nothing much I can do about that. Second, there is little connection, other than a poetic resonance, with the Egyptian Books of the Dead. These were instruction manuals buried with the dead, to help them navigate their way to immortality. An animal’s genome is an instruction manual telling the animal how to navigate through the world, in such a way as to pass the manual (not the body) on into the indefinite future, if not actual immortality.

Third, my title might be misunderstood to be about the fascinating subject of Ancient DNA. The DNA of the long dead – well, not very long, unfortunately – is in some cases available to us, often in disjointed fragments. The Swedish geneticist Svante Pääbo won a Nobel prize for jigsawing the genome of Neanderthal and Denisovan humans, otherwise known only from fossils; in the Denisovan case only three teeth and five bone fragments. Pääbo’s work incidentally shows that Europeans, but not sub-Saharan Africans, are descended from rare cases of interbreeding with Neanderthals. Also, some modern humans, especially Melanesians, can be traced back to interbreeding events with Denisovans. The field of ‘Ancient DNA’ research is now flourishing. The woolly mammoth genome is almost completely known, and there are serious hopes of reviving the species. Other possible ‘resurrections’ might include the dodo, passenger pigeon, great auk, and thylacine (Tasmanian wolf). Unfortunately, sufficient DNA doesn’t last more than a few thousand years at best. In any case, interesting though it is, Ancient DNA is outside the scope of this book.

Fourth, I shall not be dealing with comparisons of DNA sequences in different populations of modern humans and the light that they throw on history, including the waves of human migration that have swept over Earth’s land surface. Tantalisingly, these genetic studies overlap with comparisons between languages. For example, the distribution of both genes and words across the Micronesian islands of the Western Pacific islands shows a mathematically lawful relationship between inter-island distance and word-resemblance. We can picture outrigger canoes scudding across the open Pacific, laden with both genes and words! But that would be a chapter in another book. Might it be called The Selfish Meme?

The present book’s title should not be taken to mean that existing science is ready to translate DNA sequences into descriptions of ancient environments. Nobody can do that, and it’s not clear that SOF will ever do so. This book is about reading the animal itself, its body and behaviour – the ‘phenotype’. It remains true that the descriptive messages from the past are transmitted by DNA. But for the moment we read them indirectly via phenotypes. The easiest, if not the only, way to translate a human genome into a working body is to feed it into a very special interpreting device called a woman.

The Species as Sculpture; the Species as Averaging Computer

Sir D’Arcy Thompson (1860–1948), that immensely learned zoologist, classicist, and mathematician, made a remark that seems trite, even tautological, but it actually provokes thought. ‘Everything is the way it is because it got that way.’ The solar system is the way it is because the laws of physics turned a cloud of gas and dust into a spinning disc, which then condensed to form the sun, plus orbiting bodies rotating in the same plane as each other and in the same direction, marking the plane of the original disc. The moon is the way it is because a titanic bombardment of Earth 4.5 billion years ago hived off into orbit a great quantity of matter, which then was pulled and kneaded by gravity into a sphere. The moon’s initial rotation later slowed, in a phenomenon called ‘tidal locking’, such that we only ever see one face of it. More minor bombardments disfigured the moon’s surface with craters. Earth would be pockmarked in the same way but for erosive and tectonic obliteration. A sculpture is the way it is because a block of Carrara marble received the loving attention of Michelangelo.

Why are our bodies the way they are? Partly, like the moon, we bear the scars of foreign insults – bullet wounds, souvenirs of the duellist’s sabre or the surgeon’s knife, even actual craters from smallpox or chickenpox. But these are superficial details. A body mostly got that way through the processes of embryology and growth. These were, in turn, directed by the DNA in its cells. And how did the DNA get to be the way it is? Here we come to the point. The genome of every individual is a sample of the gene pool of the species. The gene pool got to be the way it is over many generations, partly through random drift, but more pertinently through a process of non-random sculpture. The sculptor is natural selection, carving and whittling the gene pool until it – and the bodies that are its outward and visible manifestation – is the way it is.

Why do I say it’s the species gene pool that is sculpted rather than the individual’s genome? Because, unlike Michelangelo’s marble, the genome of an individual doesn’t change. The individual genome is not the entity that the sculptor carves. Once fertilisation has taken place, the genome remains fixed, from zygote right through embryonic development, to childhood, adulthood, old age. It is the gene pool of the species, not the genome of the individual, that changes under the Darwinian chisel. The change deserves to be called sculpting to the extent that the typical animal form that results is an improvement. Improvement doesn’t have to mean more beautiful like a Rodin or a Praxiteles (though it often is). It means only getting better at surviving and reproducing. Some individuals survive to reproduce. Others die young. Some individuals have lots of mates. Others have none. Some have no children. Others a swarming, healthy brood. Sexual recombination sees to it that the gene pool is stirred and shaken. Mutation sees to it that new genetic variants are fed into the mingling pool. Natural selection and sexual selection see to it that, as generation succeeds generation, the shape of the average genome of the species changes in constructive directions.

Unless we are population geneticists, we don’t see the shifting of the sculpted gene pool directly. Instead, we observe changes in the average bodily form and behaviour of members of the species. Every individual is built by the cooperative enterprise of a sample of genes taken from the current pool. The gene pool of a species is the ever-changing marble upon which the chisels, the fine, sharp, exquisitely delicate, deeply probing chisels of natural selection, go to work.

A geologist looks at a mountain or valley and ‘reads’ it, reconstructs its history from the remote past through to recent times. The natural sculpting of the mountain or valley might begin with a volcano, or tectonic subduction and upthrust. The chisels of wind and rain, rivers and glaciers then take over. When a biologist looks at fossil history, she sees not genes but things that eyes are equipped to see: progressive changes in average phenotype. But the entity being carved by natural selection is the species gene pool.

The existence of sexual reproduction confers on The Species a very special status not shared by other units in the taxonomic hierarchy – genus, family, order, class, etc. Why? Because sexual recombining of genes – shuffling the pack (American deck) – takes place only within the species. That is the very definition of ‘species’. And it leads me to the second metaphor in the title of this section: the species as averaging computer.

The genetic book of the dead is a written description of the world of no particular ancestral individual more than another. It is a description of the environments that sculpted the whole gene pool. Any individual whom we examine today is a sample from the shuffled pack, the shaken and stirred gene pool. And the gene pool in every generation was the result of a statistical process averaged over all those individual successes and failures within the species. The species is an averaging computer. The gene pool is the database upon which it works.

2 ‘Paintings’ and ‘Statues’

When, like that Mojave Desert lizard, an animal has its ancestral home painted on its back, our eyes give us an instant and effortless readout of the worlds of its forebears, and the hazards that they survived. Here’s another highly camouflaged lizard. Can you see it on its background of tree bark? You can, because the photograph was taken in a strong light from close range. You are like a predator who has had the good fortune to stumble upon a victim under ideal seeing conditions. It is such close encounters that exerted the selection pressure to put the finishing touches to the camouflage’s perfection. But how did the evolution of camouflage get its start? Wandering predators, idly scanning out of the corner of their eye, or hunting when the light was poor, supplied the selection pressures that began the process of evolution towards tree bark mimicry, back when the incipient resemblance was only slight. The intermediate stages of camouflage perfection would have relied upon intermediate seeing conditions. There’s a continuous gradient of available conditions, from ‘seen at a distance, in a poor light, out of the corner of the eye, or when not paying attention’ all the way up to ‘close-up, good light, full-frontal’. The lizard of today has a detailed, highly accurate ‘painting’ of tree bark on its back, painted by genes that survived in the gene pool because they produced increasingly accurate pictures.

We have only to glance at this frog to ‘read’ the environment of its ancestors as being rich in grey lichen. Or, in another of Chapter 1’s formulations, the frog’s genes ‘bet’ on lichen. I intend ‘bet’ and ‘read’ in a sense that is close to literal. It requires no sophisticated techniques or apparatus. The zoologist’s eyes are sufficient. And the Darwinian reason for this is that the painting is designed to deceive predatory eyes that work in the same kind of way as the zoologist’s own eyes. Ancestral frogs survived because they successfully deceived predatory eyes similar to the eyes of the zoologist – or of you, vertebrate reader.

In some cases, it is not prey but predators whose outer surface is painted with the colours and patterning of their ancestral world, the better to creep up on prey unseen. A tiger’s genes bet on the tiger being born into a world of light and shade striped by vertical stems. The zoologist examining the body of a snow leopard could bet that its ancestors lived in a mottled world of stones and rocks, perhaps a mountainous region. And its genes place a future bet on the same environment as cover for its offspring.

By the way, the big cat’s mammalian prey might find its camouflage more baffling than we do. We apes and Old World monkeys have trichromatic vision, with three colour-sensitive cell types in our retinas, like modern digital cameras. Most mammals are dichromats: they are what we would call red-green colour-blind. This probably means they’d find a tiger or snow leopard even harder to distinguish from its background than we would. Natural selection has ‘designed’ the stripes of tigers, and the blotches of snow leopards, in such a way as to fool the dichromat eyes of their typical prey. They are pretty good at fooling our trichromat eyes too.

Also in passing, I note how surprising it is that otherwise beautifully camouflaged animals are let down by a dead giveaway – symmetry. The feathers of this owl beautifully imitate tree bark. But the symmetry gives the game away. The camouflage is broken.

I am reduced to suspecting that there must be some deep embryological constraint, making it hard to break away from left-right symmetry. Or does symmetry confer some inscrutable advantage in social encounters? To intimidate rivals, perhaps? Owls can rotate their necks through a far greater angle than we can. Perhaps that mitigates the problem of a symmetrical face. This particular photograph tempts the speculation that natural selection might have favoured the habit of closing one eye because it reduces symmetry. But I suppose that’s too much to hope for.

Subtly different from ‘paintings’ are ‘statues’. Here the animal’s whole body resembles a discrete object that it is not. A tawny frogmouth or a potoo resembling a broken stump of a tree branch, a stick caterpillar sculpted as a twig, a grasshopper resembling a stone or a clod of dry soil, a caterpillar mimicking a bird dropping, are all examples of animal ‘statues’.

The working difference between a ‘painting’ and a ‘statue’ is that a painting, but not a statue, ceases to deceive the moment the animal is removed from its natural background. A ‘painted’ peppered moth removed from the light-coloured bark that it resembles and placed on any other background will instantly be seen and caught by a predator. In this photograph, the background is a soot-blackened tree in an industrial area, which is perfect for the dark, melanic mutant of the same species of moth that you may have noticed less immediately by its side. On the other hand, the masquerading Geometrid stick caterpillar photographed by Anil Kumar Verma in India, if placed on any background, would have a good chance of still being mistaken for a stick and overlooked by a predator. That is the mark of a good animal statue.

Although a statue resembles objects in the natural background, it does not depend for its effectiveness on being seen against that background in the way that a ‘painting’ does. On the contrary, it might be in greater danger. A lone stick insect on a lawn might be overlooked, as a stick that had fallen there. A stick insect surrounded by real sticks might be spotted as the odd one out. When drifting alone, the leafy sea dragon’s resemblance to a wrack might protect it, at least more so than its seahorse cousin whose shape in no way mimics a seaweed. But would this statue be less safe when nestling in a waving bed of real seaweed? It’s a moot question.

Freshwater mussels of the species Lampsilis cardium have larvae that grow by feeding on blood, which they suck from the gills of a fish. The mussel has to find a way to put its larvae into the fish. It does it by means of a ‘statue’, which fools the fish. The mussel has a brood pouch for very young larvae on the edge of its mantle. The brood pouch is an impressive replica of a pair of small fish, complete with false eyes and false, very fish-like, ‘swimming’ movements. Statues don’t move, so the word ‘statue’ is strictly inappropriate, but never mind, you get the point. Larger fish approach and attempt to catch the dummy fish. What they actually catch – and it does them no good – is a squirt of mussel larvae.

This highly camouflaged snake from Iran has a dummy spider at the tip of its tail. It may look only half convincing in a still picture. But the snake moves its tail in such a way that it looks strikingly like a spider scuttling about. Very realistic indeed, especially when the snake itself is concealed in a burrow with only the tail tip visible. Birds swoop down on the spider. And that is the last thing they do. It is worth reflecting on how remarkable it is that such a trick has evolved by natural selection. What might the intermediate stages have looked like? How did the evolutionary sequence get started? I suppose that, before the tip of the tail looked anything like a spider, simply waggling it about was somewhat attractive to birds, who are drawn to any small moving object.

Both ‘paintings’ and ‘statues’ are easy-to-read descriptions of ancestral worlds, the environments in which ancestors survived. The stick caterpillar is a detailed description of ancient twigs. The potoo is a perfect model of long-forgotten stumps. Except that they are not really forgotten. The potoo itself is the memory. Twigs of past ages have carved their own likeness into the masquerading body of that caterpillar. The sands of time have painted their collective self-portrait on the surface of this spider, which you may have trouble spotting.

‘Where are the snows of yesteryear?’ Natural selection has frozen them in the winter plumage of the willow ptarmigan.

The leaf-tailed gecko recalls to our minds, though not his, the dead leaves among which his ancestors lived. He embodies the Darwinian ‘memory’ of generations of leaves that fell long before men arrived in Madagascar to see them, probably long before men existed anywhere.

The green katydid (long-horned grasshopper) has no idea that it embodies a genetic memory of green mosses and fronds over which its ancestors walked. But we can read at a glance that this is so. Same with this adorable little Vietnamese mossy frog.

Statues don’t always copy inanimate objects like sticks or pebbles, dead leaves, or tree branch stubs. Some mimics pretend to be poisonous or distasteful models, and inconspicuous is precisely what they are not. At first glance you might think this was a wasp and hesitate to pick it up. It’s actually a harmless hoverfly. The eyes give it away. Flies have bigger compound eyes than wasps. This feature is probably written in a deep layer of palimpsest that, for some reason, is hard to over-write. The largest anatomical difference between flies and wasps – two wings rather than four (the feature that gives the fly Order its Latin name, Diptera) – is perhaps also difficult to over-write. But maybe, too, that potential clue is hard to notice. What predator is going to take the time to count wings?

Real wasps, the models for the hoverfly mimicry, are not trying to hide. They’re the opposite of camouflaged. Their vividly striped abdomen shouts ‘Beware! Don’t mess with me!’ The hoverfly is shouting the same thing, but it’s a lie. It has no sting and would be good to eat if only the predator dared to attack it. It is a statue, not a painting, because its (fake) warning doesn’t depend on the background. From our point of view in this book, we can read its stripes as telling us that the ecology of its ancestors contained dangerous yellow-and-black stripy things, and predators that feared them. The fly’s stripes are a simulacrum of erstwhile wasp stripes, painted on its abdomen by natural selection. Yellow and black stripes on an insect reliably signify a warning – either true or false – of dire consequences to would-be attackers. The beetle to the right is another, especially vivid example.

If you came face to face with this, peering at you through the undergrowth, would you start back, thinking it was a snake?

It isn’t peering and it isn’t a snake. It’s the chrysalis of a butterfly, Dynastor darius, and chrysalises don’t peer. As a fine pretence of the front end of a snake, it’s well calculated to frighten. Never mind that rational second thoughts could calculate that it’s a bit on the small side to be a dangerous snake. There exists a distance – still close enough to be worrying – at which a snake would look that small. Besides, a panicking bird has no time for second thoughts. One startled squawk and it’s away. Having more time for reflection, the Darwinian student of the genetic book of the dead will read the caterpillar’s ancestral world as inhabited by dangerous snakes. Some caterpillars, whose rear ends pull the same snake trick, even move muscles in such a way that the fake eyes seem to close and open. Would-be predators can’t be expected to know that snakes don’t do that.

Eyes are scary in themselves. That’s why some moths have eyespots on their wings, which they suddenly expose when surprised by a predator. If you had good reason to fear tigers or other members of the cat family, might you not start back in alarm if suddenly confronted with this, the so-called owl moth of South East Asia?

There exists a distance – a dangerous distance – at which a tiger or a leopard would present a retinal image the same size as a close-up moth. OK, it doesn’t look very like any particular member of the cat family to our eyes. But there’s plenty of evidence that animals of various species respond to dummies that bear only a crude resemblance to the real thing – scarecrows are a familiar example, and there’s lots of experimental evidence as well. Black-headed gulls respond to a model gull head on the end of a stick, as though it were a whole real gull. A shocked withdrawal might be all it takes to save this moth.

I am amused to learn that eyes painted on the rumps of cattle are effective in deterring predation by lions.

We could call it the Babar effect, after Jean de Brunhoff’s lovable and wise King of the Elephants, who won the war against the rhinoceroses by painting scary eyes on elephant rumps.

What on Earth is this? A dragon? A nightmare devil horse? It is in fact the caterpillar of an Australian moth, the pink underwing. The spectacular eye and teeth pattern is not visible when the caterpillar is at rest. It is screened by folds of skin. When threatened, the animal pulls back the skin screen to unveil the display, and, well, all I can say is that if I were a would-be predator, I wouldn’t hang about.

PHOTO: HUSEIN LATIF

The scariest false face I know? It’s a toss-up between the octopus on the left and the vulture on the right. The real eyes of the octopus can just be seen above the inner ends of the ‘eyebrows’ of the large, prominent false eyes. You can find the real eyes of the Himalayan griffon vulture if you first locate the beak and hence the real head. The false eyes of the octopus presumably deter predators. The vulture seems to use its false face to intimidate other vultures, thereby clearing a path through a crowd around a carcase.

Some butterflies have a false head at the back of the wings. How might this benefit the insect? Five hypotheses have been proposed, of which the consensus favourite is the deflection hypothesis: birds are thought to peck at the less vulnerable false head, sparing the real one. I slightly prefer a sixth idea, that the predator expects the butterfly to take off in the wrong direction. Why do I prefer it? Perhaps because I am committed to the idea that animals survive by predicting the future.

Paintings and statues aimed at fooling predators constitute the nearest approach achieved by any book of the dead to a literal readout, a literal description of ancestral worlds. And the aspect of this that I want to stress is its astounding accuracy and attention to detail. This leaf insect even has fake blemishes. The stick caterpillar (here) has fake buds.

I see no reason why the same scrupulous attention to detail should not pervade less literal, less obvious parts of the readout. I believe the same detailed perfection is lurking, waiting to be discovered, in internal organs, in brain-wiring of behaviour, in cellular biochemistry, and other more indirect or deeply buried readings that can be dug out if only we could develop the tools to do so. Why should natural selection escalate its vigilance specifically for the external appearance of animals? Internal details, all details, are no less vital to survival. They are equally subject to becoming written descriptions of past worlds, albeit written in a less transparent script, harder to decipher than this chapter’s superficial paintings and statues. The reason paintings and statues are easier for us to read than internal pages of the genetic book of the dead is not far to seek. They are aimed at eyes, especially predatory eyes. And, as already pointed out, predatory eyes, vertebrate ones at least, work in the same way as our eyes. No wonder it is camouflage and other versions of painting and sculpture that most impress us among all the pages of the book of the dead.

I believe the internally buried descriptions of ancestral worlds will turn out to have the same detailed perfection as the externally seen paintings and statues. Why should they not? The descriptions will just be written less literally, more cryptically, and will require more sophisticated decoding. As with the ear’s decoding of Chapter 1’s spoken word ‘sisters’, the paintings and statues of this chapter are effortlessly read pages from books of the dead. But just as the ‘sisters’ waveform, when presented in the recalcitrant form of binary digits, will eventually yield to analysis, so too will the non-obvious, non-skin-deep details of animals and their genes. The book of the dead will be read, even down to minute details buried deep inside every cell.

This is my central message, and it will bear repeating here. The fine-fingered sculpting of natural selection works not just on the external appearance of an animal such as a stick caterpillar, a tree-climbing lizard, a leaf insect or a tawny frogmouth, where we can appreciate it with the naked eye. The Darwinian sculptor’s sharp chisels penetrate every internal cranny and nook of an animal, right down to the sub-microscopic interior of cells and the high-speed chemical wheels that turn therein. Do not be deceived by the extra difficulty of discerning details more deeply buried. There is every reason to suppose that painted lizards or moths, and moulded potoos or caterpillars, are the outward and visible tips of huge, concealed icebergs. Darwin was at his most eloquent in expressing the point.

It may be said that natural selection is daily and hourly scrutinising, throughout the world, every variation, even the slightest; rejecting that which is bad, preserving and adding up all that is good; silently and insensibly working, whenever and wherever opportunity offers, at the improvement of each organic being in relation to its organic and inorganic conditions of life. We see nothing of these slow changes in progress, until the hand of time has marked the long lapse of ages, and then so imperfect is our view into long past geological ages, that we only see that the forms of life are now different from what they formerly were.

3 In the Depths of the Palimpsest

It’s all very well for me to say an animal is a readout of environments from the past, but how far into the past do we go? Every twinge of lower-back pain reminds us that our ancestors only 6 million years ago walked on all fours. Our mammalian spine was built over hundreds of millions of years of horizontal existence when the working body depended on it – depended in the literal sense of hanging from it. The human spine was not ‘meant’ to stand vertically, and it understandably protests. Our human palimpsest has ‘quadruped’ boldly written in a firm hand, then over-written all too superficially – and sometimes painfully – with the tracery of a new description – biped. Parvenu, Johnny-come-lately biped.

The skin of Chapter 1’s Mojave horned lizard proclaimed to us an ancestral world of sandy, stony desert, but that world was presumably recent. What can we read from the palimpsest about earlier environments? Let’s begin by going back a very long way. As with all vertebrates, lizard embryos have gill arches that speak to us of ancestral life in water. As it happens, we have fossils to tell us that the watery scripts of all terrestrial vertebrates, including lizards, date back to Devonian times and then back to life’s marine beginning. The poetic point has often been made – I associate it with that salty, larger-than-life intellectual warrior JBS Haldane – that our saline blood plasma is a relic of Palaeozoic seas. In a 1940 essay called ‘Man as a Sea Beast’, Haldane notes that our plasma is similar in chemical composition to the sea but diluted. He takes this as an indication, not a very strong one in my reluctant opinion (‘reluctant’ because I like the idea), that Palaeozoic seas were less salty than today’s:

As the sea is always receiving salt from the rivers, and only occasionally depositing it in drying lagoons, it becomes saltier from age to age, and our plasma tells us of a time when it possessed less than half its present salt content.

The phrase ‘tells us of a time’ resonates congenially with the title of this book. Haldane goes on:

we pass our first nine months as aquatic animals, suspended in and protected by a salty fluid medium. We begin life as salt-water animals.

Whatever the plausibility of Haldane’s inference about changing salinity, what is undeniable is this. All life began in the sea. The lowest level of palimpsest tells a story of water. After some hundreds of millions of years, plants and then a variety of animals took the enterprising step out onto the land. Following Haldane’s fancy, we could say they eased the journey by taking their private sea water with them in their blood. Animal groups that independently took this step include scorpions, snails, centipedes and millipedes, spiders, crustaceans such as woodlice and land crabs, insects (who later took a further giant leap into the air) and a range of worms who, however, never stray far from moisture to this day. All these animals have ‘dry land’ inscribed on top of the deeper marine layers of palimpsest. Of special interest to us as vertebrates, the lobefins, a group of fish represented today by lungfish and coelacanths, crawled out of the sea, perhaps initially only in search of water elsewhere but eventually to take up permanent residence on dry land, in some cases very dry indeed. Intermediate palimpsest scripts tell of juvenile life in water (think tadpole) accompanying adult emergence on land.

That all makes sense. There was a living to be made on land. The sun showers the land with photons, no less than the surface of the sea. Energy was there for the taking. Why wouldn’t plants take advantage of it via green solar panels, and then animals take advantage of it via plants? Do not suppose that a mutant individual suddenly found itself fully equipped genetically for life on land. More probably, individuals of an enterprising disposition made the first uncomfortable moves. This was perhaps rewarded by a new source of food. We can imagine them learning to make brief, snatch-and-grab forays out of water. Genetic natural selection would have favoured individuals who were especially good at learning the new ploy. Successive generations would have become better and better at learning it, spending less and less time in the sea.

The general name for learned behaviour becoming genetically incorporated is the Baldwin Effect. Though I won’t discuss it further here, I suspect that it’s important in the evolution of major innovations generally, perhaps including the first moves towards defying gravity in flight. In the case of the lobe-finned fishes who left the water in the Devonian era around 400 million years ago, there are various theories for how it happened. One that I like was proposed by the American palaeontologist AS Romer. Recurrent drought would have stranded fishes in shrinking pools. Natural selection favoured individuals able to leave a doomed pool and crawl overland to find another one. A point in strong favour of the theory is that there would have been a continuous range of distances separating the pools. At the beginning of the evolutionary progression, a fish could save its life by crawling to a neighbouring pool only a short distance away. Later in evolution, more distant pools could be reached. All evolutionary advances must be gradual. A suffocating fish’s ability to exploit air requires physiological modification. Major modification cannot happen in one fell swoop. That would be too improbable. There has to be a gradient of step-by-step small improvement. And a gradient of distances between pools, some near, some a bit further, some far, is exactly what is needed. We shall meet the point again in Chapter 6 and the astonishingly rapid evolution of Cichlid fishes in Lake Victoria. Unfortunately, Romer prefaced his theory by quoting evidence that the Devonian was especially prone to drought. When this evidence was called into question, Romer’s whole theory suffered in appreciation. Unnecessarily so.

In whatever way the move to the land happened, profound redesign became necessary. Water really is a very different environment from airy land. For animals, the move out of water was accompanied by radical changes in anatomy and physiology. Watery scripts at the base of the palimpsest had to be comprehensively over-written. It is the more surprising that a large number of animal groups later went into reverse, throwing their hard-won retooling to the winds as they trooped back into the water. Among invertebrates, the list includes pond snails, diving bell spiders, and water beetles. The water that they re-invaded is fresh water, not sea. But some vertebrate returnees, notably whales (including dolphins), sea cows, sea snakes, and turtles, went right back into the salted marine world that their ancestors had taken such trouble to leave.

Seals, sea lions, walruses, and their kin, also Galapagos marine iguanas, only partially returned to the sea, to feed. They still spend much time on land, and breed on land. So do penguins, whose streamlined athleticism in the sea is bought at the cost of risible maladroitness on land. You cannot be a master of all trades. Sea turtles laboriously haul themselves out on land to lay eggs. Otherwise, they totally recommitted to the sea. As soon as baby turtles hatch in the sand, they lose no time in racing down the beach to the sea. Lots of other land vertebrates moved part-time into fresh water, including snakes, crocodiles, hippos, otters, shrews, tenrecs, rodents such as water voles and beavers, desmans (a kind of mole), yapoks (water opossums), and platypuses. These still spend a good deal of time on land, taking to the water mainly to feed.

Sea turtle

You might think that returnees to water would unmask the lower layers of palimpsest and rediscover the designs that served their ancestors so well. Why don’t whales, why don’t dugongs, have gills? Their embryos, like the embryos of all mammals, even have the makings of gills. It would seem the most natural thing in the world to dust off the old script and press it into service again. That doesn’t happen. It’s almost as though, having gone to such trouble to evolve lungs, they were reluctant to abandon them, even if, as you might think, gills would serve them better. Given gills, they wouldn’t have to keep coming to the surface to breathe. But rather than revive the gill, what they did was stick loyally to the lung, even at the cost of profound modifications to the whole air-breathing system, to accommodate the return to water.

They changed their physiology in extreme ways such that they can stay under water for over an hour in some cases. When whales do come to the surface, they can exchange a huge volume of air very quickly in one roaring gulp before submerging again. It’s tempting to toy with the idea of a general rule stating that old scripts from lower down the palimpsest cannot be revived. But I can’t see why this should in general be true. There has to be a more telling reason. I suspect that, having committed their embryological mechanics to air-breathing lungs, the repurposing of gills would be a more radical embryological upheaval, more difficult to achieve than rewriting superficial scripts to modify the air-breathing equipment.

Sea snakes don’t have gills, but they obtain oxygen from water through an exceptionally rich blood supply in the head. Again, they went for a new solution to the problem, rather than revive the old one. Some turtles obtain a certain amount of oxygen from water via the cloaca (waste disposal plus genital opening), but they still have to come to the surface to breathe air into their lungs.

Steller’s sea cow

Never parted from the buoyant support of water, whales are freed to evolve in massively (indeed so) different directions from their terrestrial ancestors. The blue whale is probably the largest animal that ever lived. Steller’s sea cows (see previous page), extinct relatives of dugongs and manatees, reached lengths of 11 metres and masses of 10 tonnes, larger than minke whales. They were hunted to extinction in the eighteenth century, soon after Steller first saw them. Like whales, sea cows breathe air, having failed to rediscover anything equivalent to the gills of their earlier ancestors. For reasons just discussed, that word ‘failed’ may be ill-advised.

Ichthyosaurs were reptilian contemporaries of the dinosaurs, with fins and streamlined bodies, and with powerful tails, which were their main engines of propulsion: like dolphins, except that ichthyosaur tails would have moved from side to side rather than up and down. The ancestors of whales and dolphins had already perfected the mammalian galloping gait on land, and the up-and-down motion of dolphin flukes was naturally derived from it. Dolphins ‘gallop’ through the water, unlike ichthyosaurs, who would have swum more like fish. Otherwise, ichthyosaurs looked like dolphins and they probably lived pretty much like dolphins. Did they leap exuberantly into the air – wonderful thought – wagging their tails like dolphins (but from side to side)? They had big eyes, from which we might guess that they probably didn’t rely on sonar as the small-eyed dolphins do. Ichthyosaurs gave birth to live babies in the sea, as we know from a fossil ichthyosaur who unfortunately died during the act of giving birth (see above). Unlike turtles, but like dolphins and sea cows, ichthyosaurs were fully emancipated from their terrestrial heritage. So were plesiosaurs, for there’s evidence that they were livebearers too. Given that viviparity has evolved, according to one authoritative estimate, at least 100 times independently in land reptiles, it seems surprising that sea turtles, buoyant in water but painfully heavy on land, still labour up the sands to lay eggs. And that their babies, when they hatch, are obliged to flap their perilous way down to the sea, running a gauntlet of gulls, frigate birds, foxes, and even marauding crabs.

Ichthyosaur died while giving birth

Sea turtles revert to land to lay their eggs, in holes that they dig in a sandy beach. And an arduous exertion it is, for they are woefully ill-equipped to move out of water. Seals, sea lions, otters, and many other mammals whom we’ll discuss in a moment, spend part of their time in water and are adapted to swimming rather than walking, which makes them clumsy on land, though less so than sea turtles. As already remarked, the same is true of penguins, who are champions in water but comically awkward on land. Galapagos marine iguanas are proficient swimmers, but they can manage a surprising turn of speed on land too, when fleeing snakes. All these animals show us what the intermediates might have been like, on the way to becoming dedicated mariners like whales, dugongs, plesiosaurs, and ichthyosaurs.

Tortles and turtoises – a tortuous trajectory

Turtles and tortoises are of special interest from the palimpsest point of view, and they deserve special treatment. But first I have to dispel a confusing quirk of the English language. In British common usage, turtles are purely aquatic, tortoises totally terrestrial. Americans call them all turtles, tortoises being those turtles that live on land. In what follows, I’ll try to use unambiguous language that won’t confuse readers from either of the two nations ‘separated by a common language’. I’ll sometimes resort to ‘chelonians’ to refer to the entire group.

Land tortoises, as we shall see, are almost unique in that their palimpsest chronicles a double doubling-back during the long course of their evolution. Their fish ancestors, along with the ancestors of all land vertebrates including us, left the sea in Devonian times, around 400 million years ago. After a period on land they then, like whales and dugongs, like ichthyosaurs and plesiosaurs, returned to the water. They became sea turtles. Finally, uniquely, some aquatic turtles came back to the land and became our modern dry-land (in some cases very dry indeed) tortoises. This is the ‘double doubling-back’ that I mentioned. But how do we know? How has the uniquely complicated palimpsest of land tortoises been deciphered?

We can draw a family tree of extant chelonians, using all available evidence including molecular genetics. The diagram below is adapted from a paper by Walter Joyce and Jacques Gauthier. Aquatic groups are shown in blue, terrestrial in orange. I’ve taken the liberty of colouring the ‘ancestral’ blobs blue when the majority of their descendant groups are blue. Today’s land tortoises constitute a single branch, nested among branches consisting of aquatic turtles.

This suggests that modern land tortoises, unlike most land reptiles and mammals, have not stayed on land continuously since their fish ancestors (who were also ours) emerged from the sea. Land tortoises’ ancestors were among those who, like whales and dugongs, went back to the water. But, unlike whales and dugongs, they then re-emerged back onto the land. I suppose this means I should reluctantly admit that American terminology has something going for it. As it turns out, what we British call tortoises are just sea turtles who turned turtle and returned to the land. They’re terrestrial turtles. No, I can’t do it. My upbringing leads me to go on calling them tortoises, but I’ll curb my tendency to wince at a phrase like ‘desert turtles’. In any case, what is interesting from the point of view of the genetic book of the dead is this: where reversals are concerned, land tortoises appear to have the most complicated palimpsests of all, with the largest number of almost perverse-seeming reversals.

Modern land tortoise

Moreover, it appears that our modern land tortoises may not be the first of their kind to achieve this remarkable double doubling-back. What looks like an earlier case occurred in the Triassic era. Two genera, Proganochelys and Palaeochersis, date way back to the first great age of dinosaurs, indeed long before the more spectacular and famous giant dinosaurs of the Jurassic and Cretaceous. It appears that they lived on land. How can we know? This is a good opportunity to return to our ‘future scientist’ SOF, faced with an unknown animal, and invite her to ‘read’ its environment from its skeleton. Fossils present the challenge in earnest because we can’t watch them living – whether swimming or walking – in their environment.

Proganochelys

So, what might SOF say of those enigmatic fossils, Proganochelys and Palaeochersis? Their feet don’t look like swimming flippers. But can we be more scientific about this? Joyce and Gauthier, whom we’ve already met, used a method that can point the way for anyone who wants to quantitatively decipher the genetic book of the long dead. They took seventy-one living species of chelonians whose habitat is known, and made three key measurements of their arm bones, the humerus (upper arm), the ulna (one of the two forearm bones), and the hand, as a percentage of total arm length. They plotted them on triangular graph paper. Triangular plotting makes convenient use of a proof in Euclidean geometry. From any point inside an equilateral triangle, the lengths of perpendiculars dropped to the three sides add up to the same value. This provides a useful technique for displaying three variables when the three are proportions that add up to a fixed number such as one, or percentages that add up to 100. Each coloured point represents one of the seventy-one species. The perpendicular distances of a point from each of the three lines of the big triangle represent the lengths of their three skeletal measurements. And when you colour-code the species according to whether they live in water or on land, something significant leaps off the page. The coloured points elegantly separate out. Blue points represent species living in water, yellow points species living on land. Green points represent genera that spend time in both environments and they, satisfyingly, occupy the region between the blues and yellows.

So now, the interesting question is, where do the two ancient fossil species, Palaeochersis and Proganochelys, fall? They are represented by the two red stars. And there’s little doubt about it. The red stars fall among the yellow points, the dry-land species of modern tortoises. They were terrestrial tortoises. The two stars fall fairly close to the green points, so maybe they didn’t stray far from water. This kind of method shows one way in which our hypothetical SOF might ‘read’ the environment of any hitherto unknown animal – and hence read the environment in which its ancestors were naturally selected. No doubt SOF will have more advanced methods at her disposal, but studies such as this one might point the way.

Palaeochersis and Proganochelys, then, were landlubbers. But had they stayed on land ever since their (and our) fishy ancestors crawled out of the sea? Or did they, like modern land tortoises, number sea turtles among their forebears? To help decide this, let’s look at another fossil. Odontochelys semitestacea lived in the Triassic, like Palaeochersis and Proganochelys but earlier. It was about half a metre long, including a long tail, which modern chelonians lack. The ‘Odonto’ in the generic name records the fact that it had teeth, unlike all modern chelonians, who have something more like a bird’s beak. And the specific name semitestacea testifies to its having only half a shell. It had a ‘plastron’, the hard shell that protects the belly of all chelonians, but it lacked the domed upper shell. The ribs, however, were flattened like those that support the shell in a normal chelonian.

The fossil was discovered in China and described by a group of scientists led by Li Chun. They believe Odontochelys, or something like it, is ancestral to all chelonians and that the turtle shell evolved ‘from the bottom up’. They referred to the Joyce and Gauthier paper on forelimb proportions and concluded that Odontochelys was aquatic. In case you’re wondering what was the use of half a shell, sharks (who have been around since long before any of this story) often attack from below, so the armoured belly might have been anti-shark. If we accept this interpretation, it again suggests that the chelonian shell evolved in water. Against land predators we would not expect that the breastplate should be the first piece of armour to evolve. Quite the reverse. Odontochelys was probably something like a swimming lizard, a sort of Galapagos marine iguana but armoured with a large ventral breastplate.

Although it’s controversial, the Chinese scientists favour the view that an aquatic turtle like Odontochelys, with its half shell, was ancestral to chelonians. Like all reptiles, it would have been descended from terrestrial, lizard-like ancestors, perhaps something like Pappochelys. If they are right that the chelonian shell evolved, Odontochelys-style, from the bottom up in shark-infested waters, what can we say about Palaeochersis and Proganochelys out on the land?

Odontochelys

It would seem that these represent an earlier emergence from water, an earlier incarnation of doubling-back terrestrial tortoises, to parallel today’s behemoths of Galapagos and Aldabra, who evolved from a later generation of aquatic turtles. In any case, the group we know as land tortoises stand as poster child for the very idea of an elaborate palimpsest. Not only did they leave the water for the land, return to water, and then double back onto the land again. They may even have done it twice! The doubling-back was achieved first by the likes of Proganochelys, and then again, independently, by our modern land tortoises. Maybe some went back to water yet again. It wouldn’t surprise me if some freshwater terrapins represent such a triple reversal, but I know of no evidence. Even one doubling-back is remarkable enough.

Pappochelys

If this giant Galapagos tortoise could sing a Homeric epic of its ancestors, its DNA-scored Odyssey would range from ancient legends of Devonian fishes, through lizard-like creatures roaming Permian lands, back to the sea with Mesozoic turtles, and finally returning to the land a second time. Now that’s what I call a palimpsest!

Giant Galapagos tortoise

Who Sings Loudest

I said in Chapter 1 that the palimpsest chapter would return to the question of the relative balance between recent scripts and ancient ones. It is time to do so. You might conjecture something like the scriptural rule for internal Koranic contradictions: later verses supersede earlier ones. But it’s not as simple as that. In the genetic book of the dead, older scripts of the palimpsest can amount to ‘constraints on perfection’.

Famous cases of evolutionary bad design, such as the vertebrate retina being installed back to front, or the wasteful detour of the laryngeal nerve (see below), can be blamed on historical constraints of this kind.

‘Can you tell me the way to Dublin?’

‘Well, I wouldn’t start from here.’

The joke is familiar to the point of cliché, but it strikes to the heart of our palimpsest priority question. Unlike an engineer who can go back to the drawing board, evolution always has to ‘start from here’, however unfavourable a starting point ‘here’ may be. Imagine what the jet engine would look like if the designer had had to start with a propellor engine on his drawing board, which he then had to modify, step by tinkering step, until it became a jet engine. An engineer starting with the luxury of a clean drawing board would never have designed an eye with the ‘photocells’ facing backwards, and their output ‘wires’ being obliged to travel over the surface of the retina and eventually dive through it in a blind spot on their way to the brain. The blind spot is worryingly large, although we don’t notice it because the brain, in building its constrained virtual reality model of the world, cunningly fills in a plausible replacement for the missing patch on the visual field. I suppose such guesswork could be dangerous if a hazard happened to fall on the blind spot at a crucial moment. But this piece of bad design is buried deep in embryology. To change it in order to make the end product more sensible would require a major upheaval early in the embryonic development of the nervous system. And the earlier in embryology it is, the more radical and difficult to achieve. Even if such an upheaval could at length be achieved, the intermediate evolutionary stages on the way to the ultimate improvement would probably be fatally inferior to the existing arrangement, which works, after all, pretty well. Mutant individuals who began the long trek to ultimate improvement would be out-competed by rivals who coped adequately with the status quo. Indeed, in the hypothetical case of reforming the retina, they would probably be totally blind.

You can call the backwards retina ‘bad design’ if you wish. It’s a legacy of history, a relic, an older palimpsest script partially over-written. Another example is the tail of humans and other apes, prominent in the embryo, shrunk to the coccyx in the adult. Also faintly traced in the palimpsest is our sparse covering of hair. Once useful for heat insulation, it is now reduced to a relic, still retaining its now almost pointless erectile properties in response to cold or emotion.

The recurrent laryngeal nerve in a mammal or a reptile serves the larynx. But instead of going directly to its destination, it shoots straight past the larynx, on its way down the neck into the chest, where it loops around a major artery and then rushes all the way back up the neck to the larynx. If you think of it as design, this is obviously rotten design. The length of the detour in the giant dinosaur Brachiosaurus would have been about 20 metres. In a giraffe it is still impressive, as I witnessed at first hand when, for a Channel Four documentary called Inside Nature’s Giants, I assisted in the dissection of a giraffe, who had unfortunately died in a zoo. Who knows what inefficiencies or outright errors might have resulted from the transmission delay that such a detour must have imposed. But natural selection is not wantonly silly. It wasn’t originally bad design in our fishy ancestors when the nerve in question went straight to its end organ – not larynx, for fish don’t have a larynx. Fish don’t have a neck either. When the neck started to lengthen in their land-dwelling descendants, the marginal cost of each small lengthening of the detour was small compared to what would have been the major cost of radically reforming embryology to re-route the nerve along a ‘sensible’ path, the other side of the artery. Mutant individuals who began the embryologically radical evolutionary journey towards re-routing the laryngeal nerve would have been out-competed by rival individuals who made do with the working status quo. There’s a very similar example in the routing of the tube connecting testis to penis. Instead of taking the most direct route, it loops over the tube connecting kidney to bladder: an apparently pointless detour. Once again, the bad design is a constraint buried deep in embryology and deep in history.

Recurrent laryngeal nerve

‘Buried deep in embryology and deep in history’ is another way of saying ‘buried deep under layers of younger scripts in the palimpsest’. Far from a ‘Koranic’ type of rule in which ‘Later trumps Earlier’, we might be tempted to toy with the reverse, ‘Earlier trumps Later’. But that won’t do either. The selection pressures that winnowed our recent ancestors are probably still in force today. So, to change the metaphor from a book to a cacophony of voices, the youngest voice, in its youthful vigour, might have something of a built-in advantage. Not an overriding advantage, however. I’d be content with the more cautious claim that the genetic book of the dead is a palimpsest made up of scripts ranging from very old to very young and including all intermediates between. If there are general rules governing relative prominence of old versus young or intermediate, they must wait for later research.

Biologists have long recognised morphological features that lie conservatively in basal layers of the palimpsest. An example is the vertebrate skeleton: the dorsally placed spinal column, with a skull and tail at the two ends, the column made of serially segmented vertebrae through which runs the body’s main trunk nerve. Then the four limbs that sprout from it, each consisting of a single, typically long bone (humerus or femur) connected to two parallel bones (radius/ulna, tibia/fibula); then a cluster of smaller bones terminating in five digits. It’s always five digits in the embryo, although in the adult some may be reduced or even missing. Horses have lost all but the middle digit, which bears the hoof (a massively enlarged version of our nail). A group of extinct South American herbivores, the Litopterns, included some species, such as Thoatherium (left), which independently evolved almost exactly the same hoofed limb as the horse (right). The two limbs have been drawn the same size for ease of comparison, but Thoatherium was considerably smaller than a typical horse, about the size of a small antelope. Think of the horse in the picture as a Shetland pony!

Litoptern Horse

Arthropods have a different Bauplan (building plan or body plan), although they resemble vertebrates in their segmented pattern of units repeated fore-and-aft in series. Annelid worms such as earthworms, ragworms, and lugworms also have a segmented body plan, and they share with arthropods the ventral position of the main nerve. This difference in position of the body’s main nerve has led to the provocative speculation that we vertebrates may be descended from a worm who developed the habit of swimming upside down – a habit that has been rediscovered by brine shrimps today. If this is so, the ‘basic’ vertebrate Bauplan may not be quite as basic as we thought.

Brine shrimp

But, important and even stately as such morphological bauplans are, morphology has become overshadowed by molecular genetics when it comes to reading the lower layers of biological palimpsests in order to reconstruct animal pedigrees. Here’s a neat little example. South American trees are inhabited by two genera of tree sloths, the two-toed and the three-toed. There was also a giant ground sloth, which went extinct some ten or twelve thousand years ago, just recently enough to supply molecular biologists with DNA. Since the two tree sloths are so alike, in both anatomy and behaviour, it was natural to suppose that they are closely related, descended from a tree-dwelling ancestor quite recently, and more distantly related to the giant ground sloth. Molecular genetics now shows, however, that the two-toed tree sloth is closer to the giant sloth – all 4 tonnes of it – than it is to the three-toed tree sloth.

Long before modern molecular taxonomy burst onto the scene, morphological evidence aplenty showed us that dolphins are mammals not fish, for all that they look and behave superficially like large fish – mahi-mahi are indeed sometimes called ‘dolphinfish’ or even ‘dolphins’. But although science long knew that dolphins and whales were mammals, no zoologist was prepared for the bombshell released in the late twentieth century by molecular geneticists when they showed, beyond all doubt, that whales sprang from within the artiodactyls, the even-toed, cloven-hoofed ungulates. The closest living cousins of hippos are not pigs, as I was taught as a zoology undergraduate. They are whales. Whales don’t have hooves to cleave. Indeed, their land ancestors probably didn’t actually have cloven hooves, but broad four-toed feet, as hippos do today. Nevertheless, they are fully paid-up members of the artiodactyls. Not even outliers to the rest of the artiodactyls but buried deep within them, closer cousins to hippos than hippos are to pigs or to other animals who actually have cloven hooves. A staggering revelation that nobody saw coming. Molecular gene sequencing may have other shocks in store for us yet.

Just as a computer disc is littered with fragments of out-of-date documents, animal genomes are littered with genes that must once have done useful work but now are never read. They’re called pseudogenes – not a great name, but we’re stuck with it. They are also sometimes called ‘junk’ genes, but they aren’t ‘junk’ in the sense of being meaningless. They are full of meaning. If they were translated, the product would be a real protein. But they are not translated. The most striking example I know concerns the human sense of smell. It is notoriously poor compared with that of coursing hounds, seal-hunting polar bears, truffle-snuffling sows, or indeed the majority of mammals. You’d be right to credit our ancestors with feats of smell discrimination that would amaze us if we could go back and experience them. And the remarkable fact is that the necessary genes, large numbers of them, are still with us. It’s just that they are never read, never transcribed, never rendered into protein. They’ve become sidelined as pseudogenes. Such older scripts of the DNA palimpsest are not only there. They can be read in total clarity. But only by molecular biologists. They are ignored by the natural reading mechanisms of our cells. Our sense of smell is frustratingly poor compared to what it could be if only we could find a way to turn on those ancient genes that still lurk within us. Imagine the high-flown imagery that mutant wine connoisseurs might unleash. ‘Black cherry offset by new-mown hay in the attack, with notes of lead pencil in the satisfying finish’ would be tame by comparison.

Hippos are closer cousins to whales than to any other ungulates

The analogy between genome and computer disc is a more than usually close one. If I invite my computer to list the documents on my hard disc, I see an orderly array of letters, articles, chapters of books, spreadsheets of accounts, music, holiday photos, and so on. But if I were to read the raw data as it is actually laid out on the disc, I would face a phantasmagoria of disjointed fragments. What seems to be a coherent book chapter is made up of here a scrap, there a fragment, dotted around the disc. We think it’s coherent only because system software knows where to look for the next fragment. And when I delete a document, I may fondly imagine it has gone. It hasn’t. It’s still sitting where it was. Why waste valuable computer-time to expunge it? All that happens when you delete a document is that the system software marks its territory on the disc as available to be over-written by other stuff, as and when the space is needed. If the territory is not needed it will not be over-written and the original document, or parts of it, will survive – legible but never actually read – like the smell pseudogenes that we still possess but don’t use. This is why, if you want to remove incriminating documents from your computer, you must take special steps to expunge them completely. Routine ‘deletion’ is not proof against hackers.

Pseudogenes are a lucid message from the past: a significant part of the genetic book of the dead. If she hadn’t already deduced it from other cues, SOF would know, from the graveyard of dead genes littering the genome, that our ancestors inhabited a world of smells richer than we can imagine. The DNA tombstones are not only there, the lettering on them is more or less clear and distinct. Incidentally, these molecular tombstones are a huge embarrassment to creationists. Why on earth would a Creator clutter our genome with smell genes that are never used?

This chapter has been mainly concerned with deep layers of the palimpsest, the legacies of more ancient history. In the next four chapters we turn to layers nearer the surface. This amounts to a look at the power of natural selection to override the deep legacies of history. One way to study this is to pick out convergent resemblances between unrelated animals. Another way is ‘reverse engineering’. To which we now turn.

4 Reverse Engineering

One of the central messages of this book – that the meticulously detailed perfection we see in the external appearance of animals pervades the whole interior too – obviously rests on an assumption that something approaching perfection is there in the first place. There, and to be expected on Darwinian grounds. It’s an assumption that has been criticised and needs defending, which is the purpose of the next three chapters.

The most prominent critics of what they called ‘adaptationism’ were Richard Lewontin and Stephen Gould, both at Harvard, both distinguished, in their respective fields of genetics and palaeontology. Lewontin defined adaptationism as ‘That approach to evolutionary studies, which assumes without further proof that all aspects of the morphology, physiology and behavior of organisms are adaptive optimal solutions to problems.’ I suppose I am closer to being an adaptationist than many biologists. But I did devote a chapter of The Extended Phenotype to ‘Constraints on Perfection’. I distinguished six categories of constraint, of which I’ll mention five here.
1. Time lags (the animal is out of date, hasn’t yet caught up with a changing environment). Quadrupedal relics in the human skeleton supply one example.
2. Historical constraints that will never be corrected (e.g. recurrent laryngeal nerve, back-to-front retina).
3. Lack of available genetic variation (even if natural selection would favour pigs with wings, the necessary mutations never arose).
4. Constraints of costs and materials (even if pigs could use wings for certain purposes, and even if the necessary mutations were forthcoming, the benefits are outweighed by the cost of growing them).
5. Mistakes due to environmental unpredictability or malevolence (e.g. when a reed warbler feeds a baby cuckoo it is an imperfection from the point of view of the warbler, engineered by natural selection on cuckoos).
If such constraints are allowed for and admitted, I think I could fairly be called an adaptationist. There remains the point, which will occur to many people, that certain ‘aspects of the morphology, physiology and behavior of organisms’ may be too trivial for natural selection to notice them. They pass under the radar of natural selection. If we are talking about genes as molecular geneticists see them, then it is probably true that most mutations pass unnoticed by natural selection. This is because they are not translated into a changed protein, therefore nothing changes in the organism. They are literally neutral, in the sense of the Japanese geneticist Motoo Kimura, not mutations at all in the functional sense. It’s like changing the font in which an instruction is printed, from Times New Roman to Helvetica. The meaning is exactly the same after the mutation as it was before. But Lewontin had sensibly excluded such cases when he specified ‘morphology, physiology and behavior’. If a mutation affects the morphology, physiology, or behaviour of an animal, it is not neutral in the trivial ‘changing the font’ sense.

Nevertheless, some people still have an intuitive feeling that many mutations are probably still negligible, even if they really do affect morphology, physiology, or behaviour. Even if there’s a real change visible in the animal’s body, mightn’t it be too trivial for natural selection to bother about? My father used to try to persuade me that the shapes of leaves, say the difference between oak shape and beech shape, couldn’t possibly make any difference. I’m not so sure, and this is where I tend to part company with the sceptics like Lewontin. In 1964, Arthur Cain (my sometime tutor at Oxford) wrote a polemical paper in which he forcefully (some might say too forcefully) argued the case for what he called ‘The Perfection of Animals’. On ‘trivial’ characters, he argued that what seems trivial to us may simply reflect our ignorance. ‘An animal is the way it is because it needs to be’ was his slogan, and he applied it both to so-called trivial characters and to the opposite – fundamental features like the fact that vertebrates have four limbs and insects have six. I think he was on firmer ground where so-called trivial characters were concerned, for instance in the following memorable passage:

But perhaps the most remarkable functional interpretation of a ‘trivial’ character is given by Manton’s work on the diplopod [a kind of millipede] Polyxenus, in which she has shown that a character formerly described as an ‘ornament’ (and what could sound more useless?) is almost literally the pivot of the animal’s life.

Even in those cases where the character is very close to being genuinely trivial, natural selection may be a more stringent judge than the human eye. What is trivial to our eyes may still be noticed by natural selection when, in Darwin’s words, ‘the hand of time has marked the long lapse of ages’. JBS Haldane made a relevant hypothetical calculation. He assumed a selection pressure in favour of a new mutation so weak as to seem trivial: for every 1,000 individuals with the mutation who survive, 999 individuals without the mutation will survive. That selection pressure is much too weak to be detected by scientists working in the field. Given Haldane’s assumption, how long will it take for such a new mutation to spread through half the population? His answer was a mere 11,739 generations if the gene is dominant, 321,444 generations if it is recessive. In the case of many animals, that number of generations is an eye-blink by geological standards. A relevant point is that, however seemingly trivial a change may be, the mutated gene has very many opportunities to make a difference – via all the thousands of individuals in whose bodies it finds itself over geological time. Moreover, even though a gene may have only one proximal effect, because embryology is complicated, that one primary effect may ramify. As a result, the gene appears to have many seemingly disconnected effects in different parts of the body. These different effects are called pleiotropic, and the phenomenon is pleiotropism. Even if one of a mutation’s effects was truly negligible, it’s unlikely that all its pleiotropic effects would be.

With all due recognition to the various constraints on perfection, I think a fair working hypothesis is one that, surprisingly, Lewontin himself expressed, admittedly long before his attacks on adaptationism: ‘That is the one point, which I think all evolutionists are agreed upon, that it is virtually impossible to do a better job than an organism is doing in its own environment.’

Some biologists prefer to say natural selection produces animals that are just ‘good enough’ rather than optimal. They borrow from economists the term ‘satisficing’, a jargon word that they love to namedrop. I’m not a fan. Competition is so fierce, any animal who merely satisficed would soon be out-competed by a rival individual who went one better than satisficing. Now, however, we have to borrow from engineers the important notion of local optima. If we think of a landscape of perfection where improvement is represented by climbing hills, natural selection will tend to trap animals on the top of the nearest relatively low hill, which is separated from a high mountain of perfection by an impassable valley. Going down into the valley is the metaphor for getting temporarily worse before you can get better. There are various ways, known to both biologists and engineers, whereby hill-climbers can escape local optima and make their way to ‘broad, sunlit uplands’, though not necessarily to the highest peak of all. But I shall leave the topic now.

Engineers assume that a mechanism designed by somebody for a purpose will betray that purpose by its nature. We can then ‘reverse engineer’ it to discern the purpose that the designer had in mind.

Reverse engineering is the method by which scientific archaeologists reconstructed the purpose of the Antikythera mechanism, a mesh of cogwheels found in a sunken Greek ship dating from about 80 BC. The intricate gearing was exposed by modern techniques such as X-ray tomography. Its original purpose has been reverse engineered as an ancient equivalent of an analogue computer, designed to simulate the movement of heavenly bodies according to the system of epicycles later associated with Ptolemy.

Reverse engineering assumes that the object facing us had a purpose in the mind of a competent designer, a purpose that can be guessed. The reverse engineer sets up a hypothesis as to what a sensible designer might have had in mind, then checks the mechanism to see if it fits the hypothesis. Reverse engineering works well for animal bodies as well as for man-made machines. The fact that the latter were deliberately designed by conscious engineers while the former were designed by unconscious natural selection makes surprisingly little difference: a potential for confusion readily exploited by creationists with their characteristically eager appetite for it. The grace of a tiger and of its prey could not easily, it would seem, be bettered:

What immortal hand or eye
Could frame thy fearful symmetry.

Indeed, animals sometimes seem too symmetrically designed, to their own detriment: remember the owl pictured in Chapter 2.

Darwin had a section of Origin of Species called ‘Organs of extreme perfection and complication’. It’s my belief that such organs are the end products of evolutionary arms races. The term ‘armament race’ was introduced to the evolution literature by the zoologist Hugh Cott in his book on Animal Coloration published in 1940, during the Second World War. As a former officer in the regular army during the First World War, he was well placed to notice the analogy with evolutionary arms races. In 1979, John Krebs and I revived the idea of the evolutionary arms race in a presentation to the Royal Society. Whereas an individual predator and its prey run a race in real time, arm races are run in evolutionary time, between lineages of organisms. Each improvement on one side calls forth a counter-improvement on the other. And so the arms race escalates, until called to a halt, perhaps by overwhelming economic costs, just like military arms races.

Antelopes could always outrun lions, and vice versa, but only by counter-productive investment of too much ‘capital’ in leg muscles at the expense of other calls on investment in, say, milk production. If the language of ‘investment’ sounds too anthropomorphic, let me translate. Individuals who excel in running speed would be out-competed by slightly slower individuals who divert resources more usefully, from athletic legs into milk. Conversely, individuals who overdo milk production are out-competed by rivals who economise on milk production and put the energy saved into running speed. To quote the economists’ hackneyed saw, there’s no such thing as a free lunch. Trade-offs are ubiquitous in evolution.

I think arms races are responsible for every biological design impressive enough to, in the words of David Hume’s Cleanthes, ravish ‘into admiration all men who have ever contemplated them’. Adaptations to ice ages or droughts, adaptations to climate change, are relatively simple, less prone to ravish into admiration because climate is not out to get you. Predators are. So are prey, in the indirect sense that, the more success prey achieve at evading capture, the closer their would-be predators come to starvation. Climate doesn’t menacingly change in response to biological evolution. Predators and prey do. So do parasites and hosts. It is the mutual escalation of arms races that drives evolution to Cleanthean heights, such as the feats of mimetic camouflage we met in Chapter 2, or the sinister wiles of cuckoos that will amaze us in Chapter 10.

And now for a point that at first sight seems negative. Whereas animals look beautifully designed on the outside, as soon as we cut them open, we seem superficially to get a different impression. An untutored spectator of a mammal dissection might fancy it a mess. Intestines, blood vessels, mesenteries, nerves seem to spill out all over the place. An apparent contrast with the sinewy elegance of, say, a leopard or antelope when seen from outside. On the face of it, this might seem to contradict the conclusion of Chapter 2. The central point stated there was that the perfection typical of the outer layer must pervade every internal detail as well. Now compare your heart with the village pump, which seems neatly and simply fit for purpose. Admittedly, the heart is two pumps in one, serving the lungs on the one hand and the rest of the body on the other. But you could be forgiven for wondering whether a more minimally elegant pump might profitably have been designed.

Each eye sends information to the brain on the opposite side. Muscles on the left side of the body are controlled by the right side of the brain and vice versa. Why? I suppose we are again dealing with ancient scripts long buried in low strata of the palimpsest. Given such deep constraints, natural selection busily tinkers with the upper-level scripts, making good, as far as possible, the inevitable imperfections imposed by deeper levels. The backwards wiring of the vertebrate retina is well compensated by post-hoc making good. You might think that ‘from such warped beginnings nothing debonair can come’. The great German scientist Hermann von Helmholtz is said to have remarked that if an engineer had produced the eye for him, he would have sent it back. Yet after tweaking, ‘in post’ as movie-makers say, the vertebrate eye can become a fine piece of optical kit.

Two pumps

Why do animals look obviously well designed on the visible outside but apparently less so inside? Does the clue reside in that word ‘visible’? In the case of Chapter 2’s camouflage, and also ornamental extravaganzas like the peacock’s fan, (human) eyes are admiring the external appearance of the animal, and (peahen or predator) eyes are doing the natural selection of external appearance: similar vertebrate eyes in both cases. No wonder external appearance looks more perfectly ‘designed’ than internal details. Internal details are every bit as subject to natural selection, but they don’t obviously look that way because it is not selection by eyes.

That explanation won’t do for the streamlined flair of a sprinting cheetah, or its equally graceful Tommy prey. Those beauties did not evolve for the delectation of eyes but to satisfy the lifesaving requirements of speed. Here it would seem to be the laws of physics that impose what we perceive as elegance: as it is for the aerodynamic grace of a fast jet plane. Aesthetics and functionality converge on the same stylish elegance.

I confess that I find the interior of the body bewilderingly complex. I might even go so heretically far as to dismiss it as a mess. But I am a naive amateur where internal anatomy is concerned. A consultant surgeon whom I have consulted (what else should one do with a consultant?) assures me in no uncertain terms that, to his trained eye, internal anatomy has a beautiful elegance, everything neatly stowed away in its proper place, all shipshape and Bristol fashion. And I suspect that ‘trained eye’ is exactly the point. In Chapter 1, I contrasted the ear’s effortless deciphering of the spoken word ‘sisters’ with the eye’s fumbling impotence to see anything beyond a wavy line on an oscilloscope. My eye sees elegance on the outside. Then when I cut an animal open, my amateur eye contemplates only a mess. The trained surgeon sees stylish perfection of design, inside as well as out. It is, at least partly, the story of ‘sisters’ all over again. Yet there is more to be said. Something about embryology.

Veins, nerves, arteries, lymphatic system – a whole armful of complexity

The sceptic vocally doubts whether it can really matter whether this vein in the arm passes over or under that nerve. Maybe it doesn’t in the sense that, if their relationship could be reversed with a magic wand, the person’s life might not suffer, and might even improve. But I think it does matter in another sense – the sense that solved the riddle of the laryngeal nerve. Every nerve, blood vessel, ligament, and bone got that way because of processes of embryology during the development of the individual. Exactly which passes over or under what may or may not make a difference to their efficient working, once their final routing is achieved. But the embryological upheaval necessary to effect a change, I conjecture, would raise problems, or costs, sufficient to outweigh other considerations. Especially if the embryological upheaval strikes early. The intricate origami of embryonic tissue-folding and invagination follows a strict sequence, each stage triggering its successor. Who can say what catastrophic downstream consequences might flow from a change in the sequence – the kind of change necessary to re-route a blood vessel, say.

Moreover, perhaps Darwinian forces have worked on human perception to sharpen our appreciation of external appearances as opposed to internal details. At all events, I revert with confidence to the conclusion of Chapter 2. It is entirely unreasonable to suppose that the chisels of natural selection, so delicately adept at perfecting external and visible appearance, should suddenly stop at the animal’s skin rather than working their artistry inside. The same standards of perfection must pervade the interior of living bodies, even if less obviously to our eyes. To dissect the non-obvious and make it plain will be the business of future zoological reverse engineers, and it is to them that I appeal.

Ideally, reverse engineering is a systematic scientific project, perhaps involving mathematical models in the sense discussed in Chapter 1. More usually, at present at least, it involves intuitive plausibility arguments. If the object in question has a lens in front of a dark chamber, focusing a sharp image on a matrix of light-sensitive units at the back of the chamber, any person living after the invention of the camera can instantly divine the purpose for which it evolved. But there will be numerous details that will matter and will require sophisticated techniques of reverse engineering, including mathematical analysis. In this chapter our reverse engineering is mostly of the intuitive, common sense kind, like the example of the eye and the camera.

Reverse engineering is supplemented by comparison across species. If SOF is confronted with a hitherto unknown animal, she can read it both by pure reverse engineering (‘a device designed by an engineer to do such-and-such would probably look rather like this’) and also by comparison with known species (‘this organ looks like an organ in so-and-so species that we already know, and it probably is used for the same purpose’).

An indirect version of reverse engineering can be used to infer aspects of an animal that cannot be seen, for example when all we have is fossils. We have no fossil evidence about the heart of a dinosaur. But fossils tell us that some sauropods such as Brontosaurus and the even larger Sauroposeidon had extraordinarily long necks. The CGI artists of Jurassic Park beautifully illustrated the dominant view that they reached up to browse tall trees. Like giraffes, only more so. Now the engineer steps in and invokes simple laws of physics to dictate that the heart would have had to generate very high pressure in order to push blood to the height of the animal’s brain when plucking leaves from a high tree. You can’t suck water through a straw that’s more than 10.3 metres tall, even if your sucking is powerful enough to generate a perfect vacuum in the straw. Sauroposeidon’s head probably overtopped its heart by about that much, which gives an idea of the pressure that the heart would have had to generate to push blood up to the head. Without ever seeing a fossilised sauropod heart, the engineer infers that it must have generated especially high pressure. Either that or that they didn’t browse trees at all.

I can’t resist reflecting that the difficulty of pumping blood to a head so high might have been partially responsible for those large dinosaurs outsourcing some brain functions to a second ‘brain’, in the pelvis. Also, I never miss an excuse to quote Bert Leston Taylor’s delightfully witty poem on the subject.

Behold the mighty dinosaur,
Famous in prehistoric lore,
Not only for his power and strength
But for his intellectual length.

You will observe by these remains

The creature had two sets of brains –

One in his head (the usual place),

The other at his spinal base,

Thus he could reason A priori

As well as A posteriori.

No problem bothered him a bit

He made both head and tail of it.

So wise was he, so wise and solemn,

Each thought filled just a spinal column.

If one brain found the pressure strong

It passed a few ideas along.

If something slipped his forward mind

’Twas rescued by the one behind.

And if in error he was caught

He had a saving afterthought.

As he thought twice before he spoke

He had no judgment to revoke.

Thus he could think without congestion

Upon both sides of every question.

Oh, gaze upon this model beast,

Defunct ten million years at least.

The pelvic ‘brain’ would have been about on a level with the heart, and impressively much lower than the head.

Alas, there are no sauropods for us to test such ideas, and we must make do with the next best thing, which is the giraffe. Though not in the same league as a giant dinosaur, the giraffe’s head is quite lofty enough to require an abnormally high blood pressure, out of the ordinary for a mammal. And the following graph bears out the expectation.

I have plotted mean arterial blood pressure against the logarithm of body mass for a range of mammals from mouse to elephant. It’s best to use logarithms for the weights – otherwise it would be hard to fit mouse and elephant on the same page, with intermediate animals conveniently spread out between. The dotted line is the straight line that best fits the data. The line slopes upwards – larger animals tend to have higher blood pressure. Most species are pretty close to the line, meaning that their blood pressure is close to typical for their weight. But the big exception is the giraffe, which is far above the line. Its blood pressure is way higher than it ‘should be’ for an animal of its size. Surprisingly, other evidence shows that the giraffe heart is not especially large. It seems to be prevented from enlarging in evolution by the need to share the body cavity with large herbivorous guts. It achieves the extra-high blood pressure in a different way, by a greater density of heart muscle cells, an improvement that probably imposes costs of its own. Without ever seeing a Brontosaurus heart, we can predict that it too would have stood way above the line in the equivalent graph for reptiles.

The teeth of a hitherto unknown animal speak volumes, and this is fortunate because teeth, being necessarily hard enough to crunch food, are also hard enough to outlast anything else in the fossil record. Some important extinct species are known only from teeth. In the rest of this chapter, we shall use teeth and other biological food-processing devices as our example of choice. Look at this ancient skull. The first thing you notice is the scary canine teeth. You might reverse engineer these as being good for either fighting rivals or stabbing prey to death and holding onto them. Seeking further evidence, you might then look at the other teeth near the back of the jaw, the molars. They don’t mesh surface-to-surface in the way that ours or a horse’s do, but shear past each other like scissors as the jaws close. They seem designed to slice rather than to mill. This says ‘carnivore’. Well, obviously. But it’s only obvious because we are rather good at intuitive reverse engineering, and because we have living large carnivores like lions and tigers for comparison. It does no harm to make the reasoning explicit.

Sabretooth

Animals, perhaps because they are themselves made of meat, find meat relatively easy to digest, and carnivore intestines tend to be appropriately short. If SOF were handed an unknown animal, very long intestines would signal ‘herbivore’ to her. I’ll return to this. Meat, moreover, demands relatively little pre-processing with teeth before digestion. Cutting off substantial chunks to be swallowed whole is sufficient. Plants may be easier to catch than animals – they don’t run away – but they make up for it by being harder to process once you’ve caught them. Plant cells are different from animal cells. They have thick walls toughened by cellulose and silica. For this and other reasons, herbivores need to grind their food into tiny pieces before it is ready to pass into the gut for further breaking up chemically into even smaller pieces. Herbivore teeth are millstones which, like the mills of God, grind slowly and they grind exceeding small. Carnivore teeth don’t resemble millstones and they don’t grind. They cut, shearing through fibrous tissues.

Looking at the back teeth of the above skull, then, we confirm our initial diagnosis from the dagger-like canines, and convincingly reverse-engineer our scary specimen as telling a tale of ancestral carnivores. Moving to the rest of the skull, we note that the articulation of the lower jaw allows only up-and-down movement suitable for scissoring food, not side-to-side movement such as would be needed for milling. Up and down is putting it mildly: the sheer size of the gape is formidable. As you’ll have guessed, this is the skull of a sabretooth cat, often called sabretooth tiger, although it could just as well be called sabretooth lion. It was a big cat, Smilodon, not closer to any particular modern big cat than to any other. Contemporaneous with Smilodon, there were true lions in America, now extinct, bigger than Smilodon, bigger than African lions.

How did Smilodon use those formidable fangs? It’s notable that among modern carnivores, the cat family (Felidae) runs to long canine teeth more than the dog family (Canidae), despite the name ‘canine’ for the teeth. A plausible reason is as follows. Canids are mostly pursuit-hunters. They run their prey down to exhaustion. When they finally catch up with it, the poor spent creature is in no state to escape. Killing it is not a problem. Just start eating! Felids, on the other hand, tend to be stalkers and ambushers. Their prey, when they first pounce upon it, is fresh and in a strong position to escape. Either a swift killing stab or an inescapable grip is desirable, and long penetrating canines answer both needs. Among living cats, the clouded leopard sports the nearest approach to the sabres of Smilodon. Clouded leopards spend much of their time in trees and drop on their prey. Long, sharp daggers would be especially suited to subduing an animal taken by surprise from above, not ‘heated in the chase’ and in full possession of its powers.

Turning to other parts of the skull of Smilodon, we notice that the eye sockets point forward, indicating binocular vision, useful for pouncing on prey and no good for seeing danger creeping up from behind. Sabretooths had no need to watch their back. Herbivorous animals, whose ancestors became ancestors by virtue of noticing would-be killers, tend to have lookout eyes pointing sideways, giving almost 360° vision, calculated to spot a predator stalking from any direction.

Clouded leopard

So now, suppose you are presented with the skull below. It’s obviously very different. The eyes look sideways, as if scanning all around for danger while not being especially concerned with what is ahead. Probably an animal with a need to fear predation, then. The incisor teeth at the front look well suited to cropping grass. Most noticeable are the back teeth. They are broad grinders rather than sharp slicers, and they meet their opposite numbers in a precise fit when the jaws close. Their whole shape with its articulation is well suited to grinding plant food into very small pieces, again confirming the suspicion that this animal’s genes survived in a world of grass or other plant food. And the lower jaw, unlike that of Smilodon, moves sideways as well as up and down, a good milling action. This fossil is Pliohippus, an extinct horse that lived in the Pliocene, probably in mortal fear of Smilodon.

Pliohippus

The contrast between the skulls of the carnivorous sabretooth and the herbivorous horse is stark and clear. There was an animal called Tiarajudens, one of those we used to call a mammal-like reptile (nowadays we’d call it an early mammal), which flourished perhaps 280 million years ago, before the great age of dinosaurs. It had impressive sabretooth canines, much like Smilodon, which indicate a carnivorous diet similar to that of the formidable cat. But the back teeth suggest that, along with other animals to whom it was related, it was in fact a herbivore. So, we have a mismatch. Why would a creature with grinding back teeth have canine teeth like Smilodon? Perhaps Tiarajudens was a herbivore equipped with daggers for defence against predators. Or perhaps, like modern walruses, for fighting against rivals of its own species, as elephants use their gigantic tusks (elephant tusks are enlarged incisor teeth, not canines as in walruses).

Walrus

Walruses have been seen using their (upper canine) tusks to lever themselves out of the water and to make holes in the ice. Anyway, Tiarajudens stands as a cautionary warning against over-hasty reverse engineering, looking at only one thing, in this case the canine teeth.

Hedgehog

Some mammals such as shrews and small bats eat insects. Dolphins eat fish. Though technically carnivorous, the dental demands of these diets are different. Insectivorous teeth are neither grinders nor cutters but piercers. They tend to have sharp points, well suited to piercing the external skeletons of insects. If SOF’s unknown specimen sported piercing teeth like those of this hedgehog, she’d suspect that its ancestors survived on a diet of insects and other arthropods. And that is correct, but they like earthworms too. Ants and termites are a special case (see below).

Common dolphin

Gavial

And now here’s the skull of a dolphin (top), and a gavial (bottom), to show typical fish-eating teeth and jaws. These two fish-eaters, a mammal and a crocodilian, have independently evolved pretty much the same dentition and jaw shape, an example of convergent evolution (which is the topic of Chapter 5). What’s the reverse-engineering explanation for this convergent resemblance? Fish-eaters, unlike, say, lions, are usually much larger than their prey. They don’t need to grind or cut or pierce their prey. Their prey is small enough to swallow whole. Long rows of small, pointed teeth are well equipped to grasp a slippery, soft fish and prevent it from escaping. And the slender jaws can snap shut on the fish without expelling a rush of water that might propel it out of harm’s way.

Ichthyosaur

If you were lucky enough to stumble upon a fossil like the above, you could apply the lesson of the previous paragraph: fish-eater. It’s an ichthyosaur such as we met in Chapter 3, a contemporary and relative of dinosaurs, member of a large group that went extinct somewhat earlier than the last of the dinosaurs. Both reverse engineering, and comparison with the dolphin and gavial pictures, speak to us loud and clear: its ancestors ate fish.

Killer whales (Orca) and sperm whales can be thought of as giant dolphins. They too eat prey smaller than themselves, and they too have long rows of dolphin-like teeth but hugely enlarged. Sperm whales have them only in the lower jaw (very occasionally in the upper jaw, and we may take this as a vestigial relic). Killer whales have them in both jaws. All other large whales, the so-called baleen whales, are filter feeders, sieving krill (crustaceans). They have no teeth at all (though, revealingly, their embryos have them and never use them). Their huge baleen filters are made of keratin, like hooves, fingernails, and rhinoceros horn. The reverse engineer would have no trouble in diagnosing a baleen whale as a trawler. Actually, they are better than trawlers, for they will target a huge aggregation of krill, and gulp it in with copious quantities of sea water, which is then forced out through the curtain of baleen, trapping the krill.

Ants and termites are colossally numerous. A specialist capable of penetrating an ant nest’s formidable defences can hoover up a bonanza of food denied to an ordinary insectivore like a hedgehog. And their dentition is correspondingly specialised. For this purpose, by the way, termites are honorary ants. Mammals who preferentially eat ants and/or termites are all called anteaters. There’s a group of three South American mammals whose name in English is ‘anteater’: the Giant Anteater, the Lesser Anteater, and the Silky Anteater.

Giant Anteater

Tamandua

Giant Anteater

Pangolin

Armadillo

Echidna

The Giant Anteater’s scientific name, Myrmecophaga, is simply Greek for ‘anteater’. You will already have concluded that, since other mammals also specialise in eating ants, ‘Anteater’ is not a great name for a taxonomic group. I’ll use a capital letter for the three South American ‘Anteaters’ and a lower-case letter for other mammals who eat ants (or termites).

The South American Anteaters push the anteating habit to its extreme. The skulls of two of them, Tamandua and the Giant Anteater Myrmecophaga, are pictured at the top of the page opposite. Notice the extreme prolongation of the snout and the total absence of teeth. You’d hardly recognise the Giant Anteater’s skull as a skull at all. All anteaters show the same features, if to a lesser extent. The pangolin has no teeth and a moderately long snout. Armadillos have a longer snout and rather small teeth. The aardvark or antbear of Africa has back teeth, but no teeth at all along most of its long snout. Myrmecobius, the numbat, marsupial anteater of Australia, has a long, pointy head. It has teeth but doesn’t use them for eating except in infancy. Adults seem to use them only for gripping and preparing nest material.

Tachyglossus, the spiny anteater or echidna of Australia and New Guinea, is as distant as you can get from all the above while still being a mammal. It’s an egg-laying mammal like the platypus, a leftover from the ‘mammal-like reptiles’ of the ancient supercontinent of Gondwana. But unlike the platypus, with which it shares deep palimpsest features, it does, as its English name suggests, eat ants and termites. And its rather weird-looking skull does indeed have a long, slender snout and no teeth. Let’s not get carried away, however. A slightly longer snout is possessed by the related echidna genus, Zaglossus, and Zaglossus eats almost nothing but earthworms. Evidently, we must be careful before we jump too precipitately to the conclusion that ‘long snout’ necessarily means anteater. Anteating is not the only habit capable of writing ‘long snout’ in the palimpsest.

What else might SOF use to diagnose an animal as an anteater? Myrmecophaga, the Giant Anteater of South America, whose hugely elongated skull we have already seen, has a giant-sized sticky tongue, which it can protrude to a length of 60 cm, having deployed its formidable claws to break into an ant or termite nest. Huge numbers of the insects stick to the tongue and are drawn in before the tongue shoots out again. Despite its great length, the tongue flicks out and in again at high speed, more than twice per second. Though none can quite match Myrmecophaga, creditably long, sticky tongues are also found, convergently evolved, in aardvarks and the unrelated aardwolves, who, unlike other members of the hyaena family, specialise in eating termites. Pangolins, too, have convergently evolved a long sticky tongue. That of the giant pangolin can be 40 cm long and is attached way back near the pelvis instead of to the hyoid bone in the throat, like ours. A pangolin can extend its tongue deep inside an ants’ nest, skilfully steering through the labyrinth of tunnels, turning left, turning right, leaving no subterranean avenue unexplored. Tamanduas also have a long sticky tongue but, in this case, their evolution was not independent of Myrmecophaga. They surely inherited the long tongue from their shared ancestor, also an anteater. The egg-laying spiny anteater too has a long, sticky tongue, and this time it really is convergent. As is that of the numbat, the marsupial anteater.

There are also physiological resemblances among anteating mammals, notably a low metabolic rate and low body temperature, convergently evolved enough times to impress our hypothetical SOF. However, a low metabolic rate is not exclusively diagnostic of an ant-eating habit. Sloths, befitting their name, also have a low metabolic rate. So do koalas, whom you could regard as a kind of marsupial equivalent of sloths. Both live up trees, eating relatively un-nutritious leaves, and both are slow moving, you might even say lethargic. The convergence doesn’t extend to both ends of the alimentary canal, however. Koalas defecate more than a hundred times per day, while sloths hold the record for the other extreme. They defecate about once per week, maybe because they laboriously climb down from the tree in order to do so.

Some of my reverse-engineering conjectures could be wrong. They are only provisional, to illustrate the point that the teeth of an animal, if properly read, will tell a story. In many cases, a story of ancient grassland prairies or leafy forests. Or, if the teeth resemble those of Smilodon or the clouded leopard, they speak to us of ambush and stalking. No doubt, if we could read them, every tooth we find could plunge us ever deeper into more specific, detailed stories. Teeth are enamelled archives of ancient history.

Teeth constitute the first food processor in the conveyor belt of digestion. The revealing differences between carnivores and herbivores continue on into the gut. Weight for weight, plants are not so nutritious as meat, so cows, for example, need to graze pretty continuously. Food passes through them like an ever-rolling stream, and they defecate some 40 or 50 kilograms per day. Plant stuff being so different from their own bodies, herbivores need help from chemical specialists to digest it. Those specialist chemists, some of whom were honing their skills perhaps a billion years before animals came on the scene at all, include bacteria, archaea (formerly classified as bacteria but actually far separated from them), fungi, and (what we used to call) protozoa. Ruminants such as cows and antelopes do their fermentation in a different way from horses and rabbits, and at different ends of the gut, but all rely on help from micro-organisms. As already mentioned above, herbivores have longer guts than carnivores, and their guts are complicated by elaborate blind alleys and fermentation chambers, specially fashioned to house symbiotic micro-organisms. Ruminants have the added complication of sending the food back for reprocessing by the teeth for a second time after it’s been swallowed – chewing the cud.

Herbivore gut Carnivore gut

There is one bird, the hoatzin of South America, which eats nothing but leaves, the only bird to do so. And – an example of convergent evolution, the process we’ll meet in the next chapter – the hoatzin resembles ruminant mammals in having lots of little gut chambers in which are housed bacteria wielding the necessary chemical expertise to digest leaves. Incidentally, there’s a widely believed myth that the hoatzin is unique among birds in retaining ancient claws in the front of the wing, like the Jurassic ‘intermediate’ fossil Archaeopteryx. It’s true that hoatzin chicks have these primitive claws, but so do the chicks of many other birds, as David Haig pointed out to me. He went on to suggest that this mythic meme is popular among both biologists and creationists, who respectively want Archaeopteryx to be, and not to be, an ‘evolutionary intermediate’. No animal exists to be primitive for the sake of it, nor to serve as an evolutionary intermediate. The claws are useful to the chicks, who used them for clambering back into a tree when they fall.

Tiktaalik

By the same token animals don’t exist for the sake of ‘moving on to the next stage in evolution’. The Devonian fossil Tiktaalik is widely touted as a transition between fish and land vertebrates. So it may be, but being transitional is not a way to earn a living. Tiktaalik was a living, breathing, feeding, reproducing creature, which should be reverse-engineered as such – not as a half-way stage on the way to something better.

What of our own teeth and jaws, our own guts, and those of our near relatives? What tales of long-gone ancestral meals do they tell? Comparison of our Homo sapiens lineage with extinct hominins such as Paranthropus (Australopithecus) robustus and boisei shows a marked trend over time towards shrinkage of both jaws and teeth in our sapiens lineage. The ribcage of those robust old hominins could accommodate a large vegetarian gut. They were evidently less carnivorous than we are, equipped with large plant-milling teeth, strong grinding jaws, and correspondingly powerful jaw muscles. Even though the muscles themselves have not fossilised, their bony attachments, sometimes culminating in a vertical (‘sagittal’) crest like a gorilla’s to increase their purchase, speak to us eloquently of generations of plant roughage. Our own jaw muscles don’t reach so high up the side of our head and we have no bony crest.

The primatologist Richard Wrangham has promoted the intriguing hypothesis that the invention of cooking was the key to human uniqueness and human success. He makes a persuasive case that our reduced jaws, teeth, and guts are ill-suited to either a carnivorous or a herbivorous diet unless a substantial proportion of our food is cooked. Cooking enables us to get energy from foods more quickly and efficiently. For Wrangham it was cooking that led to the dramatic evolutionary enlargement of the human brain, the brain being by far the most energy-hungry of our organs. If he’s right, it’s a nice example of how a cultural change (the taming of fire) can have evolutionary consequences (the shrinking of jaws and teeth).

Birds have no teeth, nor bony jaws. Surprising as it sounds, they may have lost them to save weight – an important concern in a flying animal – replacing them with light, horny beaks. The word ‘mandible’ is used for both parts of the beak – the upper mandible and the lower mandible. Beaks can tear but they can’t chew. Birds do the equivalent of chewing with the gizzard, a muscular chamber of the gut, often containing hard gastroliths – stones or grains of sand that the bird swallows to help with the milling process. Ostriches swallow appropriately large stones, up to 10 cm. Being flightless, they don’t have to worry so much about weight. Even larger stones found with fossil birds such as the giant moas of New Zealand are identified as gastroliths by their polished surfaces – polished by the grinding action in the gizzard.

1. Macaw

2. Crossbill

3. Spoonbill

4. Eagle

5. Skimmer

6. Hummingbird

Beaks vary greatly, and speak to us eloquently of different ways of procuring food. Their variety has been compared with the set of pliers in a mechanic’s toolkit. Pointed beaks delicately select small targets such as single seeds or grubs. Parrot beaks are robust nutcrackers or large seed crushers, and the curved upper mandible with its pointed tip is used as something like a hand. Caged parrots can often be seen climbing on the bars, levering themselves up with the beak as if it were a hand. In the wild they use the same trick in trees. Hummingbird beaks are long tubes for imbibing nectar. Imperious, hooked eagle beaks rip flesh from carcases. Woodpecker beaks hammer like high-powered pneumatic drills, pounding rhythmically into trees in search of larvae. They have specially reinforced skulls to cope with the shock of hammering. Flamingo beaks are upside-down filters for small crustaceans, the bird world’s nearest approach to the krill-sieving baleen of whales. Oystercatchers use their long, pointed beaks to chisel into mussels and other shellfish. Curlews use theirs to probe mud for worms and shellfish. Spoonbills have flat paddle-like bills that they sweep from side to side, at the same time using their feet to stir up mud and expose small animals lurking in it. Skimmer beaks are even more specialised. The lower mandible is longer than the upper. The bird flies close to the water with the mouth open and the tip of the lower mandible skimming the surface. When it hits a fish, the beak snaps shut, trapping the fish. Pelicans have a voluminous pouch of skin under the beak, which nets fish.

Nestling birds who are fed by their parents don’t need beaks to do anything other than gape. Their beaks are grotesquely wide, with brightly coloured linings – advertising surfaces garishly designed to out-compete their siblings for parental largesse. The huge difference from adult beaks of the same species reminds us that juvenile needs can be very different from adult ones, a principle writ large by caterpillars and butterflies, tadpoles and frogs, and many other examples where larval forms occupy a completely different niche from the adults they become.

GALAPAGOS FINCHES

Large ground finch

Medium ground finch

Small tree finch

Green warbler finch

Crossbills sport a weird crossover of upper and lower jaw beaks, which is helpful in prying apart the scales of pinecones. Insectivorous birds have differently shaped beaks from seed-eaters. And specialists on seeds of different sizes have correspondingly different beaks, the differences making total sense from a reverse-engineering point of view. The evolution of such differences is the subject of a beautiful and still proceeding long-term study of ‘Darwin’s Finches’ on one of the smaller Galapagos Islands by Peter and Rosemary Grant, and their collaborators.

Galapagos is matched as a Pacific island showcase of Darwinian evolution by the archipelago of Hawaii. Both island chains are volcanic and very young by geological standards. The biology of Hawaii differs in being more contaminated by humans, and by the other invasions for which humans are to blame. The evolutionary divergence of Hawaiian honeycreepers (below) shows a variety of beaks that outdoes even that of the Galapagos finches (above). There are eighteen surviving species (more than twice that number have gone extinct), all apparently descended from a single species of Asian finch, probably looking not unlike a Galapagos finch. The range of bill types that has evolved in such a short time is astonishing.

Some have retained the seed-eating habits of the ancestor, and still look finch-like with stout, stubby beaks. Others have modified their beaks for nectar-sipping, like African sunbirds rather than like New World hummingbirds. Yet others, with long downward-curving beaks, are probers for insects. Of these, the so-called ‘I’iwi’ (below) has a sharp, stout, stabbing lower mandible, which hammers into bark. Then the long curved upper mandible, which has been held out of the way during the hammering, comes into action to probe insects out of the cracks. The Maui parrotbill uses its powerful callipers to crush twigs and rip off bark in search of insects.

HAWAIIAN HONEYCREEPERS

Laysan finch

Kakawahie (extinct)

‘Akiapola’au

‘I’iwi

Heron beaks are long fishing spears, stabbing down into the water with sudden precision. The African black heron uses its wings to shade its field of view, which would otherwise be troubled by reflections from the rippling water surface. It dramatically sweeps its black wings across its body, laughably recalling a black-cloaked villain in Victorian melodrama. A separate problem for anyone spearfishing from above is refraction at the water surface – the illusion that makes oars look bent. There is some evidence that herons and kingfishers adjust their aim to compensate. The archer fishes of Southeast Asia face the same problem in reverse. They lurk under water and shoot insects sitting on tree branches above the surface, by squirting a sudden jet of water straight at the target. That’s remarkable enough in itself. Even more so, they seem to compensate for refraction, like herons but in the other direction.

Archer fish

Reverse engineering, then, is one method by which we can read the body of an animal. Another method is to compare it with other animals, both related and unrelated. We used this method to some extent in this chapter. When the genetic books of unrelated animals spell the same message about their environment and way of life, we call it convergence. Convergent resemblances can be spectacular, as we’ll see in the next chapter.

5 Common Problem, Common Solution

This book’s main thesis is that every animal is a written description of ancestral worlds. It rests upon the hidden assumption – well, not so very hidden – that natural selection is an immensely powerful force, carving the gene pool into shape, deep down among the smallest details. As we saw in Chapter 2, among the most convincing evidence for the power of natural selection is the perfection of camouflage, the consummate detail with which some animals resemble their (ancestral) environment, or resemble an object in that environment. Equally impressive is the detailed resemblance of an animal to another, unrelated animal, because both have converged on the same way of life. Matt Ridley’s How Innovation Works documents how our greatest human innovations have been hit upon many times independently by inventors in different countries, working in ignorance of each other’s efforts. Just the same is true of evolution by natural selection. This chapter is about convergent evolution as an eloquent witness to the power of natural selection.

Despite appearances, the animal above opposite is not a dog. It is an unrelated marsupial, Thylacinus, the Tasmanian wolf (often called Tasmanian tiger, for no better reason than the stripes). In (what hindsight can now see as) a heinous crime against nature, the Tasmanian government in 1888 put a bounty on thylacine heads. The last one to be seen in the wild was infamously shot in 1930 by someone called Wilf Batty. He must have known it was almost extinct, though he couldn’t have known his victim was the last one. I suppose in 1930 people still didn’t care about such things, a poignant example of what I have called the shifting moral Zeitgeist. A captive specimen called Benjamin survived in Hobart Zoo until 1936. Thylacinus is one of the best-known examples of convergence. It looked like a dog because it had the same way of life as a dog. Its skull especially is so like a dog’s that it is a favourite trick question in zoology student examinations. Such a favourite, indeed, that in my year at Oxford they gave us a real dog skull as a double bluff, assuming that we’d automatically plump for Thylacinus.

Thylacine

Rhinoceros beetle

You’d never mistake this for a rhinoceros. But if you watched two rhinoceros beetles fighting, and then two rhinoceroses, you’d realise that convergent resemblances can vault over many orders of magnitude of body size. A fight is a fight is a fight, and a horn is a handy weapon at any size. The same goes for stag beetles and stags, with a somewhat dramatic embellishment. Stag beetles, but not stags, can lift their rivals high in the air on the prongs of their ‘antlers’.

Paca Chevrotain

On the left is a paca, a rodent from the rainforests of South and Central America. To its right is a chevrotain or ‘mouse deer’, an even-toed ungulate that lives in Old World forests. They look like each other convergently because they have similar ways of life. In Africa, the niche is filled by a small ungulate, in South America, by a large rodent.

Armadillos are South American mammals, armoured against predators. When threatened, they roll up into a ball. The picture to the left shows the three-banded armadillo, which rolls up with especially compact elegance. In one of its illustrative quotations, the Oxford English Dictionary startlingly records that ‘Formerly the armadillo was used in medicine, being swallowed as a pill in its rolled-up state.’ Quite a stretch! Until you realise that ‘armadillo’ in this 1859 quotation referred not to the mammal but to a convergent crustacean, a woodlouse, whose Latin name Armadillidium means ‘little armadillo’. Armadillo itself is a Spanish word, a diminutive of armado or ‘armed’. So Armadillidium is a diminutive of a diminutive, a double diminutive. The commonality of name speaks to the power of convergent evolution. As befits its vernacular name of ‘pill bug’, in its rolled-up state you could indeed swallow a woodlouse whole, although as to its alleged medicinal value, I shall not comment. The mammalian armadillo and the crustacean Armadillidium have converged in their evolution, independently hitting on the same protective habit, albeit at very different sizes, rolling themselves into a ball.

The Latin language has the virtue of condensing into one word what might take three in a language such as English. Latin even has a specialised verb, glomero, meaning ‘I roll into a ball’ (from which we get English words like conglomerate and agglomerate). And Glomeris is the scientific name of yet another animal that rolls itself into a ball, and is also called ‘pill’ in vernacular English. It is not a crustacean but a millipede, the ‘pill millipede’, a member of the order Glomerida. As if that wasn’t enough, two different orders of millipede have independently converged on the roll-up pill body. In addition to the order Glomerida, members of the order Sphaerotheriida (Greek ‘spherical beast’) look just like Glomeris and indeed like Armadillidium, except that they are bigger.

Pill woodlouse Pill millipede

Pill woodlouse (above left) and pill millipede (above right) provide what may be my favourite example – in a strong field – of convergent evolution. They are almost indistinguishable when you see them crawling along, or when they roll into a ball. But the one is a crustacean, related to shrimps and crabs, while the other is a myriapod, related to centipedes. To make sure which is which, I have to turn them over. The crustacean has only one pair of legs per segment, making seven pairs in all. The millipede has many more legs, two pairs per segment. These two deeply different ‘pill’ animals look extremely alike in their surface palimpsest layers because they make their living in the same kind of way and in the same kind of place. Starting from widely separated ancestors they converged, in evolutionary time, on very similar end points.

Giant isopod

The deep palimpsest layers show that one is unmistakeably an isopod crustacean, the other a myriapod. Isopods are an important group of crustaceans, and they include members who grow to alarmingly large size on the sea bottom. We shall refer to them again in the next chapter, which goes to town on crustaceans.

Latin isn’t the only language to impress with its parsimony. The Malay noun pengguling means ‘one who rolls up’ and from it we get the name pangolin. We met the pangolin in the previous chapter. You might mistake it for a large, animated fir cone. It is not closely related to any other mammals but is out in its own order, Pholidota. That name comes from a Greek word meaning ‘covered with scales’, and an alternative English name for pangolin is ‘scaly anteater’. The scales are made of keratin, like hooves and fingernails. They aren’t as hard as the bony armour plates of armadillos.

However, when it comes to glomerising, pangolins perhaps outdo armadillos, pill woodlice, and pill millipedes. According to a report by a biologist on the island of Siberut in Indonesia, a pangolin ran away from him to the top of a steep slope, then formed itself into a ball and rolled down the slope at a speed of about 3 metres per second, twice as fast as a pangolin can run. The witness of this event interpreted the rolling down the hill as a normal response to predation. I reluctantly wonder if it might have been accidental.

There seems to be no doubt as to the effectiveness of rolling up as protection. Lions engage in futile endeavours to penetrate a pangolin’s defence. The pangolin’s enviable insouciance makes one wonder why other hunted animals don’t adopt the same strategy – the tortoise or armadillo strategy – instead of frantically fleeing. I suppose armour is expensive to make, but then so are long, well-muscled, fast-running legs. And it’s not a good argument – though possibly true – that if all antelopes, say, were to jettison speed for armour-plated roll-ups, lions on their side of the evolutionary arms race would come up with a counter-strategy. What might be a better argument is that the first individual antelopes to essay rudimentary, and still inadequate, armour would suffer compared with unencumbered rival antelopes disappearing in a cloud of dust.

Lion thwarted by pangolin

Two of the best-known examples of convergent evolution, too familiar to need detailed illustration yet again, are flight and eyes. The laws of physics allow the possibility of using energy to stay aloft for indefinite periods, and the wing has been independently and convergently invented five times: by insects, pterosaurs, birds, bats, and … human technology.

Eyes have been independently evolved many dozens of times, to nine basic designs. The convergent similarity between the camera, the vertebrate eye, and the cephalopod eye has become almost legendary. Here I’ll just mention that the most revealing difference – the vertebrate retina but not the mollusc one being wired up backwards – is a difference at a deep palimpsest level. This is another way of saying there’s a fundamental difference in their embryology. The vertebrate eye develops mostly as an outgrowth of the brain, while the cephalopod eye develops as an invagination from the outside. That difference lies deep down among the oldest palimpsest layers.

A less familiar example of convergence, compound eyes, have also evolved independently several times. Some bivalve molluscs have a form of compound eye, as do some tube-dwelling annelid worms. These are convergent on each other and on the more highly developed compound eyes of crustaceans, insects, trilobites, and other arthropods. Camera eyes have one lens, which focuses an upside-down image on a retina. The image of a compound eye, if you can call it an image, is the right way up. Think hunting dragonfly, with its pair of large hemispheres, each a cluster of tubes radiating outwards in different directions. Whichever tube sees the target, that’s the direction to fly in order to catch it.

A familiar sight throughout both North and South America is the ‘turkey vulture’. It looks like a vulture, behaves like a vulture, lives the life of a vulture, feeding on carrion that it finds, like a vulture, with a sense of smell keener than is typical among birds. But it is not a vulture. Or rather, it has converged on vulturehood independently of true vultures. But wait, who is to say that Old World vultures are any more ‘true’ than New World turkey vultures? Americans might see the priority differently. Let us call both of them vultures, in enthusiastic recognition of convergent evolution and its impressive power to mislead.

We could settle much the same argument about which are the ‘true’ porcupines. Old World and New World porcupines are both rodents. But within the very large order of rodents, they are not particularly closely related, and they evolved their spiny defences independently. The two pictures show a leopard about to suffer the same punishment from an Old World porcupine as the dog has endured from a New World porcupine.

Contrary to legend, no porcupine shoots its quills. But they do have a quick-release mechanism so that a predator injudicious enough to molest a porcupine comes away with a face full of quills. New World quills prolong the agony by means of backward-facing barbs, which make them difficult to remove. This detail is not shared by the otherwise convergent Old World porcupines but it is convergent, at a much smaller scale, on the barbs of bee stings (American stingers).

Dog after approaching New World porcupine

Leopard approaching Old World porcupine

The sting of a bee, unlike a porcupine quill, is double. There are two barbed blades rubbing against each other with the venom running between them. The two move alternately against each other, sawing their way into the victim. Both are serrated with backward-pointing barbs like those on a New World porcupine quill. The sting is a modified ovipositor, a tube for egg-laying. Porcupine quills are modified hairs. Bees are not the only insects whose ovipositors are serrated. In cicadas (which don’t sting), the serrations, and the bee-like alternate sawing action of the two blades, serve to dig the ovipositor (egg-laying tube) into (for example) a tree, where the eggs are laid.

The sting of a bee, derived from the ovipositor and therefore possessed only by females, is a hypodermic syringe for injecting venom. The hypodermic venom injector has evolved convergently in eleven different animal groups by my count (probably more than once independently in some groups): in insects, scorpions, snakes, lizards, spiders, centipedes, stingrays, stonefish, cone shells, and the hind-leg claw of the male duckbilled platypus. The stinging cells, ‘cnidoblasts’, of jellyfish are miniature harpoons that shoot out on the ends of threads, and inject venom. Among plants, stinging nettles have miniature hypodermic syringes.

The short spikes of hedgehogs are like the long quills of porcupines in being modified hairs. And these too have arisen independently at least three times. There are spiky tenrecs in Madagascar, which look remarkably like hedgehogs although they are not members of the same Order as hedgehogs. They are Afrotheres, related to elephants, aardvarks and dugongs. A third convergence is provided by the spiny anteaters of Australia and New Guinea. Egg-layers, they are as distant from hedgehogs and tenrecs as it is possible to be while still being mammals. They too are covered with spikes, again modified hairs.

We have seen that porcupine quills are a nice example of convergent evolution, independently arisen within the rodents. So-called flying squirrels also arose twice independently in different families of rodents, the true squirrels, and the so-called scaly-tails or anomalures. We know they evolved their gliding habit independently of each other because the closest relatives of both, within the rodents, are not gliders. It’s the same way we know New World and Old World porcupines are convergent, again within the large order of rodents.

Not surprisingly, the gliding skill has evolved convergently in a number of vertebrates. The picture shows four mammal examples, including the two rodents just mentioned. The colugo of the Southeast Asian forests is sometimes called the flying lemur, but it isn’t a lemur (all true lemurs come from Madagascar, though that’s not what makes the colugo a non-lemur) and it doesn’t really fly, although it is perhaps a more accomplished glider than the others in the picture. The sugar glider, although it looks extremely like a flying squirrel, is actually a marsupial from Australasia, one of several ‘flying phalangers’. Despite the startlingly close resemblance between sugar glider and flying squirrel, we know that one is a marsupial and the other a rodent, because of deeper layers of palimpsest. For example, the female phalanger has a pouch, the squirrel a placenta.

1. Colugo

2. Flying squirrel

3. Marsupial sugar glider

4. Anomalure
(Not to scale)

The Australian marsupial fauna provides many other examples of convergent evolution, of which perhaps the most famous is the extinct thylacine or Tasmanian wolf, already mentioned. The picture opposite shows a selection of comparisons between Australian marsupials and their placental equivalents in the rest of the world. These include a pair of anteaters and a pair of ‘mice’. The marsupial ‘mole’ of Australia resembles not only the familiar Eurasian mole but also the ‘golden mole’ of South Africa. Also very mole-like, among the rodents there are the zokors of Asia.

All these ‘moles’ independently adopted the same burrowing way of life, all have adapted their hands into powerful spades and all four look pretty alike. So convincing is the convergence that the golden moles were once classified as moles until it was realised that they belong to a radically different branch of (African) mammals, the Afrotheria, together with elephants, aardvarks, and manatees. Eurasian moles, by contrast, are Laurasiatheres, related to hedgehogs, horses, dogs, bats, and whales. Rodent zokors are related to the blind mole rats, who are thoroughly committed to subterranean life and look like moles, but, as you might expect from a rodent, they dig with their teeth rather than their hands. The family tree, overleaf, showing the affinities of four ‘moles’ is quite surprising.

PLACENTAL MAMMALS MARSUPIALS
Dog Thylacine
European mole Marsupial mole
Mouse Marsupial mouse
Flying squirrel Sugar glider
Tamandua Numbat

Independently evolved ‘moles’

Impressive as are the convergences of Australian marsupials with a whole variety of placental mammals, we mustn’t overlook the exceptions. Kangaroos don’t look very like the African antelopes with whom they share a way of life. They easily might have converged. But they didn’t. They diverged, mostly because they early committed themselves to a different gait for travelling fast. I suppose there was a time when the ancestors of either could have adopted the hopping gait of a kangaroo or the galloping gait of an antelope. Both gaits are fast and efficient, a least after many generations of evolutionary perfecting. But once an evolutionary lineage starts down a path like hopping or galloping, it is difficult to change. ‘Commitment’ really is a thing, in evolution. Once a lineage of mammals had advanced some way along the hopping gait path, any mutant that tried to gallop would have been out-competed. Perhaps its front legs were already too short. Conversely, in a lineage that was somewhat committed to galloping, a mutant that tried to hop would clumsily fail. There’s no rule that says placental mammals couldn’t have taken the kangaroo route. Indeed, there are rodents whose ancestors travelled that path very successfully. A colleague teaching zoology at the University of Nairobi said in a lecture that there were no kangaroos in Africa. This was denied by a student who excitedly claimed to have seen a small one. What he had seen was a springhaas or springhare, a rodent that looks and hops just like a wallaby, complete with foreshortened arms and enlarged, counterbalancing tail.

Springhare

If you could witness an ichthyosaur sporting in Mesozoic waves, you’d be irresistibly reminded of dolphins. A classic case of convergent evolution. On the other hand, your time machine might also present to you a plesiosaur. Far from looking like a dolphin or an ichthyosaur, it doesn’t resemble anything else you ever saw. Ichthyosaurs and plesiosaurs are both descended from land reptiles that went back to the sea. But they started out along, and then became ‘committed to’, alternative paths towards efficient swimming ‘gaits’. Ichthyosaurs rediscovered the ancient side-to-side tailbeat of their fish ancestors. They probably passed through a phase resembling the serpentine wavy motion of Galapagos marine iguanas. Plesiosaurs, instead, relied like sea turtles on their limbs, all four of which became huge flippers. Once committed, both ichthyosaurs and plesiosaurs became increasingly dedicated to their respective evolutionary pathways. And ended up looking extremely different.

Convergently evolved animals are not necessarily contemporaries. In North America in the Eocene period there were mole-like subterranean animals, the Epoicotheriids, with mole-like digging hands, not closely related to any living burrowers but belonging to the pangolin family, Pholidota. I’d be surprised if there weren’t dinosaur ‘moles’, but I must confess I don’t know of any. There were smallish dinosaurs such as Oryctodromeus who dug burrows, but I don’t know of any who could be called convergent on moles.

Then there were the so-called ‘false sabretooths’. We’ve already met Smilodon, the sabretooth ‘tiger’, that large, robust and doubtless frightening cat, which went extinct along with most of the American megafauna at the end of the Pleistocene era, only about 10,000 years ago, when man discovered America. What is less well known is that Smilodon was not the only member of the order Carnivora to evolve such terrifying fangs. Thirty million years earlier, spanning the Oligocene epoch, lived a group called Nimravids. The Nimravids were not cats but an older group within the Carnivora, and they independently evolved stabbing canine teeth just like those of Smilodon. Nimravids are sometimes called false sabretooths. False? Tell that to the early horse Mesohippus and the other terrified victims of those giant daggers. Those ‘false’ sabretooths were living, breathing, snarling, pouncing, probably strong-smelling carnivores, to whose victims they would have seemed anything but false. Another extinct group of ‘false sabretooths’, the Barbourofelids, lived in the Miocene epoch, later than the Nimravids but earlier than Smilodon, and convergently occupying the same niche.

‘False’ sabretooth – Nimravid

Given that the Carnivora have endowed us with three independently evolved sabretooths at different times in geological history, we might even feel a little let down if there were no marsupial sabretooth. And sure enough, South America rose to the occasion.

Marsupial sabretooth – Thylacosmilus

The marsupial Thylacosmilus looks to have been nearly as formidable as Smilodon and the other convergent sabretooths of the Carnivora. On the other hand, it was a bit smaller.

Convergences between animals and human technology can be especially impressive, as we saw in the case of the camera and the vertebrate or octopus eye. Though the discovery was originally thought an outrageous hoax, it is now well accepted that bats hunting by night have their own version – ‘echolocation’ – of what submariners have converged upon under the name ‘sonar’ – using echoes of their own sounds to detect targets. Bats are divided into two main groups, the small Microchiroptera, and the large Megachiroptera (‘fruit bats’ and ‘flying foxes’). Microchiropteran bats ‘see’ with their ears. They have highly sophisticated echolocation, good enough to hunt fast-flying insects. The brain pieces together a detailed model of the world, including insect prey, by a highly sophisticated real-time analysis of the echoes of the bats’ own shrieks. When a bat is cruising, its cries just tick over. But when homing in on a moth, which is likely to be taking evasive action, the sounds come out as a rapid-fire stutter like a machine gun. Since each pulse gives the bat an updated picture of the world, machine-gun repetition enables it to cope with a moth’s high-speed twists and turns. The higher the pitch, the shorter the wavelength by definition. And only short wavelengths can resolve a detailed picture. That means ultrasound: too high, mostly way too high, for us to hear. Young people can hear the lower end of the bat’s frequency range. I nostalgically remember them from my youth as sounding like something between a click and a squeak. We can use instruments called bat detectors, which translate ultrasound into audible clicks.

Slightly less well known is the fact that dolphins and other toothed whales (sperm whales, killer whales) do the same thing, also using ultrasound, and they are up there with bats in sophistication. A more rudimentary form of echolocation has also evolved in shrews, and in cave-nesting birds at least twice independently: in South American oilbirds and Asian cave swiftlets (of bird’s-nest soup fame). The birds don’t use ultrasound: their cries are low enough for us to hear. Some megachiropterans also use a less precise form of echolocation, but they generate their clicks with their wings rather than with the voice. This too must be seen as yet another convergent evolution of echolocation. One genus of Megachiroptera echolocates using the voice, like Microchiroptera but not so skillfully. Interestingly, molecular evidence indicates that one group of Microchiroptera, the Rhinolophids, are more closely related to Megachiroptera than they are to other Microchiroptera. This would seem to suggest that the Rhinolophids evolved their advanced sonar convergently with the other Microchiroptera. Either that or the majority of Megachiroptera lost it.

Small bats and toothed whales are in a class of their own. Their sonar is of such high quality that ‘seeing with their ears’ scarcely exaggerates what they do. Echolocation using ultrasound provides them with a detailed picture of their world, which bears comparison with vision. We know this through experimental testing of bats’ ability to fly fast between thin wires without hitting them. I have even published the speculation (probably untestable, alas) that bats ‘hear in color’. I stubbornly maintain that it’s plausible, because the hues that we perceive are internally generated labels in the brain, whose attachment to particular wavelengths of light is arbitrary. When bat ancestors gave up on eyes, substituting echoes for light, the internal labels for hues would have gone begging, left hanging in the brain with nothing to do. What more natural than to commandeer them as labels for echoes of different quality? I suppose you might call it an early exploitation of what some humans know as ‘synaesthesia’.

In one of modern philosophy’s most cited papers, Thomas Nagel didactically asked, ‘What is it like to be a bat?’ One of his points was that we cannot know. My suggestion is that it is perhaps not so very different from what it’s like to be us, or another visual animal like a swallow. Pursuing a point from Chapter 1, both swallows and bats build up an internal virtual reality model of their world. The fact that swallows use light, while bats use echoes, to update the model from moment to moment is less important than the nature and purpose of the internal model itself. This is likely to be similar in the two cases, because it is used for a similar purpose – navigation in real time between obstacles, and detection of fast-moving prey. Swallows and bats need a very similar internal model, a three-dimensional one, inhabited by moving insect targets. Both are champion insect hunters on the wing, swallows by day and then, at nightfall, the bats take over. If my speculation is right, the similarity may extend to the use of colors to label objects in the model, even in the case of bats ‘seeing with their ears’. Incidentally, each swallow eye has two foveas (regions of special acuity – our eyes have only one, which we use for reading etc.), probably one for distance and one for close vision. Instead of bifocal glasses they have bifocal retinas.

The James Webb Telescope presents us with stunning images of distant nebulae, glowing clouds of red, blue and green. Color is used to represent wavelength of radiation. But the colours in the photographs are false. They use color to represent different wavelengths, but they actually lie in the invisible infrared part of the spectrum. And my point is that the brain’s convention for representing visible light of different wavelengths is just as arbitrary. One is tempted to feel dissatisfied by false colour images such as those from the James Webb Telescope: ‘But is that really what it looks like? Is the telescope telling the truth, or are we being fobbed off with false colours?’ The answer is that we are always being ‘fobbed off’ when we look at anything. If you must talk about false colours, everything you ever see – a rose, a sunset, your lover’s face – is rendered in the brain’s own ‘false’ colours. Those vivid or pastel hues are internal concoctions manufactured by the brain as coded labels for light of different wavelength. The truth lies in the actual wavelength of electromagnetic radiation. The perceived hue is a fiction, whether it is the false colour rendering of a James Webb photograph, or whether it is the labels that the brain generates to tag the wavelengths of light hitting the retina. My conjecture about bats ‘hearing in colour’ makes use of the same idea of internally perceived hues being arbitrary labels.

Doctors use ultrasound to ‘look’ through the body wall of a pregnant woman and see a black-and-white moving image of her developing foetus. The computer uses the ultrasound echoes to piece together an image compatible with our eyes. There is anecdotal evidence that dolphins pay special attention to pregnant women swimming with them. It seems plausible that they are doing with their ears what doctors do with their instruments. If this is so, they could presumably also ‘see’ inside female dolphins and detect which ones are pregnant. Might this skill be useful to male dolphins choosing mates? No point inseminating a female who is already pregnant.

Bats and dolphins evolved their echo-analysing skills independently of each other. In the family tree of mammals, both are enveloped by relatives who don’t do echolocation. A strong convergence, and another powerful demonstration of the power of natural selection. And now for a point that’s especially telling for the genetic book of the dead. There’s a type of protein called prestin, which is intimately involved in mammal hearing. It’s expressed in the cochlea, the snail-shaped hearing organ in the inner ear. As with all biological proteins, the exact sequence of amino acids in prestins is specified by DNA. And, also as is usual, the DNA sequence is not identical in different species. Now here’s the interesting point. If you construct a family tree of resemblance based on the genome as a whole, whales and bats are far apart, as you’d expect: their ancestors have been evolving independently of one another since way back in the age of dinosaurs. If, however, you ignore all genes except the prestin gene – if you construct a tree of resemblance based on prestin sequences alone – something remarkable emerges. Dolphins and small bats cluster together with each other. But small bats don’t cluster together with non-echolo-cating large bats, to whom they are much more closely related. And dolphins don’t cluster together with baleen whales, which, although related to them, don’t echolocate. This suggests that SOF could read the prestin gene of an unknown animal and infer whether it (more precisely its ancestors) lived and hunted in conditions where ultrasonic sonar would be useful: night, dark caves, or other places where eyes are useless, such as the murky water of the Irrawaddy river or the Amazon. I’d like to know whether the two echolocating bird species have bat-like prestins.

This finding on bats and dolphins – the specific resemblance of their prestin genes – strikes me as a pattern for a whole field of future research on the genetic book of the dead. Another example concerns flight surfaces in mammals. Bats fly properly, and marsupial flying phalangers glide, using stretched flaps of skin that catch the air. There’s a specific complex of genes, shared by both bats and marsupial phalangers, which is involved in making the skin flaps. It will be interesting to know whether the same genes are shared by the other gliding mammals that we met earlier in this chapter, so-called flying lemurs and the two groups of rodents that independently evolved the gliding habit.

It would be nice to look in the same kind of way at those animals who have returned from land to water – of which whales are only the most extreme example, along with dugongs and manatees. Do returnees to water have genes in common that are not shared by non-aquatic mammals? What other features do they share? Many aquatic mammals and birds have webbed feet. If our hypothetical SOF is presented with an unknown animal who has webbed feet, she can safely ‘read’ the feet as saying, ‘Water in the recent ancestral environment.’ But that’s obvious. Can we be systematic in our search for less obvious signals of water in the genetic book of the dead? How many other features are diagnostic of aquatic life? Are there some shared genes, such as we saw in the case of prestin for sonar, and skin flaps in bats and sugar gliders? There are probably lots of shared features buried deep in an aquatic animal’s physiology and genome. We have just to find them. We can get a sort of negative clue by looking at genes that were made inactive when terrestrial animals took to the water. Just as humans have a large number of smell genes inactivated (see here), whale genomes contain several inactivated genes, whose inactivation has been interpreted as beneficial when diving to great depths.

We could proceed along the following lines. We borrow from medical science the technique known as GWAS (genome-wide association study). The idea of GWAS is lucidly and conversationally explained by Francis Collins, former Director of the Human Genome Project, as follows:

What you do for a genome-wide association study is find a lot of people who have the disease, a lot of people who don’t, and who are otherwise well matched. And then, searching across the entire genome … you try to find a place where there is a consistent difference. And if you’re successful – and [you’ve] got to be really careful about the statistics here, so that you don’t jump on a lot of false positives – it allows you to zero in on a place in the genome that must be involved in disease risk without having to guess ahead of time what kind of gene you’re going to find.

Substitute ‘lives in water’ for ‘disease’, and ‘species’ for ‘people’, and you have the procedure I am here advocating. Let’s call it ‘Interspecific GWAS’ or IGWAS.

Gather a large number of mammals known to be aquatic. Match each one with a related mammal (the more closely related the better) who lives on land, preferably in dry conditions. We might start with the following list of matched pairs, and the list could be extended.

Water vole Vole
Water shrew Shrew
Desman Mole
Platypus Echidna
Water tenrec Land tenrec
Otter Badger
Seal Wolf
Yapok Opossum
Polar bear Brown bear

To do the IGWAS, you would now look at the genomes of all the animals and try to pinpoint genes shared by the left-hand column and not by the right-hand column. Until all those animals have had their genomes sequenced, and until mathematical techniques are up to the task, proceed with a non-genomic version of IGWAS as follows. Go to work taking measurements of all the animals. Measure all the bones. Weigh the heart, the brain, the kidneys, the lungs, etc., all these weights being expressed relative to total body weight (to correct for absolute size, which is unlikely to be of much interest). By the same token, the bone measurements should be expressed as a proportion of something, just as, in the chelonian example of Chapter 3, the bone lengths were expressed as a proportion of total arm length. Measure the body temperature, blood pressure, the concentrations of particular chemicals in the blood, measure everything you can think of. Some of the measurements might not be continuously varying quantities like centimetres or grams: they might be ‘yes or no’, ‘present or absent’, ‘true or false’.

Feed all the measurements into a computer. And now for the interesting part. We want to maximise the discrimination between aquatic mammals and their terrestrial opposite numbers. We want to discover which measurements discriminate them, pull them apart. At the same time, we want to identify those features that unite all aquatic mammals, however distantly related from each other. Webbing between the toes will presumably emerge as a good discriminator, but we want to find the non-obvious discriminators, biochemical discriminators, ultimately gene discriminators. Where genomic comparisons are concerned, the GWAS methods already developed for medical purposes will serve. A possible graphic method is a version of the triangular plot of tortoise and turtle limbs that we saw in Chapter 3. Another graphic method is drawing pedigrees with genetic convergences coloured in.

A refinement of IGWAS might order species along an ecological dimension. You could, perhaps, string mammals out along a dimension of aquaticness, from whales and dugongs at one extreme to camels, desert foxes, oryxes, and gundis at the other. Seals, otters, yapoks and water voles would be intermediate. Or we might explore a dimension of arboreality. We might conclude that a squirrel is a rat who has moved a measurable distance along the dimension of arboreality. Are moles, golden moles and marsupial moles situated at one extreme on a dimension of fossoriality. Could we distribute birds along a dimension from flightless cormorants and emus who never fly, at one extreme, to albatrosses at the other, or, even more extreme, to swifts, who even copulate on the wing? Having identified such ‘dimensions’, could we look for trends in gene frequency as you move along from one extreme to the other. I can immediately foresee alarming complications. The dimensions would interact with other dimensions, and we’d have to call in experts with mathematical wings to fly through multi-dimensional spaces. My own sadly amateur ventures, limited to three dimensions, and using computer simulation rather than mathematics, are in my book Climbing Mount Improbable, especially the chapter called ‘The Museum of All Shells’.

A group at Carnegie Mellon University in Pittsburgh performed a model example of what I call (they don’t) IGWAS. What they studied was not aquaticness but hairlessness in mammals. Most mammals are hairy, and all had hairy ancestors, but if you survey the mammal family tree you notice that hairlessness pops up sporadically among unrelated mammals. See the diagram, which shows a few of the sixty-two species whose genomes were examined.

Sporadic distribution of hair loss among mammals

Whales, manatees, pigs, walruses, naked mole rats, and humans have all lost their hair more or less completely (yellow names in the diagram). And, which is important, independently of each other in many cases. We can tell this by looking at the hairy closer relatives from among whom they sprang. You remember that echolocating bats and echolocating whales had something else in common – their prestin gene. Do the genomes of the naked species have a gene for hairlessness that they share with each other? The answer is literally no. But only literally. The truth is equally interesting. It turns out that we and other naked species still retain the ancestral genes that make hairs. But the genes have been disabled. And disabled in different ways. What is convergent is the fact of being disabled, but the details are not shared. Incidentally, we again have here a problem for creationists. If an intelligent designer wished to make a naked animal, why would he equip it with genes for making hair and then disable them? Chapter 3 mentions the similar example of the human sense of smell: the olfactory sense genes of our mammal ancestors still lurk within us, but they have been turned off.

One of my favourite examples of convergent evolution is that of weakly electric fish. Two separate groups of fish, Gymnotids in South America and Gymnarchids in Africa, have independently and convergently discovered how to generate electric fields. They have sense organs all along the sides of the body, which can detect distortions that objects in the environment cause in the electric fields. It is a sense of which we can have no awareness. Both groups of fish use it in murky water where vision is impossible. There’s just one difficulty. The normal undulating movements typical of fish fatally compromise the analysis of the electric fields measured along the body. It is necessary for the fish’s body to maintain a rigid stance. But if their body is rigid, how do they swim? By means of a single longitudinal fin traversing the whole length of the body. The body itself, with its row of electrical sensors, stays rigid, while the single longitudinal fin alone performs the sinuous movements typical of fish locomotion. But there’s one revealing difference. In the South American fish, the longitudinal fin runs along the ventral surface, while in the African fish it runs along the back. In both groups of fish, the undulating waves can be thrown into reverse: the fish swim backwards and forwards with apparently equal facility.

The ‘duck bill’ of the platypus and the huge, flat ‘paddle’ sticking out of the front end of the paddlefish (Polyodontidae) are both covered with electrical sensors, convergently and independently evolved. In this case the electric fields they pick up are generated, inadvertently, by the muscles of their prey. There is a long-extinct trilobite that also had a huge paddle-like appendage like that of the paddlefish. Its paddle was studded with what look like sense organs, and it seems probable that this represents yet another convergence.

A ringed plover’s eggs and chicks lie out on the ground, defenceless except for their camouflage. A fox approaches. The parent is much too small to put up any kind of resistance. So it does an astonishing thing. It attempts to lure the predator away from the nest by offering itself as a bigger prize than the nest. It limps away from the nest, pretending to have a broken wing, simulating easy prey. It flutters pathetically on the ground, wings outstretched, sometimes with one wing stuck incongruously in the air. There’s no assumption that it knows what it is doing or why it is doing it (although it may). The minimal assumption we need make is that natural selection has favoured ancestors whose brains were genetically wired up to perform the distraction display, and perfect it over generations. Now, why tell the story in this chapter on convergent evolution? It’s because the broken wing display has arisen not once but many times independently in different families of birds. The diagram on the following page is a pedigree of birds, wrapped around in a circle so it fits on the page. Birds who perform the broken wing display are coloured in red, those who don’t in blue. You can see that the habit is distributed sporadically around the pedigree, a lovely example of convergent evolution.

My final example of convergence will lead us into the next chapter. More than 200 species, belonging to thirty-six different fish families, practise the ‘cleaner’ trade. They remove surface parasites and damaged scales from the bodies of larger ‘client’ fish. Each individual cleaner fish has its own cleaning station, and its own loyal clients who return repeatedly to the same ‘barber’s shop’ on the reef. This site tenacity is important in keeping the benefit exclusively mutual: the cleaner eats the parasites and worn-out scales from the skin of particular client fish, and the client refrains from eating its particular benefactor. Without individual site fidelity, and therefore repeat visits, clients would have no incentive to refrain from eating the cleaner – after being cleaned, of course. Sparing a cleaner would benefit fish in general, including competitors of the sparer. Natural selection doesn’t ‘care’ about general benefit. Quite the contrary. Natural selection cares only about benefit to the individual and its close relations, at the expense of competitors. A bond of individual loyalty between particular cleaner and particular client therefore really matters, and it is achieved by site tenacity. Some cleaners even venture inside the mouth of a client to pick its teeth – and survive to repeat the service on the client’s next visit. Cleaner fish advertise their trade and secure their safety by a characteristic dance, often enhanced by a striped pattern – the fishy equivalent of the striped pole insignia of a human barber’s shop. This constitutes a safe-conduct pass.

Broken wing display

The remarkable ‘broken wing display’ crops up again and again in different bird groups (shown in red). Striking testimony to the power of natural selection.

The interesting point for this chapter is that the cleaner habit has evolved many times convergently, not only many times independently in fish but many times in shrimps too. As before, the client fish abide by the covenant and refrain from eating their cleaner shrimps, in just the same respectful way as for cleaner fish. In many cases, cleaner shrimps sport a similar stripe, the ‘barber’s pole’ insignia. It is to the benefit of all that all the ‘barber’s pole’ badges should look similar.

When swimming in the sea, you would be well advised to steer clear of the sharp-toothed jaws of the moray eel. Yet here is a shrimp, calmly picking its teeth. Note, yet again, the red stripe or ‘barber’s shop pole’, telling the moray, ‘Don’t eat me, I’m your special cleaner. You and I have a mutual relationship. You’ll need me again.’ Does the shrimp feel fear as it trustingly enters those formidable jaws? Does some equivalent of ‘trust’ pulsate through its cephalic ganglion? I doubt it, but not everyone would agree. Do you?

Moray eel and cleaner shrimp

Not only has the habit evolved independently – convergently – in fish and shrimps. It has evolved convergently many times within shrimps, just as it has many times within fish. Even within one family of shrimps, the Palaemonidae, the cleaner trade is practised by sixteen different species, having evolved within the Palaemonidae five times independently. Here’s how we know the five evolutions were independent of each other. The method again serves as a model for how we ever know instances of evolution are independent of each other. Look at the family tree of the Palaemonidae, constructed with the aid of molecular genetic sequencing. It contains sixty-eight species of shrimp. Those species that practise the fish-cleaning trade have a little fish symbol by them. There are sixteen species of palaemonid cleaner shrimps. But many of the sixteen cannot be said to have evolved the habit independently. For example, the three species of Urocardella are all cleaners, but the picture warns us against counting them as independent: they probably inherited it from their common ancestor.

Six members of the genus Ancyclomenes are cleaners, but again we must make the conservative assumption that they inherited it from their common ancestor – and that the habit has been lost in A.aqabai, A.kuboi, A.luteomaculatus, and A.venustus. Using this conservative approach, we conclude that the cleaning habit evolved independently in five palaemonid genera but not in all species of those five genera. And the story doesn’t end with the Palaemonidae. Two other families of shrimps not shown in the diagram, the Hippolytidae (see moray eel picture above) and the Stenopodidae, also have many species of cleaner.

The Cambridge palaeontologist Simon Conway Morris has treated convergent evolution more vividly and thoroughly than anyone else. In his wittily written Life’s Solution he points out that convergent evolution is commonly sold as amazing, astounding, uncanny, etc., but there is no need for this. Far from being especially amazing, it’s exactly what we should expect of natural selection. Convergent evolution is, nevertheless, great for confounding armchair philosophers and others who underestimate the power of natural selection and the magnificence of its productions. In addition to 110 densely packed pages of massively researched endnotes and references to the biological literature, Life’s Solution has three indexes: a general index, a name index and – this must surely be unique – a ‘convergences index’. It runs to five double-column pages and around 2,000 examples of convergence. Of course, not all of them are as impressive as the pillbugs, the moles, the gliders, the sabretooths, or the fish-cleaners but even so …

Independent evolution of cleaners

Convergent evolution can be so impressive, it makes you wonder how we know the resemblance really is convergent. That’s the power of natural selection, the immense yet subtle power that underpins the whole idea of the genetic book of the dead. Pill woodlouse and pill millipede, alike as two pills, how do we know one is a crustacean, the other a distant myriapod? There are numerous tell-tale clues. The deep layers of the palimpsest are never completely over-written. The glyphs of history keep breaking through. And, if all else fails, molecular genetics cannot be denied.

Convergence of animals with widely separated histories is one manifestation of the power of selection to write layer upon layer of the palimpsest. Another is its converse: evolutionary divergence from a common historic origin, natural selection seizing a basic design and moulding and twisting it into an often bizarre range of functionally important shapes. The next chapter goes there.

6 Variations on a Theme

As we saw in Chapter 3, molecular comparison conclusively shows that whales are located deep within the even-toed ungulates, the artiodactyls. By ‘located deep within’, I mean something very specific and surprising. It’s worth repeating. We’re talking about much more than just a shared ancestor, with the whales going one way, and the artiodactyls the other. That would not have been surprising. ‘Deep within’ means that some artiodactyls (hippos) share a more recent ancestor with whales than they share with the rest of the artiodactyls whom they much more strongly resemble. This has been known for more than twenty years, but I still find it almost incredible, so overwhelming is the submersion under surface layers of palimpsest. Of course, this doesn’t mean whales’ ancestors were hippos or even resembled hippos. But whales are hippos’ closest living relatives.

What is it that’s so special about whales, so special that new writings in their book of the dead so comprehensively obliterated almost every trace of that earlier world, of grazing prairies and galloping feet, which must lie buried far down in the palimpsest? How did the whales manage to diverge so completely from the rest of the artiodactyls? How were they able so comprehensively to escape their artiodactyl heritage?

The answer probably lies in that word ‘escape’. Cattle, pigs, antelopes, sheep, deer, giraffes, and camels are relentlessly disciplined by gravity. Even hippos spend significant amounts of time on land, and indeed can accelerate their ungainly bulk to an alarming speed. The land-dwelling artiodactyl ancestors of whales had to submit to gravity. In order to move, land mammals must have legs stout enough to bear their weight. A land animal as big as a blue whale would need legs half way to Stonehenge pillars, and it’d have a hard time surviving, with heart and lungs smothered suffocatingly by the body’s own weight. But in the sea, whales shook off gravity’s tyranny. The density of a mammal body is approximately that of water. Gravity never goes away, but buoyancy tames it. When their artiodactyl ancestors took to the water, whales shed the need for leggy support, and the fossil evidence beautifully lays out the intermediate stages.

A major milestone marks the point where, like dugongs and manatees but unlike seals and turtles, whales gave up returning to land even to reproduce. That was the final release from gravity, as buoyancy totally took over. Whales were free to grow to prodigious size, literally insupportable size. A whale is what happens when you take an ungulate, cut it adrift from the land and liberate it from gravity. All manner of other modifications followed in the wake of the great emancipation, and they richly defaced the ancient palimpsest. Forelegs became flippers, hind limbs disappeared inside and shrank to tiny relics, the nostrils moved to the top of the head, two massive horizontal flukes – lobes stiffened not by bone but by dense fibrous tissue – sprouted sideways to form the propulsive organ. Numerous profound alterations of physiology and biochemistry allowed deep diving, and hugely prolonged intervals between breaths. Whales switched from a (presumed) herbivorous diet to one dominated by fish, squid, and – in the case of the baleen whales – filtered shoals of krill in lavish quantities.

Fish, too, are allowed by buoyancy to adopt bizarre shapes (see pictures here), which gravity on land would forbid. In the case of teleost (bony as opposed to cartilaginous) fish, the buoyancy is perfect, owing to that exquisite device, the swim-bladder, buried deep within the body. By manipulating the amount of gas in the swim-bladder, the fish is able to adjust its specific gravity and achieve perfect equilibrium at whatever happens to be its preferred depth at any time.

I think that’s what makes a home aquarium such a restful furnishing for a room. You can dream of drifting effortlessly through life, as a fish drifts through water in perpetual equilibrium. And it is the same hydrostatic equilibrium that frees fish to assume such an extravaganza of shapes. The leafy sea dragon trails clouds of glorious fronds, and you feel you could almost identify the species of wrack that those fronds mimic. You must peer deep between them to discern that they are parts of a fish: a modified sea horse – which is itself a distorted caricature of the ‘standard fish’ design of more familiar cousins such as trout and mackerel.

Most predatory fish actively seek and pursue prey, and this expends a considerable proportion of the energy obtained from the food caught. Angler fish, of which there are several hundred species sitting on the sea bottom, save energy by luring prey to come to them. The anglers themselves are superbly camouflaged. A fishing rod (modified fin spine) sprouts from the head. At its tip is a lure or bait, which the angler fish waves around in a tempting manner. Unsuspecting prey are attracted to the bait, whereupon the angler opens its enormous mouth and engulfs the prey. Different species of angler favour different baits. With some it resembles a worm, and it jiggles about plausibly as the angler waves its rod. Angler fish of the dark deep sea harbour luminescent bacteria in the tip of the rod. The resultant glowing lure is very attractive to other deep-sea fish, and invertebrate prey such as shrimps. Convergently, snapping turtles rest with their mouth open, wiggling their tongue like a worm, as bait for unsuspecting prey fish.

Sea horses and angler fish are extreme exponents of the adaptive radiation of teleost fish. They also, in their different ways, sport unusual sex lives. The sex life of angler fish is nothing short of bizarre. Everything I said in the previous paragraph applies to female angler fish only. The males are tiny ‘dwarf males’, hundreds of times smaller than females. A female releases a chemical, which attracts a dwarf male. He sinks his jaws into her body, then digests his own front end, which becomes buried in the female’s body. He becomes no more than a small protuberance on her, housing male gonads from which she extracts sperm when she needs to. It is as though she becomes a hermaphrodite, except that ‘her’ testes possess a different genotype from her own, having invaded from outside in the form of the dwarf male locked into her skin.

Lionfish

Weedy sea dragon

Marlin

Leafy sea dragon

Trumpet fish

Sunfish

Gulper eel

Seahorse

Puffer

Sloane’s viper fish

Ghost pipefish

Angler fish

Freed by buoyancy from the constraints of gravity, fish were able to evolve an astonishing variety of shapes

Many species of fish are livebearers – females get pregnant like mammals and give birth to live young. Sea horses are unusual in that it’s the male who gets pregnant, carries the young in a belly pouch, and eventually gives birth to them. Do you wonder, then, how we define him as male? Throughout the animal and plant kingdoms, the male sex is easily defined as the one that produces lots of small gametes, sperms, as opposed to fewer, larger, eggs.

Adaptive radiation means evolutionary divergence fanning out from a single origin. It is seen in an especially dramatic way when new territory suddenly becomes available. When, 66 million years ago, a celestial catastrophe cleared 76 per cent of all species from the planet, the stage was wide open for mammalian understudies to step into the dinosaurs’ vacated costumes. The subsequent adaptive radiation of mammals was spectacular. From the small, burrowing creatures who survived the devastation, probably by hibernating in safe little underground bunkers, a comprehensive range of descendants, ranging hugely in size and habit, appeared in surprisingly quick time.

On a smaller scale and a much shorter timescale, a volcanic island can spring up suddenly (suddenly by the standards of geological time) through volcanic upwelling from the bottom of the sea. For animals and plants it is virgin territory, barren, untenanted, open to exploitation afresh. Slowly (by the standards of a human lifetime) the volcanic rock crumbles and starts to make soil. Seeds fly in on the wind, or are transported by birds and fertilised with their droppings. From being a black lava desert, the island greens. Winged insects waft in, and tiny spiders parachuting under floating threads of silk. Migrating birds are blown off course, land for recuperation, stay, reproduce; their descendants evolve. Fragments of mangrove drift in from the mainland, and the occasional tree uprooted by a hurricane. Such freak raftings carry stowaways – iguanas, for instance. Step by accidental step, the island is colonised. And then descendants of the colonists evolve, rapidly by geological standards, diversifying to fill the various empty niches. Diversification is especially rich in archipelagos, where driftings between islands happen more frequently than from the mainland to the archipelago. Galapagos and Hawaii are textbook examples.

A volcano is not the only way new virgin territory for evolution can open up. A new lake can do it too. Lake Victoria, largest lake in the tropics and larger than all but one of the American Great Lakes, is extremely young. Estimates range from 100,000 years to a carbon-dated figure of only 12,400 years. The discrepancy is easily explained. Geological evidence shows that the lake basin formed about 100,000 years ago, but the lake itself has dried up completely and refilled several times. The figure of 12,400 years represents the age of the latest refilling, and therefore the age of the current lake in its large geography. And now, here is the astonishing fact.

There are about 400 species of Cichlid (pronounced ‘sicklid’) fish in Lake Victoria, and they are all descended from probably as few as two founder lineages that arrived from rivers within the short time that the lake has existed. The same thing happened earlier in the other great lakes of Africa, the much deeper Lakes Tanganyika and Malawi. Each of the three lakes has its own unique radiation of Cichlid fishes, different from, but parallel to, the others.

Nimbochromis livingstonii Lamprologus lemairii

Here’s a slightly macabre example of this parallelism. In Lake Malawi (where I spent my earliest bucket-and-spade beach holidays), there is a predatory fish called Nimbochromis livingstonii. It lies on the bottom of the lake pretending to be dead. It even has light and dark blotches all over its body, giving the appearance of decomposition. Deceived into boldness, small fish approach to nibble at the corpse, whereupon the ‘corpse’ suddenly springs into action and devours the small fish. This hunting technique was thought to be unique in the animal kingdom. But then exactly the same trick was discovered in Lake Tanganyika, the other great Rift Valley lake. Another Cichlid fish, Lamprologus lemairii, has independently, convergently, hit upon the same death-shamming trick. And it has the same blotchy appearance, suggestive of death and decay. In both lakes, adaptive radiation independently hit upon the same somewhat gruesome way of getting food. Along with dozens of other ways of life, independently discovered in parallel in the two similar lakes.

My old friend, the late George Barlow, vividly described the three great lakes of Africa as Cichlid factories. His book, The Cichlid Fishes, makes fascinating reading. The Cichlids have so much to teach us about evolution in general and adaptive radiation in particular. Each of the three great lakes has its own, independently evolved radiation of several hundred Cichlid species. All three lakes tell the same story of explosive Cichlid evolution, yet the three histories unfolded entirely independently. All three began with a founder population of very few species. Each of the three followed a parallel evolutionary course of massive radiation into a huge variety of ‘trades’ or ways of life – the same great range of trades being independently discovered in all three lakes.

You might think the oldest lake would have the most species. After all, it’s had the longest time to evolve them. But no. Lake Tanganyika, easily the oldest at about 6 million years, has only (only!) 300 species. Victoria, a baby of only 100,000 years, has about 400 species. Lake Malawi, intermediate in age at between 1 and 2 million years, has the largest species count, probably around 500, although some estimates exceed 1,000. Moreover, the size of the radiation seems unrelated to the number of founder species. The huge radiations in Victoria and Malawi trace back substantially to only one lineage of Cichlids, the Haplochromines. The relatively venerable Lake Tanganyika’s approximately 300 species appear to stem from twelve different founder lineages, of which the Haplochromines are only one.

What all this suggests is that young Lake Victoria’s dramatic explosion of species is the model for all three lakes. All three probably took only tens of thousands of years to generate several hundred species. After the explosive beginning, the typical pattern is probably to stabilise the number, or it may even decrease, such that the final number of species is not correlated with the age of the lake, or with the number of founder species. The Cichlids of Lake Victoria show how fast evolution can proceed when it dons its running shoes. We cannot expect that such an explosive rate is typical of animals in general. Think of it as an upper bound.

And when you work it out, even Lake Victoria’s feat is not quite so surprising as first appears. Although the lake in its present form is only some 12,400 years old, I’ve already mentioned that a lake filled the same shallow basin 100,000 years ago. In the intervening years it has largely dried up several times and refilled, the latest such episode occuring with the refill of 12,400 years ago. Lake Malawi shows how dramatically these lake levels can fall and rise. Between the fourteenth and nineteenth centuries, the water level was more than 100 metres lower than today. Unlike Lake Victoria, however, it came nowhere close to drying up altogether. In its Rift Valley chasm, it is nearly ten times as deep as Victoria. In shallow Lake Victoria, as each drying cycle occurred, the lowering of the water level would have left numerous ponds and small lakes, these becoming reunited at the next iteration of the refill cycle. The temporary isolation of the fish trapped in the residual ponds and small lakes enabled them to evolve separately – no gene flow between ponds. At the next refill of the cycle, they were reunited, but by then they would have drifted apart genetically, too far to interbreed with those who had been stranded in other ponds. If this is correct, the drying/refilling alternation provided ideal conditions for speciation (the technical term for the evolutionary origin of a new species, by splitting of an existing species). And it means that, from an evolutionary point of view, we could regard the true age of Lake Victoria as 100,000 years, not 12,400. Still very young.

Given 100,000 years to play with, what sort of interval between speciation events would yield 400 species, starting, hypothetically, with a single founding species? Is 100,000 years long enough? Here’s how a mathematician might reason: a back-of-the-envelope calculation, making conservative assumptions throughout, to be on the safe side. There are two extremes, two bounds bracketing the possible rate of speciation, depending on the pattern of splitting. The most prolific pattern (an improbable extreme) is where every species splits into two, yielding two daughter species which, in turn, split into two. This pattern yields exponential growth of species numbers. It would take only between eight and nine speciation cycles to yield 400 species (2⁹ is 512). An interval of 11,000 years between speciations would do the trick. The least prolific pattern (also an improbable extreme) is where the founder species ‘stays put’ and successively throws off one daughter species after another. This would require far more speciation events, about 400, to reach the tally of 400 species: a speciation event every 250 years. How to estimate a realistic intermediate between these two extremes? A simple average (arithmetic mean) gives an estimate of between 5,000 and 6,000 years between speciations, which is enough time. Our mathematician, however, might be more cautious and recommend the geometric mean (multiply the two numbers together and take the square root). One reason to prefer it is that it captures the stronger influence of an occasional very bad year. This more conservative estimate asks for an interval of about 1,600 years between speciations. Somewhere between the two estimates is plausible, but let’s bend over backwards to be cautious and use the estimate of 1,600 years. Cichlid fish typically reach sexual maturity in under two years, so let’s again be conservative and assume a two-year generation time. Then we’d need about 800 fish generations between speciation events, in order to generate 400 species in 100,000 years. Eight hundred generations is enough for plenty of evolutionary change.

How do I know 800 generations is plenty of time? Again, mathematicians can do back-of-the-envelope calculations to assist intuition. One calculation that I like was done by the American botanist Ledyard Stebbins. Imagine that natural selection is driving mouse-sized animals towards larger size. Stebbins, too, bent over backwards to be conservative, by assuming a very weak selection pressure, so weak that it could not be detected by scientists working in the field, trapping mice and measuring them. In other words, natural selection in favour of larger size is assumed to exist but to be so slight and subtle that it is below the threshold of detectability by field researchers. If the same undetectably weak selection pressure were maintained consistently, how long would it take for the mice to evolve to the size of an elephant? The answer Stebbins calculated was about 20,000 generations, the blink of an eye by geological standards. Admittedly, it’s a lot more than our 800 generations, but we weren’t talking about anything so grandiose as mice turning into elephants. We were only talking about Cichlid fishes changing enough to be incapable of interbreeding with other species. Moreover, Stebbins’s assumptions, like ours, were conservative. He assumed a selection pressure so weak that you couldn’t measure it. Selection pressures have actually been measured in the wild, for example on butterflies. Not only are they easily detectable, they are orders of magnitude stronger than the sub-threshold, under-the-radar pressure assumed by Stebbins. I conclude that 100,000 years is a comfortably long time in Cichlid evolution, easily enough time for an ancestral species to diversify into 400 separate species. That’s fortunate, because it happened!

Incidentally, Stebbins’s calculation is an instructive antidote to sceptics who think geological time is not long enough to accommodate the amount of evolutionary change we observe. His 20,000 generations to wreak the change from mouse to elephant is so short that it would ordinarily not be measurable by the dating methods of geologists. In other words, a selection pressure too weak to be detectable by field geneticists is capable of yielding major evolutionary change so fast that it could look instantaneous to geologists.

The crustaceans are another great group of mostly aquatic animals with spectacular evolutionary radiations, from much more ancient common sources. In this case, it is the modification of a shared anatomy that impresses. Rigid skeletons permit movement only if built up of hinged units, bones in the case of vertebrates, armoured tubes and casings in the case of crustaceans and other arthropods. Because these bones and tubes are rigid and articulated, there is a finite number of them, each one a unit that can be named and recognised across species. The fact that all mammals have almost the same repertoire of nameable bones (206 in humans) makes it easy to recognise evolved differences as distortions of each named bone: ulna, femur, clavicle, etc. The same is true of crustacean skeletal elements, with the bonus that, unlike bones, they are externally visible.

The great Scottish zoologist D’Arcy Thompson took six species of crab and looked at just one unit of the skeleton, the main portion of the body armour, the carapace, of each.

Geryon Corystes
Chorinus Scyramathia
Lupa Paralomis

He arbitrarily chose one of the six, it happened to be Geryon (far left), and drew it on a rectangular grid. He then showed that he could approximate the shape of each of the other five, simply by distorting the grid in a mathematically lawful way. Think of it as drawing one crab on a sheet of stretched rubber, then distorting the rubber sheet in mathematically specified directions to simulate five other shapes. These distortions are not evolutionary changes. The six species are all contemporary. No one species is ancestral to any other, they share ancestors who are no longer with us. But they show how easily changes in embryonic development (altered gradients of growth rates, for instance) can yield an illuminating variety of crustacean form with respect to one part of the exoskeleton. D’Arcy Thompson did the same thing with many other skeletal elements including human and other ape skulls.

Of course, bodies are not drawn on anything equivalent to stretched rubber. Each individual develops afresh from a fertilised egg. But changes in growth rates, of each part of the developing embryo, can end up looking like the distortions of stretched rubber. Julian Huxley applied D’Arcy Thompson’s method to the relative growth of different body parts in the developing embryo. Such embryological changes are under genetic control, and evolutionary changes in gene frequencies generate evolutionary variety, again looking like stretched rubber. And of course it isn’t just the carapace. The same kind of evolutionary distortion is seen in all the elements of the crustacean body (and the bodies of all animals but often less obviously). You can see how the same parts are present in each specimen, just emphasised to different degrees. The differential emphasis is achieved by different growth rates in different parts of the embryo.

Crustaceans are exceedingly numerous. With characteristic wit, the Australian ecologist Robert May said, ‘To a first approximation, all species are insects,’ yet it has been calculated that there are more individual copepods (crustacean water fleas) than there are individual insects in the world. The painting opposite, by the zoologist Ernst Haeckel (1834–1919), Darwin’s leading champion in Germany, is a dazzling display of the anatomical versatility of the copepods.

Wondrous copepods from Ernst Haeckel’s Art Forms in Nature

Mantis shrimp

Here’s a typical adult crustacean, a mantis shrimp. Well, mantis shrimps (Stomatopods) are typical with respect to their body plan, which, together with their colourful beauty, is why I’ve chosen one for this purpose. But they include some formidable customers who are far from typical in one alarming respect. They pack a punch, literally. With vicious blows from club-like claws, they smash mollusc shells in nature, while in captivity the blow from a large smasher, travelling as fast as a small-calibre rifle bullet, will shatter the glass of your aquarium tank. The energy released is so great that the water boils locally and there is a flash of light. You don’t want to mess with a mantis shrimp, but they’re a wonderful example of the diverse modification of the basic crustacean body plan.

Mantis shrimps are not to be confused with the (literally) stunning ‘pistol shrimps’ or ‘snapping shrimps’ (Alpheidae), who in their way also beautifully illustrate the diversity of crustacea. These have one enlarged claw, somewhat bigger than the other. They snap the enlarged claw with terrific force, generating a shock wave – a violent pulse of extreme high pressure immediately followed by extreme low pressure in its wake. The shock wave stuns or kills prey. The noise is among the loudest heard in the sea, comparable to the bellows and squeaks of large whales. Muscles are too slow to generate high-speed movement such as the snapping claws of pistol shrimps or the punching clubs of mantis shrimps (or indeed the jump of a flea). They store energy in an elastic material or spring, and then suddenly release it – the catapult or bow-and-arrow principle.

Crustacea dazzle with diversity. But it is a constrained diversity. To repeat the point, which is the reason I chose crustaceans for this chapter, you can in every species easily recognise the same parts. They are connected to each other in the same order, while differing hugely in shape and size. The first thing you notice about the basic crustacean body plan is that it is segmented. The segments are arrayed from front to rear like a goods train with trucks (American freight train with wagons or cars). The segmentation of centipedes and millipedes is even more obviously train-like because most of their segments are the same. A mantis shrimp or a lobster is like a train whose trucks are the same in a few respects (wheels, bogies, and coupling hooks, say) but different in other ways (cattle wagons, milk tanks, timber carriers, etc.).

Crustaceans in their evolution achieve astonishing variety by changing the trucks over evolutionary time, while never losing sight of the train. Varied as they are, the segments of a mantis shrimp are still visibly a train built to the same pattern as any other crustacean, each bearing a pair of limbs that fork at the tip. The claw of a crab or lobster is a conspicuous example of the fork. As you move from front to rear of the animal, the paired appendages consist of antennae, various kinds of mouth parts, claws, then four pairs of legs. Move backwards further, and the segments of a lobster or mantis shrimp’s abdomen each have small, jointed appendages called swim-merets underneath, on both sides, each often ending in a little paddle. In a lobster or, even more so, a crab, the segments of the thorax and head are hidden beneath a shared cover, the carapace. But their segmentation is betrayed by the appendages, walking legs in the case of four of them, antennae, large claws and mouth parts at the front end. The rear end of the abdomen, the guard’s van (American caboose) of the train, has a special pair of flattened appendages called uropods. When I first visited Australia, I was intrigued to see, laid out in a buffet, what they call bay bugs. These have what look like uro-pods at the front end as well as the rear, a sort of crustacean version of Doctor Dolittle’s Pushmi-Pullyu, but with two rear ends instead of two heads. This is not all that surprising, as we shall now see.

The segmentation of arthropods and vertebrates was once thought to have evolved independently. No longer, and thereby hangs a fascinating tale, a tale that is true too of other segmented animals such as annelid worms. Just as the segments are arrayed in series from front to rear like a train, so the genes controlling the segments are arrayed in series along the length of a chromosome. This revolutionary discovery overturned the whole attitude to zoology that I had learned as a student, and I find it wonderful. To pursue the railway analogy, there’s a train of gene trucks in the chromosome to parallel the train of segment trucks in the body.

It’s been known for more than a century that mutant fruit flies can have a leg growing where an antenna ought to be. That mutation is called antennapedia for obvious reasons, and it breeds true. There are other dramatic mutations in fruit flies, for example bithorax, which has four wings like normal insects, instead of the two-winged pattern that gives flies their name, Diptera. These major mutations are all explained by changes in the sequentially arranged genes in the ‘chromosome train’. When I first saw that bay bug in a Great Barrier Reef restaurant, I immediately wondered whether bay bugs had originally evolved by a mutation similar to antennapedia, in this case duplicating uropods at the front end of the animal.

This kind of effect has been neatly shown by Nipam Patel and his colleagues. They work on a marine crustacean called Parhyale, belonging to the Amphipod order. I remember being fascinated by the hundreds of small amphipods in the cold stream on our farm, in the course of which my parents dug out a pool for us to swim. The swarms of exuberantly jumping ‘sandhoppers’ that we so often encounter on beaches are another familiar example. We met iso-pods, in the flattened shape of ‘pill bugs’, in the previous chapter. Amphipods are different. They are flattened left to right rather than back to belly. And, in Parahyale and many others, their appendages are far from all the same. Some of their legs point in what seems to be the ‘wrong’ direction. Three of the ‘trucks’ appear to be ‘coupled’ up backwards (red shading in left picture on the next page). Patel and his colleagues, by means of ingenious manipulations of the genes controlling the trucks of the train, were able to change the three reversed segments, coupling the trucks so that all the limbs faced in the same direction (right picture). The way this works is that the three backwards segments are replaced by duplicates of the three segments in front of them. The Patel group achieved equally interesting manipulations of other segments but the work, though fascinatingly ingenious, would take us too far afield.

ILLUSTRATION: KALLIOPI MONOYIOS

We vertebrates too are segmented, but in a different way. This is obvious in fish, and it remains pretty clear in our ribs and vertebral column. Snakes carry it to an extreme – sort of like centipedes but with internal ribs instead of external legs. We now understand the embryological mechanism whereby segments are multiplied up. Surprisingly, actually rather wonderfully, it has turned out to be pretty much the same in vertebrates and arthropods. Hence, we understand how it is that different snake species evolve radically different numbers of vertebrae ranging from around 100 to more than 400 – compared to our thirty-three. Vertebrae, whether or not they sprout ribs, all have similar coupling mechanisms to the neighbouring ‘trucks of the train’, and all have similar blood vessels, and sensory and motor nerves, connected to the spinal cord, which passes through them. As I just mentioned, one of the most revolutionary discoveries of recent zoology is that the embryological mechanisms underlying segmentation in arthropods and vertebrates, deep in the lower levels of their palimpsests, are tantalisingly similar. Once again, the truly beautiful fact is that in both groups, genes are laid out along chromosomes in the same order as the segments that they influence.

Although crustaceans all follow the segmented plan boldly written in the depths of the palimpsest, the ‘trucks’ vary so extravagantly that the simile of the train can become rather strained. Sometimes many of the segments join together to form a singular body, as in crabs. Often the appendages sprouting from the segments vary spectacularly, ranging from the formidable claws near the front of a lobster, or the punching clubs of a mantis shrimp, to the swimmerets arrayed under the abdomen. Crustaceans range in size from ‘water fleas’ at less than 1 millimetre to the Japanese spider crab Macrocheira with a limb span that can reach 3 metres (10 feet). Frightening as this creature might be to meet, it is harmless to humans. Imagine the handshake of a lobster, or the punch of a mantis shrimp, that size!

Japanese spider crab

Crabs can be thought of as lobsters with a truncated tail (abdomen) curled up under the main body, so you don’t see it unless you upend the animal. The crab abdomen bears a passing resemblance to the ape/human coccyx, both being made of a handful of segments from an ancestral tail squashed up. Hermit crabs are strictly not crabs, but belong in their own group (Anomura) within the crustacea. Their abdomen is not squashed up underneath them as in true crabs, but soft and curled round to one side, to fit the discarded mollusc shells that hermit crabs inhabit. The process by which they choose their shells, and compete with one another for favoured shells, is fascinating in its own right. But that’s another story. In this chapter they serve as yet another illustration of the wonderful diversity of crustaceans.

The larvae of crustaceans show the group’s diversity at least as gloriously as the adults. But still the basic train design is palpable throughout. Perhaps even more dramatically than in the case of adult crustaceans, it is as though natural selection pulled, pushed, kneaded, or distorted the various segments of the body with wild abandon. Different species of crustacean pass through nameable larval stages, free-living animals in their own right, often leading a very different life from the adults – as caterpillars live very differently from butterflies among the insects. The zoea is one such larval type. It is the last stage before the adult, in crabs, lobsters, crayfish, shrimps, bay bugs, and their kind – the decapod crustaceans.

Overleaf is a page full of assorted zoeas to show how easily the basic crustacean plan can be stretched and bent around in evolution, as though made of modelling clay. What I take away from these exquisite little creatures is that all have the same parts, they just vary the relative sizes and shapes of those parts. They all look like distorted versions of each other. That’s what evolutionary diversification is all about, and the crustacea show it as plainly as any animal group. You can match up the corresponding parts in all the species, and can clearly see how the different species have pulled, stretched, twisted, swelled, or shrunk the same parts in different ways over evolutionary time. It is wondrous to behold, you surely agree.

Crustacean larvae. Always the same parts, yet pulled and pushed in different directions

Zoeas may look a little like the adults they are to become. But they need to survive in a very different world, usually the world of plankton, and their bodies are versatile enough to evolve into all sorts of unlikely distortions – written in surface layers of the palimpsest. Many of them sport long spikes, presumably to make them difficult to swallow. The impressive spikes of the planktonic zoea at top middle are nowhere to be seen in the typical adult crab it is to become. Truth be told, the adult in this case is not easily seen at all under the sea urchin that it habitually carries around on its back – presumably to gain protection via the urchin’s own spikes. Notice the long, prominent abdomen of the larva, with its easily discerned segments. As with all crabs, the adult abdomen is neither long nor prominent but tucked discreetly under the thorax.

An earlier larval stage than the zoea, found in most crustacean life cycles, is the nauplius larva. Unlike zoeas, which bear some sort of resemblance to the adult they will become, naupliuses have an appearance all their own. There’s another larval stage possessed by some crustaceans, the cyprid larva, presumably so called because it resembles the adult of a water flea called Cypris. Perhaps the adult Cypris is an example of the overgrown larva phenomenon, which is a fairly common way for evolution to progress. Below is the cyprid larva of a member of the rather obscure crustacean sub-class, Facetotecta.

Facetotectan larva

This larva is unmistakeably crustacean, with a head shield, and abdominal segments bearing typically crustacean forked appendages. From 1899, when the larvae were first discovered, until 2008, nobody knew what adult facetotectans looked like. And they still have never been seen in the wild. What happened in 2008 was that a group of experimentalists succeeded in persuading larvae to turn into a precursor of the adult. They did it by means of hormone treatment. The subtitle of their paper is ‘Towards a solution to a 100-year-old riddle’. The adults turn out to be soft, unarmoured, slug-like or worm-like creatures with no visible segments and no appendages, presumably parasites, although nobody knows who their victims are. You wouldn’t know, to look at them, that they are crustaceans at all. This experiment recalls a similar one by Julian Huxley with axolotls in 1920. Axolotls are vertebrates, members of the Amphibia. They look like tadpoles; indeed they are tadpoles, but sexually mature tadpoles, and they reproduce. They evolved from larvae who would once have turned into salamanders. The adult stage of their life history was cut off during their evolution, as the larvae became sexually capable. By treating them with thyroid hormone, Julian Huxley succeeded in turning them into the salamanders that their ancestors once were. This experiment may have inspired his younger brother Aldous Huxley to write his novel After Many a Summer, in which an eighteenth-century aristocrat discovered how to cheat death – and developed, 200 years later, into a shaggy, long-armed ape humming a Mozart aria. We humans are ‘larval’ apes!

Those slug-like facetotectans are yet another manifestation of crustacean diversity. They must be descended from adults who had segments and limbs like any respectable crustacean. But the most characteristically crustacean scripts of the palimpsest have been almost completely obliterated by parasitic over-writing, while being retained in the larva. Degenerative evolution of this kind is common in parasites hailing from many parts of the animal kingdom. Within the crustacea, it is also shown to an extreme in certain members of the barnacle family, though not the typical barnacles that encrust rocks at the seaside and prick your bare feet when you walk on them.

As a boy on a seaside holiday, I remember being frankly incredulous when my father told me barnacles are really crustaceans. I thought they were molluscs because, well, they look like molluscs. Nothing like crustaceans, anyway, until you look carefully inside. The barnacles that cling close to rocks look like miniature limpets, while goose barnacles look like mussels on stalks. So how do we know they are really crustaceans? Look inside. Or see Darwin’s own drawing above and you find a shrimp-like creature lying on its back and sweeping the water with its comb-like limbs to filter out swimming morsels of food. As we have by now come to expect, the larvae of barnacles are more unmistakeably crustacean than the adults. Before the adult settles down to its sedentary permanence, it is a free-swimming larva in the plankton. On the left is the nauplius larva of Semibalanus, a small rock barnacle with, for comparison, the nauplius larva of a shrimp, Sicyonia.

Barnacle larva Shrimp larva

Barnacles don’t encrust only rocks. To a barnacle, a whale would seem like a gigantic mobile rock. Not surprisingly, some barnacles make their home on the surface of whales, and there are species of barnacle who live nowhere else. Others ride on crabs, and some of them, especially Sacculina, evolved into the most extreme examples of divergence from normal crustacean form. They moved, in evolutionary time, from the outside of the crabs to the inside, and became internal parasites bearing no apparent resemblance to a barnacle – or even any kind of animal. Parasites often evolve in a direction that could fairly be called degeneration, and Sacculina is an extreme example of this. I shall return to it in the final chapter.

There are many groups of animals that I could have chosen to illustrate evolutionary divergence and variation on a theme. Fish and crustaceans do it perhaps more spectacularly than any other groups, and I chose especially the larvae of crustaceans, partly because, living in the plankton as most of them do, they are less familiar than adult lobsters, crabs, and prawns. I regret that in this book I have been able to show only a small number of them. See the splendid Atlas of Crustacean Larvae, published by Johns Hopkins University Press, for the full and amazing range of diversity that these mesmerising little creatures display. Sir Thomas Browne (1605–82) was unaware of them when he wrote the following, about bees, ants, and spiders, but crustacean larvae might have moved him to even greater eloquence.

Ruder heads stand amazed at those prodigious pieces of nature, Whales, Elephants, Dromedaries and Camels; these I confess, are the Colossus and Majestick pieces of her hand but in these narrow Engines there is more curious Mathematicks, and the civilitie of these little Citizens more neatly sets forth the wisdome of their Maker.

7 In Living Memory

The most recent scripts, those in the top layer of the palimpsest, are those written during the animal’s own lifetime. I said that the genes inherited from the past can be seen as predicting the world into which an animal is going to be born. But genes can predict only in a general way. Conditions change on a timescale faster than the generational turnover with which natural selection can cope. Many details are usefully filled in during the animal’s own lifetime, mostly by memories stored in the brain, as opposed to the genetic book of the dead, in which ‘memories’ are written in DNA. Like gene pools, brains store information about the animal’s world, information that can be used to predict the future, and hence aid survival in that world. But brains can do it on a swifter timescale. Strictly speaking, where learning – indeed, this whole chapter – is concerned, we are talking not about the genetic book of the dead but about the non-genetic book of the living. However, as we shall see, naturally selected genes from the past prime the brain to learn certain things rather than others.

The gene pool of a species is sculpted by the chisels of natural selection, with the result that an individual, programmed as it is by a sample of genes drawn from the well-carved gene pool, tends to be good at surviving in environments that did the carving: that is, an averaged set of ancestral environments. An important part of the body’s equipment for survival is the brain. The brain – its lobes and crevices, its white matter and grey matter, its bewildering byways of nerve cells and highways of nerve trunks – is itself sculpted by natural selection of ancestral genes. The brain is subsequently changed further by learning, during the animal’s lifetime, in such a way as to improve yet further the animal’s survival. ‘Sculpting’ might not seem so appropriate a word here. But the analogy between learning and natural selection has impressed many, not least BF Skinner, a leading – if controversial – authority on the learning process.

Skinner specialised in the kind of learning called operant conditioning, using a training apparatus that later became known as the Skinner Box. It’s a cage with an electrically operated food dispenser. An animal, often a rat or a pigeon, gets used to the idea that food sometimes appears in the automatic dispenser. Built into the wall of the box is a pressable lever or a peckable key. Pressing the lever or key causes food to be delivered, not every time but on some automatically scheduled fraction of occasions. Animals learn to operate the device to their advantage. Skinner and his associates have developed an elaborate science of so-called operant conditioning or reinforcement learning. Skinner Boxes have been adapted to a wide variety of animals. I once saw a film of a rotund gourmand, in a specially reinforced Skinner Box, noisily exercising the lever-bashing skill of his bulbous pink snout. I found it endearing, and I hope the pig enjoyed it as much as I enjoyed the spectacle.

You can train an animal to do almost anything you like, by operant conditioning, and you don’t have to use the automated Skinner Box apparatus. Suppose you want to train your dog to ‘shake hands’, that is, politely raise his right front paw as if to be shaken. Skinner called the following technique ‘shaping’. You watch the animal, waiting until he spontaneously makes a move that you perceive as being slightly in the right direction: an incipient, tentative, upward movement of the right front paw, say. You then reward him with food. Or perhaps not with food but with a signal such as the sound of a ‘clicker’, which he has previously been taught to associate with a food reward. The clicker is known as a secondary reward or secondary reinforcement, where the food is the primary reward (primary reinforcement). You then wait until he moves his right front paw a little further in the right direction. Progressively, you ‘shape’ his behaviour closer and closer to the target you have chosen, in this case ‘shaking hands’. You can use the same shaping technique to teach a dog to do all manner of cute tricks, even useful ones like shutting the door when there’s a cold draught and you are too lazy to get out of your armchair. It is elaborations of the same shaping technique that erstwhile circus trainers employed to teach bears and lions to do undignified tricks.

I think you can see the analogy between behaviour ‘shaping’ and Darwinian selection, the parallel that so appealed to Skinner and many others. Behaviour-shaping by reward and punishment is the equivalent of shaping the bodies of pedigree dogs by artificial selection – domestic breeding. The gene pools of pedigree cattle, sheep, and cats, of racehorses and greyhounds, pigs and pigeons, have been carefully sculpted by human breeders over many generations to improve running speed, milk or wool yield, or in the case of dogs, cats, and pigeons, aesthetic appeal according to various more-or-less bizarre standards. Darwin himself was an enthusiast of the pigeon fancy, and he devoted an early chapter of On the Origin of Species to the power of artificial selection to modify domestic animals and plants.

Now, back to shaping in Skinner’s sense. The animal trainer has a particular end result in mind, such as handshaking in a dog. She waits for spontaneous ‘mutations’ (please note well the quotation marks) of behaviour thrown by an individual animal and selects which ones to reward. As a consequence of the reward, the chosen spontaneous variant is then ‘reproduced’ by the animal itself in the form of a repetition. Next, the trainer waits for a new ‘mutant’ (again please don’t ignore the quotation marks) extension of the desired behaviour. When the dog spontaneously goes a little further in the desired direction of the handshake, she rewards him again. And so on. By a careful regimen of selective rewards, the trainer shapes the dog’s behaviour progressively towards a desired end.

The analogy with genetic selection is evident and was expounded by Skinner himself. But so far, the analogy is with artificial selection. How about natural selection? What role does reinforcement learning play in the wild, where there are no human trainers? Does the analogy with reward learning extend from artificial selection to natural selection. How does reward learning improve the animal’s survival?

Darwin bridged the gap from domestic breeding to natural selection with his great insight that human breeders aren’t necessary. Human selective breeders – let’s call them gene pool sculptors – are replaced by natural sculptors: the survival of the fittest, differential survival in wild environments, differential success in attracting mates and vanquishing sexual rivals, differential parenting skills, differential success in passing on genes. And just as Darwin showed that we don’t need a human breeder, the analogy with learning does without a human trainer. With no human trainers, animals in the wild learn what’s good for them and shape their behaviour so as to improve their chances of survival.

‘Mutation’ consists of spontaneous trial actions that might be subject to ‘selection’ – i.e. reward or punishment. The rewards and punishments are doled out by nature’s own trainers. When a hen scratches the ground with her feet, the action has a good chance of uncovering food of some kind, perhaps a grub or a seed. And so ground-scratching is rewarded, and repeated. When a squirrel bites the kernel of a nut, it’s hard to crack unless held at a particular angle in the teeth. When the squirrel spontaneously discovers the right angle of attack, the nut cracks open, the squirrel is rewarded, the correct alignment of the teeth on the nut is remembered and repeated, and the next nut is cracked more quickly.

Much depends on the rewards that nature doles out. Food is not the only reward that we can use, even in the lab. Once, for a research project that I needn’t go into, I wanted to train baby chickens to peck differently coloured keys in a Skinner Box. There were reasons not to use food as reward, so I used heat instead. The reward was a two-second blast from a heat lamp, which the chicks found agreeable, and they readily learned to peck keys for the heat reward. But now we need to face the question, what, in general, do we mean by ‘reward’? As Darwinians, we must expect that natural selection of genes is ultimately responsible for determining what an animal treats as rewarding. It’s not obvious what will be rewarding, however obvious it might seem to us because we are animals ourselves.

We may define a reward as follows. If a random act by an animal is reliably followed by a particular sensation and if, in consequence, the animal tends to repeat the random act, then we recognise that sensation (presence of food or warmth or whatever it is) as a reward by definition. If a Skinner Box delivered not food or heat but an attractive and receptive member of the opposite sex, I have no doubt that it would – at least under some circumstances – fit the definition of a reward: an animal in the right hormonal condition would learn to press a key to obtain such a reward. A mother animal, cruelly deprived of her child, would learn to press a key to restore access. And the child would learn to press a key to obtain access to its lost mother. I know of no direct evidence for any of those guesses, nor for my conjecture that a beaver would treat access to branches, stones, and mud suitable for dam-building as a reward by the above definition. And a crow in the nesting season would define access to twigs as a reward. But as a Darwinian, in all those cases I make the prediction with a modicum of confidence.

Brain scientists are able to implant electrodes painlessly in the brains of animals, through which they can stimulate the brain electrically. Normally they do this in order to investigate which parts of the brain control which behaviour patterns. The experimenter controls an animal’s behaviour by passing weak electric currents. Stimulate a chicken’s brain here, and the bird shows aggressive behaviour. Stimulate a rat’s brain there, and the rat lifts its right front paw. The neurologists James Olds and Peter Milner conceived a variant of the technique. They handed the switch over to the rat. By pressing a lever, rats were able to stimulate their own brain. Olds and Milner discovered particular areas of the brain where self-stimulation by rats was highly rewarding: the rats appeared to become addicted to lever-pressing. Not only did electrical stimulation in these brain regions fulfil the definition of a reward. It did so in a big way. When the electrodes were inserted in these so-called pleasure centres, rats would obsessively press the switch, to the extent of unfortunately neglecting other vital activities. They would sometimes press the lever at a rate of 7,000 presses per hour, would ignore food and receptive members of the opposite sex and go for the lever instead, would run across a grid delivering electric shocks in order to get at the lever. They would press the lever continually for twenty-four hours until the experimenters removed them for fear they’d die of starvation. The experiments have been repeated on humans with similar results. The difference is that humans could verbalise what it felt like:

A sudden feeling of great, great calm … like when it’s been winter, and you have just had enough of the cold, and you go outside and discover the first little shoots and know that spring is finally coming.

Another woman (and you have to wonder whether the experiment was approved by an ethics committee)

quickly discovered that there was something erotic about the stimulation, and it turned out that it was really good when she turned it up almost to full power and continued to push on her little button again and again … she often ignored personal needs and hygiene in favor of whole days spent on electrical self-stimulation.

Rat addict

It seems plausible that natural selection has wired up animal brains in such a way that external stimuli or situations that are good for the animal (which will vary from species to species) are internally connected to the ‘pleasure centres’ discovered by Olds and Milner.

Punishment is the opposite of reward. If an action is reliably followed by a stimulus X and, as a consequence, the animal becomes less likely to repeat the action, then X is defined as a punishment. In the laboratory, psychologists sometimes use electric shock as punishment. More humanely (I guess) they use a ‘time out’ – an interval during which the animal is denied access to reward. Dog trainers (the practice is frowned upon by many experts, rightly in my opinion) sometimes smack an animal as punishment. When I was at boarding school (and this practice is now not only frowned upon but illegal) my friends and I were from time to time caned by the headmaster, hard enough (astonishing as it now seems) to leave bruises that took weeks to heal (and were admired at bath-time like battle scars). What my offences were I have now forgotten, but I’m sure I didn’t forget while I was still at the school and within range of Slim Jim and Big Ben, the two canes in the headmaster’s quiver. My probability of repeating the offence undoubtedly decreased. Therefore, beatings were punishments by definition, as well as by the intention of the headmaster.

In nature, bodily injury is perceived as painful. If an action is followed by pain, the probability of repeating that action goes down. Not only is that how we define punishment: it also explains what pain is for, in the Darwinian sense. Injury often presages death and hence failure to reproduce. Therefore, the nervous system defines bodily injury as painful.

Sometimes pain is endured when offset by reward. We’ve already seen that rats will endure painful electric shock to get to the self-stimulation lever. The punishment of a bee sting may be offset by the reward of honey. The taste of honey is such an intense reward that many animals, including bears, honey badgers, raccoons, and human hunter-gatherers, are prepared to endure the pain for the sake of it. Rewards and punishments trade off against each other, just as mutually opposing natural selection pressures trade off against each other.

The Darwinian interpretation of pain as a warning not to repeat the preceding action has ethical implications. In our treatment of non-human animals, on farms and hunting fields, in slaughterhouses and bullrings, we are apt to assume that their capacity to suffer is less than ours. Are they not less intelligent than we are? Surely this means they feel pain, if at all, less acutely than us? But why should we assume that? Pain is not the kind of thing you need intelligence to experience.

The capacity to feel pain has been built into nervous systems as a warning, an aid to learning not to repeat actions that caused bodily damage and might next time lead to death. So, if a species is less intelligent, might its pain need to be more agonising, rather than less? Shouldn’t humans, being cleverer, get away with less painful pain in order to learn not to repeat the self-harming action? A clever animal, you might think, could get away with a mild warning, ‘Er, probably a good idea not to do that again, don’t you think?’ Whereas a less intelligent animal would need the sort of dire warning that only excruciating pain can deliver. How should this affect our attitude towards slaughterhouses and agricultural husbandry? Should we not, at very least, give our animal victims the benefit of the doubt? It’s a thought, to put it at its mildest!

Rewards and punishments, pleasure and pain, are so familiar and obvious to us as human animals that you probably wonder why I am labouring the topic in this chapter. Here is where things start to become less obvious and more interesting. The brain’s choice of what shall constitute reward and what punishment is not fixed in stone. It is ultimately determined by genetic natural selection. Animals come into the world equipped with genetically granted definitions of reward and punishment. These definitions have been made by natural selection of ancestral genes. Any sensation associated with an increased probability of death will become defined as painful. A dislocated limb in the wild dramatically increases the probability of death. And it is intensely painful, as I recently and very vocally testified, all the way to the hospital. It has certainly made me take great care to avoid risking a repeat. Copulation increases the probability of reproduction, and genetic selection has consequently made the accompanying sensations pleasurable – which means rewarding. It has been suggested, with support from rat experiments and from the self-stimulating woman mentioned above, that sexual pleasure is directly linked to the ‘pleasure centres’ discovered by Olds and his colleagues. Presumably other sensations, too, could be so linked by natural selection.

I conjecture that by artificial selection you could breed a race of pigeons who enjoy listening to Mozart but dislike Stravinsky. And vice versa. After many generations of selective breeding, perhaps spread over several human lifetimes, the birds would be genetically equipped with a definition of reward such that they would learn to peck a key that caused a recording of Mozart to be played, and would learn to peck a key that caused a recording of Stravinsky to be switched off. And of course, the experiment would be incomplete unless we also bred a line of pigeons who treated Mozart as punishment and Stravinsky as reward. Let’s not get pedantic as to whether it is really Mozart that they’d treat as rewarding. The learned preference would probably generalise from Mozart to Haydn! The only point I am trying to make is that the definitions of what is rewarding and what is punishing are not carved in stone. They are carved in the gene pool and therefore potentially changeable by selection.

As a corollary, I conjecture that, by artificial selection, you could (though I wouldn’t wish to, and it might take an unconscionable number of generations) breed a race of animals who regarded what had previously been pain as rewarding. By definition, it would no longer be pain! It would be cruel to release them into their species’ natural environment because, of course, they would be unfitted to survive there – that’s the whole point. But the mere fact that they enjoy what normal members of their species would call pain is not cruel – because, however hard it is for us to imagine, at least within the confines of my thought experiment, they enjoy it! Anyway, the more interesting conclusion is that, in a state of nature, it is natural selection that determines what is reward and what is punishment. My thought experiment was devised to dramatise that conclusion.

Experimental psychologists have long known that you can train an animal to treat as a reward something that previously had neutral value for the animal. As mentioned above, it’s called secondary reinforcement, and an example is the clicker used by dog trainers. But secondary reinforcement is not what I’m talking about here, and I really want to emphasise that. I’m not talking about secondary reinforcement, but about genetically changing the very definition of what constitutes primary reinforcement. I conjecture that we could achieve it by breeding, as opposed to training. I called it a conjecture because the experiment has not, as far as I know, been done. I’m now talking about selectively breeding animals in such a way as to change their own genetically instilled definition of what constitutes a primary reward in training. To repeat my suggestion above, I predict that by artificial selection you could in principle breed a race of animals who would treat bodily injury as rewarding.

Douglas Adams carried the point to a wonderful comedic reductio in The Restaurant at the End of the Universe. Zaphod Beeblebrox’s table was approached by a large bovine creature, who announced himself as the dish of the day. He explained that the ethical problem of eating animals had been solved by breeding a species that wanted to be eaten and was capable of saying so. ‘Something off the shoulder, perhaps, braised in a white wine sauce?’

Birds don’t naturally listen to human music, so my Mozart/Stravinsky flight of fancy may seem implausible. But do they have a music of their own? A respected ornithologist and philosopher named Charles Hartshorne suggested that we should regard birdsong as music, appreciated aesthetically by the birds themselves. He may not have been wrong, as I shall soon argue.

The role of learning and genes in the development of birdsong has been intensively studied, especially by WH Thorpe, Peter Marler, and their colleagues and students. Many birds learn to imitate the song of their father or other members of their own species. Spectacular feats of mimicry by the likes of mynahs and lyre birds are an extreme. In addition to mimicking other species such as kookaburras (‘laughing jackass’), lyre birds have been recorded by David Attenborough giving remarkably convincing imitations of car alarms, camera shutters (with or without a motor drive), the chainsaws of lumberjacks and the mixed noises of a building site. I have even heard it said, but have failed to verify it, that lyre birds can distinctly mimic Nikon versus Canon camera shutters. Such virtuoso mimics incorporate an amazing variety of such sounds in an ample repertoire.

This raises the question of why many songbirds have large repertoires in the first place. Individual male nightingales can sport more than 150 recognisably distinct songs. Admittedly that’s an extreme, but the general phenomenon of song repertoires demands an explanation. Given that song serves to deter rivals and attract mates, why not stick to one song? Why switch between alternatives? Several hypotheses have been proposed. I’ll mention just my favourite, the ‘Beau Geste’ hypothesis of John Krebs.

In the adventure yarn of that name by PC Wren, an outnumbered unit of the French Foreign Legion was beleaguered in a desert fort, and the commander beat off the opposing force with a spectacular bluff.

As each man fell, throughout that long and awful day, [the commander] had propped him up, wounded or dead, set the rifle in its place, fired it, and bluffed the Arabs that every wall and every embrasure and loophole of every wall was fully manned.

Krebs’s hypothesis is that the bird with a large repertoire is pretending his territory is already occupied to the full. He is, as it were, mimicking the sounds that would emerge from an area if it were already overpopulated with too many members of his species. This deters rivals from attempting to set up their territory in the area. The more densely populated an area is, the less will it benefit an individual to settle there. Above a certain critical density, it pays an individual to leave and seek territory elsewhere, even an otherwise inferior territory. So, by pretending to be many nightingales, an individual nightingale seeks to persuade others to find a different place to set up his territory. In the case of lyre birds, the sound of a chainsaw is just another addition to the repertoire, the size of which conveys the message: ‘Go away, there’s no future for you here, the place is fully occupied.’

Virtuoso impressionists like lyre birds, mynahs, parrots, and starlings are outliers. Probably they are just manifesting, in extreme form, the normal way young birds learn their species song – imitating their fathers or other species members. The point of learning the correct species song is to attract mates and intimidate rivals. And now we return to our discussion of the definition of a reward: how natural selection defines what will be treated as reward and what punishment.

In an experiment by JA Mulligan, three American song sparrows (Melospiza melodia) were reared by canaries in a soundproof room so that they never heard the song of a song sparrow. When they grew up, all three produced songs that were indistinguishable from those of typical wild song sparrows. This shows that song sparrow song is coded in the genes. But it is also learned. In the following special sense. Young song sparrows teach themselves to sing, with reference to a built-in template, a genetically installed idea of what their song ought to sound like.

What’s the evidence for this? It is possible surgically, under anaesthetic and I trust painlessly, to deafen birds. This has been done, with both song sparrows and the related white-crowned sparrows, Zonotrichia leucophrys. If birds of either species are deafened as adults, they continue to sing almost normally: they don’t need to hear themselves sing. As adults, that is. If, however, they are deafened when three months old, too young to sing, their song when they reach adulthood is a mess, bearing little resemblance to the correct song. On the template hypothesis, this is because they have to teach themselves to sing, matching their random efforts against the template of correct song for the species. There’s an interesting difference between the two species. Whereas the song sparrow never needs to hear another bird sing – its template is innate – the white-crowned sparrow makes a ‘recording’ of white-crowned sparrow song, early in life, long before it starts to develop its own song. Once the template is in place, whether innate as in the song sparrow or recorded as in the white-crowned, the nestlings then use it to teach themselves to sing.

Doves and chickens push this to an extreme: they don’t need to listen to themselves, ever. Ring dove (also known as barbary dove) squabs, who have been surgically rendered completely deaf, later develop vocalisations that are just like those of intact doves. That the behaviour is innate is further testified by the fact that hybrid doves coo in a way that is intermediate between the parental species’ coos. As we shall see in Chapter 9, young crickets (nymphs), before they achieve their final moult to become adults, can artificially be induced to display nerve-firing patterns identical to their species song patterns, even though nymphs never sing. And hybrid crickets have a song that is intermediate between the two parental species.

But I want to get back to the sparrows. As we have seen, they teach themselves to sing by listening to their own random babblings, and repeating those fragments that are rewarded by a match to a template – whether the template is genetically built-in (song sparrow), or a ‘recording’ (white-crowned sparrow) remembered from infancy. Did you notice this means that a sound that matches the template is a reward by our definition? We have identified a new kind of reward to add to food and warmth. The song template is a much more specialised kind of reward. It’s easy to see how food (relief of hunger pangs) and warmth (relief of cold discomfort) would be general, non-specific rewards. Indeed, psychologists of the early twentieth century delighted in reducing all rewards to one simple formula, which they called ‘drive reduction’. Hunger and thirst were seen as examples of ‘drives’, analogous to forces driving the animal. A particular pattern of sounds, complicated and characteristic enough to be recognised, by ornithologists and birds alike, as belonging to one species and one species alone, is a reward of a very different kind from generalised drive-reduction. And, I would personally add, of a much more interesting kind. As a student I tried to read up that rat psychology literature, and I’m sorry to admit that I found it rather boring compared to the zoology literature on wild animals.

The ethologist Keith Nelson once gave a conference talk with the title ‘Is bird song music? Well, then, is it language? Well, then, what the hell is it?’ It isn’t language: not rich enough in information, and it doesn’t seem to be grammatical in the sense of possessing a hierarchical nesting of ‘clauses’ enclosing ‘sub-clauses’. Hartshorne, as I mentioned previously, thought it was music, and I think there’s a sense in which he was right. I believe we can make a case that birds have an aesthetic sense, which responds to song. I think there’s also a sense in which it works like a drug. In what follows, I am drawing on a pair of papers that I wrote jointly with John Krebs some years ago, about animal signals generally. We were critically responding to a then prevalent idea that animal signals function to convey useful information from the sender to the recipient, for the mutual benefit of both. For example, ‘I am a male of the species Luscinia megarhynchos, I am in breeding condition, and I have a territory over here.’ The gene’s-eye view of evolution, then quite novel, did not sit well with ‘mutual benefit’. Krebs and I followed the gene’s-eye view to a more cynical view of animal signals, substituting the idea of manipulation of the receiver by the signaller. ‘You are a female of the species Luscinia megarhynchos. COME HITHER! COME HITHER! COME HITHER!’

When an animal seeks to manipulate an inanimate object, it has only one recourse – physical power … But when the object it seeks to manipulate is itself another live animal there is an alternative way. It can exploit the senses and muscles of the animal it is trying to control … A male cricket does not physically roll a female along the ground and into his burrow. He sits and sings, and the female comes to him under her own power.

Now, you might object, surely the female should respond to male song in this way only if it benefits her. But we regarded the relationship between signaller and signallee as an arms race, run in evolutionary time. Perhaps she does put up some sales-resistance. But that provokes the male, on the other side of the arms race, to up the ante: increase the intensity of his signal. And now we come to another strand to the argument, which Krebs and I advanced in the second of our two papers. This concerns what we called ‘mind-reading’. Any animal in a social encounter can benefit itself by predicting (behaving as if predicting) the behaviour of another. There are all kinds of give-away clues. If a male dog raises his hackles, this is an involuntary indicator of an aggressive mood. Responding appropriately to such give-aways is what we dubbed ‘mind-reading’. Humans can become quite adept at mind-reading in this sense, making use of such cues as shifty eyes or fidgety fingers. And now, to bring the argument full circle, an animal who is the victim of a mind-reader can exploit the fact of being mind-read, in such a way as to render inappropriate the very word ‘victim’. A male, for instance, might manipulate a female by ‘feeding’ her mind-reading machinery, perhaps with deceptive cues. This is just to say that where victimhood is concerned, manipulation is not a one-way street. Mind-reading turns the tables. And then manipulation potentially turns them back again, against the mind-reader.

On this view animal signals, to repeat, evolve as an arms race between mind-reading and manipulation, an arms race between salesmanship and sales-resistance. In those cases where the sender benefits from being mind-read and the receiver benefits from being manipulated, we suggested that the ensuing signal should shrink to a ‘conspiratorial whisper’. Why escalate a signal when there is no push-back. Conversely – the opposite of a conspiratorial whisper – loud, conspicuous, vivid signals will arise where the recipient does not ‘want’ to be manipulated. In such cases the arms race, in evolutionary time, escalates towards exaggeration on the part of the sender, to combat increased ‘sales-resistance’ on the part of the receiver.

Why, you might wonder, should there ever be ‘sales-resistance’? It’s most easily seen in the case of the arms race between the sexes. You might think it’s always a good idea for males and females to get together and coordinate. You’d be wrong, and for an interesting reason. Ultimately because sperms are smaller and more numerous (‘cheaper’) than eggs, females need to be choosier than males. A male is more likely to ‘want’ to mate with a female than the female will ‘want’ to mate with him. Females pay a higher cost if they mate with the wrong male than males pay if they mate with the wrong female. In extreme cases, there is no such thing as the wrong female. Hence males are more likely to escalate salesmanship when trying to persuade females. And females more likely to favour sales-resistance. Where you see high-amplitude signals – bright colours, loud sounds – that means there’s probably sales-resistance. Where there’s no sales-resistance, signals are likely to sink to a conspiratorial whisper. Conspicuous signals are costly, if not in energy, in risk of attracting predators or alerting prey.

I’ve been a bit terse in condensing two full-sized papers into four paragraphs. It should become clearer when I now apply it to birdsong. Birdsong is too loud and conspicuous to be a ‘conspiratorial whisper’, so let’s go for the other extreme: increased sales-resistance fomenting exaggerated efforts to manipulate. Is birdsong an attempt to manipulate the behaviour of females and other males: an attempt to change their behaviour to the advantage of the singer?

If biologists wish to manipulate the behaviour of a bird, what can they do? This chapter has already introduced one possibility that birds themselves, unfortunately for them, cannot do: electrical stimulation of another’s brain through implanted electrodes. The Canadian surgeon Wilder Penfield pioneered the technique on human patients whose brains were undergoing surgery for other reasons. By exploring different parts of the cerebral cortex, he was able to jerk specific muscles into action like a puppeteer pulling strings. When he drew a map of which parts of the brain pulled which muscles, it looked like a caricature of a human body, the so-called ‘motor homunculus’ (there’s also a ‘sensory homunculus’ on the left-hand side of the picture, which looks rather similar). The grotesque exaggeration of the homunculus’s hand goes some way towards explaining the formidable skill of a concert pianist, for example. And the large brain area given over to the lips and tongue is no doubt related to speech. The German biologist Erich von Holst, working with chickens in a deeper part of the brain, the brain stem, was able to control what might be called the bird’s ‘mood’ or ‘motivation’, resulting in changes to the observed behaviour, including ‘guiding hen to nest’ and ‘uttering call warning of predator’. I repeat that these operations are painless, by the way. There are no pain receptor nerves in the brain.

Now, a male nightingale might well ‘wish’ he could implant electrodes in a female’s brain and control her behaviour like a puppet. He can’t do that, he’s no von Holst, and he has no electrodes. But he can sing. Might song have something like the same manipulative effect? No doubt he might benefit, if only he could inject hormones into her bloodstream. Again, he can’t literally do that. But evidence on ring doves and canaries suggests that birds can do something close to it. Male doves vigorously court females with a display called the bow-coo. The bow is a characteristic movement resembling an unusually obsequious human bow, and it is accompanied by an equally characteristic coo, consisting of a staccato note followed by a purring glissando. A week’s exposure to a bow-cooing male reliably causes massive growth of a female’s ovary and oviduct, with accompanying changes in sexual, nest-building, and incubation behaviour. This was shown by the American animal psychologist Daniel S Lehrman. Lehrman went on to show that the behaviour of male ring doves has a direct effect on the hormones circulating in female bloodstreams. Parallel work by Robert Hinde and Elizabeth Steel in Cambridge on nest-building behaviour in female canaries showed the same thing.

The ring dove and canary type experiments have not been done on nightingales, but it probably is generally the case that male birdsong changes the hormonal state of females. Male song manipulates female behaviour, as though the male had the power to inject her with chemicals, presumably nightingales no less than other species.

My heart aches, and a drowsy numbness pains

My sense, as though of hemlock I had drunk,

Or emptied some dull opiate to the drains

One minute past, and Lethe-wards had sunk.

John Keats was not a bird, but his brain was a vertebrate brain like a female nightingale’s. The male nightingale song drugged him – almost to death in his poetic fancy. If it can so intoxicate the mammal Keats, might it not have a yet more powerful effect on the vertebrate brain that it was designed to beguile, the brain of another nightingale? To answer yes, we hardly need the testimony of the dove and canary experiments. I believe natural selection has shaped the male nightingale’s song, perfecting its narcotic power to manipulate the behaviour of a female, presumably by causing her to secrete hormones.

But now, let’s return to learning, and the deafening experiments. The evidence shows that young white-crowned sparrows and song sparrows teach themselves to sing with reference to a template. Young white-crowneds need to hear song in order to make their ‘recording’ of the template. But any old song won’t do. They have to hear the song of their own species. This shows that, even when the template is recorded, there is an innate component to it, built in by the genes. And in the case of the song sparrow, it doesn’t even need to be recorded.

I suggested above that birdsong might be appreciated as music, enjoyed aesthetically by the birds themselves. We are now in a position to spell out the argument. The male teaches himself to sing by comparing his ‘random’ burblings against a template. The template serves as reward, positively reinforcing those random attempts that happen to match it. Reflect, now, that the male songster has a brain much like the female he later hopes to manipulate. When he teaches himself to sing, he is finding out which fragments of song appeal to a bird of his own species (himself … but later, a female). What is that, if not the employment of aesthetic judgment?

Burble. I like it (conforms to my template). Repeat it.

Burble warble. Ooooh, that’s even better. I like that very much.

It really turns me on. Repeat that too. YES!

What turns him on will probably turn a female on too, for they are, after all, members of the same species with the same typical brain of the species. At the end of the developmental period when the final adult song has been perfected, it will be equally beguiling to the singer himself and his female target. He learns to sing whichever phrases turn him on. There seems no powerful reason to deny that both enjoy an aesthetic experience – as did John Keats when he heard the nightingale.

We’ve come a long way from the idea of reward as generalised ‘drive reduction’. And we’ve arrived at what I think is a much more interesting place. The lesson of these experiments on birdsong is that reward can be a highly specific stimulus, or stimulus-complex, ultimately laid down by genes: what Konrad Lorenz, one of the fathers of ethology, dubbed the ‘Innate Schoolmarm’.

If this is right, we should predict the following result in a Skinner Box. A young song sparrow who has never heard song should learn to peck a key that yields the sound of song sparrow, but no other species’ song. That hasn’t been done, but various similar experiments have. Joan Stevenson found that chaffinches preferred to settle on a perch attached to a switch that turned on chaffinch song. However, the control sound for comparison was white noise, not the song of another species. Her chaffinches, moreover, were not naive but wild caught. Her method was adopted by Braaten and Reynolds with hand-reared, naive zebra finches and using starling song for comparison instead of white noise. They showed a clear preference for perches that played zebra finch song rather than starling song. It would be nice to do a big experiment with, say, naive young songbirds of six different species, with six perches, each perch playing one of the six songs. We should predict that each species should learn to sit on the perch that played their own species song. It wouldn’t be an easy experiment. Hand-rearing baby songbirds is hard work. A neat design might be to give each baby to foster parents of one of the other six species.

The template of song sparrows is innate. The ‘recorded’ template of young white-crowned sparrows, laid down early in life before they start singing, looks like the kind of learning called ‘imprinting’, most closely associated with Konrad Lorenz and his pursuing geese. Imprinting was first recognised in nidifugous baby birds.

Nidifugous, from the Latin, means ‘fleeing the nest’. Nidifugous hatchlings start life equipped with warm and protective downy feathers and well-coordinated limbs. Examples are ducklings, goslings, moorhen chicks, chicken chicks, ground-nesting species generally. Within hours of hatching, as soon as their feathers are dry, nidifugous chicks are up and about, walking competently, looking around alertly, pecking at food prospects, and dogging parental footsteps. The opposite of nidifugous is nidicolous. All songbirds are nidicolous. Nidicolous bird species typically nest in trees. The babies are helpless, naked, incapable of walking (they’re in a nest balanced up a tree, where would they walk to?), incapable of feeding themselves but with a huge gaping beak, a begging organ into which their parents tirelessly shovel food. Many seabirds such as gulls are nidifugous in that they hatch with downy feathers and don’t gape for food. But they are dependent on the parents bringing food that they regurgitate for the chicks.

Mammals, too, have their own equivalent to nidifugous (think gambolling lambs; and wildebeest calves must follow the herd on the day they’re born) and nidicolous (baby mice are hairless and helpless). Man is a nidicolous species. Our babies are almost completely helpless. There has been an evolutionary trade-off between a pressure towards a bigger brain, conflicting with the difficulty of being born imposed by a large head. The result was to push our babies towards being born earlier, before the head became insufferably (for the mother) large to push out. The result was to make us even more helplessly nidicolous than other ape species.

Nidifugous species, both mammals and birds, are in danger if they become separated from their parent(s), and this is where imprinting comes in. Nidifugous babies, as soon as they hatch, do something equivalent to taking a mental photograph of the first large moving object they see. They then follow it about, at first very closely, then venturing gradually further away as they grow older. The first moving object they see is usually their parent, so the system works fine in nature. Goslings hatched in an incubator, however, tend to imprint on a human carer, for example Konrad Lorenz.

The idea of imprinting in mammals is imprinted in child minds by the nursery rhyme ‘Mary had a little lamb’ (Everywhere that Mary went / The lamb was sure to go). Imprinted animals, both birds and mammals, often retain their mental photograph into adulthood and attempt to mate with creatures (such as humans) who resemble it. One of the reasons zoos have difficulty with breeding is that the frustrated animals hanker after their keepers.

Imprinting may or may not be a special kind of learning. Some say it’s just a special case of ordinary learning. It’s controversial. Either way, it’s a nice example of a recent, ‘top layer’ palimpsest script. The genes could have equipped the animal with a built-in image or specification of precisely what to follow, what to mate with, what song to sing. Instead, they equip the animal with rules for colouring in the details.

Reinforcement learning and imprinting are not the only kinds of learning by which an animal, during its own lifetime, supplements the inherited ancestral wisdom. Elephants make important use of traditional knowledge. The brains of old matriarchs contain a wealth of knowledge about such vital matters as where water can be found. Young chimpanzees learn from their elders skills such as using a stone as a hammer to crack nuts, and preparing a twig to probe termite nests. The handover from adept to apprentice is a kind of inheritance, but it is memetic, not genetic. This is why these skills are practised in particular local areas and not others. The skill of sweet potato washing in Japanese macaques is another example. So is pecking through the foil or cardboard lids of milk bottles by British tits, in the days when milk was delivered daily on the doorstep. In this case, the skill was seen to radiate geographically outwards from focal points, in the manner of an epidemic.

What else equips animals to improve on their genetic endowment, apart from learning? Perhaps the most important example of a ‘memory’ not mediated by the brain is the immune system. Without it, none of us would have survived our first infection. Immunology is a huge subject, too big for me to do it justice in this book. I’ll say a few words, just enough to make the point that genes don’t attempt the impossible task of equipping bodies with information about all the bacteria, viruses, and other pathogens that they might ever encounter. Instead, genes furnish us with tools for ‘remembering’ past infections, forearming us against future infection. We carry not just the genetic book of the dead (the ancestral past) but a special molecular book in which is written a continually updated medical record of our infections and how we dealt with them.

Geese imprinted on Konrad Lorenz. A special kind of learning, which casts light on the mind of birds

Bacteria, too, suffer from infection – by viruses called bacteriophages, or phages for short – and they have their own immune system, which is rather different from ours. When a bacterium is infected, it stores a copy of part of the viral DNA within its own single circular chromosome. These copies have been called ‘mug shots’ of criminal viruses. Each bacterium sets aside a portion of its circular chromosome as a kind of library of these mug shots. The mug shots will later be used to apprehend criminals in the form of the same or related viruses making a reappearance. The bacterium makes RNA copies of the mug shots. These RNA images of ‘criminal’ DNA are circulated through the interior of the bacterial cell. If a virus of a familiar type should invade, the appropriate mug shot RNA binds to it, and special protein enzymes cut up the joined pair, rendering the virus harmless.

The bacterium needs a way to label the mug shots, so they aren’t confused with its own DNA. They are labelled by the presence of adjacent nonsense sequences of DNA, which are palindromes called CRISPR: Clustered Regularly Interspaced Short Palindromic Repeats. Each time a bacterium is assailed by a new kind of virus, another CRISPR-flanked mug shot is added to the CRISPR region of the chromosome. It’s another story, but CRISPR has become famous because scientists have discovered a way in which the bacterial skill can be borrowed for the human purpose of editing genomes.

The vertebrate immune system works rather differently. It’s more complicated but we too have a ‘memory’ of pathogens of the past. Our immune system is then able to mount a rapid response, should any of those old enemies venture to return. This is why those of us who have had mumps or measles can safely mingle with victims, confident that we shall not get the disease a second time. And the enormous boon of vaccination works by tricking the immune system into building up a false memory, normally by injecting either a killed strain or a weakened strain of the pathogen.

The Covid-19 pandemic was largely stopped in its tracks, saving thousands of lives, by a wonderful new type of vaccine, the mRNA vaccine. The role of mRNA (messenger RNA) is to convey coded messages from DNA in the nucleus to where proteins are made to the code’s specification. Now, here’s how mRNA vaccines work. Instead of injecting a killed or weakened strain of the dangerous virus, a harmless protein in its jacket is first sequenced. The genetic code appropriate to that protein is then written into mRNA. The mRNA does its thing, which is to code the synthesis of protein – in this case the harmless jacket protein of the Covid virus. And then, the immune system does its thing and attacks the virus if it enters the body, recognising it by the protein in its jacket.

What is especially interesting, in pursuit of our analogy between learning and evolution, is that the vertebrate immune system’s ‘memory’ (unlike the bacterial one) works in a kind of Darwinian way, by an internal version of natural selection, within the body. But that is another story, beyond our scope here.

The immune system, and the brain, are the two rich data banks in which entries are written during the animal’s own lifetime, to update the genetic book of the dead, or ‘colour in the details’. More minor examples need mentioning for the sake of completeness. Darkening of the skin is a kind of memory of lying out in the sun. It provides useful screening against the damage that the sun’s rays, especially ultraviolet, can wreak, for example in causing skin cancers. This is a case where genetic and post-genetic scripts both contribute. People whose ancestors have lived many generations in fierce tropical sun tend to be born with dark skin, for example native Australians, many Africans, and people from the south of the Indian sub-continent. By contrast, those whose ancestors have lived many generations at higher latitudes are at risk from too little sun. They tend to lack Vitamin D and hence are prone to rickets. Genetic natural selection at high latitudes has therefore favoured lighter skins. That’s all written in the genetic book of the dead. But this chapter is about palimpsest scripts written after birth, and here is where suntan comes in. Browning in the sun, a post-birth ‘colouring-in’, achieves in light-skinned, high-latitude people a temporary approach towards what is written into the genome of tropical peoples. You could think of the two as short-term memory and long-term memory of sunlight.

Another example is acclimatisation to high altitude. The higher you go, the thinner the atmosphere, where lack of oxygen causes ‘mountain sickness’, whose symptoms include headaches, dizziness, nausea, and complications of pregnancy. People whose ancestors have long lived at high altitude have evolved genetic adaptations such as elevated haemoglobin levels in the blood. Those ‘memories’ of ancestral natural selection are written in the genetic book of the dead. Interestingly, the details differ between Andean and Himalayan peoples, not surprisingly because they have independently, over 10,000 years or more, adapted to a lack of oxygen in mountainous regions widely separated from each other. There are several routes to acclimatisation, and it is not surprising that different mountain peoples have followed different evolutionary paths.

Once again, ancestral scripts can be over-written during the animal’s own lifetime. Lowland people who move to high areas can acclimatise. In 1968, when the Olympic Games were held in Mexico City, national teams deliberately arrived early, in order to train at the high altitude (2,200 metres, more than 7,000 feet) of the Anahuac Plateau. Changes that develop during a period of weeks living at high altitude are written into the post-birth palimpsest layer. As with skin colour, they mimic the older, gene-authored scripts.

Talking of skin colour, the ‘paintings’ of Chapter 2 were all done by ancestral genes, replaying ancestral worlds. But there are some animals who can repaint their skin on the fly, to match the changing background they happen to be sitting on at any given moment. This is another example of the non-genetic book of the living. Chameleons are proverbial, but they aren’t the top virtuosi when it comes to impromptu skin artistry. Flatfish such as plaice can change not just their colour but also their patterning. The one above is capable of changing its colour to match the yellow background on which it now sits. But you only have to take one look at it to read it as a detailed description of the lighter bottom it has just moved off, with its mottled pattern projected by shimmering light from surface ripples.

Even flatfish are upstaged by octopuses and other cephalopod molluscs, who have perfected the art of dynamic cross-dressing to an astonishing extent. And they, uniquely in the animal kingdom, do their changes at high speed. Roger Hanlon, while diving off Grand Cayman, saw a clump of brown seaweed suddenly turn ghostly white and swim rapidly away in a puff of sepia smoke. It was an octopus, with a perfect painting of brown seaweed all over its skin. As Hanlon approached, an emergency order from the octopus brain twitched the muscles controlling the tiny bags of pigment peppering the skin. Instantaneously, the whole surface changed colour from perfect camouflage (trying not to be noticed by predators) to scary white (startling would-be predators). Finally, the puff of dark brown ink deflects the attention of would-be predators away from the fleeing octopus.

Thaumoctopus mimicus

Sea snake

Thaumoctopus mimicus

Flounder

Hanlon saw an octopus (upper right) in Indonesian waters, Thaumoctopus mimicus, who mimicked a flounder (lower right), not just its appearance but also its behaviour, stopping and starting in jerky glides over the sand surface. What’s the point? Hanlon is unsure, but he suspects it deceives predators who like to bite off a tentacle but cannot cope with a substantial flatfish. This octopus also can put on a show with its tentacles (upper left), making each one resemble a venomous sea snake (lower left) common in tropical waters. Cephalopods can even change their skin’s texture, ruffling up or puckering it into extraordinary shapes. A colleague once dramatised their other-world strangeness by beginning a lecture on Cephalopods: ‘These are the Martians.’

The main thesis of this book is that the animal can be read as a description of much older, ancestral environments. This chapter has shown how further details are added, on top of the ancestral palimpsest scripts. Earlier chapters invoked a future scientist, SOF, presented with an animal and challenged to read its body and reconstruct the environments that shaped it. There, we spoke only of ancestral environments, described in the genomic database and its phenotypic manifestations. In this chapter we’ve seen how SOF could supplement her reading of ancestral environments, by additional readings of the more recent past, including the other two great databases that supplement the genes, namely the brain and the immune system. Today’s doctors can read your immune system database and reconstruct a moderately complete history of the infections you have suffered – or been vaccinated against. And if SOF could read what is written in the brain (a big if, she really would have to be a scientist of the future), she could reconstruct much detail of the animal’s past environments in its own lifetime.

Experience, either literal experience stored in the brain as memories, disease experience, or genetic ‘experience’ sculpted into the genome by natural selection, enables an animal to predict (behave as if predicting) what will happen next. But there’s one more trick that the brain can pull off in order to foretell the future: simulation, or imagination. Human imagination is a much grander affair than this but, from the point of view of an animal’s survival, and our analogy between natural selection and learning, we could regard imagination as a kind of ‘vicarious trial and error’. Unfortunately, that particular phrase has been usurped by rat psychologists. A rat in a ‘maze’ (usually just a choice between turning left or right) will sometimes physically vacillate, looking left, right, left, right before finally making up its mind. This ‘VTE’ may be a special case of imagining alternative futures, but it’s probably safest if I reluctantly surrender the phrase itself to the rat-runners and not use it here. Instead, I’ll prefer an analogy with computer simulation: the animal’s brain simulates likely consequences of alternative actions internally, thereby sparing itself the dangers of trying them out externally in the real world.

I said the human imagination is a much grander affair. It finds expression in art and literature. Words written by one person can call up an imagined scene in the brain of another. Gertrude’s lament for Ophelia can move a reader to tears four centuries after the poet’s death. Less ambitiously, let me ask you to imagine a baboon atop a steep cliff. Someone has balanced a plank over the edge of the cliff. Resting at the far end of the plank, over the abyss, is a bunch of bananas. Imagine them, yellow and tempting. The baboon is indeed tempted to venture out along the plank. However, his brain internally simulates the consequence, sees that his extra weight would topple the plank – imagines himself tumbling to his death. So he refrains.

Let’s now imagine a range of brains faced with the banana on the plank. First, the genetic book of the dead can build in an innate fear of heights. I myself experience a tingling of the spine, which inhibits me from walking within a metre of the edge of a precipice such as the Cliffs of Moher in Western Ireland. This, even when there’s no wind and no reason to suppose that I would fall.

The visual cliff

A whole genre of experimentation, the so-called ‘visual cliff’ experiment, has been devised to investigate fear of heights. The baby in the picture is quite safe: there’s strong glass over the ‘cliff’. I recently visited one of the world’s tallest buildings where one could stand on toughened glass looking down on the street far below. Perfectly safe, and I watched others walk on the glass, but I avoided doing so myself. Irrational, but innate fears are hard to conquer. Perhaps an innate fear of heights is inherited from tree-climbing ancestors who survived because they possessed it. Not everyone succumbs, of course. These New York construction workers are enjoying a relaxed lunch with evident (though incomprehensible to me) nonchalance.

Death by falling is the crudest route through which a fear of heights might be built into animals. Another way is by learning, reinforced by pain. Young baboons who fall down smaller cliffs are not killed, but they experience pain. Pain, as we’ve seen, is a warning: ‘Don’t do that again. Next time the cliff might be higher, and it will kill you.’ Pain is a kind of vicarious, relatively safe substitute for death. Pain stands in for death in the analogy between learning and natural selection.

The ‘detour problem’

But now, since you are human with a human power of imagination, you are probably simulating in your brain an unusually bright baboon. He sees himself, in his own imagination, pulling the plank carefully inwards, complete with bananas. Or reaching out with a stick and nudging the bananas along the plank towards him. Probably only highly evolved brains are capable of such simulations. Even dogs (above) perform surprisingly poorly on the so-called ‘detour problem’. But if he succeeds, this imaginative baboon risks no pain and doesn’t fall to his death but does it all by internal simulation. He simulates the fall in his imagination, and consequently refrains from venturing out along the plank. He then simulates the safe solution to the problem and gets the bananas.

I need hardly say that internal simulation of dangerous futures is preferable to the actual actions. Provided, of course, that the simulation leads to accurate prediction. Aircraft designers find it cheaper and safer to test model wings in wind tunnels rather than actual wings on real aeroplanes. And even wind tunnel models are more expensive than computer simulations or analytical calculations, if these can be done. Simulation still leaves some room for uncertainty. The maiden flight of a new plane is still an informative event, however rigorously its parts have been subjected to ordeal by wind tunnel or computer simulation.

Once a sufficiently elaborate simulation apparatus is in place in a brain, emergent properties spring up. The brain that can imagine how alternative futures might affect survival can also, in the skull of a Dante or a Hieronymus Bosch, imagine the torments of Hell. The neurons of a Dalí or an Escher simulate disturbing images that will never be seen in reality. Non-existent characters come alive in the head of the great novelist and in those of her readers. Albert Einstein, in imagination, rode a sunbeam to his place among the immortals with Newton and Galileo. Philosophers imagine impossible experiments – the brain in a vat (‘Where am I?’), atom-for-atom duplication of a human (which ‘twin’ would claim the ‘personhood’?). Beethoven imagined, and wrote down, glories that he tragically could never hear. The poet Swinburne happened upon a forsaken garden on a sea cliff, and his imagination revived a pair of long-dead lovers whose eyes went seaward, ‘a hundred sleeping years ago’. Keats reconstructed the ‘wild surmise’ with which stout Cortez and all his men stared at the Pacific, ‘silent upon a peak in Darien’.

The ability to perform such feats of imagination sprang, emergently, from the Darwinian gift of vicarious internal simulation within the safe confines of the skull, of predicted alternative actions in the unsafe real world outside. The capacity to imagine, like the capacity to learn by trial and error, is ultimately steered by genes, by naturally selected DNA information, the genetic book of the dead.

8 The Immortal Gene

The central idea of The Genetic Book of the Dead grows out of a view of life that may be called the gene’s-eye view. It has become the working assumption of most field zoologists studying animal behaviour and behavioural ecology in the wild, but it has not escaped criticism and misunderstanding, and I need to summarise it here because it is central to the book.

There are times when an argument can helpfully be expressed by contrast with its opposite. Disagreement that is clearly stated deserves a clear reply. I could hypothetically invent the opposite of the gene’s-eye view, but fortunately I don’t need to because the diametric opposite has been put, articulately and clearly, by my Oxford colleague (and incidentally my doctoral examiner, on a very different subject long ago) Professor Denis Noble. His vision of biology is alluring, and is shared by others whose expression of it is less explicit and less clear. Noble is clear. He ringingly hits a nail on the head, but it’s the wrong nail. Here is his lucid and unequivocal statement, right at the beginning of his book Dance to the Tune of Life:

This book will show you that there are no genes ‘for’ anything. Living organisms have functions which use genes to make the molecules they need. Genes are used. They are not active causes.

That is precisely and diametrically wrong, and it will be my business in this chapter to show it.

If genes are not active causes in evolution, almost all scientists now working in the fields known as Behavioural Ecology, Ethology, Sociobiology, and Evolutionary Psychology have been barking up a forest of wrong trees for half a century. But no! ‘Active causes’ is precisely what genes must be: necessarily so if evolution by natural selection is to occur. And, far from being used by organisms, genes use organisms. They use them as temporary vehicles, which they exploit in the service of journeying to future generations. This is not a trivial disagreement, no mere word game. It is fundamental. It matters.

A physiologist of distinction, Denis Noble is captivated by the shattering complexity of the organism, of every last one of its trillions of cells. He sets out to impress his readers with the intricate co-dependency of all aspects of the living organism. As far as this reader is concerned, he succeeds. He sees every part as working inextricably with every other part in the service of the whole. In that service – and this is where he goes wrong – he sees the DNA in the nucleus of a cell as a useful library to be drawn upon when the cell needs to make a particular protein. Go into the nucleus, consult the DNA library there, take down the manual for making the useful protein, and press it into service. I devised that characterisation of Noble’s position during a public debate with him in Hay-on-Wye, and he vigorously nodded his assent. DNA, in Noble’s view, is the servant of the organism, in just the same way as the heart or the liver or any cell therein. DNA is useful to make a particular enzyme when you need it, just as the enzyme is useful for speeding up a chemical reaction … and so on.

Dance to the Tune of Life has the subtitle ‘Biological Relativity’. Noble’s usage of ‘relativity’ has only a tenuous and contrived connection with Einstein’s, but it exactly matches that of the historian Charles Singer in A Short History of Biology:

The doctrine of the relativity of functions is as true for the gene as it is for any of the organs of the body. They exist and function only in relation to other organs.

Now here is Noble some ninety years later. He has the advantage over Singer in that we now know genes are DNA. But his sentiment about biological relativity, in conjunction with the quotation above, resonates perfectly with Singer’s.

The principle of Biological Relativity is simply that there is no privileged level of causation in biology.

I shall argue that, no matter how complicatedly interdependent the parts of a living organism are when we are talking physiology, when we move to the special topic of evolution by Darwinian natural selection there is one privileged level of causation. It is the level of the gene. To justify that is the main purpose of this chapter.

Here’s Singer’s whole vitalistic passage from which I took the above quotation. It’s the peroration of his book and is a perfect prefiguring of Noble’s ‘relativity’.

Further, despite interpretations to the contrary, the theory of the gene is not a ‘mechanist’ theory. The gene is no more comprehensible as a chemical or physical entity than is the cell or, for that matter, the organism itself. Further, though the theory speaks in terms of genes as the atomic theory speaks in terms of atoms, it must be remembered that there is a fundamental distinction between the two theories. Atoms exist independently, and their properties as such can be examined. They can even be isolated. Though we cannot see them, we can deal with them under various conditions and in various combinations. We can deal with them individually. Not so the gene. It exists only as a part of the chromosome, and the chromosome only as part of a cell. If I ask for a living chromosome, that is, for the only effective kind of chromosome, no one can give it to me except in its living surroundings any more than he can give me a living arm or leg. The doctrine of the relativity of functions is as true for the gene as it is for any of the organs of the body. They exist and function only in relation to other organs. Thus the last of the biological theories leaves us where the first started, in the presence of a power called life or psyche which is not only of its own kind but unique in each and all of its exhibitions.

Watson and Crick blew that out of the water in 1953. The triumphant field of digital genomics that they initiated falsifies every single one of Singer’s sentences about the gene. It is true but trivial that a gene is impotent in the absence of its natural milieu of cellular chemistry. Here’s Noble again, bringing Singer up to date but agreeing with his sentiment:

There really is nothing alive in the DNA molecule alone. If I could completely isolate a whole genome, put it in a petri dish with as many nutrients as we may wish, I could keep it for 10,000 years and it would do absolutely nothing other than to slowly degrade.

Obviously a gene in a petri dish cannot do anything, and it would degrade as a physical molecule within months, let alone 10,000 years. But the information in DNA is potentially immortal, and causally potent. And that is the whole point. Never mind the physical molecule and never mind the petri dish. Let the sequence of A, T, C, G triplet codons of an organism’s genome be written on a long paper scroll. Or, no, paper is too friable. To last 10,000 years, carve the letters deep in the hardest granite. To be sure, world-spanning ranges of highland massif would still be too small, but that is a superficial difficulty. In 10,000 years, if scientists still walk the Earth, they will read the sequence and type it into a DNA-synthesising machine such as we already have in early form. They’ll have the embryological knowhow to create a clone of whoever donated the genome in the first place (just a version of the way Dolly the sheep was made). Of course, the DNA information would need the biochemical infrastructure of an egg cell in a womb, but that could be provided by any willing woman. The baby she bears, an identical twin of its 10,000-year dead predecessor, would be living repudiation of Singer and Noble.

That the information necessary to create the twin could be carved in lifeless granite and left for 10,000 years is a truth that fills me with amazement still, even seventy years after Watson and Crick prepared us for it. Charles Singer would be forced to recant his vitalism, while Charles Darwin, I suspect, would exult.

The point is that, transitory though physical DNA molecules themselves may be, the information enshrined in the nucleotide sequence is potentially eternal. Essential though the surrounding machinery is – messenger RNA, ribosomes, enzymes, uterus and all – they can be provided anew by any woman. But the information in an individual’s DNA is unique, irreplaceable, and potentially immortal. Carving it in granite is a way to dramatise this. But it’s not the practical way. In the normal course of events, DNA information achieves its immortality through being copied. And copied. And copied. Copied indefinitely, potentially eternally, down the generations. Of course, DNA can’t copy itself on its own. Obviously, just as a computer disc can’t copy itself without supporting hardware, DNA needs an elaborate infrastructure of cellular chemistry. But of all the molecules that are involved in the process, however essential they may be for the copying process, only DNA is actually copied. Nothing else in the body is so honoured. Only the information written in DNA.

You might think every part of the body is replicated. Does not every individual have arms and kidneys, and are these not renewed in every generation? Yes, but you’d be utterly wrong if you called it replication in the sense that genes are replicated. Arms and kidneys don’t replicate to make new arms and kidneys. Here’s the acid test, and it really matters. Make a change to an arm, say by a fracture or by pumping iron, and the change is not propagated to the next generation. Make a change in a germline gene, on the other hand, and the mutation may long outlast 10,000 years, copied again and again down the generations.

Before the invention of printing, biblical scriptures were painstakingly copied by scribes at regular intervals to forestall decay. The papyrus might crumble but the information lived on. Scrolls don’t replicate themselves. They need scribes, and scribes are complicated, just as the enzymes involved in DNA replication are complicated. Through the mediation of scribes/enzymes information in scrolls/DNA is copied with high fidelity. Actually, scribes might copy with lower fidelity than DNA replication can achieve. With the best will in the world human copyists make errors, and some zealous scribes were not above a little well-meant improvement. Older manuscripts of Mark 9, 29 quote Jesus as saying that a particular kind of demonic possession can be cured only by prayer. Later versions of the text, not content with mere prayer, say ‘prayer and fasting’. It seems that some zealous scribe, perhaps belonging to a monkish order that especially valued fasting, thought to himself that Jesus must surely have meant to mention fasting, how could he not? So it was scarcely taking a liberty to put the words into his mouth. DNA is capable of higher fidelity of replication than that, but even DNA is not perfect. It does make mistakes – mutations. And in one important respect, DNA is unlike the over-zealous scribe: mutation is never biased towards improvement. Mutation has no way to judge in which direction improvement lies. Improvement is judged retrospectively. By natural selection.

So the information in DNA is potentially eternal even though the physical medium of DNA is finite. And let me repeat why this matters. Only the information contained in DNA is destined to outlive the body. Outlive in a very big way. Most animals die in a matter of years if not months or weeks. Few survive the ravages of decades, almost none centuries. And their physical DNA molecules die with them. But the information in the DNA can last indefinitely. I once attended an evolution conference in America where, at the farewell dinner, we were all challenged to produce an appropriate poem. My limerick ran as follows:

An itinerant Selfish Gene

Said ‘Bodies a-plenty I’ve seen.

You think you’re so clever

But I’ll live for ever:

You’re just a survival machine.’

And I raided Rudyard Kipling for the body’s reply:

What is a body that first you take her,

Grow her up and then forsake her,

To go with the old Blind Watchmaker.

I have emphasised the immortality of the gene in the form of copies. But how big is the unit that enjoys such immortality? Not the whole chromosome: it is far from immortal. With minor exceptions such as the Y-chromosome, our chromosomes don’t march intact down the centuries. They are sundered in every generation by the process of crossing over. For the purposes of this argument, the length of chromosome that should be considered significant in the long run depends upon how many generations it is allowed, by crossing over, to remain intact, when measured against the relevant selection pressures. I expressed this only slightly facetiously in my first book, The Selfish Gene, by saying that the title strictly should have been The slightly selfish big bit of chromosome and the even more selfish little bit of chromosome. A small fragment of chromosome, such as a gene responsible for programming one protein chain, can last 10,000 years. In the form of copies. But only fragments that are successful in negotiating the obstacle course that is natural selection actually do that. It’s arguable that a better book title would have been The Immortal Gene, and I have adopted it as the title of this chapter. As we shall see in Chapter 12, it is no paradox that The Cooperative Gene would also have been appropriate.

How does a gene earn ‘immortality’? In the form of copies, it influences a long succession of bodies so that they survive and reproduce, thereby handing the successful gene on to the next generation and potentially the distant future. Unsuccessful genes tend to disappear from the population, because the bodies they successively inhabit fail to survive into the next generation, fail to reproduce. Successful genes are those with a statistical tendency to inhabit bodies that are good at surviving and reproducing. And they enjoy that statistical tendency, positive or negative, by virtue of the causal influence they exert over bodies. So, we have arrived at the reason why it was profoundly wrong to say that genes are not active causes. Active causes is precisely and indispensably what they must be. If they were not, there could be no natural selection and no adaptive evolution.

‘Cause’ has a testable meaning. How do we ever identify a causal agent in practice? We do it by experimental intervention. Experimental intervention is necessary, because correlation does not imply causation. We remove, or otherwise manipulate, the putative cause, and we strictly must do so at random, a large number of times. Then we look to see whether there tends to be a statistically significant change in the putative effect. To take an absurd example, suppose we notice that the church clock in the village of Runton Acorn reliably chimes immediately after that of Runton Parva. If we’re very naive, we jump to the conclusion that the earlier chiming causes the later. But of course it’s not good enough to observe a correlation. The only way to demonstrate causation is to climb up the church tower in Runton Parva and manipulate the clock. Ideally, we force it to chime at random moments, and we repeat the experiment many times. If the correlation with the Runton Acorn chiming is maintained, we have demonstrated a causal link. The important point is that causation is demonstrated only if we manipulate the putative cause, repeatedly and at random. Of course, nobody would be silly enough to actually do this particular experiment with the church clocks. The result is too obvious. I use it only to clarify the meaning of ‘cause’.

Now back to Denis Noble’s statement that ‘Genes are used. They are not active causes.’ By our ‘church clock’ definition, genes most definitely are active causes because, if a gene mutates (a random change), we consistently observe a change in the body of the next generation – and subsequent generations for the indefinite future. Mutation is equivalent to climbing the Runton Parva tower and changing the clock. By contrast, if there is a non-genetic change in the body (a scar, a lost leg, circumcision, an exaggeratedly muscular arm due to exercise, a suntan, acquired fluency in Esperanto or virtuosity on the bassoon), we do not observe the same thing in the next generation. There is no causal link.

Genetic information, then, is potentially immortal, is causal, and there’s a telling difference between potentially immortal genes that succeed in being actually immortal and potentially immortal genes that fail. The reason some succeed and others fail is precisely that they have a causal influence, albeit a statistical one, on the survival and reproductive prospects of the many bodies that they inhabit, through successive generations and across many bodies through populations. It’s important to stress ‘statistical’. One copy of a good gene may fail to survive to the next generation because the body it inhabits is struck by lightning or otherwise suffers bad luck. More relevantly, one copy of a good gene may happen to find itself sharing a body with bad genes, and is dragged down with them. Statistics enter in because sexual recombination sees to it that good genes don’t consistently share bodies with bad genes. If a gene is consistently found in bodies that are bad at surviving, we draw the statistical conclusion that it is a bad gene. After 10,000 years of recombining, shuffling, recombining again, a gene that remains in the gene pool is a gene that is good at building bodies: in collaboration with the other genes that it tends to share bodies with, and that means the other genes in the gene pool of the species (you may remember from Chapter 1 that the species can be seen as an averaging computer).

In The Selfish Gene, I used the image of the Oxford vs Cambridge Boat Race, the parable of the rowers. Eight oarsmen and a cox all have their part to play, and the success of the whole boat depends upon their cooperation. They must not only be strong rowers, they must be good cooperators, good at melding with the rest of the crew. The rowers, of course, represent genes, and they are arrayed along the length of the boat, as genes are arrayed along a chromosome. It’s hard to separate the roles of the individual oarsmen, so intimate is their cooperation, and so vital is cooperative pulling together for the success of the whole boat. The coach swaps individual rowers in and out of his trial crews. Although it’s hard to judge individual performance by watching them, he notices that certain individuals consistently seem to be members of the fastest trial crews. Other individuals consistently are seen to be members of slower crews. Although single individuals never row on their own, in the long run the best rowers show their mettle in the performance of the successive boats in which they sit.

Natural selection sorts out the good genes from the bad, precisely because of the causal influence of genes on bodies. The practical details vary from species to species. Genes that make for good swimmers are ‘good genes’ in a dolphin gene pool but not in a mole gene pool. Genes that make for good diggers are ‘good genes’ in a mole, wombat, or aardvark gene pool but not in a dolphin or salmon gene pool. Genes for expert climbing flourish in a monkey, squirrel, or chameleon gene pool but not in a swordfish, rhinoceros, or earthworm gene pool. Genes for aerodynamic proficiency flourish in a swallow or bat gene pool though not in a hippo or alligator gene pool.

But however varied the details of ‘good’ and ‘bad’ may be from species to species, the central point remains. Depending on their causal influence on bodies, genes either survive or don’t survive to the next generation, and the next, and the next … ad infinitum. Let me put it more forcefully: any Darwinian process, anywhere in the universe – and I’m pretty sure if there’s life elsewhere in the universe it will be Darwinian life – any Darwinian process depends on trans-generational replicated information, and that information must have a causal influence on its probability of being replicated from one generation to the next. It happens that on our planet the replicated information, the causal agent in the Darwinian process, is DNA. It is wrong, utterly, blindingly, flat-footedly, downright wrong, to deny its fundamental role as a cause in the evolutionary process.

Have I labored the point excessively? Would that it were excessive, but unfortunately there is reason to think that views such as those I have criticized here have been widely influential. Stephen Jay Gould (whose errors were consistently masked by the graceful eloquence with which he expressed them) went so far as to reduce the role of genes in evolution to mere ‘bookkeeping’. The metaphor of the bookkeeper has a dramatic appeal so seductive that it evidently seduced Gould himself. But it’s as wide of the mark as it is possible to be. It is the bookkeeper’s role to keep a passive record of transactions after they happen. When the bookkeeper makes an entry in his ledger, the entry does not cause a subsequent monetary transaction. It is the other way around.

I hope the preceding pages have convinced you that ‘bookkeeping’ is worse than a hollow travesty of the central causal role that genes play in evolution. It is the exact opposite of the truth, a metaphor as deeply wrong as it is superficially persuasive. Gould was also a proponent of ‘multi-level selection’, and this is another respect in which he is seen as an opponent of the gene’s-eye view of evolution (see, for instance, the philosopher Kim Sterelny’s perceptive book Dawkins Versus Gould: Survival of the Fittest). Gould, and others, insisted that natural selection occurs at many levels in the hierarchy of life: species, group, individual, gene. The first thing to say about this is that although there is a persuasive hierarchy, a real ladder, the gene doesn’t belong on it. Far from being the bottom rung of a ladder, far from being on the ladder at all, the gene is set off to one side. Precisely because of its privileged role as a causal agent in evolution. The gene is a replicator. All other rungs in the ladder are vehicles, a term that I shall explain later in this chapter.

As for higher levels of selection, there is, to be sure, a sense in which some species survive at the expense of others. This can look a bit like natural selection at the species level. The native red squirrel in Britain is steadily going extinct as a direct result of the lamentable whim of the 11th Duke of Bedford in the nineteenth century to introduce American grey squirrels. The greys out-compete the smaller reds, and also infect them with squirrel pox, to which they themselves have evolved resistance over many generations in America. Ecological replacement of a species by a competitor species looks superficially like natural selection. But the resemblance is empty and misleading. This kind of ‘selection’ does not foster evolutionary adaptation. It’s not natural selection in the Darwinian sense. You would not say that any aspect of the grey squirrel’s body or behaviour was a device to drive red squirrels extinct, whereas you might happily talk about the Darwinian function of its bushy tail, meaning those aspects of the tail that assisted ancestral squirrels to out-compete rival squirrel individuals of the same species, with a slightly different tail.

In 1988, I published a paper called ‘The Evolution of Evolvability’. This is the closest I have come to supporting something like ‘multi-level selection’. My thesis was that certain body plans, for example the segmented body plans of arthropods, annelids, and vertebrates, are more ‘evolvable’ than others. I quote from that paper:

I suspect that the first segmented animal was not a dramatically successful individual. It was a freak, with a double (or multiple) body where its parents had a single body. Its parents’ single body plan was at least fairly well‑adapted to the species’ way of life, otherwise they would not have been parents. It is not, on the face of it, likely that a double body would have been better adapted … What is important about the first segmented animal is that its descendant lineages were champion evolvers. They radiated, speciated, gave rise to whole new phyla. Whether or not segmentation was a beneficial adaptation during the individual lifetime of the first segmented animal, segmentation represented a change in embryology that was pregnant with evolutionary potential.

I envisioned that my concept of ‘evolvability’ should be regarded as a property of embryology. Thus, a segmented embryology has high evolvability potential, meaning an embryology that lends itself to rich evolutionary divergence. The world tends to become populated by clades with high evolvability potential. A clade is a branch of the tree of life, meaning a group plus its shared ancestor. ‘Birds’ constitutes a clade, for all birds have a single common ancestor not shared by any non-birds. ‘Fish’ is not a clade, because the common ancestor of all fish is shared by all terrestrial vertebrates including us, who are not fish. ‘Mammals’ is a clade, but only if you include so-called ‘mammal-like reptiles’. It would be unhelpful and confusing to call the evolution of evolvability group selection. ‘Clade selection’, a coining of George C Williams, fits the bill.

What other criticisms of the gene’s-eye view should we consider? Many would-be critics have pointed out that there is no simple one-to-one mapping between a gene and a ‘bit’ of body. Though true, that’s not a valid criticism at all, but I need to explain it because some people think it is. You know those gruesome butchers’ diagrams, where a map of a cow’s body is defaced by lines representing named ‘cuts’ of meat: ‘rump’, ‘brisket’, ‘sirloin’, etc? Well, you can’t draw a map like that for domains of genes. There’s no ‘border’ you can draw on the body, marking where the ‘territory’ of one gene ends and that of the next one begins. Genes don’t map onto bits of body; they map onto timed embryological processes. Genes influence embryonic development, and a change in a gene (mutation) maps onto a change in a body. When geneticists notice a gene’s effects, all they are really seeing is a difference between individuals that have one version (‘allele’) of the gene and individuals that don’t. The units of phenotype that geneticists count, or trace through pedigrees, traits such as the Hapsburg jaw, albinism, haemophilia, or the ability to smell freesias, loop the tongue, or disperse the froth on contact with beer, are all identified as differences between individuals. For, of course, countless genes are involved in the development of any jaw, Hapsburg or not; any tongue, loopy or not. The Hapsburg jaw gene is no more than a gene for a difference between some individuals and other individuals. Such is the true meaning whenever anyone talks of a gene ‘for’ anything. Genes are ‘for’ individual differences. And, just as the eyes of a geneticist are focused on individual differences in phenotype, so also, precisely and acutely, are the eyes of natural selection: differences between those who have what it takes to survive and those who don’t.

As for the all-important interactions between genes in influencing phenotype, here’s a better metaphor than the butcher’s map. A large sheet hangs from the ceiling, suspended from hooks by hundreds of strings attached to different places all over the sheet. It may help the analogy to consider the strings as elastic. The strings don’t hang vertically and independently. Instead, they can run diagonally or in any direction, and they interfere with other strings by cross-links rather than necessarily going straight to the sheet itself. The sheet takes on a bumpy shape, because of the interacting tensions in the tangled cat’s-cradle of hundreds of strings. As you’ve guessed, the shape of the sheet represents the phenotype, the body of the animal. The genes are represented by the tensions in the strings at the hooks in the ceiling. A mutation is either a tug towards the hook or a release, perhaps even a severing of the string at the hook. And, of course, the point of the parable is that a mutation at any one hook affects the whole balance of tensions across the tangle of strings. Alter the tension at any one hook, and the shape of the whole sheet shifts. In keeping with the sheet model, many, if not most, genes have ‘pleiotropic’ (multiple) effects, as defined in Chapter 4.

A balance of tensions

For practical reasons, geneticists like to study the minority of genes that do have definable, seemingly singular effects, like Gregor Mendel’s smooth or wrinkled peas, for example. But even such ‘major genes’ often have a surprisingly miscellaneous collection of other pleiotropic effects, sprinkled seemingly at haphazard around the body. And it’s not surprising that this should be so: genes exert their effects at many stages of embryonic development. It’s only to be expected, therefore, that they’ll have pleiotropic consequences even at opposite ends of the body. A change in tension at one hook leads to a comprehensive shapeshift, all over the whole sheet.

There’s no one-to-one mapping, then, from single gene to single ‘bit’ of body. We have no butcher’s map here. But not by a jot or even a tittle does this fact threaten the gene’s-eye view of evolution. However pleiotropic, however complicated and interactive the effects of a gene may be, you can still add them all up to derive a net positive or net negative effect of a change (mutation) in its influence on the body: a net effect on its chances of surviving into the next generation. Such causal influences on a gene’s own survival in the gene pool come unscathed through the complications, notwithstanding numerous interactions with other genes – the other genes with which it jointly affects the tensions in all the strings. When the gene in question mutates, the whole shape of the sheet may shift, with perhaps lots of pleiotropic changes all over the body. But the net effect of all these changes, in different parts of the body, and in interaction with many other genes, must be either positive or negative (or neutral) with respect to survival and reproduction. That is natural selection.

The tension in the genetic strings is affected too by environmental influences. See these as yet more strings tugging from the side, rather than from hooks in the ceiling. The developing animal is, of course, influenced by the environment as well as by the genes, always in interaction with the genes. But again, this doesn’t matter one iota to the gene’s-eye view of evolution. To the extent that, under available environmental conditions, a change in a gene causes a change in that gene’s chances of making it through the generations (either positive or negative), natural selection will occur. And natural selection is what the gene’s-eye view is all about.

So much for that criticism of the gene’s-eye view. What else do we have? Granted that genes are active causes in evolution, it is the whole individual body that we observe behaving as an active agent. This fact, too, is often wrongly seen as a weakness of the gene’s-eye view. Yes, of course, it is the whole animal who possesses executive instruments with which to interact with the world – legs, hands, sense organs. It’s the whole animal who restlessly searches for food, trying first this avenue of hope, then switching to another, showing all the symptoms of questing appetite until consummation is reached. It is the individual animal who shows fear of predators, looks vigilantly up and around, jumps when startled, runs in evident terror when pursued. It is the individual animal who behaves as a unitary agent when courting the opposite sex. It is the individual animal who skilfully builds a nest, and works herself almost to death caring for her young.

The animal, the individual animal, the whole animal, is indeed an agent, striving towards a purpose, or set of purposes. Sometimes the purpose seems to be individual survival. Often it is reproduction and the survival of the individual’s children. Sometimes, especially in the social insects, it is the survival and reproduction of relatives other than children – sisters and nieces, nephews and brothers. My late colleague WD Hamilton (he of the palimpsest postcard in Chapter 1) formulated the general definition of the exact mathematical quantity that an individual under natural selection is expected to maximise as it engages in its purposeful striving. It includes individual survival. It includes reproduction. But it includes more, because genes are shared with collateral relatives, and gene survival can therefore be fostered by enabling the survival and reproduction of a sister or a nephew. He gave a name to the exact quantity that an individual organism should strive to maximise: ‘inclusive fitness’. He condensed his difficult mathematics into a long and rather complicated verbal definition:

Inclusive fitness may be imagined as the personal fitness which an individual actually expresses in its production of adult offspring as it becomes after it has been first stripped and then augmented in certain ways. It is stripped of all components which can be considered as due to the individual’s social environment, leaving the fitness which he would express if not exposed to any of the harms or benefits of that environment. This quantity is then augmented by certain fractions of the quantities of harm and benefit which the individual himself causes to the fitnesses of his neighbours. The fractions in question are simply the coefficients of relationship appropriate to the neighbours whom he affects: unity for clonal individuals, one-half for sibs, one-quarter for half sibs, one-eighth for cousins … and finally zero for all neighbours whose relationship can be considered negligibly small.

Pretty convoluted? A bit hard to read? Well, it has to be convoluted because inclusive fitness is a hard idea. It’s necessarily convoluted in my view because looking at it from the individual’s point of view is an unnecessarily convoluted way of thinking about Darwinism. It all becomes blessedly simple if you dispense with the individual organism altogether and go straight to the level of the gene. Bill Hamilton himself did this in practice. In one of his papers, he wrote:

let us try to make the argument more vivid by attributing to the genes, temporarily, intelligence and a certain freedom of choice. Imagine that a gene is considering the problem of increasing the numbers of its replicas, and imagine that it can choose between causing purely self-interested behaviour by its bearer … and causing ‘disinterested’ behaviour that benefits in some way a relative.

See how clear and easy to follow that is, compared to the previous quotation on inclusive fitness. The difference is that the clear passage adopts the gene’s-eye view of natural selection. The difficult passage is what you get when you re-express the same idea from the point of view of the individual organism. Hamilton gave his blessing to my half-humorous informal definition: ‘Inclusive fitness is that quantity that an individual will appear to be maximising, when what is really being maximised is gene survival.’

Role Maximises
Gene Replicator Survival
Organism Vehicle Inclusive fitness

Bill Hamilton

The individual organism, in my terminology, is a ‘vehicle’ for survival of copies of the ‘replicators’ that ride inside it. The philosopher David Hull got the point after an extensive correspondence with my then student Mark Ridley, but he substituted the word ‘interactor’ for my ‘vehicle’. I never quite understood why. Depending on your preference you can see either the vehicle or the replicator as the agent that maximises some quantity. If it’s the vehicle, then the quantity maximised is inclusive fitness, and rather complicated. But equivalently, if it’s the replicator, the quantity maximised is simple: survival. I don’t want to downplay the importance of vehicles as units of action. It is the individual organism who possesses a brain to take decisions, based on information supplied by senses, and executed by muscles. The organism (‘vehicle’) is the unit of action. But the gene (‘replicator’) is the unit that survives. On the gene’s-eye view, the very existence of vehicles should not be taken for granted but needs explaining in its own right. I essayed a kind of explanation in ‘Rediscovering the Organism’, the final chapter of The Extended Phenotype.

Replicators (on our planet, stretches of DNA) and vehicles (on our planet, individual bodies) are equally important entities, equally important but they play different, complementary roles. Replicators may once have floated free in the sea but, to quote The Selfish Gene, ‘they gave up that cavalier freedom long ago. Now they swarm in huge colonies, safe inside gigantic lumbering robots’ (individual bodies, vehicles). The gene’s-eye view of evolution does not play down the role of the individual body. It just insists that that role (‘vehicle’) is a different kind of role from that of the gene (‘replicator’).

Successful genes, then, survive in bodies down the generations, and they cause (in a statistical sense) their own survival by their ‘phenotypic’ effects on the bodies that they inhabit. But I went on to amplify the gene’s-eye view by introducing the notion of the extended phenotype. For the causal arrow doesn’t stop at the body wall. Any causal effect on the world at large – any causal effect that can be attributed to the presence of a gene as opposed to its absence, and that influences the gene’s chances of survival, may be regarded as a phenotypic effect, of Darwinian significance. It has only to exert some kind of statistical influence on the chances, positive or negative, on that gene’s surviving in the gene pool. I must now revisit the extended phenotype, for it is, to me, an important part of the gene’s-eye view of evolution.

Alternative titles for The Selfish Gene, all true to its content

9 Out Beyond the Body Wall

Imagine the furore if Jane Goodall reported seeing chimpanzees building an amazing stone tower in a forest clearing. They meticulously select stones of the correct shape for the purpose, rotating each one until it snugly fits neighbouring stones. Then the chimps cement it securely in place before picking out another stone. They evidently like to use two radically different sizes of stones, small ones to build the walls themselves, and much larger ones to provide outer fortification and structural strength, the all-important supporting walls. The discovery would be a sensation, headline news, the subject of breathless BBC discussions. Philosophers would jump on it, there’d be passionate debates about personhood, moral rights, and other topics of philosophical moment. The tower is ill-suited to housing its builders. If not functional, then, is it some kind of monument? Does it have ritual or ceremonial significance like Stonehenge? Does the tower show that religion is older than mankind? Does it threaten the uniqueness of man?

The edifice pictured is a real animal construction, but not one built by chimpanzees; the reality is much smaller, and it doesn’t stand up like a monument but lies flat on the bottom of a stream. It is the house of a little insect, the larva of a caddis fly, Silo pallipes. Caddis adults fly in search of mates and live only a few weeks, but their larvae grow for up to two years under water, living in mobile homes that they build for themselves out of materials gathered from their surroundings, cementing them with silk that they secrete from glands in the head. In the case of Silo pallipes (see top left of picture) the building material is local stone. Its astonishing building skills were unravelled by Michael Hansell, now our leading expert on animal architecture in general.

These larvae are master masons. Just look at the delicate placing of the small stones between the carefully chosen large ones buttressing the sides. Hansell showed how they select stones, choosing by size and shape but not by weight. Ingenious experiments in whichhe removed parts of the house showed how the larvae fit appropriate stones in the gaps, and cement them in place. Just as impressive is the log house at top right of the picture. This was built not by a caddis larva but by a caterpillar, a so-called bagworm. Caddises in water and bagworms on land have converged independently on the habit of building houses from materials that they gather from their surroundings. The picture shows a selection of caddis and bagworm houses.

If only chimps had the skills of a caddis larva…

The word ‘phenotype’ is used for the bodily manifestation of genes. The legs and antennae, eyes and intestines are all parts of the caddis’s phenotype. The gene’s-eye view of evolution regards the phenotypic expression of a gene as a tool by which the gene levers itself into the next generation – and, by implication, an indefinite number of future generations. What this chapter adds is the notion of the extended phenotype. Just as the shell of a snail is part of its phenotype, its shape, size, thickness, etc. being affected by snail genes, so the shape, size, etc. of a stone caddis house or twiggy bagworm cocoon are all manifestations of genes. Because these phenotypes are not part of the animal’s own body, I refer to them as extended phenotypes.

These elegant constructions must be the products of Darwinian evolution, no less than the armoured body wall of a lobster, a tortoise, or an armadillo. And no less than your nose or big toe. This means they have been put together by the natural selection of genes. Such is the Darwinian justification for speaking of extended phenotypes. There must be genes ‘for’ the various details of caddis and bagworm houses. This means only that there must be, or have been, genes in the insects’ cells, variants of which cause variation in the shape or nature of houses. To conclude this, we need assume only that these houses evolved by Darwinian natural selection, an assumption that no serious biologist would dispute, given their elegant fitness for purpose. The same is true of the nests of potter wasps, mud dauber wasps, and ovenbirds. Built of mud rather than living cells, they are extended phenotypes of genes in the bodies of the builders.

While their grasshopper cousins sing with serrated legs, male crickets sing with their wings, scraping the top of one front wing against a rough ‘file’ on the underside of the other front wing. Among their songs, the ‘calling song’ is loud enough to attract females within a certain radius, and to deter rival males. But what if it could be amplified, widening the catchment area for pulling females? Some kind of megaphone, perhaps? We use a megaphone as a simple directional amplifier, which works by ‘impedance matching’. No need to go into what that means, except to say that, unlike an electronic amplifier, it adds no extra energy. Instead, it concentrates the available energy in a particular direction. Could a cricket grow a megaphone out of its horny cuticle – a phenotype in the conventional sense? Like the remarkable backwards-facing trombone of the dinosaur Parasaurolophus, which probably served as a resonator for its bellowings. Crickets could have evolved something like that. But an easier material was to hand, and mole crickets exploited it.

CADDIS BAGWORM

EXTENDED PHENOTYPES BUILT OF MUD

Potter wasp

Mud dauber

Ovenbird

Mole crickets, as their name suggests, are digging specialists. Their front legs are modified to form stout spades, strongly resembling those of moles, albeit on a smaller scale. The similarity, of course, is convergent. Some species of mole crickets are so deeply committed to underground life that they cannot fly at all. Given that a mole cricket could benefit from a megaphone, and given that it digs a burrow, what more natural than to shape the burrow as a megaphone? In the case of Gryllotalpa vineae it is a double megaphone, like an old-fashioned clockwork gramophone with two horns. Henry Bennet-Clark showed that the double horn concentrates the sound into a disc section rather than letting it dissipate in all directions as a hemisphere. Bennet-Clark was able to hear a single Gryllotalpa vineae (a species he discovered himself) from 600 metres away. The range of no ordinary cricket comes close.

Parasaurolophus

Assuming it’s as beautifully functional as it seems to be, the mole cricket’s megaphone must have evolved by natural selection, as a step-by-step improvement, in just the same way as the digging hand or as any part of the cricket’s own body. Therefore, there must be genes controlling horn shape, just as there are genes controlling wing shape or antenna shape. And just as there are genes controlling the patterning of cricket song itself. If there were no genes for horn shape, there would be nothing for natural selection to choose. Once again, remember that a gene ‘for’ anything is only ever a gene whose alternative alleles encode a difference between individuals.

Mole cricket Mole

Mole cricket with double megaphone burrow

Now, when contemplating the double megaphone (or, for that matter, the houses of caddises and bagworms) you might be tempted to say something along the following lines. Cricket burrows are not like wings or antennae. They are the product of cricket behaviour, whereas wings and antennae are anatomical structures. We are accustomed to the idea of anatomical structures being under the control of genes. Can the same be said of behaviour, of cricket digging behaviour, or the sophisticated stonemasonry behaviour of a caddis larva? Yes, of course it can. And there is nothing to stop it being said of artifacts that are produced by the behaviour. The artifacts are just one further step in the causal chain from gene to protein to … a long cascade of processes in the embryo, culminating in the adult body.

There are numerous studies of the genetics of behaviour, including, as it happens, the genetics of cricket song. I want to discuss this work because, weirdly, behaviour genetics arouses a scepticism never suffered by anatomical genetics. Cricket song (though not specifically mole cricket song) has been the subject of penetrating genetic research by David Bentley, Ronald Hoy, and their colleagues in America. They studied two species of field cricket, Teleogryllus commodus from Australia and Teleogryllus oceanicus, also Australian but found in Pacific islands too. Adult crickets who have been brought up in isolation from other crickets sing normally. Nymphs who have not yet undergone their final moult to adulthood never sing, but in the laboratory their thoracic ganglia can be induced to emit nerve impulses with a time-pattern identical to the species song pattern. These facts strongly suggest that the instructions for how to sing the species song are coded in the genes. And those genes must be relevantly different in the two species, for their song patterns are different. This is beautifully confirmed by hybridisation experiments.

In nature these two Teleogryllus species don’t interbreed, but they can be induced to do so in the laboratory. The diagram, from Bentley and Hoy, shows the songs of the two species and of various hybrids between them. All cricket songs are made up of pulses separated by pauses. T.oceanicus (A in the picture) has a ‘chirp’ consisting of about five pulses followed by a series of about ten ‘trills’, each trill always made up of two pulses, closer to each other than the pulses of the chirp. We hear a rhythmic repetition pattern of trills. To my ears the trills sound slightly quieter than the chirps. After about ten of these double-pulse trills there’s another chirp. And the cycle repeats rhythmically, over and over again indefinitely. T.commodus (F) has a similar pattern of alternating chirps and trills. But instead of a series of ten or so double-pulse trills, there is only one long trill or perhaps two, between chirps.

Songs of pure bred and hybrid crickets

Now to the interesting question: what about the hybrids? Hybrid songs (C and D) are intermediate between those of the two parent species (A and F). It makes a difference which species is the male (compare C with D), but we needn’t go into that here, interesting though it is for what it might tell us about sex chromosomes. In any case, hybrid song is a beautiful confirmation of genetic control of a behaviour pattern. Further evidence (B and E) comes from crossing hybrids with each of the two wild species (what geneticists call a backcross). If you compare all five songs, you’ll note a satisfying generalisation: hybrid songs resemble the two wild species’ songs in proportion to the number of genes the hybrid individual has inherited from each species. The more oceanicus genes an individual has, the more its song resembles wild oceanicus rather than commodus. And vice versa. As your eyes move down the page from oceanicus towards commodus, the more you detect resemblance to commodus song. This suggests that several genes of small effect (‘polygenes’) sum their effects. And what is not in doubt is that the species-specific song patterns that distinguish these two species of crickets are coded in the genes: a nice example of how behaviour is just as subject to genetic control as anatomical structures are. Why on earth shouldn’t it be? The logic of gene causation is identical for both. Both are products of a chain of causation, with the behaviour having one more link in the chain.

You could do a similar study of the genetics of megaphone-building behaviour. But you might as well go to the next step in the causal chain, the megaphone itself. Do a genetic study of differences between megaphones. They are extended phenotypes of mole cricket genes. This has not been done, but nothing prevents it. Again, nobody has studied the genetics of caddis houses, but there’s no reason why they shouldn’t, although there might be practical difficulties in breeding them in the lab. Michael Hansell was once giving a talk at Oxford, on the building behaviour of caddis larvae. In passing, he was lamenting his failed attempts to breed caddises in the lab, for he wished he could study their genetics. At this, the Professor of Entomology growled from the front row: ‘Haven’t you trrrried cutting their heads off?’ It seems that the insect brain exercises inhibitory influences such that beheading can be expected to have a releasing effect.

If you were to succeed in breeding caddises in captivity, you could systematically select changes in caddis houses over generations. Or you could artificially select for mole cricket megaphone size or shape, generation by generation, breeding from those individuals whose horns happen to be wider, or deeper, or of a different shape. You could breed giant megaphones, just as you might breed giant antennae or mandibles.

That would be artificial selection, but something like it must have happened through natural selection. Whether by artificial or natural selection, the evolution of larger megaphones could come about only by differential survival of genes for megaphone size. For the megaphone to have evolved in the first place as a Darwinian adaptation, there had to be genes for megaphone shape. The notion of the extended phenotype is a necessary part of the gene’s-eye view of evolution. The extended phenotype should be an uncontroversial addition to Darwinian theory.

But aren’t those ‘genes for megaphone shape’ really genes for altered digging behaviour, which is part of the ‘ordinary’ phenotype of the cricket? Aren’t genes for caddis house shape ‘really’ genes for building behaviour, that is to say, ‘ordinary’ phenotypic manifestations within the body? Why talk about ‘extended’ phenotypes outside the body at all? Well, you could equally well say that the genes for altered digging behaviour are ‘really’ genes for changed wiring in the ganglia in the thorax. And genes for changes in the thoracic ganglia are, in turn, ‘really’ genes for changes in cell-to-cell interactions in embryonic development. And they, in turn, are ‘really’ … and so on back until we hit the ultimate ‘really’. Genes are really really really only genes for changed proteins, assembled according to the rules for translating the sixty-four possible DNA triplet codons into twenty amino acids plus a punctuation mark. I repeat, because it is important, we have here a chain of causation whose first steps (DNA codons choosing amino acids) are knowable, whose final step (megaphone shape) is observable and measurable, and whose intermediate steps are buried in the details of embryology and nerve connections – perhaps inscrutable but necessarily there. The point is that any one of those many intermediate steps in the chain of causation could be regarded as ‘phenotype’, and could be the target of selection, artificial or natural. There is no logical reason to stop the chain at the animal’s body wall. Megaphone is ‘phenotype’, every bit as much as nerve-wiring is phenotype. Every one of those steps, both in the cricket’s body and extended outside it, can be regarded as caused by gene differences. Just the same is true of the chain of causation leading from genes to caddis house, even though the behavioural step, the actual building itself, involves sophisticated trial and error in the selection of suitable stones and rotating them into position to fit the existing structure. And now to advance the argument a stage further. The extended phenotype of a gene can reach into the body of a different individual.

Natural selection doesn’t see genes for digging behaviour directly, nor does it see neuron circuitry directly, nor indeed megaphone shape directly. It sees, or rather hears, song loudness. Gene selection is what ultimately matters, but song loudness is the proxy by which gene selection is mediated, via a long series of intermediates. But even song loudness is not the end of the causal chain. As far as natural selection is concerned, song loudness only matters insofar as it attracts females (and deters males, but let’s not complicate the argument). The causal chain extends to a radius where it exerts an influence on a female cricket. This has to mean that a change in female behaviour is part of the extended phenotype of genes in a male cricket. Therefore, the extended phenotype of a gene can reside in another individual. The general point I am aiming towards is that the phenotypic expression of a gene can extend even to living bodies other than the body in which the genes sit. Just as we can talk of a gene ‘for’ a Hapsburg lip, or a gene ‘for’ blue eyes, so it is entirely proper to talk of a gene (in a male cricket) ‘for’ a change in another individual’s behaviour (in this case a female cricket).

We saw in Chapter 7 that song in male canaries and ring doves has a dramatic effect on female ovaries. They swell hugely, with a corresponding rush of hormones and all that it entails. The consequent changes in female behaviour and physiology are in truth phenotypic expression of male genes. Extended phenotypic expression. You may deny it only if you deny Darwinian selection itself.

Ears are not the only portals into a female dove’s brain through which a male’s genes might exert an extended phenotypic influence. Male birds of many species glow with conspicuous colours. These cannot be good for individual survival, but they are still good for the survival of the genes that fashioned them. They achieve this good by assisting individual reproduction at the expense of individual survival. With few exceptions, it is males that sacrifice their personal longevity on the altar of gene survival, through sexually attractive coloration. In those species such as pheasants or birds of paradise, where males dazzle, females are usually drabber in colour, often well camouflaged. Bright coloration in males is favoured, either through attracting females or through besting rival males. In both cases, the naturally selected genes for bright coloration have extended phenotypic expression in the changed behaviour of other individuals. I don’t know whether exposure to a male peacock fan causes peahen ovaries to change, as male dove bow-cooing song does to female dove ovaries. It wouldn’t surprise me. I’d even be surprised if it didn’t.

Unfortunately, predators tend to have eyes like the eyes of the females whom the male is seeking to impress. What is conspicuous to one will probably be conspicuous to all. It’s worth it to the male, or rather to the genes that coloured him. Even if his finery costs him his life, it can already have paid its way in previous success with females. But is there some way a male bird could manipulate females via their eyes without calling attention to himself? Could he shed his dangerously conspicuous personal phenotype, offloading it to an extended phenotype at a safe distance from his own body? ‘Shed’ and ‘offload’, of course, must be understood over evolutionary time. We aren’t talking about shedding feathers in an annual moult, although that happens too – perhaps for the same reason. Black-headed gulls, for instance, shed their conspicuously contrasting face masks as soon as the breeding season is over.

Bower birds are a family of birds inhabiting the forests of New Guinea and Australia. Their name comes from a remarkable and unique habit. They build ‘bowers’ to seduce females. The skills needed to build a bower could be seen as a distant derivative of nest-building skills, and perhaps ultimately derived from them. But the bower is emphatically not a nest. No eggs are laid in it, no chicks reared there. Female bower birds build nests to house eggs as other birds do, and their nests don’t resemble male bowers.

The bower’s sole purpose is to attract females, and males take enormous pains in their creation. First, they clear stray leaves and other debris from the arena in which the bower is to be built. Then the bower itself is assembled from twigs and grass. The details vary from species to species. Some resemble a Robinson Crusoe hat, some a grand archway, others a tower. The final stage of bower design is, I think, the most remarkable of all. The ground in front of and under the bower is colourfully and – I can’t resist saying – tastefully decorated. The male gathers decorative objects – coloured berries, flowers, even bottle tops. Movies of male bower birds at work irresistibly remind me of an artist putting the finishing touches to a canvas, standing back, head cocked judgmentally, then darting forward to make a delicate adjustment, standing back again and surveying the effect with head on one side before darting forward again. That is what emboldened me to use a word like ‘tastefully’. It is hard to resist the impression that the bird is exercising his aesthetic judgement in perfecting a work of art. Even if the decorated bower is not to every human’s taste, or even every female bower bird’s, the ‘touching up’ behaviour of the male almost forces the conclusion that the male has taste of his own, and he is adjusting his bower to meet it.

Remember the discussion in Chapter 7, where I suggested that when male songbirds learn to sing, they are exercising their own aesthetic judgement? The evidence shows, you’ll remember, that young birds burble at random, choosing, by reference to a template, which random fragments to incorporate into their mature song. The male, I argued, has a similar brain to a female of his own species. Not surprisingly, therefore, whatever appeals to him can be expected to appeal to her. The development of song in the young bird could be regarded as a work of creative composition in which the male adopts the principle of ‘whatever turns me on will probably appeal to a female too’. I see no reason to refrain from a similar aesthetic interpretation of bower-building. ‘I like the look of a heap of blue berries just there. So there’s a good chance that a female of my own species will like it too … And perhaps a single red flower over there … or, no, it looks better here … and better still, slightly to the left, and why not set it off with some red berries?’ Of course, I am not literally suggesting that he thinks it through in so many words.

Species differ as to their preferred decoration colours, as well as the shape of their bowers. The satin bower bird (here) goes for blue, a fact that may be connected with the blue-black sheen of his plumage, or the species’ brilliant blue eyes. The male satin bower bird who built this bower has discovered blue drinking straws and bottle tops, and laid out a rich feast of blue to delight the female eye. More soberly, the Great Bower Bird, Chlamydera nuchalis, says it with shells and pebbles (opposite).

The bower is an extended phenotype of genes in the body of the male bower bird. An external phenotype, which presumably has the advantage that its extravagance is not worn on the body and therefore will not call predators’ attention to the male himself. I do not know whether exposure to a more than usually magnificent bower stimulates a hormone surge in the blood of a female, but again the research on ring doves and canaries would lead me to expect this.

We are accustomed to thinking of genes as being physically close to their phenotypic targets. Extended phenotypes can be large, and far distant from the genes that cause them. The lake flooded by a beaver’s dam is an extended phenotype of beaver genes, extended in some cases over acres. The songs of gibbons can be heard a kilometre away in the forest, howler monkeys as much as five kilometres: true genetic ‘action at a distance’. These vocalisations have been favoured by natural selection because of their extended phenotypic effect on other individuals. Chemical signals can achieve a great range among moths. Visual signals require an uninterrupted line of sight, but the principle of genetic action at a distance remains. The gene’s-eye view of evolution necessarily incorporates the idea of the extended phenotype. Natural selection favours genes for their phenotypic effects, whether or not those phenotypic effects are confined to the body of the individual whose cells contain the genes.

In 2002, Kim Sterelny, editor of the journal Biology and Philosophy, marked the twentieth anniversary of the publication of The Extended Phenotype by commissioning three critical appraisals, plus a reply from me. The special issue of the journal came out in 2004. The criticisms were thoughtful and interesting, and I tried to follow suit in my reply, but all this would take us too far afield here. I concluded my piece with a humorously grandiose fantasy about the building of a future Extended Phenotypics Institute. This pipedream edifice was to have three wings, the Zoological Artifacts Museum (ZAM), the laboratory of Parasite Extended Genetics (PEG), and the Centre for Action at a Distance (CAD). The subjects covered by ZAM and CAD have dominated this chapter. PEG must wait till the final chapter. Parasites often exert dramatic extended phenotypic effects on their hosts, manipulating the host’s behaviour to the parasite’s advantage, often in bizarrely macabre ways. The parasite doesn’t have to reside in the body of the host, so there is an overlap with CAD, the Action at a Distance wing. Cuckoo chicks are external parasites who exert extended phenotypic influence over the behaviour of their foster parents. And cuckoos are so fascinating they deserve a chapter of their own. For a different reason, now to be explained.

10 The Backward Gene’s-Eye View

The previous two chapters constituted my short reprise of the gene’s-eye view of evolution as I explained it in The Selfish Gene and The Extended Phenotype. I want, now and in the next chapter, to offer the gene’s-eye view in another way, a way that is particularly suitable for The Genetic Book of the Dead. This is to imagine the view seen by a gene as it ‘looks’ backwards at its ancestral history. A vivid example concerns the cuckoo. To which deplorable bird we now turn.

‘Deplorable bird’? Of course I don’t really mean that. The phrase amused me in a Victorian bird book belonging to my Cornish grandparents, where it referred to the cormorant. Each page of the book was devoted to one species. When you turned to the cormorant’s page, the very first sentence to greet you was, ‘There is nothing to be said for this deplorable bird.’ I can’t remember what grudge the author held against the cormorant. He might have had better grounds with the cuckoo, which is certainly deplorable from the point of view of its foster parents but, as a Darwinian biologist, I think it is a supreme wonder of the world. ‘Wonder’, yes, but there’s also an element of the macabre in the spectacle of a tiny wren devotedly feeding a chick big enough to swallow it whole.

Everyone knows that cuckoos are brood parasites who trick nesting birds of other species into rearing their young. ‘Cuckoo in the nest’ is proverbial. John Wyndham’s The Midwich Cuckoos, about aliens implanting their young in unwitting human wombs, is one of several works of fiction whose titles sound the cuckoo motif. Then there are cuckoo bees, cuckoo wasps, and cuckoo ants who, in their own hexapod ways, hijack the nurturing instincts of other species of insect. The cuckoo fish, a kind of catfish from Lake Tanganyika, drops its eggs among the eggs of other fish. In this case the hosts are ‘mouthbreeders’, fish belonging to the Cichlid family who take their eggs and young into their own mouths for protection. The cuckoo fish’s eggs and later fry are welcomed into the unsuspecting host’s mouth, and tended as lovingly as the mouthbreeder’s own.

Plenty of bird species have independently evolved their own versions of the cuckoo habit, for example the cowbirds of the New World, and cuckoo finches of Africa. Within the cuckoo family itself (Cuculidae), 59 of the 141 species parasitise other species’ nests, the habit having evolved there three times independently. In this chapter, unless otherwise stated, for the sake of brevity I use the name cuckoo to mean Cuculus canorus, the so-called common cuckoo. Alas, it’s not common anymore, at least in England. I miss their springtime song even if their victims don’t, and was delighted to hear it on a recent visit to a beautiful, remote corner of western Scotland where it ‘shouts all day at nothing’. My main authority – indeed today’s world authority – is Professor Nick Davies of Cambridge University. His book Cuckoo is a delightful amalgam of natural history and memoir of his field research on Wicken Fen, near Cambridge. Described by David Attenborough as one of the country’s greatest field naturalists, he achieves heights of lyrical word-painting unsurpassed in the literature of modern natural history:

North towards the horizon is the eleventh-century cathedral of Ely, which sits on the raised land of the Isle of Ely, from where Hereward led his raids against the Normans. In the early mornings, when the mist lies low, the cathedral appears as a great ship, sailing across the fens.

The ruthlessness of the cuckoo begins straight out of the egg. The newly hatched chick has a hollow in the small of the back. Nothing sinister about that, you might think. Until you are told the sole use to which it is put. The cuckoo nestling needs the undivided attention of its foster parents. Rivals for precious food must be disposed of without delay. If it finds itself sharing the nest with either eggs or chicks of the foster species, the hatchling cuckoo fits them neatly into the hollow in its back. It then wriggles backwards up the side of the nest and tosses the competing egg or chick out. There is, of course, no suggestion that it knows what it’s doing, or why it is doing it, no feelings of guilt or remorse (or triumph) in the act. The behavioural routine simply runs like clockwork. Natural selection in ancestral generations favoured genes that shaped nervous systems in such a way as to play out this instinctive act of (foster) fratricide. That is all we can say.

And there’s no more reason to expect the foster parents to know what they are doing when they fall for the cuckoo’s trick. Birds are not little feathered humans, seeing the world through the lens of intelligent cognition. It makes at least as much sense to see the bird as an unconscious automaton. This helps us understand the otherwise surprising behaviour of foster parents. A pioneering cinematographer of the cuckoo’s dark ways was Edgar Chance, avid ornithologist of the early twentieth century. By Nick Davies’s account of his film, a mother meadow pipit appeared totally unconcerned as it watched its own precious offspring being murdered by the cuckoo chick in its nest. The mother then left on a foraging trip, as if nothing untoward had happened. When she returned, she pointlessly fed her chick as it lay dying on the ground. From a human cognitive point of view, her behaviour makes no sense: neither the impassive watching of the initial murder nor the subsequent futile feeding of the doomed chick. We shall meet this point again and again throughout the chapter.

The name ‘cuckoo’ is derived from the simple, two-note tune of the male bird’s song, so simple indeed that some ornithologists downgrade it from ‘song’ to ‘call’ (on parallel grounds to the hysterically unpopular downgrading of Pluto to sub-planet status). The cuckoo’s song (or call) is commonly described as dropping through a minor third, but I’m happy to quote no less an authority than Beethoven in support of my hearing it as a major third. His famous cuckoo in the Pastoral Symphony descends from D to B Flat. Whether major or minor, whether song or call, it is simple – and perhaps has to be simple because the male never gets a chance to learn it by imitation. A cuckoo never meets either biological parent. It knows only its foster parents, who could belong to any of a variety of species, each with its own song, which the young cuckoo must not learn. So the male cuckoo’s song has to be hard-wired genetically, and a kind of common sense concludes, not very confidently, that it should therefore be simple.

Now we approach the remarkable story that earns the cuckoo its place in a chapter on genes ‘looking backwards in time’. Cuckoo eggs mimic the colour and patterning of the other eggs in the particular foster nest in which they sit. And they mimic them even though many different foster species are involved, with very different eggs. Here is a clutch of six brambling eggs plus one cuckoo egg. The only way I, and doubtless you, can tell which one is the cuckoo egg is by its slightly larger size.

At first sight, such egg mimicry might seem no more remarkable than the ‘paintings’ of Chapter 2. Well, that’s quite remarkable enough! But now look at the next picture showing a parasitised nest of meadow pipit eggs.

Again, you can spot the tell-tale size of the cuckoo egg. But what is really noticeable is that the cuckoo egg in the second picture is dark with black speckles like meadow pipit eggs, whereas the cuckoo egg in the first picture is light and with rusty speckles like brambling eggs. Meadow pipit eggs are dramatically different from brambling eggs. Yet cuckoo eggs achieve a near-perfect colour match in each of the two nests.

Once again, the mimicry might seem par for the course, all of a piece with the lizard, frog, spider, or ptarmigan ‘paintings’ of Chapter 2. It would indeed be relatively unremarkable if the cuckoos that parasitise bramblings were a different species from the cuckoos that parasitise meadow pipits. But they aren’t. They’re the same species. Males breed indiscriminately with females reared by any foster species, so the genes of the whole species are mixed up as the generations pass. That mixing is what defines them all as of the same species. Different females, all belonging to the same species and consorting with the same males, parasitise redstarts, robins, dunnocks, wrens, reed warblers, great reed warblers, pied wagtails, and others. But each female parasitises only one of those host species. And the remarkable fact is that (with a few revealing exceptions) the eggs of each female cuckoo faithfully mimic those of the particular host in whose nest she lays them. The only consistent betrayer is that cuckoo eggs are slightly larger than the host eggs that they mimic. Even so, they are smaller than they ‘should’ be for the size of the cuckoo itself. Presumably, if the pressure to mimic drove them to be any smaller, the chicks would be penalised in some way. The actual size is a compromise between pressure to be small to mimic the host eggs, and an opposite pressure towards the larger optimum for the cuckoo’s own size.

I doubt that you’re wondering why egg mimicry benefits the cuckoos. Foster parents are mostly very good at spotting cuckoo eggs, and they often eject them. A cuckoo egg of the wrong colour would stand out like a sore thumb. Actually, that’s an unusually poor cliché. Have you ever seen a sore thumb, and did it stand out? Let’s initiate a new simile. Stands out like a baseball at Lord’s? Like a Golden Delicious in a basket of genuinely delicious apples? Just look at that cuckoo egg in the brambling nest and imagine transplanting it into the meadow pipit nest. Or vice versa. The host birds would unhesitatingly toss it out. Or, if tossing it out is too difficult, abandon the nest altogether. Such discrimination is not a surprise when you consider that bird eyes are acute enough to perfect the exquisitely detailed painting of lichen-mimicking moths and stick-mimicking caterpillars.

Foster parents, then, whether as automata or cognitively, can be expected to provide the selection pressure that explains why it might benefit cuckoo eggs to show such beautiful egg mimicry. They throw out eggs that don’t look like their own. But what is surprising, hugely so, is that cuckoos, all of one intrabreeding species, manage to mimic the eggs of many different foster parent species. To drive home the point, here’s yet another example: a reed warbler nest with, once again, wonderful egg mimicry by the single, slightly larger cuckoo egg.

These beautiful examples force us back to the central question of this whole discussion. How is it possible for female cuckoos, all belonging to the same species and all fathered by indiscriminate males, to produce eggs that match such a range of very different host eggs? Are we to believe that female cuckoos take one look at the eggs in a nest and take a decision to switch on some kind of alternative egg-colouring mechanism in the lining of the oviduct? That is improbable, to say the least. There are women who might love to control, by sheer willpower and for very different reasons, the behaviour of their own oviduct. But it’s not the kind of thing willpower does. And, with the best will in the world, it’s not clear how will will power it.

What is the true explanation for the female cuckoo’s apparent versatility? Nobody knows for sure, but the best available guess makes use of a peculiarity of bird genetics. As you know, we mammals determine our sex by the XX / XY chromosome system. Every woman has two X-chromosomes in all her body cells, so all her eggs have an X-chromosome. Every man has an X- and a Y-chromosome in all his body cells. Therefore, half his sperms are Y sperms (and would father a son when coupled with a necessarily X egg) and half are X sperms (would father a daughter when coupled with a necessarily X egg). Less well known is that birds have a similar system, but it evidently arose independently because it is reversed. The chromosomes are called Z and W instead of X and Y, but that’s not important. What matters is that in birds females are ZW and males are ZZ. That’s opposite to the mammal convention, but otherwise the principle is the same. Whereas the Y-chromosome passes only down the male line in mammals, in birds the W chromosome passes only down the female line. The W comes from the mother, the maternal grandmother, the maternal maternal great grandmother and so on back through an indefinite number of female generations.

Now recall the title of this chapter: ‘The Backward Gene’s-Eye View’. It’s all about genes looking back at their own history. Imagine you are a gene on the W-chromosome of a cuckoo, looking back at your ancestry. Not only are you in a female bird today, you have never been in a male bird. Unlike the other genes on ordinary chromosomes (autosomes), which have found themselves in male and female bodies equally often down the ages, the ancestral environments of the W-chromosome have been entirely confined to female bodies. If genes could remember the bodies they have sat in, the memories of W-chromosomes would be exclusively of female bodies not male ones. Z-chromosomes would have memories of both male and female bodies.

Hold that thought while we look at a more familiar kind of memory: memory by the brain, individual experience. It is a fact that female cuckoos remember the kind of nest in which they were reared, and choose to lay their own eggs in nests of the same foster species. Unlike the improbable feat of controlling your own oviduct, remembering early experience is exactly the kind of thing bird brains are known to do. When they come to choose a mate, as we saw in Chapter 7, birds of many species refer back to a kind of mental photograph of their parent, which they filed away in memory after their first encounter on hatching (‘imprinting’): even if – in the case of incubator-hatched goslings, for instance – what they later find attractive is Konrad Lorenz. To remember Lorenz, parental plumage, father’s song, or foster-parent’s nest – it’s all the same kind of problem. The same imprinting brain mechanism works well enough in nature even if, in captivity, it misfires.

I think you can see where this argument is going. Each female mentally imprints on the same foster nest as her mother; and therefore her maternal grandmother; and her maternal maternal great grandmother. And so on back. And her childhood imprinting leads her to choose the same kind of nest as her female forebears. So, she belongs to a cultural tradition going exclusively down the female line. Among females there are robin cuckoos, reed warbler cuckoos, dunnock cuckoos, meadow pipit cuckoos, etc., each with their own female tradition. But only females belong to these cultural traditions. Each cultural line of females is called a gens – plural gentes. A female may belong to the meadow pipit gens, or the robin gens, or the reed warbler gens, etc. Males don’t belong to any gens. They are descended from – and they father – females of all gentes indiscriminately.

Finally, we put these two strands of thought together, again in the light of the chapter’s title. With the exception of W-chromosome genes, all the genes in a female cuckoo look back through a chain of ancestors belonging to every gens that’s going. W-chromosomes aside, gentes are not genetically separate like true races, because males confound them. Only W-chromosome genes are gens-specific. Only W-chromosomes look back on ancestors of a particular gens to the exclusion of any other. We talked of two kinds of memory: genetic memory and brain memory. See how the two coincide where W-chromosome genes are concerned!

With respect to the W-chromosome, and only the W-chromosome, gentes are separate genetic races. So – I think you’ve already completed the argument yourself – if the genes that determine egg coloration and speckling are carried on the W-chromosome, it would solve the riddle we began with, the riddle of how it’s possible for the females of one species of cuckoo to mimic the eggs of a wide variety of host species. It isn’t willpower that chooses egg colour, it’s W-chromosomes.

You will have guessed that it’s not as simple as that. Things seldom are in biology. Although female cuckoos have a strong preference for their natal nest type when they come to lay, they occasionally make a mistake and lay in the ‘wrong’ nest, different from their natal nest. Presumably that’s how new gentes get their start. And not all gentes achieve good egg mimicry. Dunnock (hedge sparrow) eggs are a beautiful blue. But cuckoo eggs in dunnock nests aren’t blue (left). They aren’t even ‘trying’ to be blue, we might say. The cuckoo egg in the picture stands out like a sore … like a bloodhound in a pack of dachshunds. Are cuckoos, perhaps, constitutionally incapable of making blue eggs? No. Cuculus canorus in Finland has achieved a most beautiful blue, in perfect mimicry of redstart eggs (right). So why don’t cuckoo eggs mimic dunnock eggs? And how do they get away with it? The answer is simple, although it remains puzzling. Dunnocks are among several species that don’t discriminate, don’t throw out cuckoo eggs. They seem blind to what looks to us glaringly obvious. How is this possible, given that other small songbirds have powers of discrimination acute enough to perfect the finishing touches to the egg mimicry achieved by their respective gentes of female cuckoos? And given that bird eyes are capable of perfecting the detailed mimicry of stick caterpillars, lichen-mimicking moths, and the like?

Cuckoos and their hosts, as with stick caterpillars and their predators, are engaged in an ‘evolutionary arms race’ with one another. As mentioned in Chapter 4, arms races are run in evolutionary time. It’s a persuasive parallel to human arms races, which are run in ‘technological time’, and a lot faster. The aerial swerving and dodging chases of Spitfires and Messerschmitts were run in real time measured in split seconds. But in the background and more slowly, in factories and drawing-offices in Britain and Germany, races were run to improve their engines, propellers, wings, tails, weaponry, etc., often in response to improvements on the other side. Such technological arms races are run over a timescale measured in months or years. The arms races between cuckoos and their various host species have been running for thousands of years, again with improvements on each side calling forth retaliatory improvements in the other.

Nick Davies and his colleague Michael Brooke suggest that some gentes have been running their respective arms races for longer than others. Those against meadow pipits and reed warblers are ancient arms races, which is why both sides have become so good at outdoing the other – and therefore why the cuckoo eggs are such good mimics. The arms race against dunnocks, they suggest, has only just begun. Not enough time for the dunnocks to evolve discrimination and rejection. And not enough time for the dunnock gens of cuckoos to evolve the appropriate blue colour.

If it’s true that cuckoos have only just ‘moved into’ dunnock nests, we must suppose that these ‘pioneer’ cuckoos have ‘migrated’ from another host species, presumably one with rusty-spotted grey eggs because that’s the egg colour of the ‘newly arrived’ dunnock gens of cuckoo. I suppose this is how any new gens gets its start. But don’t be misled by ‘pioneer’ and ‘migrated’. It would not have been any kind of bold decision to sally forth into fresh nests and pastures new. It would have been a mistake. As we’ve seen, cuckoos do indeed occasionally lay an egg in the wrong kind of nest, a nest appropriate to a different gens. Their egg then really does stand out like a … invent your own substitute for the sore thumb cliché. Natural selection normally penalises such blunders, we can presume, pretty promptly. But what if it’s a new host species that hasn’t yet been ‘invaded’ by cuckoos. The new host species is naive. They haven’t hitherto had any reason to throw out mismatched eggs. Once again, remember, birds are not little feathered humans with human judgement. The arms race has yet to get properly under way. And the host species can expect to remain naive while the arms race is yet young. But how young is young? Strangely enough, we are not totally without evidence bearing on the question, as Nick Davies points out.

Call the witness Geoffrey Chaucer. In The Parlement of Foules (1382), the cuckoo is reproached: ‘Thou mordrer of the heysugge on the braunche that broghte thee forth.’ Another name for dunnock is hedge sparrow or, in Middle English, heysugge (heysoge, heysoke, eysoge). This would seem to suggest that cuckoos were already parasitising dunnocks in the fourteenth century, when Chaucer wrote. Is 650 years long enough for an arms race to reach some sort of perfection of mimicry? Perhaps not, given that, as Davies points out, only 2 per cent of dunnock nests are parasitised. Maybe, then, the selection pressure is so weak that a 600-year-old arms race is indeed young.

I prefer to add two further suggestions. The first concerns identification. Did Chaucer really mean dunnock when he said heysugge? When we say ‘sparrow’ we normally mean the house sparrow, Passer domesticus, not the hedge sparrow or dunnock, Prunella modularis. Yet the English word ‘sparrow’ is used for both. To many who are not avid twitchers, all little brown birds (LBBs) look much the same, and we might even sink so low as to call them all ‘sparrows’. I can’t help wondering whether Chaucer was using ‘heysugge’ to mean LBB rather than specifically Prunella modularis?

My second suggestion is more biologically interesting. If we think carefully about it, there’s no reason, is there, to suppose that there’s only one cuckoo gens for each host species? Maybe Chaucer’s gens of dunnock cuckoos has died out, and a new gens of dunnock cuckoos is just beginning its arms race. Perhaps other gentes of dunnock cuckoos have perfect egg mimicry today, but have not come to the notice of ornithologists. There would be no relevant gene flow between them because males don’t have W-chromosomes.

Claire Spottiswoode and her colleagues are running a parallel study of an unrelated South African finch, which convergently evolved the cuckoo habit. The cuckoo finch, Anomalospiza imberbis, lays its eggs in the nests of grass warblers. Different gentes of cuckoo finch mimic the eggs of different grass warbler species. There is genetic evidence that what distinguishes the gentes is indeed their W-chromosomes, which reinforces the idea that the same thing is going on in cuckoos. As Dr Spottiswoode points out, this doesn’t have to mean that every detail of all the egg colours is carried on the W-chromosome. In both cuckoos and cuckoo finches, genes for making all the different egg colours have very probably been built up on other chromosomes (‘autosomes’) over many generations, and are carried by all the gentes and passed on by males as well as females. The W-chromosome need only have switch-genes – genes that switch on or off whole suites of genes carried on autosomes. And the relevant autosomal genes would be carried by males as well as females.

This is indeed how sex itself is determined. If you have a Y-chromosome, you have a penis. If you have no Y-chromosome, you have a clitoris instead. But there’s no reason to suppose that the genes that influence the shape and size of a penis are confined to the Y-chromosome. Far from it. It’s entirely plausible that they are scattered over many autosomes. There’s no reason to doubt that a man may inherit genes for penis size from his mother as well as from his father. Presence or absence of a Y-chromosome determines only which alternative suite of genes on autosomes will be switched on. For most purposes you can think of the entire Y-chromosome as a single gene that switches on suites of other genes on autosomes elsewhere in the genome. A point of terminology: members of these suites of autosomal genes are called ‘sex-limited’ as distinct from ‘sex-linked’. Sex-linked genes are those that are actually carried on the sex-chromosomes themselves.

Probably the best guess towards a solution of the riddle of cuckoo egg mimicry is that suites of genes on lots of chromosomes determine egg coloration and spotting. These are equivalent to ‘sex-limited’, and we may call them ‘gens-limited’. They are switched on or off by the presence or absence of one or more genes on the W-chromosome, genes that, by analogy, we can call ‘gens-linked’. All cuckoo autosomes may have suites of genes for mimicking a whole repertoire of host eggs. W-chromosomes contain switch genes that determine which suite of genes is turned on. And it is W-chromosomes that are peculiar to each gens of females, W-chromosomes that look back at their history and see a long line of nests of only one foster species.

This interpretation of egg mimicry in cuckoos is my introduction to the topic of the backward gene’s-eye view, genes looking over their shoulder at their own ancestry. Here’s a similar but more complicated example involving fish and the Y-chromosome. Different kinds of fish display a bewildering variety of sex-determining systems. Some don’t use sex chromosomes at all but determine sex by external cues. Some fish are like birds in that females are XY and males are XX. Others are like us mammals: males are XY, females XX. Among these are small fish of the genus Poecilia, which includes mollies and guppies among popular aquarium fish. One species, Poecilia parae, has a remarkable colour polymorphism, which affects only the males. Polymorphism means that there are different genetically determined colour types coexisting in the population (in this case five colour patterns) and the proportions of the different types remain stable in the population through time. All five male morphs can be found swimming together in South American streams. There’s only one female morph: females look alike.

Since the polymorphism affects only one sex, we can call them five gentes, by analogy with the cuckoos, with the difference that in these fish it’s the males who are separated by gens. The picture shows the five male types plus a female at the bottom. Three of the five male types have two long stripes like tramlines. Between the tramlines there is colour, and I’ll call them reds, yellows, and blues respectively. These three ‘tramliners’ can, for many purposes, be lumped together. The fourth type has vertical stripes. They’re officially named ‘parae’, but confusingly that’s also the name of the whole species. I’ll call them ‘tigers’. The fifth type, ‘immaculata’, is relatively plain grey, like females but smaller, and I’ll call them ‘greys’.

Tigers are the largest. They behave aggressively, chasing rival males away, and copulating with females by force. Greys are the smallest, and they manage to copulate only by occasionally sneaking up on females opportunistically. When they get away with it, it seems to be because otherwise aggressive males mistake them for females, which they do indeed resemble. Greys have the largest testes, presumably capable of producing the most sperm, perhaps to take advantage of their scarce opportunities to use it. Red, yellow, and blue tram-liners are of intermediate size. Rather than rape or sneak, they court females in a civilised manner, displaying their respective coloured flanks.

Tiger
Grey
Blue
Yellow
Red
Female

Male ‘gentes’ in fish?

Now here’s where the parallel to cuckoos kicks in. Evidence suggests that colour morph inheritance runs entirely down the male line. In every case studied, sons belong to the same type as their father, and therefore paternal grandfather, paternal paternal great grandfather, etc. Their mother has no genetic say in the matter, and nor does their maternal grandfather, etc., even though each one belongs to one colour gens or another. This suggests the hypothesis that the five types of males differ with respect to their Y-chromosomes – just as gens-inheritance in female cuckoos seems to be carried on the W-chromosome. The details of colour pattern and behaviour of the male fish may be carried in suites of genes on autosomes (gens-limited). But the genes determining which gens an individual belongs to (and presumably which suite of colour and pattern genes on other chromosomes is switched on) seem to be gens-linked, that is, carried on the Y-chromosome.

Researchers are doing fascinating work on mate choice in these fish and are homing in on what maintains the polymorphism. It seems that each of the five male types has an equilibrium frequency, fitting the definition of a true polymorphism. If its frequency falls below the equilibrium, it is favoured and therefore becomes more frequent in the population. If its frequency rises too high, it is penalised and its frequency decreases. This so-called ‘frequency-dependent selection’ is a known way for polymorphisms to be maintained in a population. How might it work in practice? The details are not yet clear but might look something like this. The grey sneakers benefit from being mistaken for females. If they become too frequent, perhaps the real females or aggressive tigers get ‘wise’ to them. How about the tigers themselves? If they get too frequent, they waste time fighting each other instead of mating. This might give the greys more opportunity to sneak matings. As for the three ‘tramliners’, who court females in a gentlemanly manner by flashing their vividly coloured flanks, there is some evidence that females prefer rarer types. This would fit the ‘equilibrium frequency’ idea, although it’s not clear why females should exhibit such a preference. More research is needed and is under way now. I am grateful to Dr Ben Sandkam, formerly of the University of British Columbia and now at Cornell, for sharing with me his thoughts on these matters.

Now let’s again apply the backward-looking technique of this chapter. Every male of Poecilia parae can look back through a long line of male ancestors, all belonging to the same gens as him, and all sharing the same Y-chromosome. This is what makes it possible for suites of genes for colour patterning and associated behaviour to become switched on in separate gentes of males, despite their sharing the same ancestors in the female line. The gene’s-eye view of the past comes into its own again, as with the cuckoos. Autosomal genes, governing characteristics other than gens-specific colour, look back on ancestors of all gentes.

Returning to cuckoos, the ‘looking back’ ploy can help us answer another riddle, and it’s an even tougher one. Although most host species are very good at distinguishing cuckoo eggs from their own (how else could natural selection have perfected cuckoo egg mimicry?), they turn out to be lamentably bad later, failing to notice that the growing cuckoo fledgling is an impostor. Even though it dwarfs them, in most cases grotesquely so. A tiny warbler is in danger, you might think, of being swallowed whole by its monstrous foster child. Foster parents, of whatever species, end up dwarfed by the cuckoo nestling into whom they tirelessly shovel food, working every devoted daylight hour to do so. How do the cuckoo nestlings get away with such a transparent, over-the-top deception? Once again, we have to be more than usually on our guard against anthropomorphism. Do not ask whether the bird’s behaviour makes sense from a human-like cognitive perspective. Of course it doesn’t. Ask instead about selection pressures acting on ancestral genes that control the development of behavioural automatisms.

A warbler feeding a cuckoo

Even given this preliminary, I must admit that available answers to the riddle epitomised by the picture on the previous page remain unsatisfying, compared to the explanations that I am accustomed to offering in my books. And indeed, compared to the explanation of egg mimicry. But here’s the best explanation – or series of partial explanations – I can find. We return to the idea of the arms race. In our 1979 paper, John Krebs and I considered ways in which an arms race might end in ‘victory’ for one side (here again, the quotation marks are strongly advised). We identified two principles, the ‘Life Dinner’ and the ‘Rare Enemy’ principle. These are closely related, maybe just different aspects of the same thing.

In one of Aesop’s Fables, a hound was pursuing a hare, got tired and gave up. Taunted for his lack of stamina, the hound replied, ‘It’s all very well for you to laugh, but we had not the same stake at hazard. He was running for his life, but I was only running for my dinner.’

As in military arms races, predators and prey must balance design improvements and resources against economic costs. The more they put into servicing the arms race – muscles, lungs, heart, the machinery of speed and endurance – the less is available for other aspects of life such as making eggs or milk, building up fat reserves for the winter etc. In the language of Darwinism, Aesop’s hares have been subject to stronger selection to invest resources into the arms race than the hounds. There is an asymmetry in the cost of failure – loss of life versus mere loss of dinner. The failed predator lives to pursue another prey. The failed prey has fled its last pursuer. But now, notice how we can say the same thing more piercingly in the language of the genetic book of the dead. The predator’s genes can look back on ancestors many of whom were outrun by prey. But not one of the prey’s ancestors was outrun by a predator. At least not before it had passed on its genes. Plenty of predator genes can look back on ancestors who failed to outrun prey. Not a single prey gene can look back on ancestors who had lost a race against a predator.

Apply the Life Dinner Principle to the cuckoo nestling and its host. The cuckoo nestling can look back on an unbroken line of ancestors, literally not a single one of whom was outwitted by a discriminating host. If it had been, it would not have become an ancestor. Cuckoo genes for failing to fool hosts are never passed on. But genes that lead foster parents to fail to notice cuckoos? Plenty of hosts who were fooled by cuckoos could live to breed again. Genetic tendencies among hosts to be fooled by cuckoos can be passed on. Genetic tendencies among cuckoos to fail to fool hosts are never passed on. It’s the Life Dinner Principle in operation.

Moreover, the host can look back on ancestors many of whom may never have met a cuckoo in their lives. In Nick Davies and Michael Brooke’s long-running study on Wicken Fen, only 5 to 10 per cent of reed warbler nests were parasitised by cuckoos. And this brings us to the Rare Enemy Effect. Cuckoos are comparatively rare. Most reed warblers, wagtails, pipits, dunnocks, etc. probably get through their lives and successfully reproduce without ever encountering a cuckoo. They may look back on many ancestors who never encountered a cuckoo in their lives. But every single cuckoo looks back at an unbroken line of ancestors who successfully fooled a host into feeding them. Asymmetries of this kind could favour ‘victory’ such that even a monstrous cuckoo nestling gets away with fooling its diminutive foster parent. The selection pressure to outwit cuckoos is weak compared to the selection pressure on cuckoos to do the outwitting.

Another parable with an Aesopian flavour is the fable of the boiled frog. A frog dropped into very hot water might do anything in its power to jump out. But a frog in cold water that is slowly heated up does not notice until it is too late. When the baby cuckoo first hatches, the deceiver is indistinguishable from the real thing. As it gradually grows, there is no one day when it suddenly becomes obvious that it’s a fake. Just as there’s never a day when a baby becomes a child; or a child a teenager; or a middle-aged man old. Every day, it looks much the same as the day before. Perhaps this helps the outwitting. Note that the boiled frog effect doesn’t apply to eggs. A cuckoo egg suddenly appears in the nest. It doesn’t gradually become more and more imposterish like a cuckoo nestling.

In another pair of papers already mentioned, Krebs and I proposed that animal communication in general can be seen as manipulation. I discussed this in Chapter 7 in connection with nightingale song bewitching John Keats. Birdsong is known to cause female gonads to swell. This is an example of what we called manipulation. It will not always be to the female’s advantage to submit to it. There will be an arms race between salesmanship and sales-resistance, each side escalating in response to the other. What tricks of salesmanship might the cuckoo nestling employ, in response to the sales-resistance of the host? They’d need to be pretty powerful to outweigh the eventually incongruous mismatch in size between foster parent and cuckoo nestling. But that’s no argument against their existence.

All nestlings open their gapes wide and squawk their appeals for food. If you’re a baby reed warbler, say, the louder you cry, the more likely you are to persuade your parent to drop food into your gape rather than a sibling’s (and there is indeed good Darwinian reason for competition among siblings, even real gene-sharing siblings). On the other hand, loud vocalisation costs vital energy. This applies to baby birds as much as to adults. In one study of wrens at Oxford, the researcher allowed himself to speculate that a male literally sang himself to death. The calling rate and loudness of a baby reed warbler will normally be regulated to an optimum level: enough to compete with siblings, but not so much as to overtax itself or attract predators. The oversized baby cuckoo needs as much food as four young reed warblers. It urges the foster parent on by sounding like a clutch of reed warbler chicks rather than just one very loud reed warbler chick.

Among the ingenious field experiments done by Nick Davies, he and his colleague Rebecca Kilner put a blackbird nestling in a reed warbler nest. The young blackbird was about the same size as a cuckoo nestling. The reed warblers fed it, but at a lower rate than they would normally feed a baby cuckoo. Then the experimenters played their masterstroke: a sound recording of a baby cuckoo piped through a little loudspeaker next to the nest, switched on whenever the baby blackbird was seen to beg. Now the reed warbler adults upped the rate with which they fed the blackbird chick, to a rate appropriate to a baby cuckoo – the same rate as for a clutch of baby reed warblers. And indeed, a recording of four baby reed warblers crying had the same effect. It would seem that baby cuckoo squawks have evolved to become a super-stimulus. Super-stimuli are well attested in experiments on bird behaviour. My old maestro Niko Tinbergen reported that oystercatchers, offered a choice, will preferentially attempt to incubate a dummy egg eight times the volume of their own egg. It’s called a supernormal stimulus. Something like this is what we’d expect as the culmination of an evolutionary arms race, with escalating salesmanship on the cuckoo’s side keeping pace with escalating sales-resistance on the part of the foster parents.

How about a visual equivalent of such a super-stimulus? The open beak of all nestlings is conspicuous, often bright yellow, orange, or red. Doubtless such bright coloration persuades the parents to drop food in, the brighter the gape the greater the chance of their favouring this gape rather than a sibling’s. Reed warbler chicks have a yellow gape. Davies and colleagues found that reed warbler parents gauge their food-fetching efforts according to the total area of yellowness gaping at them in the nest, and also to the rate of begging cries. Cuckoo chicks have a red gape. Is this, perhaps, a stronger stimulus than yellow? An experiment with painted gapes failed to support the hypothesis. Is the cuckoo gape, then, larger than a reed warbler chick’s gape? Yes, cuckoo chicks have a bigger gape than any one reed warbler chick. But its area is not equal to the sum of four reed warbler chicks – perhaps closer to two. Cuckoo chicks use sound to compensate for this, and by two weeks of age a cuckoo chick sounds like a clutch of reed warbler chicks. The combination of a somewhat bigger gape than one reed warbler chick’s, together with supernormal begging cries, is just enough to persuade the adult reed warblers to pump into the cuckoo chick as much food as they would normally bring to a whole clutch of their own chicks. Once again, we could see the supernormal begging call as the end product of an escalating arms race between salesmanship and sales-resistance.

A cardinal feeding a goldfish

That birds are susceptible to large gapes – even the alien gape of a fish – is shown by the well-attested observation of a cardinal (an American bird) repeatedly dropping food into the open mouth of a goldfish. We view the scene through human eyes and think, how absurd, how could a bird be so stupid? But the example of the oystercatcher sitting on the giant egg should warn us that human eyes are precisely what we should not trust. We have no right to be sarcastic. Birds are not little humans, cognitively aware of what they are doing and why they are doing it. And after all, a human male can be sexually aroused by a supernormal caricature of a female, even though he is well aware that it is a drawing on two-dimensional paper, with unnaturally exaggerated features, and a fraction of normal size. The baby cuckoo has no idea what it is doing when it tosses eggs out of the nest. Think of it as a programmed automaton. The oystercatcher does not know why it sits on a giant egg. Think of it as a pre-programmed incubation machine. And in the same way, think of a parent bird as a robot mother, programmed to drop food into wide-open gapes, however ridiculous it may seem to us when the gape belongs to a fish. Or to the giant imposter who is a nestling cuckoo.

If cuckoo nestlings have a supernormal gape, mimicking two ordinary chicks, there’s an Asian cuckoo, Horsfield’s hawk cuckoo, Cuculus fugax, that goes one better. It has the visual equivalent of a clutch of gapes. In addition to its yellow gape, it has a pair of dummy gapes: a patch of bare skin on each wing, the same yellow colour as the real gape. It waves the wing patches about, usually one at a time, next to the real gape. The foster parent (a species of blue robin was the host in this Japanese study by Dr Keita Tanaka) is stimulated by the double whammy of gape plus patch. Dr Tanaka has kindly sent me several photographs plus some amazing film footage. As soon as the foster parent flies in, the cuckoo chick dramatically raises its right wing and waves it about. The gesture reminds me of a swordsman raising his shield to intercept an attack. But this analogy has it exactly wrong. The point is not to repel but to attract. One film even shows the robin vigorously stuffing food up against the yellow patch on the upheld right wing, before turning and shoving it into the wide-open gape instead. The Japanese researchers ingeniously blacked out the wing patch, and this reduced the feeding rate by the robins. There’s a similar story for another brood parasite, the whistling hawk cuckoo, Hierococcyx nisicolor, in China. Like the Horsfield’s hawk cuckoo, the nestlings have yellow wing patches that they display in the same way, to fool foster parents.

So much for cuckoos, not deplorable because a true wonder of nature and natural selection. Now, let’s see what else we can do with the notion of genes looking over their shoulder.

Horsfield’s hawk cuckoo with fake gape on wing

11 More Glances in the Rear-View Mirror

Where once they would have talked of the good of the species, nowadays essentially all serious biologists studying animal behaviour in the wild have adopted what I am calling the gene’s-eye view. Whatever the animal is doing, the question these modern workers ask is, ‘How does the behaviour benefit the self-interested genes that programmed it?’ David Haig, now at Harvard University, is one of those pushing this way of thinking towards the limit, illuminating a great diversity of topics, including some important ones that doctors should care about, such as problems of pregnancy.

Among other things, Haig noticed a lovely example of genes looking backwards – actually at the immediate past generation. There’s a phenomenon called genomic imprinting. A gene can ‘know’ (by a chemical marker) whether it came from the individual’s father or mother. As you can imagine, this radically changes the ‘strategic calculations’ whereby a gene looks after its own self-interest. Haig shows how genomic imprinting changes how a gene views kin. Normally, a gene for kin altruism should regard a half-sibling as equivalent to a nephew or niece – half the value of a full sib or offspring. But if the altruistic gene ‘knows’ it came from the mother and not the father, it should see a maternal half-sibling as equal to its own offspring, or to a normal full sibling. The other way round if it ‘knows’ it came from the father. It should then see the maternal half-sibling as equivalent to an unrelated individual. Genomic imprinting opens up a whole lot of ways in which genes within an individual can come into conflict with one another, the topic of Burt and Trivers’ book Genes in Conflict. Haig goes so far as to blame warring genes for the familiar psychological sensation of being pulled in two directions at once, as in short-term gratification versus longer-term benefit. Genomic imprinting provides a stark example of how a gene might look in the ‘rear-view mirror’. Other examples constitute the topics of this chapter.

A gene on a mammalian Y-chromosome ‘looks back’ at an immensely long string of ancestral male bodies and not a single female one, probably as far back as the dawn of mammals if not further. Our mammal Y-chromosome has been swimming in testosterone for perhaps 200 million years. But if Y-chromosomes look back at only male bodies, what about X-chromosomes? If you are a gene on an X-chromosome, you might come from the animal’s father, but you are twice as likely to come from its mother. Two-thirds of your ancestral history has been in female bodies, one-third in male bodies. If you are a gene on a chromosome other than a sex chromosome, an autosome, half your ancestral history was in female bodies, half in male bodies. We should expect many autosomal genes to have sex-limited effects, programmed with an IF statement: one effect whenever they find themselves in a male body, a different effect when in a female body.

But when any gene looks back at the male bodies that it has inhabited, what it sees will not be a random sample of male bodies but a restricted sub-set. This is because the average male is often denied the Darwinian privilege of reproduction. A minority of males monopolises the mating opportunities. Most females, on the other hand, enjoy close to the average reproductive success. Red deer stags with large antlers prevail in fights over access to females. So when a red deer gene looks back at its male ancestors, it will see the minority of male bodies that are topped by abnormally large antlers.

Even more extreme is the asymmetry shown by seals, especially Mirounga, the elephant seal. There are two species: the southern elephant seal, which I have seen, close enough to touch (though I would not), on the remote island of South Georgia, and the northern elephant seal, which Burney Le Boeuf has thoroughly studied on the Pacific beaches of California. Like many mammals, elephant seals have harem-based societies but they carry it to an extreme. Successful males, ‘beachmasters’, are gigantic: up to 4 metres long and weighing 2 tonnes. Females are relatively small and are gathered into harems, which may typically number as many as fifty ‘belonging to’, and vigorously defended by, a single dominant male. Most of the males in the population have no harem, and either never reproduce or bide their time hoping to sneak an occasional copulation, as well as aspiring eventually to get big and strong enough to displace a beachmaster. In one report from Le Boeuf’s long-term California study of northern elephant seals, only eight males inseminated an astonishing 348 females. One male inseminated 121 females, while the great majority of males had no reproductive success at all. An elephant seal gene on a Y-chromosome looks back at, not just a long sequence of male bodies, but specifically at the overgrown, blubbery, belching, bloated bodies of a tiny minority of dominant, harem-holding beachmasters: highly aggressive males, over-endowed with testosterone and with the dangling trunks used as living trombones to resonate roars that intimidate other males. On the other hand, an elephant seal gene will look back at a succession of female bodies that are close to the average.

Do you find something puzzling about the fact that only a small minority of males does almost all the fathering? Isn’t it terribly wasteful? Think of all those bachelor males, consuming a fat slice of the food resources available to the species, yet never reproducing. A ‘top-down’ economic planner with species welfare in mind would protest that most of those males shouldn’t be there. Why doesn’t the species evolve a skewed sex ratio such that only a few males are born: just enough males to service the females, the same number of males as would normally hold harems? They wouldn’t have to fight each other, they’d all get a harem as a matter of automatic entitlement, just for being male. Wouldn’t a species with such an economically sensible, planned economy prevail over the present, wildly uneconomical, strife-ridden species? Wouldn’t the planned economy species win out in natural selection?

Sexual inequality on the beach

Yes, if natural selection chose between species. But, contrary to a widespread misunderstanding, it doesn’t. Natural selection chooses between genes, by virtue of their influence on individuals. And that makes all the difference. If the sensible planned economy were to come about by Darwinian means, it would have to be through the natural selection of genes controlling the sex ratio. This is not impossible. A gene could bias the number of X sperms versus Y sperms produced by males. Or it could favour selective abortion of some male foetuses. Or it could favour starving some baby sons to death and keeping just a favoured few. Never mind how it does it, just call this hypothetical gene the Planned Economy Gene, pegged to top-down common sense.

Imagine a planned economy population where most of the individuals are female, say one male for every ten females. This is the kind of population our sensible economist would expect to see. It is economically sensible because food is not wasted on males who are never going to reproduce. Now imagine a mutant gene arising, a mutation that biases individuals towards having sons. Will this male-favouring gene spread through the population? Alas for the planned economy, it certainly will. In the planned economy, females outnumber males ten to one, so a typical male can expect ten times as many descendants as a typical female. It’s a bonanza for males. The son-biased mutant gene will spread rapidly through the population. And the males will have good reason to fight. It’s the flip side of our observation that our hypothetical gene looks back at a successful minority of male bodies, not at an average sample of male bodies.

Will the population sex ratio swing right round to the opposite extreme and become male-biased? No, natural selection will stabilise the sex ratio we actually see, a 50/50 sex ratio (but see the important reservation below) with a minority of harem-holding males and a majority of frustrated bachelors. Here’s why. If you have a son, there’s a good chance he’ll end up a disconsolate bachelor who’ll give you no grandchildren. But if your son does end up a harem-holder, you’ve hit the jackpot where grandchildren are concerned. The expected reproductive success of a son, averaged over his slim chance of the jackpot plus the much greater chance of bachelor misery, equals the expected average reproductive success of a daughter. Equal sex ratio genes prevail, even though the society they create is so horribly uneconomical. Sensible as it sounds, the ‘planned economy’ cannot be favoured by natural selection. In this respect at least, natural selection is not a ‘sensible’ economist.

I said that selection would stabilise the sex ratio at 50/50 but I added a cautionary reservation. There are various reasons for that caution, and they are important. Here’s one of them. Suppose it costs twice as much to rear a son as to rear a daughter. To equip a son to fight off rivals and win a harem, he must be big. Being big doesn’t come free. It costs food. If a mother seal must suckle a son for longer than a daughter, if a son costs twice as much as a daughter to rear, the ‘choice’ facing the mother is not ‘Shall I have a son or a daughter’ but ‘Shall I have a son or two daughters?’ The general principle, first clearly understood by RA Fisher, is that the sex ratio stabilised by natural selection is 50/50 measured in economic expenditure on daughters versus economic expenditure on sons. That will amount to 50/50 in numbers of male and female bodies, only if the cost of making sons and daughters is the same. Fisher’s principle balances what he called parental expenditure on sons versus daughters. This may cash out in the form of equal numbers of males and females in the population, but only if sons and daughters are equally costly to rear. There are other complications, some pointed out by WD Hamilton, but I won’t stay to deal with them.

Elephant seals are an extreme example of a principle that typifies many mammal species. Females tend to have nearly the same reproductive success as each other, close to the population average, while a minority of males enjoys a disproportionate monopoly of reproduction. In statistical language, mean reproductive success of males and females is equal, but males tend to have a higher variance in reproductive success. And, to return to the title of this chapter, the ancestral females that genes ‘look back on’ will be close to the average. But they’ll look back on an ancestral history dominated by a minority of males: that minority endowed with whatever it takes in the species concerned – large antlers, fearsome canine teeth, sheer bodily bulk, courage, or whatever it might be.

‘Courage’ can be given a more precise meaning. Any animal must balance the short-term value of reproducing now against its own long-term survival to reproduce in the future. A brutal fight against a rival male may end in victory and a harem. But it may end in death, or serious injury which presages death. Courage is at a premium. Risking death is worthwhile because the stakes for a male are so high: a huge number of pups to his name if he wins, zero and perhaps death if he loses. A female seal would give higher priority to surviving to reproduce next year. She only has one pup in a year, so she’ll maximise her reproductive success by surviving herself. Natural selection would favour females who are more risk-averse than males; would favour males who are more courageous or foolhardy. Males are biased towards a high-stakes high-risk strategy. This is probably why males tend to die younger. Even if they’re not killed in battle, their whole physiology is skewed towards living to the full while young, even at the expense of living on at all when old.

A complication is that, in some species, including elephant seals, subordinate males sneak surreptitious matings at the risk of punishment from dominant males. They may adopt a particular strategy known as the ‘sneaky male’ strategy. This means that as a Y-chromosome looks back at its history, it will see mostly a river of dominant harem-holders but also a side rivulet, that of the sneaky males. And now, a change of topic.

As will be apparent by now, my late colleague WD Hamilton had a restless and highly original curiosity, which led him to solve many outstanding riddles in evolutionary theory, problems that lesser intellects never even recognised as problems. A naturalist from boyhood, he noticed that many insect species come in two distinct types which could be named ‘dispersers’ and ‘stay-at-homes’. Dispersers typically have wings. ‘Stay-at-homes’ often don’t. It’s surprising how many species of insects have both winged and wingless members, seemingly in balanced proportion. If you like human parallels, think of human families in which one brother comfortably inherits the farm while the other brother emigrates to the far side of the world in search of an improbable fortune. In the case of plants, dandelion seeds with their fluffy parachutes are ‘winged’ dispersers, while other members of the daisy family have, to quote Hamilton, ‘a mixture of winged and wingless within a single flower head’.

To stolid common sense, it seems intuitively obvious that if parents live in a good place (and they probably do live in a good place, or they wouldn’t have succeeded in becoming parents), the best strategy for an offspring must be to stay in the same good place. ‘Stay at home and mind the family farm’ would seem to be the watchword, and that was the conventional wisdom among most evolutionary theorists before Bill Hamilton. Bill suspected, by contrast, that selection would favour a balance between stay-at-homes and dispersers, the point of balance varying from species to species. He enlisted the help of his mathematical colleague Robert May, and together they developed mathematical models that supported his intuition.

My own, less mathematical way to express Bill’s intuition is in terms of the gene’s-eye view of the past. No matter how favourable the ‘family farm’ – the environment in which parents have flourished – it is sooner or later going to be subject to a catastrophe: a forest fire perhaps, or a disastrous flood or drought. So, as a gene looks back at the history of ‘the family farm’, the parental, grandparental, and great grandparental generations may indeed have flourished there. The success story might go back an unbroken ten or even twenty generations. But eventually, if it looks far enough back into the past, the stay-at-home gene will eventually hit one of those catastrophes.

The disperser gene may look back on the recent past as one of comparative failure: life on the family farm was milk and honey. But if we look back sufficiently far, we come to a generation where only the disperser gene, the gene for wild wanderlust, made it through. There’s also the anthropomorphic point that wanderlust occasionally strikes gold.

Naked mole rat

I perhaps went too far when in 1989 I published a speculation about naked mole rats, but it serves to dramatise the point. Naked mole rats are small, spectacularly ugly (by human aesthetics) African mammals, who live underground. They are famous among biologists as the nearest mammalian approach to social insects: ants and termites. They live in large colonies of as many as 100 individuals in which only one female, the ‘queen’, normally reproduces, and she is fecund enough to compensate for the near sterility of all the other females, who function as ‘workers’. A colony can extend through a huge network of 2 or 3 miles of burrows, gathering underground tubers as food.

This much has become lore among biologists intrigued by the obvious similarity to social insects. However, one discrepancy always worried me. Although the ants and termites that we ordinarily see are wingless, sterile workers, their underground nests periodically erupt in a boiling mass of winged reproductive individuals of both sexes. These fly up to mate, after which the newly fertilised young queens settle down, lose their wings (in many cases even biting them off), dig a hole, and attempt to found a new underground nest with the aid of sterile, wingless worker daughters (and sons in the case of termites). The winged castes are Hamilton’s dispersers, and they are an essential part – indeed, the essential part – of the biology of social insects. You could say they are what the whole social insect enterprise is all about. Why don’t naked mole rats have an equivalent? Their lack of a dispersal phase is something approaching a scandal!

Not literally winged dispersers! Even I am not foolhardy enough to predict rodents with wings. But I did wonder, and still do, whether there might be a dispersal phase that nobody has spotted yet. In 1989 I wrote: ‘Is it conceivable that some already known hairy rodent, running energetically above ground and hitherto classified as a different species, might turn out to be the lost caste of the naked mole rat?’ My idea for a hitherto unrecognised dispersal caste may not have much going for it, but it is at least testable, a virtue that scientists value highly. The genome of the naked mole rat has been sequenced. If my hypothetical dispersal phase were ever discovered, some hairy mole rats should turn out to have the same genes.

I admitted the implausibility of my suggestion. How could such a hypothetical creature have been overlooked by biologists? However, I went on to make a comparison with locusts. Locusts are the terrifying ‘wanderlust’ phase of harmless ‘stay-at-home’ grasshoppers. They look different from grasshoppers and behave very differently. They are the very same grasshoppers but (oh, in a moment) they change. The genes of a harmless grasshopper have the capacity, when the conditions are right, to change (change utterly, and a terrible beauty is born). The devastating effects are all too well known. My point is that locust plagues only occasionally happen. It just takes the right conditions. Perhaps the dispersal phase of the naked mole rat has yet to erupt during the decades since biologists have been around to study the species? No wonder it has never yet been seen. Perhaps it would take only a crafty hormone injection … and a naked mole rat could become its own hairy, scurrying (though not, I suppose, winged) dispersal phase.

Another change of topic before we leave the backwards gene’s-eye view. There are two ways in which we can look back at a family tree. Conventional pedigrees trace ancestry via individuals. Who begat whom? Which individual was born of which mother? The most recent individual ancestor shared by the late Queen Elizabeth II and her husband Prince Philip was Queen Victoria. But you can also trace the ancestry of a particular gene, and you will have guessed that this is the alternative manner of tale I want to tell here. Genes, like individuals, have parent genes and offspring genes. Genes, as well as individuals, have pedigrees, family trees. But there is a significant difference between a ‘people tree’ and a ‘gene tree’. An individual person has two parents, four grandparents, eight great grandparents, etc. So a people tree is a vast ramification as you look backwards in time. Any attempt to draw it out completely will soon get out of hand. The best way to visualise it is not on paper but zooming around a computer screen. Not so the gene tree. A gene has only one parent, one grandparent, one great grandparent, etc. A gene tree is therefore a simple linear array streaking back in time, whereas a people tree bifurcates its way unmanageably into the past. This is not so when you look forwards in time, by the way. A gene can have many offspring but only ever one parent. Looking forwards, gene trees branch and branch. But this chapter is all about looking backwards.

A particular sub-lethal gene, haemophilia, has plagued the royal families of Europe ever since the early nineteenth century. The gene tree of royal haemophilia is simple and fits the page comfortably. The equivalent people tree would want several square metres of paper to be legible. The royal haemophilia gene can be traced back to a particular individual ancestor, Queen Victoria, one of whose two X-chromosomes bore the gene. The mutation occurred, to quote Steve Jones’s mordant phrase, ‘in the august testicles’ of her father, Edward, Duke of Kent. One of Victoria’s four sons, Prince Leopold, suffered from haemophilia. The other sons, including Edward VII and his descendants such as our present monarch, King Charles III, beat the odds and were lucky to escape. Leopold survived to the age of thirty, long enough to have a daughter, Princess Alice of Albany, who inevitably carried the gene on one of her X-chromosomes. Her son Prince Rupert of Teck realised his 50 per cent probability of being afflicted and died young.

Royal haemophilia

Of Victoria’s five daughters, three (at least) inherited the gene. Princess Alice of Hesse passed it on to her son, Prince Friedrich, who died in infancy, and to two daughters, Irene and Alexandra, who passed it on to three haemophiliac grandsons of Alice, including the Tsarevich Alexey of Russia. Irene married her first cousin Henry, a common practice among royals and generally not a good idea because of inbreeding depression. But inbreeding depression was not responsible for the fact that two of their sons, Waldemar and Heinrich, suffered from haemophilia: they got it on their X-chromosome from their mother, and she’d have been equally likely to pass it on, whomever she married, cousin or not (unless the cousin was himself haemophiliac, in which case 50 per cent of her daughters would actually suffer from the disease itself). Another of Victoria’s daughters, Princess Beatrice bequeathed the gene to her daughter the Queen of Spain, and on into the Spanish Royal Family, to the resentment, I gather, of the Spanish.

Tracing back the gene tree of the royal haemophilia gene, all lines coalesce in Victoria. And indeed, there is a flourishing branch of mathematical genetic theory called Coalescent Theory in which you look back at the history of a genetic variant in a population and trace the most recent common ancestor of that gene – the coalescent gene upon which all lines converge as you look back. Forget about individuals, look through the skin to the genes within, and you can trace two copies of a particular gene back in time until you hit the ancestor in whom they coalesce. That coalescence point is the ancestral individual in which the gene itself divided into two copies, which then went their separate ways in two siblings and eventually two lines of descendants. If you make purifying assumptions like random mating, no natural selection, and everybody has two children, the coalescent tree has an expected form that mathematicians can calculate in theory. In reality, of course, those assumptions are violated, and that’s when it becomes interesting. Royal families, for example, typically violate the assumption of random mating. Protocol and political expediency constrain them to marry each other.

Coalescent theory is an important part of modern population genetics, and very relevant to this chapter on the backwards gene’s-eye view, but the mathematics is outside my scope here. I will discuss one intriguing example: a particular study of one man’s genome – as it happens, my genome, although that isn’t why I find it intriguing. It is a remarkable fact that you can make powerful inferences about the demographic history of an entire population using the genome of just a single individual. For a rather odd reason, I was one of the earliest people in Britain to have their entire genome (as opposed to the relatively small sample done by the likes of ‘23-and-Me’) sequenced. I handed the data disc over to my colleague Dr Yan Wong, and he included a clever analysis of it in the book that we co-authored, The Ancestor’s Tale (2016). It’s rather tricky to explain, but I’ll do my best.

In every cell of my body swim twenty-three chromosomes inherited intact from my father and twenty-three from my mother. Every (autosomal) paternal gene has an exact opposite number (allele) on the corresponding maternal chromosome, but my father John’s chromosomes and my mother Jean’s chromosomes float intact and aloof from each other in all my cells. Now, here’s where it gets tricky. Take a particular gene on a John chromosome and allow it to look back at its ancestral history. Now take its opposite number (‘allele’) on the equivalent Jean chromosome, and allow it to look back in the same way. It’s the same principle as tracing the royal haemophilia gene back to Victoria. But, in this case, it is not haemophilia that is being traced, we’re looking a lot further back, and we have no hope of identifying a named individual like Victoria. We could do it with any pair of alleles, one on a John chromosome and the other on a Jean chromosome. And not just one such pair but (a sample of the) many.

Sooner or later, each gene pair, as they look back, is bound to converge on a particular individual in whom a gene once split to form the ancestor of the John gene and the ancestor of the Jean gene. I really do mean a particular individual ancestor who lived at a particular time and in a particular place. This individual had two children, one of whom was John’s ancestor and the other Jean’s ancestor. But we’re talking about a different ancestral individual – different time and place – for each Jean/John gene pair. For each gene pair, there must have been two siblings, one carrying the ancestral Jean gene and the other the ancestral John gene.

There are many overlapping people-tree routes that trace my father and my mother back to different shared ancestors. But for each of my John genes there is only one path linking it to the shared ancestor of my corresponding Jean gene. Gene trees are not the same as people trees. Each gene pair coalesces in a particular ancestor, at a particular moment in the past. You can let each pair of my genes look back, and you can find a different coalescence point in each case. You can’t literally identify the exact coalescence point for any given gene pair. But what you can do, using the mathematics of coalescent theory, is estimate when it occurred. When Dr Wong did this with my genome, he found that a large majority coalesced somewhere around 60,000 years ago, say 50,000 to 70,000.

And how should this concordance be interpreted? It means that my ancestors suffered a population bottleneck around that time. Very likely, yours did too. As my John genes and my Jean genes look back at their history, during most of those millennia they see a picture of outbreeding. But somewhere around 60,000 years into the past, the effective population size narrowed to a bottleneck. When the population is smaller, the Jean and John lineages are more likely to find themselves in a shared ancestor, simply by chance. That is why my gene pairs tend to coalesce around that time. Indeed, the coalescence data from my genome, on its own, making use of no other data, can be translated into the above graph of effective population size plotted against time. It is presumably typical for Europeans. The faint grey line shows the equivalent for an individual Nigerian, whose ancestors, it would seem, were not subject to the same bottleneck. I confess to an obscure satisfaction that, of the two co-authors of a book, one was able to use the genome of the other to make a quantitative estimate of prehistoric demography affecting not just one individual but millions.

What else can genes tell us as they look back at their history? Zoologists are accustomed to drawing family trees of animals, and calculating which species are close cousins of other species, and which distant. Among ape species, for example, chimpanzees and bonobos are our closest living relatives, and those two species are exactly equally close to us. They are equally close because they share an ancestor with each other some 3 million years ago, and that ancestor shares an ancestor with us about 6 million years ago (see below). Gorillas are the outgroup, a more distant relative of the rest of us African apes. The ancestor we share with gorillas lived longer ago, perhaps 8 or 9 million years.

GORILLA CHIMP BONBO HUMAN

On the previous page is the conventional way to draw a family tree, an organism-based family tree. But we can also draw a family tree from the point of view of a gene, looking back at its own history. The organism tree is unequivocal. Chimps and bonobos are close cousins of each other, and we are their closest relatives apart from each other. But while that is indeed a fact from the point of view of the whole organism, it is not necessarily the case when it is genes that look in the rear-view mirror. True, a majority of genes would ‘agree’ with each other and with the ‘people tree’ of the traditional zoologist. Nevertheless, it is perfectly possible that, from the point of view of some particular genes, the family tree could look very different. As on the opposite page, perhaps. The majority of our genes agree with the ‘people tree’. But when the gorilla genome was published in 2012, it turned out that ‘Humans and chimpanzees are genetically closest to each other over most of the genome, but the team found many places where this is not the case. Fifteen per cent of the human genome is closer to the gorilla genome than it is to chimpanzee, and 15 per cent of the chimpanzee genome is closer to the gorilla than human.’ I hope you agree that his kind of conclusion is an interesting product of the ‘backward gene’s-eye view’.

Such an anomaly could occur even within one small family. Two brothers, John and Bill, share the same parents, Enid and Tony, and the same four grandparents: Arthur and Gertrude, the parents of Enid, and Francis and Alice, the parents of Tony. (Sex chromosomes apart) each of the brothers received exactly half his genes from each of their shared parents. That’s because each is the product of exactly one egg from Enid and one sperm from Tony. And each brother received a quarter of his genes from each of the four shared grandparents, but in this case the figure is only approximate. It’s not exactly a quarter. Through the vagaries of chromosomal crossing-over, the sperm from Tony that conceived John could, by chance, have contained mostly Alice’s genes rather than Francis’s. The sperm from Tony that conceived Bill could have contained a preponderance of Francis’s genes rather than Alice’s. The egg from Enid that gave rise to John could have contained mostly Arthur’s genes, while the egg from Enid that gave rise to Bill contained a preponderance of Gertrude’s genes. It’s even theoretically possible (though vanishingly improbable) that John received all his genes from two of his grandparents, and none from the other two. Thus, the gene’s-eye view of closeness of relatedness can differ from the individual’s-eye view. The individual’s-eye view sees all four grandparents as equal contributors.

BONOBO CHIMP GORILA HUMAN

And the same is true of all generations prior to the immediate parental generation. Although you are quite probably descended from William the Conqueror, it is also quite likely that you have inherited not a single gene from him. Biologists tend to follow the historic precedent of tracing ancestry at the level of the whole individual organism: every individual has one father and one mother, and so on back. But the John/Bill, gorilla/chimpanzee comparison of the previous paragraphs will prove, I believe, to be the tip of an iceberg. More and more, we shall see pedigrees being drawn up from the genes’ point of view as opposed to the individual organism’s. An example is the discussion of the prestin gene in Chapter 5. Such a trend is obviously highly congenial to this book, stressing, as it does, the gene’s-eye view.

The last topic I want to deal with in this chapter on the backwards gene’s-eye view is Selective Sweeps. Among the messages from the past that the genes of a living animal whisper to us, if only we could hear them, many tell of ancient natural selection pressures. That, indeed, is what I mean by the genetic book of the dead, but here I am talking about a particular kind of signal from the past, one that geneticists have learned how to read. Present-day genes send statistical ‘signals’ of natural selection pressures. A gene pool that has recently undergone strong selection shows a certain characteristic signature. Natural selection leaves its mark. A Darwinian signature. Here’s how.

Two genes that sit close to one another on a chromosome tend to travel together through the generations. This is because chromosomal crossing over is relatively unlikely to split them: a simple consequence of their proximity to each other. If one gene is strongly favoured by natural selection it will increase in frequency. Of course, but mark the sequel. Genes whose chromosomal position lies close to a positively selected gene will also increase in frequency: they ‘hitch-hike’. This is especially noticeable when the linked genes are neutral – neither good nor bad for survival. When a particular region of a chromosome contains a gene that is under strong selection in its favour, the geneticist notices a diminution in the amount of variation in the population, specifically in the hitch-hiking zone of the affected chromosome. Because of the hitch-hiking, natural selection of one favoured gene ‘sweeps’ away the variation among nearby neutral genes. This ‘selective sweep’ then shows up as a ‘signature’ of selection.

I find the ‘backwards’ way of looking at ancestral history illuminating. But the most important ‘experience’ that a gene can ‘look back on’ is easily overlooked because it hides in plain view. It is the companionship of other genes of the species: other genes with which it has had to share a succession of bodies. I am not talking here about genes being linked close to each other on the same chromosome. I am now talking about shared membership of the same gene pool, and hence of many individual bodies. This companionship is the topic of the next chapter.

12 Good Companions, Bad Companions

The previous chapter could be expanded with an indefinite number of examples of the backward gene’s-eye view. Genes look back on a series of environments variously characterised by trees, soil, predators, prey, parasites, food plants, water holes, etc. But the external environment is only part of the story. It leaves out the most important kind of ‘experience’ of a gene. Far more important is the experience of rubbing shoulders with all the other genes in a long succession of bodies: partners through dynasties of mutual collaboration in the subtle arts of building bodies. That is the central point of this chapter.

The genes within any one gene pool are travelling bands of good companions, journeying together, and cooperating with each other down the generations. Genes in other gene pools, gene pools belonging to other species, constitute parallel bands of travelling companions. These bands do not include the genes of other species. That is precisely how biologists like to define a species (although the definition sometimes blurs in practice, especially when new species are being born).

Sexual reproduction validates the very notion of a species, more precisely the notion of a gene pool: a pool of genes like a stirred pool of water. The gene pool is thoroughly stirred in every generation by sexual reproduction, but it doesn’t mix with any other such pool – pools belonging to other species. Children resemble their parents but, because the gene pool is stirred, they resemble them only slightly more than they resemble any random member of the species – and much more than they resemble a random member of another species. The gene pool of each species sloshes about in a watertight compartment of its own, isolated from all others.

As I said, that is part of the very definition of a ‘species’, at least the most widely adopted definition, the one codified by that lofty patriarch among evolutionists, Ernst Mayr (1904–2005):

Species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups.

Fossils, being dead to the possibility of actually interbreeding – beyond breeding at all – force a retreat to Mayr’s ‘potentially’. When we say that Homo erectus was a separate species, distinct from modern Homo sapiens, the Mayr definition would be interpreted as meaning, ‘If a time machine enabled us to meet Homo erectus, we would be incapable of interbreeding with them.’ A niggling difficulty arises over ‘incapable’. There are species that can be persuaded to interbreed in captivity but would not choose to do so in the wild. Chapter 9’s example of the two crickets Teleogryllus oceanicus and commodus is only one of several. Even if we were capable of interbreeding with Homo erectus, say by artificial insemination, would we – or they – choose to do so by the normal, natural means? Never mind, that is a detail that might concern a pernickety taxonomist or philosopher, but we can pass it by.

If, as most anthropologists believe, we descend from Homo erectus, there must have been intermediates during the transitional phase: intermediates that would defy classification. Nobody who has thought it through would suggest that suddenly a sapiens baby was born to proud erectus parents. Every animal ever born throughout evolutionary history would have been classified in the same species as its parents, not only by the interbreeding criterion but by all sensible criteria. That fact – though it troubles some minds – is totally compatible with the fact that Homo sapiens is descended from Homo erectus, those two species being distinct species incapable – let us presume – of breeding with each other. It’s also compatible with the fact that you are descended from a lobe-finned fish, with every intermediate along the way being a member of the same species as its parents and its children.

Moreover, when a species splits into two daughter species in the process known as speciation, there is bound to be an interregnum when the two are still capable of interbreeding. The split originates accidentally, imposed perhaps by a geographic barrier such as a mountain range or a river or stretch of sea. It is probable that chimpanzees and bonobos started to go their separate evolutionary ways when two sub-populations found themselves on opposite sides of the Congo river. The two populations were physically prevented from interbreeding – the flow of genes was halted by the flow of water between them. For a while, they could potentially interbreed, and maybe occasionally did so when an individual inadvertently crossed the river on a floating log. But the geographically imposed lack of gene flow freed them to evolve in separate directions. Those different directions could have been guided by natural selection, or unguided in a process of random drift. It doesn’t matter, the point is that the compatibility between their genes gradually declined until a stage was reached when, even if they should chance to meet, they could no longer interbreed in actuality. The initial geographic barrier doesn’t necessarily come about through an environmental change like an earthquake diverting a river. Geography can stay the same while a pregnant female, for instance, gets accidentally washed ashore on a deserted island. Or the other side of a river.

But why, in any case, do the genes of two separated populations tend to become incompatible as companions, thereby preventing interbreeding? One reason is that the two sets of chromosomes need to pair off in the process of meiosis, when gametes are made. If they become sufficiently different, say on opposite sides of a barrier, hybrids, if any, would be unable to make gametes. They might live, but could not reproduce. Another reason – back to the central point of this chapter – is that genes, on either side of the barrier, are naturally selected to cooperate with other genes on the same side, but not the other. After enough time has elapsed in physically enforced separation, two gene pools become so incompatible that interbreeding becomes impossible even if the physical barrier is removed. Chimpanzees and bonobos haven’t quite reached that stage. Hybrids can be born in captivity.

There doesn’t have to be a distinct barrier, like a river, for geographically based speciation to occur. A mouse in Madrid never meets a mouse in Vladivostok but there could well be continuous local gene flow across the 12,000-kilometre gap between them. Given enough time, their descendants could diverge genetically until they could no longer interbreed even if they should somehow contrive to meet. Speciation would have occurred, the barrier being nothing more than sheer distance rather than an unswimmable river or sea, or an impassable desert or mountain range, and despite continuous gene flow locally across the entire range. We have here the spatial equivalent of the temporal continuum between Homo erectus and Homo sapiens. In both cases the extremes never meet. Yet in both cases there can be an unbroken chain of intermediates happily breeding all the way across the range: range in space for the example of the mice; range in time for the example of erectus and sapiens.

Occasionally, the chain of intermediates wraps around in a circle, bites itself in the tail, and we have a so-called ‘ring species’. Salamanders of the genus Ensatina live all around the four edges of California’s Central Valley but don’t cross the valley. If you start sampling at the southern end of the valley and work your way up the west side to the north, go eastwards across the north end of the valley, then down the eastern side and back around to your starting point, you notice a fascinating thing. The salamanders all along your route around the edge of the valley can interbreed with their neighbours. Yet they gradually change as you go around, and when you arrive back at your starting point, the ‘last’ species of the ring cannot interbreed with the ‘first’. A ring species is a rare case where you can see laid out in the spatial dimension the kind of evolutionary change that you could see along the time dimension if only you lived long enough.

Such considerations render pointless all heated arguments about whether or not closely related animals, living or fossil, belong to the same species. It is a necessary consequence of evolution that there must be, or must have been, intermediates that you cannot forcibly assign to either species. It would be worrying if it were otherwise. But of course most species in existence are clearly distinct from most other species by any criterion, because of the long time that has elapsed since their ancestors diverged. As for the grey areas where potential interbreeding is even an issue, and where species definition is problematical, this chapter will not treat them further.

Where external environments are concerned, the genes of a mole speak to us of damp, dark, moist tunnels, of earthy smells, of earthworms and beetle larvae crawling between tangled rootlets and filaments of fungal mycelium and mycorrhizae. The genes of a squirrel have a very different ancestral autobiography, a tale of airy greenery, waving boughs, acorns, nuts, and sunlit glades to be crossed between trees. We could weave a similar list for any species. The point of this chapter, on the other hand, is that the genes’ external ‘experience’ of damp, dark soil, or forest canopy, grassy plains, coral reefs, the deep sea, or whatever it might be, is swamped by the more immediate and salient internal experiencing of other genes in the stirred gene pool. This chapter is about the ‘good companions’ with which the genes have travelled and collaborated, in body after body since earlier times: parting from and re-joining, ever encountering and re-encountering familiar sets of companion genes, collaborating in the difficult arts of building livers and hearts, bones and skin, blood corpuscles and brain cells. The details will be tweaked by ‘external’ pressures: the best heart, kidney, or intestine for a burrowing vermivore is doubtless not the same as the best heart, kidney, or intestine for a tree-climbing nut-lover. But a centrally important quality of a successful gene will be the ability to collaborate with the other genes of the shared gene pool, be it mole, squirrel, hedgehog, whale, or human gene pool.

Every biochemistry lab has on its wall a huge chart of metabolic pathways, a bewildering spaghetti of chemical formulae joined by arrows. Below is a simplified version in which chemicals are represented by blobs rather than having their formulae spelled out. The lines represent chemical pathways between the blobs. This particular diagram refers to the gut bacterium Escherichia coli, but something similar, and just as bewildering, is going on in your cells.

Every one of those hundreds of lines is a chemical reaction performed inside a living cell, and each one is catalysed by an enzyme.

Every enzyme is assembled under the influence of a specific gene (or often two or three genes, because the enzyme molecule may have several ‘domains’ wrapped around each other, each domain being a protein chain). The genes that make these enzymes must cooperate, must be good companion genes in the sense of this chapter.

All mammals have almost exactly the same set of over 200 named bones, connected in the same order, but differing in size and shape. We saw the principle in the crustaceans of Chapter 6. And the same is true of the metabolic pathways diagrammed above. They are almost the same in all animals but different in detail. And, although they may be engaged in joint enterprises that are similar, the cartels of mutually compatible genes will not be compatible with parallel cartels evolving in other lineages: antelope cartels versus lion cartels, say. Antelopes and lions both need metabolic pathways in all their cells, and both need hearts, kidneys, and lungs, but they’ll differ in details appropriate to herbivores versus carnivores. And more obviously so in teeth, intestines, and feet, for reasons we’ve covered already. If they were somehow to mix in the same body, they wouldn’t work well together.

I shall say that two separate gene pools, for instance an impala gene pool and a leopard gene pool, represent two separate ‘syndicates’ of ‘cooperating’ genes. Building a body is an embryological enterprise of immense complexity, involving feats of cooperation between all the genes in the active genome. Different kinds of body require different embryological ‘skills’, perfected over evolutionary time by different suites of mutually compatible genes: compatible with members of their own syndicate but incompatible with other syndicates simultaneously being built in other gene pools. These cooperating cartels are assembled over generations of natural selection. The way it works is that each gene is selected for its compatibility with other genes in the gene pool, and vice versa. So cartels of mutually compatible, cooperating genes build up. It is tempting but misleading to speak of alternative cartels being selected as whole units versus other cartels as whole units. Rather, cartels assemble themselves because each member gene is separately selected for its compatibility with other genes within the cartel, which are themselves being selected at the same time.

Within any one species, genes work together in embryological harmony to produce bodies of the species’ own type. Other cartels in other species’ gene pools self-assemble, and work together to produce different bodies. There will be carnivore cartels, herbivore cartels, burrowing insectivore cartels, river-fishing cartels, tree-climbing, nut-loving cartels, and so on. My main point in this chapter on ‘Good Companions’ is that by far the most important environment that a gene has to master is the collection of other genes in its own gene pool, the collection of other genes that it is likely to meet in successive bodies as the generations go by. Yes, the external ecosystem furnished by predators and prey, parasites and hosts, soil and weather, matters to the survival of a gene in its gene pool. But of more pressing moment is the ecosystem provided by the other genes in the gene pool, the other genes with which each gene is called upon to cooperate in the construction and maintenance of a continuing sequence of bodies. It is an easily dispelled paradox that my first book, The Selfish Gene, could equally well have been called The Cooperative Gene. Indeed, my friend and former student Mark Ridley wrote a fine book with that very title. In his words, which I’d have been pleased to have written myself,

The cooperation between the genes of a body did not just happen. It required special mechanisms for it to evolve, mechanisms that arrange affairs such that each gene is maximally selfish by being maximally cooperative with the other genes in its body.

As inhabitants of today’s technologically advanced world, we are aware of the power of cooperation between huge numbers of specialist experts. SpaceX employs some 10,000 people, cooperating in the joint enterprise of launching massive rockets into space and – even more difficult – bringing them back and gently landing them in a fit state to be re-used. Many different specialists are united in intimate cooperation: engineers, mathematicians, designers, welders, riveters, fitters, turners, computer programmers, crane operators, quality control checkers, 3-D printer operators, software coders, inventory control officers, accountants, lawyers, office workers, personal assistants, middle managers, and many others. Most of the experts in one field have little understanding of what experts in other parts of the enterprise do, or how to do it. Yet the feats that we humans can achieve when thousands of us deploy our complementary skills, in well-oiled collaboration but in ignorance of each other’s role, are staggering.

The human genome project, the James Webb Telescope, the building of a skyscraper or a preposterously oversized cruise ship, these are stunning achievements of cooperation. The Large Hadron Collider at CERN brings together some 10,000 physicists and engineers from more than 100 countries, speaking dozens of languages, working smoothly together to pool their diverse expertise. Yet these huge accomplishments of mass cooperation are more than matched by the nine-month collaborative enterprise of building each one of us in our mother’s womb: a feat of cooperation among billions of cells, belonging to hundreds of cell types (different ‘professions’), orchestrated by about 30,000 intimately cooperating genes, exceeding the personnel count we find in a large human enterprise such as SpaceX. Cooperation is key, in both building a body and building a rocket.

The genes that build a body must cooperate with all the other companions that the sexual lottery throws at them as the generations go by. They must cooperate not only with the present set of companions, those in today’s body. In the next generation, they’ll have to cooperate with a different sample of companions drawn from the shared gene pool. They must be ready to cooperate with all the alternative genes that march with them down the generations within this gene pool – but no other gene pool. This is because Darwinian success, for a gene, means long-term success, travelling through time over many generations, in many successive bodies. They must be good travelling companions of all the genes in the stirred gene pool of the species.

The 1957 film of JB Priestley’s novel The Good Companions had an accompanying song with a not uncatchy tune, of which the refrain was,

Be a good companion,
Really good companion,
And you’ll have good companions too.

It is a song whose evoked mutualism suits the travelling troupe of genes, which constitutes the active gene pool of a species such as ours. Sexual recombination of genes gives meaning to the very existence of the ‘species’ as an entity worth distinguishing with a name at all. Without it, as is the case with bacteria, there is no distinct ‘species’, no clear way to divide the population with confidence into discrete nameable groups. It is sexual reproduction that confers identity on the species. Some bacterial types are not far from being a big smear, grading into each other as they promiscuously share genes. The attempt to assign discrete species names to such bacteria is a losing battle in a way that doesn’t apply to animals like us, where sexual exchange is limited to sexual encounters between a male and a female of the same species – and no other species by definition. As already stated, where fossils are concerned we have to guess, based on their anatomical similarity, whether they would have been able to interbreed when they were alive. This involves subjective judgement, which is why naming fossils such as Homo rhodesiensis and Homo heidelbergensis is a matter of aggravated controversy between ‘lumpers’ and ‘splitters’. But notwithstanding naming disagreements, which can even become acrimonious, we remain confident that the gene pool surrounding every one of those fossils was a troupe of travelling companions isolated from other gene pools – even though imperfectly isolated during episodes of speciation. Bacteria largely deny us that confidence. So-called ‘species’ of bacteria are not clearly delimited.

Every working gene, ‘expert’ in rendering up its own contribution to the collaborative building of an embryo, is confined to its own gene pool. Repeated cooperation among successive samples drawn from the same troupe of travelling companions has selected genes largely incapable of working beneficially with members of other troupes. Not entirely, as we see from headlined examples like jellyfish genes transplanted into cats and making them glow in the dark. Genes are normally not put to that kind of test. Mules and hinnies, ligers and tigons, are almost always sterile. Their sets of travelling companions are still compatible enough to collaborate in building strong bodies. But their compatibility breaks down when it comes to chromosomal pairing-off in meiosis, the process of cell division that makes gametes. Mules can pull a cart, but they can’t make fertile sperms or eggs.

Nature doesn’t transplant antelope genes into leopards. If it did, a few might work normally. There are broad similarities between the embryologies of all mammals, and all mammals doubtless share genes for making most layers of the mammalian palimpsest. But that doesn’t undermine this chapter. Those genes concerned with what makes a leopard a predator, and an antelope its herbivorous prey, would not work harmoniously together. In childishly crude terms, leopard teeth wouldn’t sit well with antelope guts and antelope feeding habits. Or vice versa. In the language of this chapter, companions that travel well together in one gene pool would not be good companions in the other. The collaboration would fail.

The principle is illustrated by an old experiment of EB Ford, the eccentrically fastidious aesthete from whom I learned my undergraduate genetics. Most practical geneticists work on lab animals or plants, breeding fruit flies or mice in the laboratory. But Ford walked a minority path among geneticists. He and his collaborators monitored evolutionary change in gene pools, in the wild. A lifelong authority on butterflies and moths, he went out into the woods and fields, heaths and marshes of Britain, waving his butterfly net and sampling wild populations. He inspired others to do the same kind of thing with wild fruit flies, wild snails and flowers, as well as other species of butterflies and moths. He founded a whole discipline called Ecological Genetics and wrote the book of that title. The piece of work that I want to talk about here was a field study of wild populations of lesser yellow underwing moths, in Scotland and some of the Scottish islands. Ford knew it as Triphaena comes, but it is now called Noctua comes, following the strict precedence rules of zoological nomenclature.

The species is polymorphic, meaning there are at least two genetically distinct types coexisting in significant proportions in the wild. Not in England, however, nor in much of mainland Scotland, where all the lesser yellow underwings look like the pale upper one in the picture. But in some of the Scottish islands there exists, in significant numbers, a second morph, of darker colour, called curtisii, evidently named after the entomologist and artist John Curtis (1791–1862). I thought it fitting to use Curtis’s own painting of the curtisii morph and the cowslip, and I asked Jana Lenzová to paint in the light morph to complete the picture.

Dark and light morphs of lesser yellow underwing

The difference between the two morphs is controlled by a single gene, which we can call the curtisii gene. Curtisii is nearly dominant. This means that if an individual has either one curtisii gene (‘heterozygous’) or two curtisii genes (‘homozygous for curtisii’), it will be dark. If dominance were complete, heterozygous individuals with one curtisii gene would look exactly the same as homozygotes with two. Curtisii being only nearly dominant, the heterozygotes are almost the same as the curtisii homozygotes but slightly lighter. Heterozygotes are always darker than individuals homozygous for the standard comes gene, which is therefore called recessive.

Like his mentor Ronald Fisher, whom we’ve already met, Ford liked to speak of ‘modifiers’, genes whose effect is to modify the effects of other genes. According to Fisher’s theory of dominance, to which Ford subscribed, when a gene first springs into existence by mutation, it is typically neither dominant nor recessive. Natural selection subsequently drives it towards dominance or recessiveness via the gradual accumulation, through the generations, of modifiers. Dominance is not a property of a gene itself, but a property of its interactions with its companion modifiers.

Modifiers don’t change the major gene itself. What they change is how it expresses itself, in this case its degree of dominance. The language of this chapter would say that a major gene such as curtisii has modifiers among its ‘good companions’, which affect its dominance, meaning its tendency to express itself when heterozygous. For reasons we needn’t go into, natural selection favoured a significant proportion of dark curtisii morphs on certain Scottish islands. And one way this favour showed itself, according to the theory of Fisher and Ford, was by selection in favour of modifiers that increased its dominance.

Barra is an island in the Outer Hebrides, west of Scotland. Orkney, north of Scotland, is an archipelago 340 kilometres from Barra as the crow flies, and too far for the moth to fly. Ford collected and studied moths from both these locations. Both have mixed populations of Lesser Yellow Underwings, the normal pale form living alongside significant numbers of dark curtisii morphs. Breeding experiments, with both Barra and Orkney moths, separately confirmed the dominance of curtisii within both islands. However, when Ford took moths from Barra and crossed them with moths from Orkney, he got a remarkable result. The dominance broke down. It disappeared. No longer did Ford see tidy Mendelian segregation of dark versus light forms. Instead there was a messy spectrum of intermediates. Dominance had disappeared.

What had evidently happened was this. Dominance on Barra had evolved by an accumulation of mutually compatible modifiers, good Barra companions. Dominance on Orkney had independently and convergently evolved by a different consortium of modifier genes, good Orkney companions. When Ford bred across islands, the two sets of modifiers couldn’t work together. It was as though they spoke different languages. To work properly, each modifier needed its normal set of good companions, the set that had been built up over generations of selection on the different islands. That’s what being good companions is all about, and Ford’s experiment dramatically demonstrates a principle that I believe to be general. The ‘major’ gene, curtisii, is the same on both Barra and Orkney. However, for all that a gene itself is the same, its dominance can be built up in more than one way by different consortia of modifiers. This seems to have been the case with curtisii on different islands.

There’s a potential fallacy lurking here. It’s easy to presume that the Barra good companions lie close to each other on a chromosome and therefore segregate as a unit. And likewise, the Orkney consortium of good companions. That kind of thing can happen, and Ford and his colleagues discovered it in other species. Natural selection can favour inversions and translocations of bits of chromosome that bring good companions closer to each other. Sometimes they end up so close that they are called a ‘supergene’, so close that they are rarely separated by crossing over. This is an advantage, and the translocations and inversions that contribute to the building of a supergene are favoured by natural selection. But if Ford’s modifiers had been clustered together as a supergene in the case of his yellow underwings, he wouldn’t have got the results that he did.

Supergenes can be demonstrated in the lab by breeding large numbers of individuals for many generations until suddenly, by a freak of chromosomal crossing-over, the supergene is split. But the supergene phenomenon is not necessary for good companionship, and there’s no reason to suppose it applies in this case of the lesser yellow underwing. The suites of cooperating modifiers could lie on different chromosomes all over the genome. Separately, in their respective island gene pools, they were assembled by natural selection as good team workers in each other’s presence. In this case, they work well together to increase the dominance of the curtisii gene. But the principle is more general than that. We don’t have to subscribe to the Fisher/Ford theory of dominance in particular.

Natural selection favours genes that work together in their own gene pool, the gene pool of their species. Genes that go with being a carnivore (say, genes for carnivorous teeth) are naturally selected in the presence in the same gene pool of other ‘carnivorous genes’ (say genes for short carnivorous intestines whose cells secrete meatdigesting enzymes). At the same time, on the herbivore side, genes for flat, plant-milling teeth flourish in the presence of genes for long, complicated guts that provide havens for plant-digesting micro-organisms. Once again, the alternative suites of genes may be distributed all over the genome. There’s no need to assume that they cluster together on any particular chromosome.

Unfortunately, good companionship sometimes breaks down. It is even subject to sabotage. We’ve already met ways in which the genes within a body can be in conflict with one another. The uneasy pandemonium of genes within the genome, sometimes cooperating, sometimes disputing, is captured in Egbert Leigh’s ‘Parliament of Genes’. Each acts ‘in its own self-interest, but if its acts hurt the others, they will combine together to suppress it.’

Cell division within the body is vulnerable to occasional ‘somatic’ mutation. Of course it is. How could it not be? We are familiar with the idea that random copying errors, mutations, produce the raw material for natural selection between individuals. Those ‘germline’ mutations occur in the formation of sperms and eggs, and they are then inherited by an individual’s children. These are the mutations that play an important role in evolution. But most acts of cell division occur within the body – somatic as opposed to germline mutation – and they too are subject to mutation. Indeed, the mutation rate per mitotic division is higher than for meiotic division. We should be thankful our immune system is so good at spotting the danger early. Most somatic mutations, like most germline mutations, are not beneficial to the organism. Sometimes they are beneficial to themselves but bad for the organism, in which case they may engender malignant tumours – cancers. Subsequent natural selection within the tumour can generate a progression through increasingly ominous ‘stages’ of cancer. I shall return to this.

We can think of the (somatic) cells in a developing embryo as having a family history within the body, springing from their grand ancestor, the single fertilised egg cell of a few months or weeks previous. At any stage in this history of descent, starting with the embryo and on throughout the rest of life, somatic mutation can occur. Vertebrate development is the product of countless cell divisions, so embryologists have found it convenient to trace cell lineages in a simpler organism. The tiny roundworm Caenorhabditis elegans has only 959 cells. It was the genius of the great molecular biologist Sydney Brenner to pick this animal out as the ideal subject for a genre of research that has since spread to dozens of labs throughout the world. Its embryo at one of its developmental stages has precisely 558 cells. Every one of those 558 cells has its own ‘ancestral’ sequence within the developing embryo. The pedigree of each of those 558 cells within the embryo has been painstakingly worked out (illustration below). Necessarily, it’s impossible to print the details legibly on one page of a book, but you can expand it here (https://www.wormatlas.org/celllineages.html) and get an idea of the diverging pedigree of cells in the embryo, consisting of ‘families’ and ‘sub-families’. If you could read the labels by the side of families of cells, you’d see things like ‘intestine’, ‘body muscle’, ‘ring ganglion’. We shall have need to return to that idea of families of cells procreating in the embryo.

Now, if that’s what the cellular pedigree looks like for a mere 558 roundworm cells, just think what it must look like for our 30 to 40 trillion cells. Similar labels – muscle, intestine, nervous system, etc. – could be affixed to cells in a human embryo (opposite). This is true even though the pedigrees are not determined so rigidly in a vertebrate embryo, and we can’t enumerate a finite tally of named cells. It’s important to stress that these different families of cells within the developing embryo are, until something goes wrong, genetically identical. If they weren’t, they might not cooperate. When something goes wrong and they’re no longer genetically identical, well that’s when there’s a risk of their becoming bad companions. And then there’s a risk of their evolving, by natural selection within the body, to become very bad companions indeed: cancers.

As you can see on the diagram on the facing page, after some early cell generations within the embryo, the pedigree of our cells splits into three major families: the ectoderm, the mesoderm, and the endoderm. The ectoderm family of cells is destined to give rise, further down the line, to skin, hair, nails, and those hugely magnified nails that we know as hooves. Ectodermal derivatives also contribute the various parts of the nervous system. The endoderm family of cells branches to give rise to sub-families that eventually make the stomach and intestines; and other sub-families that make the liver, lungs, and glands such as the pancreas. The mesoderm dynasty of cells spawns numerous sub-families, which branch again and again to produce muscle, kidney, bone, heart, fat, and the reproductive organs, although not the germline, which is early hived off and sequestered for its privileged destiny, on down the generations.

Somatic mutants apart, every one of the cells in the expanding pedigree has the same genome, but different genes are switched on in different tissues. That is to say they are epigenetically different while being genetically the same (see the relevant endnote if popular hype has confused you as to the true meaning of ‘epigenetics’). Liver cells have the same genes as muscle cells, but once they pass beyond a certain stage in embryonic development, only liver-specific genes are active there. And the liver ‘family’ of cells in the pedigree goes on dividing until the liver is complete. They then stop dividing. The same applies to all the ‘families’, which each have their own stopping time. Cells must ‘know’ when to stop dividing. And that is where trouble can step in.

With an important reservation, the number of cell generations before the arresting of cell division varies from tissue to tissue and is typically between forty and sixty. That may seem surprisingly few. But remember the power of exponential growth. Fifty liver cell generations, if each one was a division into two (fortunately it isn’t) would yield a liver the size of a large elephant. Different cell lines stop dividing after different limits, producing end organs of different sizes. You can see how important it is for each cell line to know when to stop dividing.

Cactus with somatic mutation

Every one of the 30 trillion cells in a body was made by a cell division. And every one of those cell divisions is vulnerable to somatic mutation. Now we come to that ‘important reservation’, the one relevant to the topic of bad companions. The cells in a lineage are genetically identical only if no somatic mutation intervenes during the lineage’s successive generations. Most somatic mutations are harmless. But what if a somatic mutation arises in a cell such that it changes its behaviour and refuses to stop dividing? Its lineage in the ‘family tree’ doesn’t come to a disciplined halt, but goes on reproducing out of control. The daughter cells of the mutant cell inherit the same rogue mutation, so they too divide. And their daughter cells inherit the rogue gene, so … This is the kind of thing that produces weird growths such as adorns the cactus opposite.

Let’s follow the subsequent history of a rogue cell’s descendants, for example in a human. Reproducing for an indefinite number of generations without discipline, these cells will now be subject to a form of natural selection. Why say ‘a form of’? It is natural selection, plain and simple. The rogue cells will be subject to natural selection, every bit as Darwinian as the natural selection that chooses the fastest pumas or pronghorns, the prettiest peacocks or petunias, the most fecund codfish or dandelions. Rogue somatic mutant cells can evolve, by natural selection within the body, into cancers that spread menacingly (‘metastasis’) to other parts of the body. Now natural selection of cells within the tumour will favour those that become better cancers. What does ‘better’ mean, for a cancer? They become expert, for example, at usurping a large blood supply to nurture themselves. The whole subject, fascinating, disturbing, and not at all surprising to a Darwinist, is expounded in books such as Athena Aktipis’s The Cheating Cell, and The Society of Genes by Itai Yanai and Martin Lercher.

Since cancers evolve by natural selection (within the body), we should treat their evolutionary adaptations in just the same way we might treat the adaptations of pronghorn or codfish, except that the ecological environment is the interior of a (say) human body instead of the sea or an open prairie. This chapter’s discussion of Good Companions has prepared us for the idea of an ecology of genes within the body, to parallel the more conventional idea of an external ecology. And that internal ecology is also the setting where bad companions can thrive. An important difference is that natural evolution in the open sea or prairie goes on into the indefinite future. The evolution of a cancer tumour ends abruptly with the death of the patient, whether that death is caused by the cancer or something else. The cancer evolves to become better and better at (as an inadvertent by-product) killing itself. This, too, should not surprise. Natural selection, as I’ve said over and over, has no foresight. A tumour cannot foresee that increased malignancy will eventually kill the tumour itself. Natural selection is the blind watchmaker. Despite ending with the death of the organism, the number of generations of cell division in a tumour is large enough to accommodate constructive evolutionary change. Constructive from the point of view of the cancer. Destructive for the patient. Athena Aktipis’s book artfully treats the evolution of cancer cells in the body in just the kind of way we might treat the evolution of buffalos or scorpions in the Serengeti.

Cancer cells, then, or rather the mutant genes that turn cells cancerous, are one kind of ‘bad companion’. Another type is the so-called segregation distorter. Sperms and eggs – gametes – are ‘haploid’ cells you’ll remember, having only one copy of each gene, instead of two like normal body cells. The special kind of cell division called meiosis makes haploid gametes (having only one set of chromosomes) out of diploid cells, which have two sets of chromosomes, one set from the individual’s mother and another set from the father. It is only when gametes are made by meiosis that the two sets meet each other in the same chromosome. Meiosis performs an elaborate shuffle, cutting and pasting exchanged portions of paternal and maternal chromosomes into a new set of mixed-up chromosomes. Every gamete is unique, having different assortments of paternal and maternal genes in each of its (twenty-three in humans) chromosomes. The result of the shuffle is that each gene from the diploid set of (forty-six in humans) chromosomes has a 50 per cent chance on average of getting into each gamete.

The ‘phenotypic effect’ of a gene commonly shows itself somewhere in the body – it might affect tail length or brain size or antler sharpness. But what if a gene were to arise that exerted its phenotypic effect on the process of gamete production itself? And what if that effect was a bias in gamete production such that the gene itself had a greater than 50 per cent chance of ending up in each gamete? Such cheating genes exist – ‘segregation distorters’. Instead of the meiotic shuffle resulting in a fair deal to each gamete as it normally does, the deal is biased in favour of the segregation distorter. The distorter gene has a greater than even chance of ending up in a gamete.

You can see that if a rogue segregation distorter were to arise it would tend, other things being equal, to spread rapidly through the population. The process is called meiotic drive. The rogue gene would spread, not because of any advantage to the individual’s survival or reproductive success, not because of benefit of any kind in the conventional sense, but simply because of its ‘unfair’ propensity to get itself into gametes. We could see meiotic drive as a kind of population-level cancer. A special case of a segregation distorter is the ‘driving Y-chromosome’, that is, a gene on a Y-chromosome whose effect on males is to bias them towards producing Y sperms and therefore male offspring. If a driving Y arises in a population, it tends towards driving it extinct for lack of females: population-level cancer indeed. Bill Hamilton even suggested that we could control the yellow fever mosquito by deliberately introducing a driving male into the population. Theoretically, the population should drastically shrink through lack of females.

Other ways have been suggested to control pests by ‘driving genes’. I’ve already mentioned in Chapter 8 the crass irresponsibility of the 11th Duke of Bedford in introducing grey squirrels, native to America, into Britain. He not only released them in his own estate, Woburn Park, but made presents of grey squirrels to other landowners up and down the country. I suppose it seemed like a fun idea at the time, but the consequence is the wiping-out of our native red squirrel population. Researchers are now examining the feasibility of releasing a driving gene into the grey squirrel gene pool. This would not be carried on the Y-chromosome but would produce a dearth of females in a slightly different way. The authors of the idea are mindful of the need to be careful. We want to drive the grey squirrel extinct in Britain but not in America where it belongs and where it would have stayed but for the Duke of Bedford.

Bad companions, at least in the form of cancers, force themselves upon our forebodings. But for our purposes in this book, it is the gene’s role as good companion that we must thrust to prominence. It remains for the last chapter to pin down exactly what makes them cooperate. Fundamentally, it is, I maintain, the fact that they share an exit route from each body into the next generation.

Good companions dressed for field work: RA Fisher and EB Ford. See endnote for my suspicion that this is a historic photogaph.

13 Shared Exit to the Future

Purveyors of scientific wonder like to surprise us with the prodigious – disturbing to some – numbers of bacteria inside our bodies. We’re accustomed to fearing them but most of them are, in the words of Jake Robinson’s title, Invisible Friends. Mostly in the gut, estimates vary from 39 trillion to 100 trillion, the same order of magnitude as the number of our ‘own’ cells, where 40 trillion is a round-number estimate. Between a half and three-quarters of the cells in your body are not your ‘own’. But that doesn’t take account of the mitochondria. These miniature metabolic engine-rooms swarm inside our cells and the cells of all eucaryotes (that is, all living creatures except bacteria and archaea). It is now established beyond doubt that mitochondria originated from free-living bacteria. They reproduce by cell division like bacteria, and each has its own genes in a ring-shaped chromosome, again like bacteria. In fact, let us not mince words, they are bacteria: symbiotic bacteria that have taken up residence in the hospitable interior of animal and plant cells. We even know, from DNA-sequence evidence, which of today’s bacteria are their closest cousins. The number of mitochondria in your body is many trillions.

The bacteria that became mitochondria brought with them much essential biochemical expertise, the research and development of which was presumably accomplished long before they became incorporated as proto mitochondria. Their main role in our cells is the combustion of carbon-based fuel to release needed energy. Not the violent high-speed combustion of fire, of course, but a slow, orderly, trickle-down oxidation. Not only are you a swarm of bacteria, you couldn’t move a muscle, see a sunset, fall in love, whistle a tune, despise a demagogue, score a goal, or invent a clever idea without the unceasing activation of their chemical knowhow, expert tricks cobbled together by natural selection choosing between rival bacteria in a lost pre-Precambrian sea.

The interiors of plant cells swarm with green chloroplasts, which also are descended from bacteria (a different group, the so-called cyanobacteria). Like mitochondria, chloroplasts are bacteria in every sensible meaning of the word. Again like mitochondria, they brought with them a formidable dowry of biochemical wizardry, in this case photosynthesis. Virtually all life on Earth is ultimately powered by energy radiated from the gigantic nuclear fusion reactor that is the sun. It is captured by photosynthesis in chloroplast-equipped solar panels such as leaves, and is subsequently released in the chemical factories that are mitochondria, in all of us. Solar photons that fall on the sea are captured not by leaves but by single-celled green organisms. Whether on land or at sea, solar energy is the base of all food chains. I think the only exceptions are those strange communities whose ultimate source of energy is hot springs, undersea ‘smokers’ and such conduits of heat from the Earth’s interior.

Our mitochondria couldn’t do without us, just as we wouldn’t survive two instants without them. We are joined deep in mutual amity. Our genes and their genes are good companions that have travelled in lockstep over 2 billion years, each naturally selected to survive in an environment furnished by the other. Most of the genes that originated in their bacterial forebears have long since either migrated into our own chromosomes or been laid off as redundant. But why are mitochondria, and some other bacteria, so benign towards us, while other bacteria give us cholera, tetanus, tuberculosis, and the Black Death? My Darwinian answer is as follows. It is an example of the take-home message of the whole chapter. Mitochondrial genes and ‘own’ genes share the same exit route to the future. That is literally true if we are female, or if we for the moment overlook the fact that mitochondria in males have no future. The key to companionable benevolence, I shall show, or its reverse, is the route by which a gene travels from a present body into a body of the next generation.

Mitochondria and chloroplasts may be the earliest examples of bacteria being coopted into animals, but they are not the only ones. Here’s a much more recent re-enactment of those ancient incorporations, and it is highly congenial to the thesis of the gene’s-eye view. The embryonic development of vertebrate eyes requires a protein called IRBP, which facilitates the separation of retinal cells from one another and helps them to see better. In a large survey of more than 900 species, IRBPs were found in every vertebrate examined, plus Amphioxus, a small, primitive creature related to vertebrates, although it lacks a backbone. But of the 685 invertebrate species, the only one with a molecule resembling IRBP was an amphipod crustacean, Hyalella. Among plants, a single species, Ricinus communis, the castor oil plant, has something like an IRBP. And there’s a little cluster of fungi too. Molecules resembling IRBPs are ubiquitous among bacteria.

A family tree of IRBP-like molecules shows a richly branched pedigree among bacteria, paralleling that of the vertebrates in which they live, both pedigrees springing from a single point. The isolated pop-ups (crustacean, fungi, and plant) also spring from within the bacterial tree, but widely separated parts of it. This is good evidence of horizontal gene transfer from various bacteria into the eucaryote genome. The evidence strongly suggests that vertebrate IRBPs are ‘monophyletic’, all descended from a single ancestor, which means a single jump from a bacterium right at the base of vertebrate evolution. Ever since that event, the genes concerned have been passed vertically down the generations. This is like the bacteria that became mitochondria, although mitochondrial ancestors were whole bacteria, not single genes.

I want to give a general name to bacteria that are transmitted from host to host in host gametes: verticobacter, because they pass vertically down the generations. The ancestors of mitochondria and of chloroplasts are prime examples of verticobacters. Verticobacters can infect another organism only by riding inside its gametes into its children. By contrast, a typical ‘horizontobacter’ might pass by any route from host to host. If it lives in the lungs, for instance, we may suppose its method of infection is via droplets coughed or sneezed into the air and breathed in by its next victim. A horizontobacter doesn’t ‘care’ whether its victim reproduces. It only ‘wants’ its victim to cough (or sneeze, or make bodily contact by hands, lips, or genitals), and it works to that end – ‘works’ in the sense that its genes have extended phenotypic effects on the host’s body and behaviour, driving the host to infect another host. A verticobacter, by contrast, ‘cares’ very much that its ‘victim’ shall successfully reproduce, and ‘wants’ it to survive to reproduce. Indeed, ‘victim’ is scarcely the appropriate word, which is why I protected it behind quotation marks. This is, of course, because a verticobacter’s ‘hope’ of future transmission lies in the offspring of the host, exactly coinciding with the ‘hopes’ of the host itself. Therefore, if a verticobacter’s genes have extended phenotypic effects on the host, they will tend to agree with the phenotypic effects of the host’s own genes. In theory a verticobacter’s genes should ‘want’ exactly the same thing as the host’s genes in every detail.

The pertussis (whooping cough) bacterium is a good example of a horizontobacter. It makes its victims cough, and it passes through the air to its next victim, in droplets emitted by the cough. Cholera is another horizontobacter. It exits the body via diarrhoea into the water supply, whence it ‘hopes’ to be imbibed by somebody else, drinking contaminated water. It doesn’t ‘care’ if its victims die, and it has no ‘interest’ in their reproductive success.

The notion of a parasite’s ‘wanting’ its victim to do something needs explaining, and this again is where the extended phenotype comes in, as promised at the end of Chapter 8. The parasitology literature is filled with macabre stories of parasites manipulating host behaviour, usually changing the behaviour of an intermediate host to enable transmission to the next stage in the parasite’s complicated life cycle. Many of these stories concern worms rather than bacteria, but they convey the principle I am seeking to get across. ‘Horsehair worms’ or ‘gordian worms’, belonging to the phylum Nematomorpha, live in water when adult, but the larvae are parasitic, usually on insects. The insect hosts being terrestrial, the gordian larva needs somehow to get into water so it can complete its life cycle as an adult worm. Infected crickets are induced to jump, suicidally, into water. An infected bee will dive into a pond. Immediately the gordian worm bursts out and swims away, the crippled bee being left to die. This is presumably a real Darwinian adaptation on the part of the worm, which means that there has been natural selection of worm genes whose (‘extended’) phenotypic effect is a change in insect behaviour.

Here’s another example, this time involving a protozoan parasite, Toxoplasma gondii. The definitive host is a cat, and the intermediate host is a rodent such as a rat. The rat is infected via cat faeces. Toxoplasma then needs the infected rat to be eaten by a cat, to complete its life cycle. It insinuates itself into the rat’s brain and manipulates the rat’s behaviour in various ways to that end. Infected rats lose their fear of cats, specifically their aversion to the smell of cat urine. Indeed, they become positively attracted to cats, though apparently not to non-predatory animals, or predators that don’t attack rats. There is some evidence that they lose fear in general, owing to increased production of the hormone testosterone. Whatever the details, it’s reasonable to guess that the change in rat behaviour is a Darwinian adaptation on the part of the parasite. And therefore an extended phenotype of Toxoplasma genes. Natural selection favoured Toxoplasma genes whose extended phenotypic effect was a change in rat behaviour.

The infected snail’s bulging eyes are a tempting target for birds

Leucochloridium is a fluke (flatworm), parasitic on birds. Its intermediate host is a snail, and it needs to transfer itself from snail to bird. The snails that it infects are largely nocturnal, while the birds who are the next host feed by day. The worm manipulates the behaviour of the snail to make it go out by day. But that is only the beginning of the snail’s troubles. One of the life-history stages of the worm invades the eye stalk of the snail, which swells grotesquely, and seems to pulsate vividly along its length.

This is said to make the eye stalk look like a little crawling caterpillar. Be that as it may, it certainly renders the eye stalks conspicuous, and birds readily peck them off. Infected snails also move around more actively than unparasitised ones. The snail is not killed but only blinded. It is able to regenerate its eye stalks to pulsate another day and perhaps be again plucked off. The fluke, for good measure, castrates its snail victim. And that’s an interesting story in its own right. ‘Parasitic castration’ is common enough to be a named thing. It is practised by a wide variety of parasites from around the animal kingdom, including protozoa, flatworms, insects, and various other crustaceans. Including Sacculina, the parasitic barnacle that I introduced in Chapter 6 and promised to return to.

Sacculina is perhaps the most extreme example of the ‘degenerative’ evolution typical of parasites. Darwin, in his monographs on barnacles, which distracted him for eight of the twenty years when he might have published on evolution, misdiagnosed the affinities of Sacculina. And who can blame him? Just take a look at it. The externally visible part of Sacculina is a soft bag clinging to the underside of a crab. Most of the ‘barnacle’ consists of a branching root system permeating the inside of the unfortunate crab’s body. Eventually, it fills the body so completely that if you could sweep away the crab and leave only the Sacculina, this is what you might see.

This is not a crab

How do we know that this system of branching rootlets, this sprawling entity that looks like a plant or fungus, is really a barnacle? How do we even know it’s a crustacean? The various larval stages of the life cycle give it away. The nauplius larva is followed by the cyprid larva, and both are unmistakeably crustacean. As if final clinching were needed, Sacculina’s genome has now been sequenced. ‘It is written’, as the Muslims say: ‘Crustacean’.

Sacculina larvae

The first organs that Sacculina attacks are the crab’s reproductive organs. This is the ‘parasitic castration’ that I mentioned above. Barnacles themselves are sometimes castrated by parasitic crustaceans; marine isopods related to woodlice. So, what is the point of parasitic castration? Why would a parasite head straight for the gonads of its host, before eating other organs?

As with all animals, the host’s ancestors have been naturally selected to fine-tune a delicate balance between the need to reproduce (now) and the need to survive (to reproduce later). A parasite such as Sacculina, however, has no interest in assisting its host to reproduce. This is because its genes don’t share the host genes’ exit route to the future. Sacculina genes ‘want’ to shift the host’s ‘balance’ towards surviving, to carry on feeding the parasite. Like a docile, castrated ox being fattened up, the crab is forced by the parasite to renounce reproduction and become a maintained source of food.

The situation reverses in those cases where parasites – ‘verticoparasites’ – pass to the next host generation in the gametes of the host. Verticoparasites infect only the offspring of their individual hosts rather than potential hosts at large. The genes of a verticoparasite share the ‘exit route’ of the host genes, so their extended phenotypic effects will agree with the host genes’ phenotypic effects. Exercise our usual cautious licence to personify, and consider the ‘preferred options’ of a verticoparasite such as a verticobacter. It travels inside the eggs of a host directly into the host’s child. Here, the interests of parasite and host coincide, and their genes ‘agree’ about the optimal host anatomy and behaviour. Both ‘want’ the host to reproduce, and survive to reproduce. Once again, if the genes of vertically transmitted parasites have extended phenotypic effects on their hosts, those effects should coincide, in perfect agreement and in every detail, with the phenotypic effects of the host animal’s ‘own’ genes.

Mitochondria are an extreme example of a verticoparasite. Long transmitted vertically down the generations inside host eggs, they became so amicably cooperative that their parasitic origins are hard to spot, and were long overlooked. A horizontoparasite such as Sacculina has opposite ‘preferences’. It has no ‘interest’ in its host’s successful reproduction. Whether or not a horizontoparasite ‘cares’ about its host’s survival depends on whether it can benefit from it, presumably, as in the case of Sacculina, by feeding on the living host. If, by castration, it can shift the balance of the host’s internal economy away from reproduction and towards survival, so much the better.

The tapeworm Spirometra mansanoides doesn’t castrate its mouse victims but it achieves a similar result. It secretes a growth hormone, which makes them grow fatter than normal mice. And fatter than the optimum achieved by natural selection of mouse genes seeking a balance between growth and reproduction. Tribolium beetles normally develop through a succession of six larval moults, increasing in size, before they eventually change into an adult. A protozoan parasite, Nosema whitei, when it infects Tribolium larvae, suppresses the change to adult. Instead, the larva continues to grow through as many as six extra larval moults, ending up as a giant grub, weighing more than twice as much as the maximum weight of an uninfected larva. Natural selection has favoured Nosema genes whose extended phenotypic effect was a dramatic doubling in Tribolium fatstock weight, achieved at the expense of beetle reproduction.

A small tapeworm, Anomotaenia brevis, needs to get into its definitive host, a woodpecker. It does so via an intermediate host, an ant of the species Temnothorax nylanderi, which has the habit of collecting woodpecker droppings to feed to its larvae. Tapeworm eggs are often present in the droppings, and can therefore find themselves being eaten by ant larvae. The parasite then has an interesting effect on the ant’s behaviour when it becomes adult. It refrains from work and is fed by unparasitised workers. Parasitised ants also live longer, up to three times longer, than normal ants. This increases their chance of being eaten by a woodpecker – which benefits the tapeworm.

There are parasitic flukes who persuade their snail victim to develop a thicker shell than normal. Shells are presumably an adaptation to protect the snail and prolong its life. But a shell, like any other part of the body, is costly to make. In the personal economics of snail development, the price of thickening the shell is presumably paid out of non-shell pockets, such as those committed to reproduction. Natural selection of snails has built up a delicate balance between survival and reproduction. Too thin a shell jeopardises survival. Too thick a shell, although good for survival, takes economic resources away from reproduction. The fluke, not being a vertically transmitted parasite, ‘cares nothing’ for snail reproduction. It ‘wants’ the snail to shift its priorities towards individual survival. Hence, I suggest, the thickened shell. In extended phenotype language, natural selection favours genes in the fluke that exert a phenotypic effect on the snail, upsetting its carefully poised balance. The thickening of the shell is an extended phenotype of fluke genes, benefiting them but not the snail’s own genes. This case is interesting as an example of a parasite apparently – but only apparently – doing its host a good turn. It strengthens the snail’s armour and perhaps prolongs its life. But if that were really good for the snail, the snail would do it anyway, without the ‘help’ of a parasite. The snail balances a finely judged internal economy. Too lavish spending on survival impoverishes reproduction. The parasite unbalances the snail’s economy, pushing it too far in the direction of survival at the expense of reproduction.

According to the gene’s-eye view of life that I advocate, genes take whatever steps are necessary to propagate themselves into the distant future. In the case of ‘own’ vertically transmitted genes, the steps taken are phenotypic effects on the form, workings, and behaviour of ‘own’ bodies. Genes take those steps because they inherit the qualities of an unbroken, vertically travelling line of successful genes that took the same steps through the ancestral past – that is precisely why they still exist in the present. All of our ‘own’ genes are good companions that agree with each other about what the best steps are. Everything that helps one member of the genetic cartel into the next generation automatically helps all the others. All ‘agree’ about the goal of whatever it is they variously do to affect the phenotype. And why do they agree? Precisely because, in every generation, they share with each other the same exit route into the next generation. That exit route is the gametes – the sperms and eggs – of the present generation. And now we return to verticobacters and other verticoparasites. They have exactly the same exit route as the host’s own genes, and therefore exactly the same interests at heart.

The genes of a verticobacter look back at the same history of ancestral bodies as its host’s own genes. Verticobacter genes have the same reason to behave as good companions towards our own genes as our own genes have towards each other. If an animal benefits from fast-running legs and efficient lungs for running, then its internal verticobacters will also benefit from the same things. If a verticobacter has an extended phenotypic effect on running speed, that effect will be favoured only if it is positive from the organism’s point of view too. The interests of host and bacterium coincide in every particular. A horizontobacter, on the other hand, might be more likely to ‘want’ its victim, when pursued, to cough with exhaustion – coughing being exactly what the horizontobacter needs in order to get itself passed on to another victim. Or another horizontobacter might want its victim to mate more promiscuously than the optimum ‘desired’ by the host’s own genes, thereby maximising contact with another host, and hence opportunities for infection. An extreme horizontobacter might devour the host’s tissues completely, reducing it to a bag of spores which eventually bursts, scattering them to the winds, where they may find fresh hosts to conquer.

A verticobacter ‘wants’ its victims to reproduce successfully (which means, as we saw earlier, that ‘victims’ is not really an appropriate word). Its ‘hopes’ for the future precisely coincide with those of its host. Its genes cooperate with those of the host to build a strong body surviving to reproductive age. Its genes help to endow the host with whatever it takes to survive and reproduce; with skill in building a nest, diligence in gathering food for the infants, success in fledging them at the right time to prepare to reproduce the next generation, and so on. If a verticobacter happens to have an extended phenotypic effect on a host bird’s plumage, natural selection could favour verticobacter genes that brighten the feathers to make the host more attractive to the opposite sex. Verticobacter genes and host genes will ‘agree’ in every respect.

Exactly the same argument applies to viruses, of course. And now we approach the twist in the tail of this chapter and this book. Any virus that travels from human (for example) generation to generation via our sperms or eggs will have the same ‘interests’ as our ‘own’ genes. Whatever colour, shape, behaviour, biochemistry is best for our ‘own’ genes will also be best for (let’s call them) verticoviruses. Verticovirus genes will become good companions of our own genes, accounting for the familiar fact that viruses can help us as well as harm us. Horizontovirus genes, by contrast, don’t care if they kill their victims, so long as they get passed on to new victims by their route of choice – coughing, sneezing, handshaking, kissing, sexual intercourse, whatever it is.

A good example of a horizontovirus is the rabies virus. It is transmitted via the foaming saliva of its victims, whom it induces to bite other animals thereby infecting their blood. It also leads its victims, for example ‘mad’ dogs, to roam far and wide (and out in the midday sun), rather than stay, perhaps sleeping, within their normal home range. This helps the virus by spreading it over a larger geographical area.

What would be a good real example of a verticovirus? It has been estimated that about 8 per cent of the human genome actually consists of viral genes that have, over the millions of years, become incorporated. Among these ‘retroviruses’, some are inert but others have effects that are beneficial. For example, it has been suggested that the evolutionary origin of the mammalian placenta was the result of a beneficial cooperation with an ‘endogenous’ retrovirus that succeeded in writing itself into the nuclear DNA. LP Villarreal, a leading virologist, has gone so far as to suggest that ‘viruses were involved in most all major transitions of host biology in evolution’, and ‘From the origin of life to the evolution of humans, viruses seem to have been involved … So powerful and ancient are viruses, that I would summarize their role in life as “Ex virus omnia” (from virus everything).’

And now, can you see where I am finally going in this chapter? In what sense are our ‘own’ genes different from benign, good companion viruses? Why not push to the ultimate reductio? Why not see the entire genome as a huge colony of symbiotic verticoviruses? This is not a factual contribution to the science of virology. Nothing so ambitious. It’s more like an expansion of what we might mean by ‘virus’ – rather as ‘extended phenotype’ was an expansion of what we might mean by ‘phenotype’. Our ‘own’ genes are verticoviruses, good companions held together and cooperating because they share the same exit route to the next generation. They cooperate in the shared enterprise of building a body whose purpose is to pass them on. Viruses as we normally understand the word, and computer viruses, are algorithms that say ‘Duplicate me’. An elephant’s ‘own’ genes are algorithms that say, in the words of an earlier book of mine, ‘Duplicate me by the roundabout route of building an elephant first’. They are algorithms that work only in the presence of the other genes in the gene pool. They are equivalent to an immense society of cooperating viruses.

I’m not just saying that our genome consists of ‘endogenous retroviruses’ (ERVs) that were once free, infected us, and then became incorporated into the chromosomes. That is true in some cases and it is important, but it’s not what this final chapter is suggesting. Lewis Thomas also didn’t mean what I now mean, although I would love to borrow his poetic vision in pushing the climax of my book.

We live in a dancing matrix of viruses; they dart, rather like bees, from organism to organism, from plant to insect to mammal to me and back again, and into the sea, tugging along pieces of this genome, strings of genes from that, transplanting grafts of DNA, passing around heredity as though at a great party.

The phenomenon of ‘jumping genes’, too, is congenial to my vision of a genome as a cooperative of verticoviruses. Barbara McClintock won a Nobel Prize for her discovery of these ‘mobile genetic elements’. Genes don’t always hold their place on a particular chromosome. They can detach themselves, then splice themselves in at a distant place in the genome. Some 44 per cent of the human genome consists of such jumping genes or ‘transposons’. McClintock’s discovery of jumping genes conjures a vision of the genome as a society, like an ants’ nest: a society of viruses held together only by their shared exit route, and hence shared future and shared actions calculated to secure it.

My suggestion is that the important distinction we need to make is not ‘own’ versus ‘alien’ but vertico versus horizonto. What we normally call viruses – HIV, coronaviruses, influenza, measles, smallpox, chickenpox, Rubella, rabies – are all horizonto viruses. That, precisely, is why many of them have evolved in a direction that damages us. They pass from body to body, via routes that are all their own, by touch, in the breath, by genital contact, in saliva, or whatever it is, and not via the gametic routes with which our own genes traverse the generations. Viruses that share the same genetic destiny as our own genes have no reason to dissent from good companionship. On the contrary. They stand to gain from the survival and successful reproduction of every shared body they inhabit, in exactly the same way as our own genes do. They deserve to be considered ‘our own’ in an even more intimate sense than mitochondria, for mitochondria pass down the female line only. And, from this point of view, our ‘own’ genes are no more ‘own’ than a retrovirus that has become incorporated into one of our chromosomes and stands to be passed on to the next generation by exactly the same sperm or egg route as any other genes in the chromosome.

I cannot emphasise strongly enough that I am not suggesting that all our genes were once independent viruses that later ‘came in from the cold’ and, as retroviruses, ‘joined the club’ of our own nuclear genome. That is known of some 8 percent of our genes, it may be true of many more, it is interesting and important, but it is not what I am talking about here. My point is rather to downplay the distinction between ‘own’ and ‘other’, and to emphasise instead the distinction between vertico and horizonto.

Our entire genome – more, the entire gene pool of any species of animal – is a swarming colony of symbiotic verticoviruses. Once again, I’m not talking only about the 8 percent of our genome that consists of actual retroviruses, but the other 92 percent as well. They are good companions precisely because they are vertically transmitted, and have been for countless generations. This is the radical conclusion towards which this chapter has been directed. The gene pool of a species, including our own, is a gigantic colony of viruses, each hell-bent on travelling to the future. They cooperate with one another in the enterprise of building bodies because successive, temporary, reproduce-and-then-die bodies have proved to be the best vehicles in which to undertake their vertical Great Trek through time. You are the incarnation of a great, seething, scrambling, time-travelling cooperative of viruses.
2024-11-07
张英洪：北京周边村庄调研的情况

顺义区赵全营镇东绛洲营村调研报告

一、村庄基本情况

顺义区赵全营镇东绛洲营村是北京郊区一个比较普通的北方村庄，全村总面积约1600亩（其中被征地约300余亩）。2018年底，该村常住人口370人、110户，其中外来人口近100人，这些外来人口主要是在附近空港企业上班租住在村内的人口。全村耕地面积680.3亩（其中基本农田508亩），林地186亩（含平原造林80亩），园地15亩，水面40.47亩。2000年该村以1999年12月31日为时间节点完成了承包地确权，当时参加土地确权的户籍人口289人，人均确权地3.03亩。

2018年该村集体总收入279.89万元（比2017年的230.9万元增加48.99万元），其中财政补贴奖励138.37万元（比基2017年的155.6万元减少了17.23万元）；村集体全年总支出240.65万元（比2017年的218.8万元增加了21.85万元）。2018年，该村农民人均纯收入3.16万元，在全区处于中上水平。

近十年来，该村有过三次小规模的征地。第一次征地发生在2010年，因空港C区建设征收土地153亩，每亩征地补偿费9万元；第二次征地发生在2013年，也是因为空港C区建设征收土150余亩，每亩征地补偿费11万元；第三次征地发生在2017年，因修建京沈客运高铁专线征收土地近5亩，每亩征地补偿费20万元。在空港C区建设二次征地中，根据北京市政府2004年148号令，相应确定该村农转非人员共37人（其中第一次16人，第二次21人）。京沈客运高铁专线征收土地将给该村一个劳动力转非指标。该村有土地征收补偿费1498万多元，专账管理，村里可以使用征收补偿费的利息用于发放村民福利。2018年利息约21万元，其中70%分配给村民福利，30%留作村集体使用。

该村产业主要是籽种、花卉、苗圃，其中花卉以种植蝴蝶兰为主，面积150多亩，具备一定的规模。该村确权地的流转分两种情况：一种是村民自主流转，约200多亩；另一种是村民将确权地统一流转给村集体，由村集体再流转出去。2014年该村确定的流转给村集体的土地流转费为每亩1200元，至今没有调整土地流转费标准。2018年村里发放给村民的土地流转费76万元。全村有劳动力160多人，基本上都外出打工。村里只有2户村民自家种植苗圃，面积不到10亩，其他村民都将土地流转出去了。

该村曾有三家从事冲压件加工生产的工业企业，解决本村二三十村民就业，加上附近村民，共约六七十村民就业。村里每年除了收取上述三家企业约27亩的土地租金外，还可以从三家企业获得20多少万元的税收返还收入。2017年因环保督查，这三家企业被关闭，现该村已没有工业企业。

该村共有中共党员26名，其中4名离退休党员。仅有的三户低收入户已于2018年脱低。该村2名低保户董克立、王长青，均存在智障，每月领取家庭保障资金1485元。

2018年该村完成了违章建筑拆除后，相应加强了环境整治和绿化工作。从直观上看，该村绿化总体较好，村庄规划建设有序，环境卫生优良。村里还建立了一套村民福利制度，2018年全村发放村民福利费48.83万元。

该村有房姓、董姓、张姓、丁姓等姓氏。我随意走到村民房晓兴的家里与之聊天，生于1965年的房晓兴只有两口子在家，他们唯一的闺女已经嫁到通州区，10多亩土已经流转给村集体。房晓兴是该村2名保洁员之一，村里另设有6名专职巡防员，加强村内治安和环境等方面工作。村书记张亚军已任村书记25年，带领该村获得的荣誉不少，其中有首都绿色村庄、首都文明村、北京市民主法治示范村等荣誉称号。

二、存在的主要问题

经过初步的调研，我发现该村存在的问题可以分为具体问题、发展问题和深层问题三个方面。

（一）具体问题

一是停车位问题。该村村民自购小车较多，目前没有划出正式的停车位，存在一些随意停车和其他不规范的现象，影响村容，也产生一些不方便之处。

二是建筑垃圾处理问题。村里主要有生活垃圾和建筑垃圾两大类，对于生活垃圾，已经实行户整理、村收集、镇运输，镇村每天收集运输生活垃圾二次，可以说生活垃圾的处理已不成问题。现在关键是建筑产生的渣土垃圾的处理比较困难，一些建筑垃圾处理点对于土多一点的建筑垃圾拒绝回收，一般垃圾运输车也不准上公路，需专用建筑垃圾车才能上路运输，这些问题有一定的普遍性。

三是煤改电设备补偿问题。2015年该村列入煤改电试点村，每户村民花费7000元用于购买煤改电设备。而2016年、2017年在全镇推广煤改电项目时，煤改电设备全部免费配送给村民。这使该村村民感到明显的不公平。为平息民怨，村里从村集体资金里对每户村民进行了补偿，但上级至今未对该村进行相应补偿。

（二）发展问题

一是设施农业发展问题。2018年以来的“一刀切”式的大棚房清理，导致该村设施农业受到毁灭性打击。该村反映，作为假借设施农业之名行建房之实的“大棚户”，的确应该严格清理，但真正从事蔬菜种植的大棚，则需要建设一定比例的配套操作房间，才能正常开展农业生产。2018年8月底，原驻在该村的顺义区三农研究会建设的有关大棚和房屋被全部拆除。有的规定蔬菜大棚内的作业小道不超过60公分宽，但相应的农用小推车往往超过60公分。这些政策明显脱离实际。

二是生猪饲养问题。2018年以来，随着非洲猪瘟的爆发，该村对2处养猪场所进行了清退，对202头生猪进行了无害化处理。现在该村已无一家养猪。村民养猪受到了严格限制。

三是闲置厂房土地利用问题。该村因环保问题而关闭的三家工业企业，占地近30亩，现完全闲置。如何利用好这些村内的集体建设用地，发展壮大集体经济，是一个大课题。

四是农民合作社发展问题。目前该村尚未建立农民专业合作社，这在花卉等乡村产业中，不能很好地组织农民参与和发展。

（三）深层问题

一是人口老龄化问题。2018年该村有60岁以上的老人65人，到2019年增加到71人，老龄化率为19.18%，村庄人口老龄化问题相当突出。这个问题具有相当的普遍性。如何使老年人老有所养，是一个重大的民生问题。

二是村庄空心化问题。该村中青年人基本上都外出打工谋生，留在村内的多是一些老人。我们在村内溜达时，发现村庄虽然很整洁宁静，绿化也很好，但就是没见到年轻人，我只见到一些老人在晒太阳或聊天。村庄空心化同样具有普遍性。没有年轻人的村庄，就难以有生机活力和持续发展。

三是治理现代化问题。村集体经济组织在乡村治理中的功能和作用发挥的不够明显，村民参与治理的积极性和创造性比较缺乏。村庄治理的制度化、规范化、程序化有待于进一步健全完善。

三、几点建议

该村存在的一些问题，有的是村庄本身可以解决的，有的则是村庄自身无法解决的，需要从国家、政府以及社会等层面加以合力解决。

（一）针对具体问题的建议：一是与有关交通部门联系，做好村内停车位的规范化划分和有序化管理工作。二是与上级党委政府和有关部门联系反映，统筹解决建筑垃圾的回收处理工作。三是继续向上级党委政府争取解决煤改电相关设备费用补贴问题。

（二）针对发展问题的建议：一是改变“一刀切”式的大棚房清理方式，根据实事求是的原则，既做到坚决制止利用发展设施农业违规建设大棚房现象，又做到立足设施农业发展实际，制定有利于真正发展设施农业的相关政策。二是应当允许村民自愿饲养生猪，减少过多的行政干预，尊重农民的生产生活习性，克服农村工作的官僚主义、形式主义。三是新修订的《土地管理法》规定，允许集体经营性建设用地在符合规划、依法登记并经本集体经济组织三分之二以上成员或村民体表同意的条件下，通过出让、出租等方式交由集体经济组织以外的单位或个人直接使用。该村可以依此新规定，为产业发展利用好闲置厂房土地。四是根据该村籽种、花卉等产业发展实际，相应建立农民专业合作社，提高农民的组织化程度，扩大农民就业，增加农民收入。

（三）针对深层问题的建议：一是全面废止长期控制人口的计划生育政策，真正将自主生育权还给村民家庭，加快建立鼓励生育的政策体系，切实降低生育成本和教育成本；加强老有所养政策制度体系建设，在村里尽快建立老年餐厅，解决老年人就餐问题，同时建立健全老人照料服务体系和老人社会福利制度，保障老有所养。二是加快破除城乡二元体制，推动乡村振兴，实现城乡融合发展，实现人口和其他要素在城乡之间的双向自由流动，构建返乡人员自由选择的政策制度环境。三是创新集体经济组织建设，发挥集体经济组织在乡村治理中的应有作用；健全党组织领导的自治、法治、德治相结合的乡村治理体系，把权力关进制度的笼子里，确保高度集中起来的村庄公共权力置于法律和村民的监督制约之下，防止和惩治村庄腐败。特别是要防止任性而不负责的公共权力摧毁农业、折腾村庄、压制村民，要敬畏乡村发展的文化基因和内在规律，尊重农民的生产生活自主权，维护社会公平正义，着力实现乡村善治。

2019年9月13日

从“蚁族”聚居村到现代都市区——北京市海淀区唐家岭村城市化转型的调查与思考

唐家岭村隶属于北京市海淀区西北旺镇，20世纪90年代以来北京快速的城市化，推动了唐家岭村从传统乡村到城乡结合部，再到现代大都市社区的历史性飞跃。

一、基本情况：曾经著名的“蚁族”聚居村

2009年底，唐家岭村户籍人口3364人，其中非农业户籍人口2039人、农业户籍人口1325人，外来人口5万多人。外来人口相当一部分是在唐家岭村附近中关村企业上班的大学毕业生，他们被称为“蚁族”。

2010年5月，作为全市城乡结合部50个重点改造村之一的唐家岭村，以村民代表大会方式通过自主制定的全村腾退改造方案。2018年10月，唐家岭村委会建制被撤销，结束了村居并存的历史。截至2020年底，唐家岭社区常住户籍人口1335户3550人，辖区内居住总人口12939人；村域总面积483.06公顷，其中基本农田19.26公顷、园地141.41公顷、林地19.26公顷、规划用地231.58公顷、交通运输用地52.42公顷、水域及水利设施用地16.49公顷、其他用地2.64公顷。

二、唐家岭村城市化转型的主要做法

（一）实行旧村腾退搬迁上楼，集中建设唐家岭新城

2010年唐家岭地区正式启动整体改造工程，2012年7月开始回迁上楼。根据腾退安置政策，唐家岭村安置房面积按村民原有宅基地面积1:1置换。被腾退搬迁户家庭人均面积不足50平方米的，可按人均50平方米补足。唐家岭村腾退搬迁方案还规定了相关奖励政策。村民腾退旧村建成的唐家岭新城，占地面积11.7公顷，总建筑面积约为34.74万平方米，共18栋住宅3159套，居住户籍人口1335户。

（二）推进农村集体产权制度改革，成立股份经济合作社

唐家岭村以2010年12月31日为时点进行了清产核资，确认唐家岭村集体资产总额455412161.49元，净资产45514861.17元。唐家岭村股权设置包括集体股与个人股，集体股占10%、个人股占90%，全村共有1796人享有基本份额，股东去世与继承人合并入股，最终入股股东1791人。

2016年，唐家岭村经济合作社转制成立唐家岭村股份经济合作社。2019年12月，唐家岭村股份经济合作社完成农村集体经济组织登记赋码换证工作。2020年，唐家岭村股份社股东每年每股分红高达4万元。

（三）实行整建制农转非，实现农民身份市民化

进入21世纪以来，唐家岭村集体土地先后被征收1710亩，现在尚有集体土地4170亩。自2004年7月1日施行《北京市建设征地补偿办法》后，唐家岭村征地转非和整建制农转非均依此实施“逢征必转”“逢征必保”政策。在2006年前，唐家岭村征地转非306人；2006年，唐家岭村两次征地分别完成劳动力转非473人和200人，劳动力转非费用为3376.8万元；2011年，唐家岭村完成921人征地转非，劳动力转非费用为6431.6万元；2015年12月，唐家岭村进行最后一次280人的整建制转非，劳动力转非费用为709万元。2006年以后唐家岭村取得征地批复的土地1703.662亩，征地补偿金额为128334.805万元。

唐家岭村征地转非和整建制农转非一共涉及2180人，农转非费用共计29945万元，人均农转非费用13.7万元。其中：劳动力转非涉及1871人，劳动力转非费用共计13475万元，人均转非费用7.2万元；超转人员309人，缴纳超转费用16470万元，人均53.3万元。由于唐家岭地区整体转非时间比较早，且为了节约转非成本，唐家岭村前期优先安排了超转人员转非工作，所以人均53.3万元看起来相对不高。但是根据海淀区西北旺镇2020年整建制转非的6个村来看，一名超转人员最高转非费用高达766万元。

（四）创新集体土地入市方式，率先建设集体公共租赁住房

2012年，唐家岭村经批准，在全国率先开展利用集体产业用地建设公租房试点。唐家岭村公租房建筑面积73749.92平方米，共建成1498套公租房。按照有关要求，唐家岭公租房项目纳入政府保障性住房规划和年度计划，按照每平方米每月55元的价格整体租赁给海淀区住房保障办公室。2017年，唐家岭公租房项目正式移交海淀区住保办统一管理和配租。截至2021年底，唐家岭公租房居住率达到90%，居住在公租房里的人员基本上都是附近企事业单位的工作人员。2020年，唐家岭村股份经济合作社从公租房项目中收取租金4933万元。

（五）发挥集体经济组织主体作用，发展壮大集体经济

唐家岭村在城市化转型进程中，充分发挥集体经济组织即村经济合作社、村股份经济合作社在集体经济发展中的主体作用。2012年，经北京市政府和海淀区政府批准的唐家岭产业园项目，就是利用集体土地建设的产业园，总用地面积103680.97平方米。唐家岭产业园项目由唐家岭村经济合作社开发建设，建设总投资11亿元。2011年4月，唐家岭村与西北旺镇下属企业北京百旺种植园签订为期20年的土地租赁合同，租赁面积为448亩，年租金为179.2万元。截至2020年底，唐家岭村集体经济总收入1亿多元。

（六）撤销村委会，实现村庄治理社区化

2002年，唐家岭地区就设立了唐家岭社区居委会。2019年2月，海淀区人民政府正式批复撤销唐家岭村民委员会建制。唐家岭撤村后，唐家岭村股份经济合作社与社区居委会联合办公，各司其职，共同推进工作。股份社的主要职能是发展壮大集体经济，促进集体资产保值增值，切实维护股东合法权益；居委会的职能是办理社区居民的公共事务和公益事业，组织开展社区便民利民服务、公益服务和志愿互助服务等。当社区在服务居民的过程中，出现经费缺口，股份社通过股东代表大会决议，可以向社区提供活动经费。

三、思考与启示

唐家岭村城市化转型提供的最大启示，就是要实现从城乡二元体制中的传统城市化转向城乡一体的新型城市化。

（一）农村集体产权制度改革是维护和发展农村集体和农民财产权利的有效方式

北京市按照“撤村不撤社、资产变股权、农民当股东”的思路和原则推进农村集体产权制度改革，比较公平合理地维护了农村集体和农民群众的财产权利，坚持和发展了新型集体经济，这是城市化进程中城中村和城郊村实现城市化转型发展最为重要的基本经验。唐家岭村的城市化转型就是坚持和受益于这条基本经验。

但国家层面支持农村集体产权制度改革的税收政策法律建设滞后和缺位比较突出。农村集体产权制度改革过程中可能涉及的增值税、企业所得税、土地增值税、资产转移所涉税收、回迁房和农民安居工程所涉税收、集体收益分配税收（红利税）等，都缺乏相应的税收政策法律支持。为深化农村集体产权制度改革，国家层面应当尽快研究出台支持集体产权制度改革和农村集体经济发展的税收制度、财政制度、金融制度，应当减免农村集体产权制度改革中相关税收，加大财政金融支持。

（二）农村集体经济组织是社区投资建设、经济发展和治理的重要主体

唐家岭村集体经济组织在城市化转型发展中发挥了不可替代的作用，主要体现在三个方面：一是发挥了村庄投资开发建设主体作用。唐家岭村经济合作社（股份经济合作社）及其所属公司承担了唐家岭村腾退改造和投资开发建设的重要任务，这就保障了村集体和村民成为村庄城市化建设的主体。二是承担了集体经济发展壮大的主体责任。唐家岭村经济合作社（股份经济合作社）及其所属公司负责集体产业园区建设和其他集体经济发展责任，这与那些将集体经济组织排除在外的村庄经济建设模式形成鲜明对比。三是发挥了社区治理的重要作用。无论是撤村前的村庄社区还是撤村后的城市社区，集体经济组织都是社区治理的重要主体之一，特别是在村庄城市化转型中，集体经济组织具有其他组织都难以具备的文化纽带、情感维系、经济依赖、服务保障等生活共同体功能。

但集体经济组织的发展仍然面临不少问题，需要与时俱进地改革完善。一方面，从外部环境上说，亟须加快构建集体经济组织公平发展的制度环境。另一方面，从内部治理来说，应当高度重视集体经济组织内部治理体系和治理能力现代化建设，维护和发展集体经济组织成员的民主权利和财产权利。

（三）集体建设用地入市是增强村庄自主发展的重大制度创新

农村集体建设用地入市是一项让多方受益的重大制度创新成果。一是实现了城乡结合部地区村庄从低端的“瓦片经济”向中高端的“租赁经济”的成功转型；二是为城乡结合部地区大量外来就业人口提供了相对体面的居住需要；三是为发展壮大集体经济提供了有保障、低风险、可持续的收入来源。

随着新修订的《土地管理法》及《土地管理法实施条例》施行，已于2004年7月1日施行的《北京市建设征地补偿安置办法》与上位法及实际情况极不相符，亟须全面系统地加以修改。一是建议由市人大常委会组织开展《北京市建设征地补偿安置办法》的修改工作，统筹兼顾，超越部门利益的羁绊，保障地方立法的公正性和权威性。二是适应乡村振兴和新型城市化发展的现实需要，调整和改变长期以来土地增减挂钩的政策做法，保障和规范城乡结合部地区村庄以及传统乡村地区产业用地的需求。三是保障农村集体经济组织利用集体经营性建设用地入市的自主权，规范集体经营性建设用地入市相关程序，制定公平合理的集体经营性建设用地入市税费政策，保障集体经济组织及其成员依法合理享有集体经营性建设用地入市的收益。

（四）城乡一体化的制度供给是新型农村城市化的迫切需要

在城乡二元体制尚未破除的情况下，农村城市化模式的基本内容：一是通过政府强制征地，将农村集体土地变性为国有土地，然后在国有土地上进行开发建设；二是通过征地农转非或整建制农转非，将农业户籍身份转变为非农业户籍身份；三是农村集体和农民缴纳巨额费用，将转非农民纳入城镇社会保障体系。唐家岭村的城市化转型，既体现了新型城市化的创新探索，又带有深刻的传统城市化模式的烙印。

新时期推进农村新型城市化，必须坚持和体现城乡一体化发展的根本要求。

一是贯彻落实城乡统一的户籍制度改革政策，停止实行征地农转非和整建制农转非政策。2014年7月国务院《关于进一步推进户籍制度改革的意见》以及2016年9月北京市政府印发的《关于进一步推进户籍制度改革的实施意见》，都明确规定建立城乡统一的户口登记制度，取消农业户口和非农业户口的划分，统一登记为居民户口。因此，征地农转非和整建制农转非已经失去了基本的政策前提，建议尽快修改《北京市建设征地补偿安置办法》中有关“逢征必转”的规定，不再实行征地农转非和整建制农转非。公安部门应当依据城乡统一的户口政策，免费将全市户籍居民户口统一更改登记为居民户口。全市城乡居民只有居住地和职业之分，不再有农业户口和非农业户口之别。

二是贯彻落实《土地管理法》和《土地管理法实施条例》，缩小征地范围，保障和规范集体建设用地入市。建议尽快修改《北京市建设征地补偿安置办法》有关建设征地的规定，严格遵守因公共利益需要征收农民集体土地的规定；明确和规范农村集体经济组织使用集体建设用地兴办企业或者与其他单位、个人以土地使用权入股、联营等形式共同兴办企业的相关规定，保障和赋予农村集体经济组织更多的土地发展权，发展壮大集体经济，促进共同富裕。随着城市化和城乡一体化的发展，一个重要现象是，城市也有农村集体土地，也有农业产业；农村也有国有土地，也有非农产业。因此有关“城市土地属于国有、城市郊区和农村土地属于集体所有”的静止性法律规定应当重新认识和调整。

三是加快推进和实现城乡基本公共服务均等化，改变“逢征必保”政策体系。在城乡统一的社会保障制度建立之前确立的“逢征必保”政策已经不合时宜，建议尽快废止《北京市建设征地补偿安置办法》有关“逢征必保”的规定及其延伸的超转人员生活和医疗保障规定，统一走城乡基本公共服务均等化之路。应当明确的是，不管是否被征地，农民都应有平等享有社会保障的权利。应当按照城乡基本公共服务均等化的政策路径加快提高农民社会保障水平。建议将城镇职工和城乡居民两套基本医疗保险、基本养老保险政策，统一整合为不分城乡、身份和职业的基本医疗保险和基本养老保险。为加快补齐农民社会保障短板，建议从土地出让收入中设立专项资金用于提高农民社会保障水平，可以优先补齐撤村建居地区农民社会保障与市民社会保障的差距。

四是统筹推进城市化中的撤村与建居工作，将社区公共服务供给纳入公共财政保障体系。撤村与建居是城市化中的重大问题，涉及多个职能部门方方面面的工作，需要统筹兼顾，相互衔接。城市化进程中撤销村委会后，原村委会负责的社区公共管理和公共服务事务应当有序移交给社区居委会负责，相关公共产品供给费用应当纳入公共财政保障范围。撤村后保留和发展起来的集体经济组织在社区公共治理中承担重要职责，政府应当对集体经济组织所承担的社区公共服务给予相应的财政补贴，或减免相关税费，合理减轻集体经济组织的社会性负担。

百年辛庄变新庄——北京市昌平区兴寿镇辛庄村的调查思考与建议（节）

作为北京市昌平区兴寿镇所辖21个行政村之一的辛庄村，有着数百年的建村历史，曾是一个十分普通平凡的北方村庄，但在2023年10月召开的北京市“百村示范、千村振兴”工程动员部署会上，辛庄村入选全市首批19个、昌平区唯一一个乡村振兴示范创建村行列。为探究辛庄村近十多年来的发展密码，助推乡村振兴示范村创建工作，展望首都乡村未来前景，2023年10月—12月，笔者先后7次到该村调研，发现辛庄村发展的一条重要路径是“环境好、人才聚、村庄兴”，展现出的一条重要特征是“一村涵容一学校，一校激活一村庄”，深藏其中的一条活乡兴村密码是“开放、包容、融合”。辛庄村在发展特色草莓产业的基础上，积极营造优良环境吸引向上学校等城市要素进村发展，向上学校则以丰富的人才资源助力辛庄村发展，实现了城乡要素优势互补、有机结合，推动了城乡融合发展的村庄实践。入选全市首批乡村振兴示范村创建行列后，辛庄村应当立足北京城市战略定位，坚持首善标准，着眼于建设中华民族现代文明，高起点高标准高品位推进乡村振兴示范村创建工作，努力建设成为一个拥有莓好产业、美丽乡村、美好生活、美学品格、美满幸福，具有高国民素质、高文明程度、高生活品质的首都发达村庄。

一、基本情况

辛庄村位于北京市昌平区兴寿镇东部，村域面积3407亩，其中农用地1671亩，集体建设用地985.4亩。在农用地中，耕地1075亩、园地546亩、林地46亩、其他农用地4亩。在集体建设用地中，农村宅基地600.8亩，共有宅基地360宗；现有集体经营性建设用地11.5亩。2011年6月8日，辛庄村完成农村集体产权制度改革，成立村股份经济合作社，共有股东1259人，股东实行静态管理。产改时点量化全村集体资产总额4268.8万元（含资源性资产）。2021年全村股金分红104.6万元，2022年股金分红85.96万元。截至2022年12月底，全村常住人口1670人，其中辛庄村户籍户数543户，户籍人口1013人，其中农业户319户626人，60岁以上人口340人。村“两委”干部9人，党员102人，村民代表47人。2022年村集体经营性收入227.2万元，农民人均所得19662元；2023年上半年村集体经营性收入191.3万元，农民人均所得10157元。

20世纪90年代以来，在快速城市化进程中，辛庄村年轻人纷纷离开村庄进城谋生求发展，村庄成为老人的留守之地。与许多村庄一样，辛庄村属于典型的空巢老人村庄。但这个传统的普通村庄，最近十多年来发生了巨大的变化，从一个老人留守的空心村发展成为网红打卡村，这主要缘起于辛庄村顺应城市化和逆城市化发展的需要，积极营造优良的环境，吸引一批批市民下乡进村，使城市要素与乡村资源、现代文明与农耕文明有机结合与融合发展，从而催生了该村从一个十分普通的村庄跻身到全市乡村振兴示范创建村的历史性飞跃。新时代的辛庄村是辛勤的新老村民在城市化与逆城市化并存的城乡融合发展大潮流中共同创造出来的新村庄。

二、主要做法

辛庄村所在的昌平区是首都西北部生态屏障，确立了建设科教引领、文旅融合、宜居宜业生态城市的发展目标；所在的兴寿镇有“北京草莓第一镇”之称。在昌平区委、区政府统筹推动和兴寿镇党委、政府的直接领导下，辛庄村结合自身实际，主动适应城乡融合发展大势，积极营造优良的宜居宜业环境，团结和带领新老村民群众走上了一条“环境好、人才聚、村庄兴”的发展之路。该村的主要做法有以下几方面。

（一）引进民办学校扎根，开启自然教育兴村新起点

十多年前，辛庄村积极引进以自然教育为理念的民办教育机构向上学校进村扎根发展，从此开启了该村教育兴村的新起点。向上学校（原名南山艺术学园）创办于2009年，最初由20多位创办者选择昌平区小汤山镇讲礼村办学，当时只有3个班58名学生。2012年7月，向上学校搬至办学环境更好的昌平区兴寿镇辛庄村的果满地扎根发展。在当年一些地方对市民进村创业并不欢迎甚至歧视排挤的情况下，辛庄村李志水书记却以开放包容的胸襟引进向上学校（2022年南山艺术学园与昌平向上学校合并，统称为向上学校，另保留南山艺术幼儿园），并为向上学校（南山学园）的生存和发展提供了许多便利条件，创造了适宜的创业生活环境。向上学校（南山学园）是由一批心怀自然教育理想、向往乡村田园生活的市民，到乡村寻找宜学宜居环境而创办的新式民办教育机构。他们推崇和践行自然教育，秉持以人为本、注重身体和心灵整体健康和谐发展的全人教育理念，注重传承和弘扬我国道法自然、天人合一的自然观以及源远流长的农耕文化传统，深得不少对城市生活感到焦虑和厌倦的市民们的认同。当年辛庄村“两委”干部在一家民企老板拟高价租地建私人庄园与几个市民只能低价租地办学之间，最终决定将村里一块30亩地以年租金45万元租给了相对更少租金的向上学校（南山学园）。当时村干部认为在村里办文化教育要比建私人庄园更好。正是村干部这个非常正确的选择，在成就了向上学校（南山学园）的同时，也成就了辛庄村。俗话说“栽下梧桐树，引得凤凰来。”辛庄村“两委”栽的“梧桐树”就是营造了吸引城市要素进村的良好环境，而向上学校（南山学园）就是辛庄村引来的“金凤凰”。向上学校（南山学园）最初在辛庄村办学时只有6名学前教育的学生，2023年已发展到330多名学生、80多名全职教师。该校授课老师均为大专以上学历，其中本科学历占46%，研究生以上学历占22%。向上学校（南山学园）是辛庄村最近十多年取得突破性发展极为重要的发动机和动力源。拥有高学历、高收入的向上学校（南山学园）学生家长们常年租住在该村生活和创业，日积月累汇聚成了该村文化教育兴村的强大能量。

（二）开展人居环境整治，树起生态健康立村新标杆

为改变当年村庄人居环境比较恶劣的状况，为向上学校（南山学园）师生、新老村民营造干净卫生舒适的人居环境，辛庄村“两委”干部与向上学校（南山学园）学生家长们共同开展了村庄人居环境整治行动。2016年3月，向上学校（南山学园）学生家长杨婧、唐莹莹等7位妈妈率先在村里组成“净公益”环保小组，开展“减塑环保”行动，坚持不用、少用塑料袋、纸杯等一次性物品。2016年6月9日，辛庄村全面启动垃圾分类工作，全村取消垃圾堆放点和垃圾桶，实行“两桶两箱分类法”，走在了全市乃至全国农村生活垃圾分类的前列。所谓“两桶两箱垃圾分类法”，就是全村各户在家中将厨余垃圾放一桶、其他生活垃圾放一桶，将有毒有害垃圾放一箱、可回收物品放一箱。村委会分别对应“两桶两箱”进行收集，实现垃圾不落地。经过两年努力，辛庄村人居环境显著改善，成功创建了农村生活垃圾分类的“辛庄模式”。2021年4月辛庄村被评为北京市生活垃圾分类示范村。2018年兴寿镇以辛庄村为样板，在全镇其他20个村推广生活垃圾分类工作，形成了农村生活垃圾分类的“兴寿模式”。2019年，辛庄村“两委”根据兴寿镇党委、政府统一工作部署，集中开展了村庄环境治理，拆除了私搭乱建，进一步改善了村容村貌。2020年1月4日，向上学校（南山学园）学生家长们联络中国生物多样性保护与绿色发展基金会良食基金在村里举办“新年食尚发布会暨辛庄良食节”活动，传递健康饮食和环保文化，提倡绿色有机食品，倡导安全健康生活。2021年1月，辛庄村被评为“首都文明村镇”，村党支部书记李志水被授予“首都环保达人”称号。2021年1月8日，在《新京报》第14届“感动社区人物评选颁奖典礼”上，杨婧获得感动社区人物金奖。

（三）营造乡村创业环境，形成人才产业兴村新气象

辛庄村“两委”为向上学校（南山学园）的师生及学生家长们不断营造良好的就学就业创业创新环境，实现了新老村民的和谐共生与生产生活的良性循环。2020年至2022年，在新冠疫情的影响下，越来越多来自北京中心城区乃至全国各地的高知人群为躲避疫情、远离都市，纷纷将孩子送到辛庄村里的向上学校（南山学园）学习，自己则租住村民闲置房子生活和创业。据初步统计，到2023年12月，辛庄村向上学校（南山学园）吸引了来自全国各地近400名学生、200多户新村民，在新村民中有7名博士、72名硕士、125名本科、59名党员齐聚辛庄村生活创业。传统村庄自身不可能培养产生并留住如此多数量、高素质的人才群，这为人才兴村提供了最为宝贵的人才资源。正因为辛庄村为各种高素质人才提供了良好政治生态和人文环境，从而将一个曾经寂静的空巢老人村激活成了创客云集、业态繁多的产业兴旺村。截至2023年12月底，该村共有外来创客70余家，其中教育培训11家、餐饮14家、民宿12家、医疗健康7家、非遗手工7家、超市6家、咖啡馆4家、糕点茶艺4家、露营营地1家、农业企业8家，新村民带来社会资本投资累计达1.2亿元。新村民的创业与生活，每年为村庄创造租金收入1053万元，明显带动了本村原住村民就业增收、拉动了农特产品的生产销售、提升了村庄教育文化品位。辛庄村创客创业的影响力也辐射到周边的东新城村、西新城村、上苑村、下苑村等9个村。辛庄村每两周举办一次环保市集，形成了京郊网红一条街，每次环保市集吸引1000人左右的体验消费者。草莓是该村主要种植作物和特色支柱产业，辛庄村依托2013年3月就开始举办的北京农业嘉年华，推动了全村草莓的种植、销售和农旅体验等活动。“昌平草莓”是国家地理标志产品，兴寿镇被称为“北京草莓第一镇”。昌平草莓看兴寿，兴寿草莓看辛庄。辛庄村在2003年就开始种植近300亩的红颜草莓。2023年底，全村现有温室草莓大棚518栋，种植面积310.8亩，草莓总产量486吨，总产值1742.86万元。此外，该村还有蔬菜大棚28栋，种植面积42亩；苹果种植面积146亩。经过多年的发展，辛庄村已初步形成了以绿色有机草莓为主导的乡村特色种植业与以民办教育为带动的乡村都市型服务业这两大产业集群相互促进、相得益彰、共同发展的特色产业兴村新格局。

（四）推行共建共生共享，绘就和美乡村治理新画卷

目前，新村民与辛庄村原住民大约各占村庄常住人口的一半，新老村民共同构成了新时期辛庄村的生活共同体。辛庄村“两委”秉持共建共生共享理念，积极搭建有助于村“两委”干部与新老村民、新村民与老村民、村庄内部与外部世界、能人创业与共同富裕、农耕文化与现代文明共创共生、相得益彰的“五色金桥”，营造了良好的村内政治生态和村庄人文环境，丰富了村民的七彩生活，展现了具有自身特色的共建共生共享的生动实践，为村庄的持久发展奠定了基础。一是坚持党建引领，搭建红色服务桥。村“两委”为新村民创业与生活提供租房、租地、用水、修路、停车等各方面服务，为老村民提供出租房屋、销售农产品、就业、养老等方面服务。积极组织新老村民参与村庄人居环境整治等各项公共事务和公益事业。二是立足自然环保，搭建绿色生态桥。村“两委”紧紧依靠新老村民，村立崇尚自然、敬畏生命的自然教育观、生态产业观，共同开展“净塑环保——垃圾不落地”活动，发展绿色有机草莓产业，推行绿色低碳生活。三是着眼人才兴村，搭建青色人才桥。一方面，吸引优秀人才来村里投资兴业，千方百计为新村民营造更加优良的创业生活环境，充分发挥新村民普遍具有高学历、高收入、高品位的优势，弥补村里人才严重不足的短板，特别聘请向上学校（南山学园）副校长为村长助理，发挥了十分重要的人才智力支撑作用。另一方面，积极培育原住民中的致富带头人，选派年轻人参加抖音乡村致富带头人培训，鼓励和欢迎新乡贤回村参加“我的家乡我建设”活动。四是凝聚社会力量，搭建橙色公益桥。充分发挥荣誉村民、友好商户、向上青年等社会力量，支持和引导志愿者发起和参与环保、良食、孝老、助残、文化、教育、阅读等公益活动。2023年2月，在第十二届书香中国·北京阅读季书香京城系列评选中，辛庄村荣获“书香社区”奖。五是实现融合发展，搭建蓝色和谐桥。辛庄村将党组织领导下的自治、法治、德治相结合的治理理念和方法融入到村庄日常生产生活之中，助推乡风文明建设，传承纯朴民风，建设平安村庄，促进新老村民和谐共生、融合发展。2022年12月，辛庄村被评为北京市民主法治示范村。

三、思考和建议

最近十多年来，辛庄村实现了从京郊一个普通村庄到脱颖而出跻身全市首批乡村振兴示范村创建行列的第一次历史性飞跃，开放、包容、融合是其发展的活村密码。未来几年，辛庄村需要实现从全市乡村振兴示范创建村到建成产业强、乡村美、农民富的全市乡村振兴示范村以及村强民富、生态宜居、数字乡村、文化繁盛、文明善治的全市乡村振兴样板村的新飞跃，同样离不开开放、包容、融合的兴村要诀。开放容融活乡兴村。为使辛庄村在全市乡村振兴示范村创建中实现高质量的全面振兴，努力建成高水平的首都发达村庄，形成“中国辛庄”的乡村品牌，我们重点提出如下几方面的思考和建议。

（一）紧扣北京城市战略定位，着力将辛庄村规划建设成为体现“四个中心”功能建设、提高“四个服务”水平的首都特色村

首都乡村既是展现北京“四个中心”战略定位、履行“四个服务”的广阔空间，又是展示中国文明形象及北京首善标准的重要窗口。首都乡村，是伟大社会主义祖国的首都乡村、迈向中华民族伟大复兴的大国首都乡村、国际一流的和谐宜居之都乡村。建设首都乡村，就是要充分体现北京“四个中心”功能建设、“四个服务”的基本职责。

在制定辛庄村示范村创建规划时，要提高站位，拓宽视野，将北京“四个中心”的战略定位和“四个服务”的基本职责融入到示范村创建规划之中，着力建设首都特色村。

一是在政治中心功能规划建设上，要高度重视、因地制宜将京郊乡村作为承担国家政务活动的重要场所进行高品位的规划建设。可以考虑将辛庄村作为具有中国农味、北京韵味、乡村品味的一个重要乡村场景，规划建设体现中国特色、展现首都特点、呈现草莓特征的现代生态农场，突出规划建设北京草莓研学第一村、城乡融合发展示范村、生态文明建设样板村，为承接有关国家政务活动营造重要的乡村平台。

二是在文化中心功能规划建设上，要弘扬和建设辛庄村世代相传的中华传统农耕文化，依托有机草莓和向上学校（南山学园），开设辛庄文化大讲堂，建立乡村振兴专家团，建设草莓文化馆、草莓文创研学园，推动草莓文化、自然教育文化、都市农业文化、城乡融合文化、乡村艺术美学等规划建设。重点要围绕提高国民素质和社会文明程度，推动形成文明乡风、良好家风、淳朴民风，创新新时代文明实践站建设方式，利用重要传统民俗节日，持续举办为村里老人贴春联、送月饼、百家宴、村晚等创意文旅活动，助推学习型村庄、书香村庄、和谐村庄、草莓艺术村庄、美学村庄建设，形成体现社会全面进步、人的全面发展的现代城乡融合新文明。

三是在国际交往中心功能规划建设上，充分发挥辛庄村自然田园风光、悠久农耕文化、城乡融合发展、多元文化共生的独特魅力，围绕“自然学堂、莓好辛庄，在辛庄看见未来村庄”定位，突出有机草莓、自然教育、乡村文化的主题，以开放、包容、融合的心态和视野将辛庄村规划建设成为具有国际交往活动重要功能的乡村大舞台之一，为官方与民间丰富多彩的国际交往活动提供京郊田园式的国际知名乡村品牌“中国辛庄”。

四是在国际科技创新中心功能规划建设上，对接昌平未来科学城、农业中关村，围绕有机草莓、自然教育、农文旅研等特色优势，将辛庄村纳入乡村科技研发基地和科技应用示范区，突出数字乡村的建设、应用与示范；依托有机草莓、向上学校（南山学园），拓展农业科学、自然科学教育，强化科学普及，培育科学精神，弘扬科学文化。实施科技+农业、科技+乡村等“科技+”系列工程，加强乡村数字新基建，提升村庄产业发展和村庄治理的数字化水平。

五是在提高“四个服务”水平规划建设上，关键是要结合乡村特有功能、立足辛庄村实际，发展高质量的生态农业和乡村服务业，重点是要提供有机草莓等优质安全的农副产品、崇尚自然的现代全人教育、观光休闲的田园美景、旅游体验的乡村生活、宜居宜业宜游的乡村软硬环境，努力将辛庄村打造成为北京有机草莓第一村、食品安全第一村、自然教育第一村、营商环境第一村、北京服务第一村。

（二）把握大都市郊区化发展趋势，切实将辛庄村规划建设成为率先实现城乡融合发展的典型示范村

城市化和逆城市化并存是当前我国经济社会发展呈现的共同特征。简单地说，城市化就是农民进城，逆城市化就是市民下乡。作为超大城市，北京的逆城市化现象早在21世纪初就已显现，具体表现为郊区化，郊区化是特大城市中心城区人口向郊区扩散的现象，是逆城市化在大城市郊区的呈现方式。北京的逆城市化现象既有政府主导的以疏解北京非首都核心功能为重点的京津冀协同发展战略，也有市民自发选择离开中心城区到郊区乡村居住生活与创业就业的自觉行动。辛庄村就是在北京逆城市化即郊区化发展大势中因市民下乡进村而发展起来的新村庄。逆城市化为促进城乡融合发展提供了强大动力和宝贵机遇。推动城市化和逆城市化，以作为全市首批乡村振兴乡村创建村，辛庄村要在率先实现城乡融合发展上走在前列，做出示范。

一是着力落实和创新户口登记制度，实现城乡居民户口身份上的平等和自由迁徙。实现城乡融合发展，既要打开城门，让农民进城成为新市民；也要打开村门，让市民下乡成为新村民。作为一个统一的现代国家，我们要建立健全全国城乡统一、开放、平等、公正的制度体系和制度框架，其中包括实现城乡居民户口身份上的平等和自由迁徙。应当将国务院和北京市有关户籍制度改革的最新政策意见真正落到实处，取消农业户籍与非农业户籍、本地户籍与外地户籍的划分，按常住人口居住地统一登记居民户口。在城市化和逆城市化进程中，农民选择进城就业居住生活就将其登记为城镇居民户口，市民选择下乡创业居住生活就将其登记为乡村居民户口。坚持户口随人走，社保随人转，从根本上解决人户分离问题。人始终是一个地区经济社会发展最重要最宝贵的第一资源，随着人口老龄化和少子化的加剧，人口资源的极端重要性将更加突显出来。建议取消“外来人口”“流动人口”的称谓，统一将进城的农民称之为新市民、进村的市民称之为新村民。辛庄村原住民中的年轻人大量进城就业居住生活，而留守在村里的老年人很难支撑村庄的可持续发展，新村民已经成为该村发展最为重要的生力军。为此，要将辛庄村常住人口中的新村民户口统一登记为辛庄村居民。切实保障城乡人口自由流动和迁徙，是从根本上破解乡村衰败、实现乡村振兴的战略举措。

二是加大公共产品和公共服务供给，实现城乡基本公共服务均等化和便利化。目前辛庄村常住人口中新老村民大致各占一半，属于大城市郊区率先呈现城乡融合发展自然形态的村庄，与传统村庄以及传统城区的人口结构形态完全不同，这对于城乡融合型村庄的公共产品供给和基本公共服务均等化、便利性提出了新的现实要求。在示范村创建中，既要加强乡村产业项目、村庄风貌提升项目、公共服务设施项目等硬件规划建设供给，更要突出加强乡村基本公共服务项目、乡村文化建设项目、乡村公共治理项目等政策法律法规制度软件的规划建设供给。第一，在村庄风貌提升和公共服务设施规划建设方面，要尊重自然，守护传统，敬畏文化，保护村庄特有的物质文化和非物质文化遗产，让村民望得见山、看得见水、记得住乡愁。因地制宜进行村庄微改造、精提升，加强“无废村庄”建设，重点加强村庄污水有效处理和达标排放，提升生活垃圾以及生产垃圾有效处理水平，强化美化、亮化，建设美丽庭院，实现村庄森林化、花园化、田园化、艺术化，进一步提升生态宜居水平，展现“诗意栖居”的新境界。第二，在乡村教育文化方面，要把优先发展农村教育文化事业落到实处，坚持公办教育和民办教育并重，强化教育兴村理念。在公办教育上，要加大教育投入，在实行免费义务教育的基础上，对学前教育、高中教育也要尽快实行免收学费和杂费，建立学生免费午餐制度，保障学生吃得安全放心。建立普惠性的学生福利和家庭教育福利制度。大力创新教育方式，加强自然教育、通识教育、乡村艺术美学等教育，着力解决教育严重内卷化问题，大幅度减轻学生及其家长作业负担。在民办教育上，首先要着力解决向上学校（南山学园）继续发展所面临的一些现实问题，创造更加优良的办学政策制度环境。第三，在村庄公共文化建设上，加强公共文化设施建设，加大村庄公共文化产品和服务供给，传承弘扬乡村文化，加强乡村文化遗产保护，推动艺术乡村建设，规划建设村民俗博物馆、村文化馆、村图书馆、村史馆，组织编纂村史。结合有机草莓、自然教育、农文旅研、城乡融合等特点，举办百家宴、村晚、草莓品鉴会等乡村文化艺术活动，结合草莓和自然教育元素丰富农民丰收节活动内容，以“文”的艺术、“美”的力量推动文化兴村。第四，在医疗养老等社会保障方面，着眼村庄常住人口需求，加强村社区卫生服务站投入建设，方便新老村民就近方便就医，并朝着免费医疗的目标不断提高村民就医报销比例。2023年北京市城乡居民基础养老金标准为每人每月924元，福利养老金标准为每人每月839元，合计为每人每月1863元，与城镇职工养老金的差距较大。针对农村人口老龄化的实际，参照城镇职工养老标准以及台湾农民养老标准，加大健康养老服务投入建设，不断提高农村基础养老金和福利养老金标准，缩小城乡养老待遇差距，提高村民老有所养水平。

三是积极适应城乡融合发展的趋势和需要，改革和创新有利于城乡要素自由流动的体制机制。2019年4月，《中共中央国务院关于建立健全城乡融合发展体制机制和政策体系的意见》，明确提出要坚决破除妨碍城乡要素自由流动和平等交换的体制机制壁垒，促进各类要素更多向乡村流动，在乡村形成人才、土地、资金、产业、信息汇聚的良性循环，为乡村振兴注入新动能。全面推进乡村振兴面临的突出问题是，城市要素在向乡村流动时，作为城乡二元体制重要一元的传统农村封闭性体制机制没有相应地得到系统性改革和创新，造成了比较突出的制度改革严重滞后于实践发展的畸形社会现象，亟须解放思想，将改革开放进行到底。第一，健全农民市民化、市民村民化的机制。顺应城市化、逆城市化和城乡融合发展的大趋势，全面改革城乡二元体制，加快建立健全城乡统一、平等、开放、公平的制度体系，同步提升城市包容性和乡村包容性，确保农民进城变市民、市民下乡当村民。在实施乡村振兴战略中，要系统性地将下乡进村居住生活和创业就业的市民作为当地新村民来改革完善相关政策制度。第二，按照“三权分置”要求创新土地制度。放活和保障农村承包土地经营权，让更多新村民通过土地流转获得土地经营权而成为新农人。在解决新村民住宅问题上，按照宅基地所有权、资格权、使用权“三权分置”要求，近期要放活农村宅基地和农民房屋的使用权，赋予新村民租住原居民闲置宅基地和房屋的使用权，并予以颁证保护。依法保障原住民的土地承包经营权、宅基地使用权、集体收益分配权不受侵害。第三，深化农村集体产权制度改革，创新集体经济组织经营管理方式。随着人口自然老化与流动，已完成农村集体产权改革所确定和固化的原初集体经济组织成员（股东）将日趋减少甚至最后消失。必须与时俱进增补新村民作为集体经济组织成员，才能有效延续和维护集体经济组织的可持续发展。可以创设集体经济组织新成员（新股东）身份，明确相应的权利义务，做到既不侵害原集体经济组织成员（股东）的正当权益，又有利于集体经济组织吸收新成员（股东）后的可持续发展。对标集体经济组织特别法人定位和新村民的优势资源，加大村党支部办好村集体经济组织力度，在村集体经济组织下设立公司和专业合作社，建立平台公司，推进乡村经营，从新村民中优先选拔任用乡村经营优秀人才。可借鉴浙江经验设立强村富民公司，负责村庄产业发展和农产品品牌打造、乡村休闲观光体验旅游、承接村庄工程建设和管护、物业服务等事项；结合本村实际设立和发展草莓合作社、自然教育合作社、旅游合作社、住房合作社等。通过基层组织创新和制度创新，发展壮大新型集体经济，造福村民群众，促进共同富裕。

（三）深入贯彻绿色发展理念，明确将辛庄村规划建设成为生态涵养区乡村绿色产业发展的健康典范村

绿色发展理念是尊重自然、顺应自然、保护自然的生态文明理念，是建设健康环境、守护健康生活、保障健康身心的理念。辛庄村要立足生态涵养区功能定位实现绿色发展，重点是要突出以有机草莓为主导的乡村特色型种植业、以民办教育为带动的乡村都市型服务业这两大特色支柱产业，明确“莓好产业、自然教育、农文旅研”等乡村产业发展定位，推动和实现生态产业化、产业生态化、乡村艺术化、艺术乡村化，打造食品安全、生态文明、城乡融合、村民共富的核心竞争力，建设绿色发展的健康村庄。

一是紧密结合全市“五子”联动要求实践绿色发展。辛庄村要主动参照或参与全市“五子”联动，以绿色发展为主线，以有机草莓、自然教育、农文旅研、乡村治理等为重点，推动乡村产业和乡村生活的生态化、绿色化、艺术化、健康化。第一，在参照国际科技创新中心建设中，强化科技赋能，积极对接“三城一区”（中关村科学城、怀柔科学城、未来科学城和北京经济技术开发区）主平台，引进科技要素入村，提升科技素养，为乡村振兴示范村创建插上科技的翅膀，重点引进和发展有利于有机草莓、自然教育、农文旅研、乡村治理等生态产业高质量发展和民生改善的科学技术，主动与国家和市属科研院所、国有企业合作，多方面开展科技示范项目，建设以有机草莓、自然教育等为主题的现代设施农业园区、自然教育园区、草莓研学园区，提升草莓、教育、文旅等乡村产业发展的科技含量和健康保障水平。第二，在参照“两区”即国家服务业扩大开放综合示范区、中国(北京)自由贸易试验区建设上，强化改革赋能，重在深化乡村绿色产业领域改革开放，发展有机草莓等高质量的乡村绿色产业以及自然教育等新型乡村服务业，建设市场化、法治化、国际化的乡村营商环境和开放型的乡村绿色发展体制机制。第三，在参照全球数字经济标杆城市建设上，强化数字赋能，推动现代信息技术在有机草莓、自然教育、农文旅研、乡村治理等生态农业和乡村生产生活领域的应用，着力促进数字技术与有机草莓、自然教育、农文旅研等乡村绿色产业深度融合。推动数字化赋能生态农业、数字化赋能乡村振兴、数字化赋能乡村健康服务、数字化赋能乡村治理。发展乡村数字普惠金融，更好满足创客等乡村经营主体的金融服务需求。第四，在参与以供给侧结构性改革创造新需求上，强化质量赋能，重点是大力发展以绿色有机草莓为代表的生态农业、以自然教育为引领的乡村新型服务业，打造绿色有机草莓生产加工品牌，为村庄生活人群和其他消费者提供绿色有机的农副产品，大力推行草莓、蔬菜、玉米等农作物的绿色有机种植和加工，推广自然教育、有机面包店、有机咖啡店、有机茶馆、有机餐厅和有机民宿等发展，率先建设首都健康有机乡村。第五，在参与以疏解北京非首都功能为“牛鼻子”推动京津冀协同发展中，迫切需要将京郊乡村与北京城市副中心、河北雄安新区一道作为疏解非首都功能的“鼎立三足”之一进行统筹规划建设。从全市层面看，一方面要加强顶层设计，将京津冀协同发展战略与首都乡村振兴战略有机结合起来推动乡村绿色发展，通过承接疏解的非首都功能促进京郊乡村振兴，以京郊乡村振兴助推京津冀协同发展。另一方面在制定政府主导非首都功能疏解到京郊乡村政策制度的同时，高度重视制定市场自主的非首都功能疏解到京郊乡村的政策制度。从辛庄村层面看，一方面要更加积极主动承接从市中心城区自主疏解到村里有利于绿色发展的城市要素，为向上学校（南山学园）等众多来自都市的乡村创客排忧解难，进一步营造可以预期、长期稳定的制度环境；另一方面要主动参与京津冀协同发展，在京津冀大范围内加强生态农业合作发展、农文旅研合作共享，扩大和形成辐射京津冀的村庄生产生活圈。

二是充分利用村庄周边特有的外部优势资源推动绿色发展。跳出村庄看村庄，以更宽广的视野将辛庄村周边特有的外部优势资源纳入规划建设之中。辛庄村距北京大杨山国家森林公园10.3公里，可以将辛庄村作为北京大杨山国家森林公园周边的休闲旅游体验度假村进行规划建设。辛庄村北靠燕山山脉，京密引水渠穿村而过，可借此做好绿色发展的山水大文章，开辟登山健身步道，发展乡村体育；规划建设燕山文化艺术馆、京密引水渠博物馆、艺术馆。主动对接昌平未来科学城，为在辛庄看见未来村庄注入科学元素与活力因子。通过引进科技元素发展科技农业、开设科技小院、建设科技之村。依托距北京农业嘉年华3.6公里的区位优势，大力发展有机草莓品牌和其他有机农业品牌，建设草莓研学园、有机农业园。辛庄村距离中国国家版本馆3.5公里，可借助中国国家版本馆优势，强化文化赋能，实现联动发展，传承弘扬中华优秀传统文化，发展乡村绿色农耕文化，建设辛庄村史馆、乡村博物馆、乡村文化馆、民俗艺术馆，组织编修村史村志，推动绿色文化兴村。

三是切实立足本村农味乡情优势和现有基础提升绿色发展。进一步提升人居环境整治水平，实施农村生活垃圾分类“辛庄模式”提升工程，规划建设环保主题公园，在新的起点上发挥全市农村生活垃圾分类示范村带动效应，大力开展村庄绿化、美化行动，推动乡村美学发展，大幅度提高村庄林木花草覆盖率，建设首都森林村庄、花园村庄、艺术村庄，营造乡村“诗意的栖居”。调整优化生态涵养区产业禁限目录，发展与生态涵养功能相适应的绿色产业，拓展绿色产业发展空间，落实有机草莓等绿色产业用地保障，推行村庄全域绿色有机农产品生产和精加工，积极创建农产品质量安全村、食品安全村、饮食安全村，保障新老村民和游客“舌尖上的安全”。促进有机草莓和自然教育的深度融合，进一步提升有机草莓品牌建设，打造北京草莓研学第一村，形成有机草莓+自然教育+城乡文化融合发展的新模式。持续推进京郊网红一条街建设，提升吸引广大市民参与体验的乡村网红市集的内涵和品质。加强与周边从事有机农产品生产加工的村庄、合作社、农场、企业等建立有机农产品生产销售联盟。充分发挥向上学校（南山学园）的资源优势，持续推动自然教育等乡村新型服务业的发展，规划建设产学研一体的自然教育园区，设立创客中心，切实为乡村创客提供更优良的法治化营商环境，展现“北京服务”的乡村样板。在加大财政资金支持示范村创建的同时，通过优化村庄营商环境，吸引金融资金、社会资本参与乡村振兴。积极对接各类金融机构，引导金融机构进村入户，紧密结合绿色有机草莓等生态农业发展、乡村创客等新型服务业需求，在乡村大地上做好科技金融、绿色金融、普惠金融、养老金融、数字金融支持示范村创建五篇大文章。推动金融机构为辛庄村有机农业发展、美丽乡村建设、人居环境改善、乡村创客创业、村民共同富裕等提供金融服务支持，着力建设金融惠农示范村、金融兴村示范村。加大政策性农业保险扩面、增品、提标工作力度，将草莓等有机农产品种植纳入农业保险，发挥农业保险在稳定新老农人从事农业生产的经营收入预期，建设农业保险示范村。

（四）着眼于建设中华民族现代文明，全力将辛庄村规划建设成为现代价值观引领乡村文明新风尚的善治样板村
一是要彰显和推行开放包容融合的善治之要。
二是要坚持和践行自治法治德治的善治之道。
三是要保障和发展人权产权治权的善治之本。

本文转自《北京农村经济》2024年第1期、第2期

2024-11-07
让-巴普蒂斯特·德·帕纳菲厄《沙滩上的智人：带着人类演化史去度假》

目录
序言
第一章起源
起立，猴子！ “大有可为”的基因突变开始双足行走双足行走有何好处？
第二章南方古猿
汤恩幼儿露西 “开枝散叶”的南方古猿傍人
第三章原始人类
在人属诞生之前怎样才算人类？最初的人属最初的工具容量与日俱增的大脑直立人是个大个子多种用途的两面器新面貌 “开枝散叶”的原始人
第四章去往世界尽头
走出非洲改变最初的欧洲人狩猎与传统火的掌控
第五章其他人属物种
欧洲的尼安德特人尼安德特文化尼安德特艺术家冰期的幸存者丹尼索瓦人弗洛里斯的“霍比特人” 其他人属物种的结局
第六章最初的智人
智人的出现既是智人又是现代人！伊甸园起源问题
第七章征服地球
从非洲到美洲迁徙造就智人旧石器时代晚期的文化新人类？
第八章史前时代的结束
中石器时代大型动物的灭绝新石器时代革命基因变化迁移
结语今天的智人
过去的痕迹基因的多样性人类种族存在吗？未来的人类控制演化的痴心妄想

序言

智人（Homo sapiens）是现今人类的祖先，大约在30万年前出现在非洲。在智人诞生前的数百万年里，非洲大陆上生活着一些双足行走的人科动物，它们的后代就是我们所说的智人了。在不断演化的过程中，一些人科动物彼此隔离，隔离的时间足够久之后，演化出了不同的物种。在历史上的许多时候，地球上同时生活着不止一种人类。

在过去的几十年里，由于古人类学家的辛勤努力，我们得以重建这一段人类历史，并绘制了人类的遗传树（古人类学家称之为系统发生树），其繁茂程度是前人所无法想象的。我们人类是如何从“人丁兴旺”的大家族中脱颖而出的？我们人类的演化是渐进式的还是跃迁式的？在演化过程中，我们是在什么时候，又是怎么成为今天意义上的人类的？这些问题里的一部分已经找到了答案，或至少找到了部分答案，但是这些答案又引发了新的问题。

从西班牙的“骨坑”到南非的斯泰克方丹洞穴，从格鲁吉亚的德玛尼西遗址到印度尼西亚的弗洛里斯岛，考古新发现层出不穷。得益于越发精细的考古挖掘技术，我们得以想象祖先生存的环境是什么模样。如今，我们也能够对岩石内部进行探测，进而揭示颅骨化石最微小的细节。而通过对化石进行化学分析，我们可以了解远古生物的饮食习惯。不过，真正意义上的革新，是对史前人类进行DNA（脱氧核糖核酸）分析。即便这种方法问世尚不足十年，古生物遗传学也已经取得了惊人的研究成果，比如确认了一个无人设想过的物种的存在，抑或是提供了不同种的人类曾经相互杂交的证据。时至今日，我们身上仍留有这段历史的痕迹。

无论是在社会层面还是政治层面，人类起源都是个敏感话题。如今，许多人仍然坚信宗教神话，不愿意面对冷冰冰的化石骨骸证据，不愿意相信人类源自动物的事实，不愿意承认人性是缓慢习得的。古人类学的历史，也是我们人类社会的历史。某些国家的研究人员在培养民族自豪感的目标驱动下，试图寻找某种比其他人更古老或更灵巧的古人类，以回溯本民族起源而非人类起源。

诚然，我们今天讲述的故事，未来可能会发生改变。未来的新发现，或将充实这套叙事，或将把某些篇章整个推倒重写。古人类学有助于我们理解我们是谁，并把人类作为一个具有多样性的整体来思考。我们之所以痴迷于研究自身的演化史，是因为它不但揭示了我们的起源，还揭示了我们的本性。

地质时期和文化年表

根据骨骼化石确定的不同人亚族的分布时期图

第一章　起源

黑猩猩和人类有许多相似之处，比如二者拥有一个最近共同祖先，由这个共同祖先分化而来。直至21世纪，我们才对自己的远祖——第一批原始人类——有了更加清晰的认识。

起立，猴子！

人类的祖先是一种哺乳动物，浑身毛发，长着尾巴和尖尖的耳朵，生活在旧世界，很可能过着树栖生活。 ——达尔文，1871

原始人类中可是有不少名人的，比如露西（Lucy，距今320万年），但我们的历史未必就要从露西开始写起。我们也可以把厚厚的家谱翻到30万年前，最早的智人降生的时刻；或者再往前翻到距今700万到1 000万年，最早的人亚族诞生时。我们还可以继续向前追溯：距今5 500万年，最早的灵长目动物登场；距今2.2亿年，最早的哺乳动物出现；大约5.5亿年前，最早的脊椎动物产生。

从动物学角度说，我们属于人亚族（Hominina）。人亚族包括了与黑猩猩亲缘关系更远、与现代人类亲缘关系更近的所有灵长目动物，比如南方古猿。自最早的人亚族诞生之时起，我们的历史便与现今依然存活于世的其他动物分道扬镳，因此，将关注点聚焦于最早的人亚族是个不错的选择。

根据古生物学和分子生物学数据，人类和黑猩猩的最近共同祖先生活在距今500万至1 000万年的非洲。之所以年代估算出现这么大的差值，是因为两门学科的研究成果无法就此达成一致：化石遗存显示最近共同祖先生活在700万到800万年前（甚至可能更早），但分子生物学的研究结果表明其生活在距今500万年到700万年之间。或许，以下事实能够解释出现这种现象的原因：在与祖先物种分化后，两个支系有过杂交，由此导致两个支系的分化期变长。

关于这个最近共同祖先，除了它可能群居且茹素外，我们所知甚少。我们不知道它究竟是四足行进还是双足行走。如果它四足行进，那么人亚族就是自行发展出了双足行走的典型特征；如果它双足行走，那就意味着更古老的灵长目动物早就开始依赖双腿行动了，而后来的黑猩猩则退回了一种特殊的四足行进方式——移动时以双手第二指骨的背部作为支撑［即“指背行走”（knuckle-walking）］。

虽然我们依然不甚了解这个最近共同祖先，但化石的存在使我们得以管窥它的面貌。2000年，古人类学家马丁·皮克福德（Martin Pickford）和布里吉特·森努特（Brigitte Senut）共同描述了属于一个新物种的骨化石，这个新物种名叫图根原人（Orrorin tugenensis），生活在600万年前的肯尼亚。根据股骨颈的内部结构，皮克福德和森努特猜测，图根原人经常双足行走。图根原人生活在森林里，擅长攀缘树枝。

一年后，研究员米歇尔·布鲁内特（Michel Brunet）宣布，在乍得发现了生活在700万年前的乍得沙赫人（Sahelanthropus tchadensis）的一块头盖骨，并将其命名为“图迈”。根据枕骨大孔（指颅骨底部的孔，大脑通过此孔与脊髓相连）的位置，“图迈”似乎也靠双足行走。人类的枕骨大孔位于颅骨下方、脊柱正上方。黑猩猩的枕骨大孔则位于颅骨靠后的位置，与四足动物一样。

黑猩猩与人类的枕骨大孔对比图

然而，由于化石非常不完整，很难确定“图迈”在人亚族演化史中的位置。因此，部分古人类学家更倾向于将“图迈”归入日后演化为黑猩猩甚至大猩猩的谱系。我们之所以无法给“图迈”的演化位置下定论，是因为处于猿类和人类分化期前后的人科物种都具有很大的相似性。如果对“图迈”颅骨发现地找到的股骨加以分析，或许能够更加精确地确定它在灵长目演化树上的位置。

另一个有趣的化石来自地猿（Ardipithecus），其年代更近，保存也更完整。美国古人类学家蒂姆·D. 怀特（Tim D. White）对埃塞俄比亚多个发掘点出土的数以千计的整骨和碎骨进行了长达15年的精心研究，随后于2009年对这些可追溯到440万年前的地猿化石进行了解读。根据地猿化石周围的动物化石推断，地猿生活在森林里，身高约1.2米，既能行走又能攀缘。地猿长有对生的大脚趾，但不如黑猩猩的灵活。虽然双腿移动起来比大猩猩还要容易一些，但是地猿的双臂和指骨长而弯曲的手指非常适于树栖生活。地猿的犬齿强健有力，具有明显的祖先特征（直接遗传自祖先），脑容量接近黑猩猩。一些人认为地猿是南方古猿（和人类）的直系先祖，另一些人则将地猿视为远房表亲，与黑猩猩的亲缘关系更近。

最初的人亚族分布图

人猿总科、人科、人亚族

在灵长目动物中，失去了祖先的长尾而拥有了尾椎的猴子都被归入人猿总科（Hominoidea）。该科包括了原康修尔猿（Proconsul，2 300万年前生活在非洲）的全部后代和十来个现存物种：长臂猿、猩猩、大猩猩、黑猩猩、倭黑猩猩和人类。

原康修尔猿是第一批失去尾巴的猴子之一，也是人猿总科的祖先

除了尾椎之外，人猿总科的独特之处还在于手骨及肩胛骨的结构。人猿总科对应的是猴总科，即“旧世界猴”，后者依然长有长尾（尾巴并没有在进化过程中丧失）。至于美洲的“新世界猴”则属于阔鼻小目（Platyrrhini），是与前述两者亲缘关系更远的灵长目类群。

最近几十年，根据在亲缘关系、灭绝物种化石和DNA方面层见叠出的研究成果，人猿总科内部的分类经常出现变动。如今，人科（Hominidae）包括了猩猩、大猩猩、黑猩猩、倭黑猩猩、人类和许多化石物种。

至于人亚族，指的是人科内部与人类亲缘关系较近、与黑猩猩亲缘关系较远的全部物种。古人类学家一共描述过二十来种，包括乍得沙赫人、南方古猿、傍人，以及人属（Homo）的多个物种，比如能人（Homo habilis）、直立人（Homo erectus）、尼安德特人（Homo neanderthalensis）或智人。人们认为，这些物种都是双足行走的。

“大有可为”的基因突变

借助化石，我们能够了解最早的人亚族的大致面貌。现如今，我们拥有了一个与此迥异的补充性信息来源，那就是DNA。近些年来，基因测序已经成了生物学和古生物学的惯用研究手段（参见第10页《DNA、基因、突变》）。

人类和黑猩猩分化后，基因突变导致二者的DNA有所不同。已经发现的突变现象有：点突变（比如碱基A替换为碱基C），DNA片段缺失和重复，以及内部重组（人类和黑猩猩的染色体数量不同）。

一些基因突变并没有产生明显的后果，另一些可就是导致人类区别于黑猩猩的“元凶”了。通过对比人类和黑猩猩的基因组，人们希望能够确定导致二者演化分离的遗传事件。

在人类和黑猩猩的分化过程中，共同祖先某些DNA片段的遗失或失活似乎发挥了重要作用。人们发现，在一种参与合成肌球蛋白（肌肉收缩所必需的一种蛋白质）的基因上，人类和黑猩猩有所不同。基因MYH16负责合成一种咀嚼肌特有的肌球蛋白。然而，人类体内的MYH16基因却失活了。或许，正是这个突变导致了人类支系的下颌变小。

一些突变可能导致行为上的变化。比如，人类失去了形成触须（粗壮的感觉毛，包括黑猩猩在内的许多哺乳动物都有）和阴茎刺（覆盖在黑猩猩阴茎表面的小型角蛋白突起）的基因。失去阴茎刺会使阴茎敏感度降低，交配时间延长（黑猩猩可是出了名的快枪手）。另外，我们还知道，失去阴茎刺的灵长目往往都是单配偶型物种。

这一变化也关系到人类和猿类的其他区别，比如：人类在排卵期开始前不再有身体上的变化，以及出现乳房和光滑脸庞等第二性征。使得交配时间延长的基因突变或许改变了人亚族的生活方式，强化了雄性与雌性之间的纽带，而这一纽带正是实现社会凝聚、更好地保护后代的关键因素。

DNA、基因、突变

我们体内的每个细胞都含有46条染色体。所谓的染色体，就是扭曲折叠的DNA细丝。人类的基因组（也就是全部的DNA）由32亿个排成链状的核苷酸组成，核苷酸分为A、T、C、G四种。所谓的DNA测序，就是确定一个个核苷酸的排列顺序（比如AGATCC）。在不同物种之间或同一物种的不同个体之间，都可以进行核苷酸序列对比。

基因是细胞为了生产自身活动所需分子而转录的DNA片段。人类拥有2万个基因，其中包含了人体发育和细胞正常工作所需的全部信息。DNA的其他部分在调节这套转录系统时起着至关重要的作用，可以控制基因的“表达”（也就是基因的活动）。实际上，在不同的发育阶段或不同类型的细胞里，基因活性也有高有低。

突变指偶然发生的DNA序列改变。基因发生突变时，其活性往往也会改变。每个基因都可能因为先前发生的突变而存在多种变体，即所谓的等位基因。

如果某个基因突变导致生殖细胞（卵子或精子）发生变化，而该生殖细胞又成功受胎，那这个突变将出现在由此细胞孕育而成的新个体的所有体细胞里（不过仅存在于新个体自身一半的生殖细胞内）。这样一来，突变就能一代代传递下去。每个生物个体都带有从亲代遗传而来的100到200个新的突变，不过大部分突变都没有产生什么显性影响。

开始双足行走

从四足行进过渡到双足行走是人亚族历史上的重大事件，因为两足的移动方式使其有别于绝大部分近亲［不过还有一些与人亚族无关联的灵长目动物也发展出了两足行走的能力，比如生存于800万年前的山猿（Oreopithecus）］。人类家族中出现的这一现象该怎么解释呢？

首先，这一移动方式的改变意味着身体骨架的全面重组，并且影响到了胚胎的发育。足部形成足弓，以支撑身体的全部重量。大脚趾与其他脚趾并列，再也不能与其他脚趾构成钳形。脚踝关节和膝盖关节得以强化，同时髋关节位置发生变化，使得双腿更加靠近身体重心线。为了使上半身保持竖直状态，需要强壮的肌肉；强壮的肌肉又塑造了我们的臀部，而臀部可说是典型的人类演化创新。骨盆呈盆状展开，上托腹腔脏器，下承大腿肌肉。除此以外，骨盆还须满足分娩的需要。双重限制之下，人类的妊娠期变短，使胎儿出生时颅骨发育不全，以便顺利通过骨盆入口。腰椎位于脊柱的底端，在强度提升的同时也变得更宽更短。枕骨大孔移动至颅骨正下方，大大减轻了支撑头部的颈部肌肉的负荷（参见第4页插图）。

在布里吉特·森努特和苏珊娜·K. S. 索普（SusannahK. S. Thorpe）等众多古人类学家看来，树栖（指一生中的大部分时间都栖息在树上）的人科动物或许最先发展出了双足行走的特征。我们的直系祖先恰恰生活在森林里，它们应该不是四足行进的，且极有可能习惯于攀缘！我们已经发现，作为现存树栖特征最为明显的猿类，猩猩在踏上柔软树枝时会尽量增加腿部的伸展幅度，与人类在有弹性的地面上奔跑时的肢体反应别无二致，而其他猴子的做法却恰恰相反。由此推断，地面上的双足行走应该是由树上的双足行走发展出来的。古生物学家称此现象为在树上进行的“直立姿势预适应”。

另有一些研究人员认为，是四足攀爬的猴类最先发展出了双足行走能力。远古人亚族［比如拉密达地猿（Ardipithecus ramidus）］的骨骼研究结果似乎证明了这一论断，因为远古人亚族的腕骨与现存四足灵长目动物的腕骨相似。还有一些研究人员则认为，双足行走最先出现在半水栖人科动物身上，然而迄今为止，尚未发现任何支持这种假说的化石！

但是，不管怎么样，我们都不应这样设想：人类从四足姿势“站起来”，历经数百万年，本着主观意愿，终于获得了我们今日了不起的直立行走姿势。首先，我们探讨的是解剖学意义上的进化，自从生命起源以来，在所有动物物种身上已经产生了不胜枚举的类似例证。达成某种目标（无论结果多么有益）并不需要诉诸意愿，哪怕只是无意识的。解剖学层面的演化创新或许在日后具有很大的益处，然而，以此益处为基础建立起来的解释体系却是不可接受的，因为进化只是进化，并不能预见物种将来需要什么！如果一定要给出一个达尔文式的解释，那就需要探究向着双足行走演进过程中的每个阶段分别带来了什么好处。有朝一日能够跑马拉松这样的好处可就不要提了，以双足姿势行走能比祖先移动时间更长这种朴实的小优势可能更合理。

人科动物演进图

似是而非的图画

这幅著名的图画诞生于1965年。画面上，四足行进的猴子在前进过程中渐渐站立起来，并朝着越来越像人类的方向演化：先是原始猴子，然后是南方古猿，接着是原始人类，再往后是尼安德特人，接下来是克罗马农人（Cro-Magnons），最后是大步迈向未来的现代人。

这幅从猴到人的行进图来自时代生活图书公司（Time-Life Books）出版的图书《早期人类》（The Early Man）。毋庸置疑，这幅插图在普及演化思想方面确实发挥了作用。可不幸的是，它在多个层面上都传达了错误的信息。首先，这幅插图给人的感觉是，这些灵长动物无一例外地朝着智人的方向前进，仿佛成为人类是它们不可避免的终极演化结果。其次，插图里的几个物种，并非一个就是另一个的后代：人类的演化不是直进式的，而是分支式的，在演化的过程中，许多物种都消失在了历史长河里，并没有留下任何后代。

双足行走有何好处？

成为人类，是从脚开始的。 ——安德烈·勒鲁瓦-古朗（André Leroi-Gourhan），1982

双足行走大有好处。首先，在探索周围环境时，双足行走成本更小。实际上，从能量消耗的角度上看，双足行走比四足行进更加经济。在速度相同的条件下，人消耗的能量仅为黑猩猩的四分之一。其实，黑猩猩也能双足行走；不过由于关节构造更适于四足行进，黑猩猩在两种移动方式下的能量消耗是相等的。

在1000万年前，全球气温略有下降，尤为重要的是，天气变得更加干燥。气候变化导致非洲广大的茂密森林消失不见，取而代之的是稀树草原和稀疏森林。一些人科动物选择继续在森林里度日，另一些则着手开发新的资源。在不同以往的环境条件下，后者充分利用了自身双足行走的能力，并在自然演化的作用下强化和巩固了这种移动方式。

在比森林更加开阔的环境里，站起来的好处或许就是看得远。然而，虽然是四足行进，狒狒却在稀树草原生活得如鱼得水。所以，站起来看得远并不是个非常充分的解释。另有一些假说将关注的焦点集中在大范围分散的食物来源上。事实上，在这种情况下，能够自由移动并将采集到的食物带给族群中的其他成员，着实是很有好处的。我们发现，与双足行走相伴而来的，是食谱的变化——块根和块茎在食物中占的比例更大了——这一显著变化导致了牙釉质加厚。与此相反，喜食水果或嫩叶的动物，比如大猩猩，牙釉质就偏薄。

另一个问题则涉及双足行走与制造工具之间的关系。双足行走是否通过解放双手促进了第一批石质工具的诞生呢？这个问题也可以反过来问：制造工具的需求是否促进了向双足行走的过渡呢？一些日本古人类学家倾向于后一种假说。他们认为，首先是双手变得灵巧，而且这一过程与双足行走是没有关联的。

另外，还有一种可能性。人类手指和脚趾中最为粗壮的当属大拇指和大脚趾，而它们的“发展壮大”可能是同一个演化机制作用的结果。采用双足行走姿态后，自然选择强烈作用于脚趾之上，这种强化转而又作用于拇指，进而使得双手更加灵巧。

一些古人类学家，如美国的欧文·拉夫乔伊（OwenLovejoy），将双足行走的出现与向单配偶制的转变关联起来。最初的人亚族开始双足行走后，脑容量增大导致营养需求增加，雌性可能不得不分散到广阔的地域里寻找高能量的食物。雌性的分散可就苦了雄性，“妻妾成群”的雄性绞尽脑汁，只为了避免自己的配偶靠近其他雄性……单配偶制在我们的演化谱系中早早出现的假说，满足了保守的美国卫道士（如果他们凑巧还是演化论的支持者）的期望，但与实际情况却是背道而驰的。首先，脑容量增大是在几百万年后才发生的。其次，还需要考虑生物的性别二态性（sexual dimorphism）。所谓的性别二态性，指的是同一物种的雌性和雄性在身材和外形上的差异。在这一方面，我们对人亚族始祖一无所知。不过，继之而来的南方古猿具有非常明显的性别二态性，这与实行单配偶制的社会形式似乎并不匹配。

事实上，在灵长目动物中，两性之间体形差异过大的会形成“后宫型”社会组织形式，在这种社会里，一个雄性严格掌控一群雌性。正因如此，雄性大猩猩比雌性大猩猩要大得多，也重得多，这是激烈的性竞争导致的结果。雄性因为身体强壮、犬齿硕大而占据统治地位。于是，自然选择的天平向最为健壮的雄性倾斜，它们也得以将自身特征传给下一代。在大猩猩和狒狒中，雄性通过炫耀犬齿的方式来吓唬或制服竞争对手。相比之下，雌性的犬齿就非常小。而在黑猩猩族群中，社会结构更加灵活，虽然雄性也居于主导地位，但并不像大猩猩那么专横霸道，性别二态性也不如大猩猩那么明显。至于奉行单配偶制的长臂猿，它们的雌性和雄性具有相同的大小，犬齿也都很小。

在人亚族中，双足行走的发展与犬齿的减小是分不开的。南方古猿依然表现出比较明显的性别二态性，而在最初的人属物种中性别二态性已经有所降低，这就说明，在最初的人属物种中，雄性之间的争斗相对没有那么激烈，而单配偶制或许也更为普遍。

另一种假说则着眼于性选择。这里的性选择，不以雄性的好勇斗狠为基础，而以雌性做出的选择为基础——这在动物界可谓是屡见不鲜。雌性或许对自然而然保持直立姿势的雄性青眼有加，进而使得整个族群越来越趋于双足行走（因为这些雄性更多地将基因传了下去），随后，双足行走又因为在寻找食物上具有无可比拟的优势而得到进一步巩固与强化。

在人亚族向着双足行走演化的过程中，多种相得益彰的因素很有可能共同发挥了作用，比如：生活环境的改变，解放双手的优势，社会纽带的巩固，以及妙不可言的性！

人类演化：达尔文vs拉马克

在最近出版的一部著作里，我们还能读到这样的说法：尼安德特人的颌骨强健有力且向前凸出，是因为它们“重度使用”牙齿。现在，没有任何已知机制能够解释，为什么器官会因为被使用或不被使用而演化。某个个体的器官可以发生改变，但是这种改变并不能传给后代，这与19世纪初期拉马克所持的观点（如“用进废退”）恰恰相反。同样，外部环境的约束并不会直接塑造器官。

然而，想通这一点却实属不易。人们更乐于相信：之所以发展出双足行走的能力，是为了解放双手，并让祖先能够运用同期出现的大容量的大脑制造工具；或者反过来，大脑的演化注定是为了让我们能够制造工具，更何况我们的双手已经因双足行走得到了解放，而后者只是一种从属的演化适应而已。

在很长的时间里，拉马克的观点一直是法国动物学界和史前研究中的主流：我们祖先的演化，是朝着明确的方向进行的，也是有着明确的目标的，这个目标便是“人化”。后来，虽然这种定向演化（大概是在神的意志下发生的）的观点并未完全消弭于无形，但是进化理论和达尔文思想已经渐渐传播开来。如今，绝大多数古人类学家会通过自然选择或性选择理论来理解人亚族在数百万年里所经历的种种变化。

根据进化理论，生物的DNA偶然发生突变，而突变可在生物的种群中引发解剖学上的、生理上的或行为上的改变。当改变对生物个体有利时，生物个体便有更多生存和繁殖的机会，这种改变也就更有可能传给后代，并随着一代一代的繁殖而传遍整个种群。这个机制被称为自然选择；自然选择在整个生物界里屡见不鲜，而且形式多种多样。那种认为我们的祖先摆脱了演化规律约束的想法，是完全站不住脚的。

第二章　南方古猿

在500万年前至100万年前，南方古猿和它们的亲戚傍人在非洲稀树草原上繁衍生息。在很长一段时间里，对于这些双足行走的猿人，人们的了解只限于著名的露西女士。不过，自21世纪初期开始，骨骸化石的发现激增，让我们对这些人亚族物种有了更好的了解。

汤恩幼儿

第一个登上人类系统发生树的南方古猿，是绰号“汤恩幼儿”（Taung Child）的幼猿。“汤恩幼儿”的化石发现于南非汤恩的采石场，澳大利亚人类学家雷蒙德·达特（Raymond Dart）于1925年对其进行了描述。雷蒙德·达特确认“汤恩幼儿”是具有惊人特征的幼猿，认为它是猿和人之间的过渡物种，并将其命名为南方古猿非洲种（Australo-pithecus africanus）。

雷蒙德·达特展示“汤恩幼儿”的颅骨

“汤恩幼儿”的颅骨化石带有天然形成的脑模。雷蒙德·达特还指出，“汤恩幼儿”是双足行走的。现在，人们认为，“汤恩幼儿”是在230万年前被一只猛禽杀死的，殁年仅4岁。在当时的学界，雷蒙德·达特受到了激烈的抨击，人们期待中的“缺失环节”应该是一种有着猿的身体和类似人的大脑的生物，可大脑似猿而牙齿似人的“汤恩幼儿”与人们的预期相去甚远。而且，人们一直在亚洲而不是非洲寻找这个所谓的“缺失环节”。

随着新的化石接连出土，比如1947年在南非斯泰克方丹出土的普莱斯夫人（Mrs. Ples）颅骨化石，达特的观点逐渐被研究人员所接受。刚出土的时候，普莱斯夫人被命名为德兰士瓦迩人，后来才被确认与“汤恩幼儿”属于同一物种。普莱斯夫人为双足行走的猿人，身高约1.1米，臂长腿短，脑容量约为450毫升至500毫升，略大于黑猩猩脑。

普莱斯夫人，为南方古猿非洲种，出土于南非斯泰克方丹

1997年，古人类学家罗纳德·J. 克拉克（Ronald J. Clark）在斯泰克方丹发现了一具近乎完整的南方古猿非洲种（或邻近物种）的骨架，并将其命名为“小脚”，其生活年代距今370万年。罗纳德·J. 克拉克认为，“小脚”是雌性古猿，身高约1.3米，去世时约30岁。由于骨骸被封存在极其坚硬的矸石之中，人们在20年之后才将它取出，直至2017年才对它进行了描述。

寻找缺失环节

“缺失环节”的概念诞生于19世纪，指的是能够解释从一种形态向另一种形态（比如从“猿”到人）过渡的缺失物种化石。正如其名称（猿人）所示，欧仁·杜布瓦（EugèneDubois）发现的直立猿人（Pithecanthropus erectus，后来归入直立人；参见第77页第四章）本来有望成为“缺失环节”，但它与史前史学家彼时的想象实在是天差地别。

时至今日，“缺失环节”的概念已遭彻底弃用。一方面，演化不再被视为由一个一个的物种组成的演化链条，而是被视为枝杈繁多的演化树。另一方面，如果仅仅考察一个世系（即演化树的一个分支），那么它必然总有一些缺失环节，也就是说，总是会缺少某些从一个物种到另一个物种的转变阶段。实际上，由于化石化是极为罕见的现象，肯定不会所有的演化中间形态都能保留下来，特别是当演化速度非常快的时候（在地质年代的尺度上）。

自达尔文提出演化论以来，反对者便试图利用“过渡物种”的明显缺失来反对达尔文的观点。然而，古生物学家已经发现了为数众多的“过渡物种”，比如始祖鸟，这种具有爬行动物特征的鸟类说明了小型恐龙是怎样演化为鸟类的。达尔文尚在人世时，人们便已经对始祖鸟进行了描述，随后也发现了大量的中间物种。但是，在反对达尔文的人眼中，过渡物种总是欠缺的。对缺失环节的找寻，也只能以失败告终。

欧仁·杜布瓦于1891年发现的直立猿人（或爪哇人）遗骨

现在，人们在欧仁·杜布瓦发现的直立猿人附近又发现了大量化石。在这些化石上，原始特征和衍生特征、祖先特征和演化创新相互镶嵌，导致其演化位置很难被确认，加之化石数量众多，形成了不止一条演化链，所以现在的问题已经不是有环节缺失，反而是环节太多了！

露西

同一时期，另一副骨骸化石的发现使南方古猿的存在成为全世界普遍接受的观点。这副骨骸于1974年11月24日在莫里斯·塔伊布（Maurice Taieb）、伊夫·柯本斯（Yves Coppens）、唐纳德·约翰松（DonaldJohanson）组织的埃塞俄比亚科考活动中被发现，并被编号为AL 288，后来根据披头士乐队的歌曲《缀满钻石天空下的露西》得名露西。

这件南方古猿化石标本包含大约40%的骨架，是当时发现的最为完整的远古人科生物化石。露西属于南方古猿阿法种（Australopithecus afarensis），现在已经发现了属于这个物种的300余件化石（几乎都是碎片）。

在410万年至290万年前，这些南方古猿生活在东非的稀树草原上。相应地，与人类相比，露西的双臂较长、双腿较短。露西既能双足行走又过着树栖生活，肩膀和手臂的构造非常适于攀缘；骨盆较大、股骨向内，使得行走时更加稳定。大脚趾偏离其他脚趾，靠脚掌外侧支撑全部体重；脚跟高高隆起，像黑猩猩一样。膝盖不能完全展开，导致它行走时比人类消耗更多的能量。

露西的骨骸，已有330万年的历史

露西主要食用水果和树叶，或许也吃小型动物，尤其是白蚁和容易捕捉的昆虫，它们富含营养物质，往往也数量众多。此外，在2010年，人们发现了一些食草动物骨骼，同时出土的还有南方古猿化石，在这些食草动物的骨骼上发现的切割痕迹，令人不禁猜想露西所属的南方古猿可能也有食腐行为，也就是说吃死亡动物尸体上的肉。这也意味着，露西曾经使用石质工具切割肌腱（参见《最初的工具》）。

露西通常被视为年轻的雌性古猿，身高约1.05米。这一物种的雄性平均身高为1.35米、雌性为1.1米，体重为25千克至45千克。它们颅骨较小（脑容量约400毫升），额头后缩，面部前凸。犬齿很小，具有现代特征；臼齿较大，更具原始特点。牙上覆着厚厚的牙釉质，以免牙齿快速磨损。牙齿在颌骨上呈圆弧状排列，曲度介于猿类的平行牙弓和人类的抛物线牙弓之间。

南方古猿阿法种的性别二态性相当明显。因此可以猜想，雄性之间的竞争极为激烈。幼崽的发育很可能比较缓慢，就像现存的猿类一样，而且亲代（雌性古猿？）照顾子代的时间很长。牙齿的化验结果表明，雌性在发育期后会改变食谱，而雄性却不会。为了解释这个现象，人们提出了如下的假说：雌性在成年后会离开原生族群并加入另一个族群，和现存的黑猩猩一样。

南方古猿阿法种和智人的颅骨对比

关于南方古猿阿法种的演化位置，人们曾展开激烈争论。虽然知名度高，但露西未必就是我们的祖先！30年前，众多美国研究人员将露西定为我们的先祖，可伊夫·柯本斯认为它只是我们的“姑婆”，代表一个已经灭绝的旁系。另一些研究人员则认为露西与傍人有亲缘关系（参见第43页）。如今，随着已被描述的南方古猿物种的增多，露西的演化位置很难有定论，更何况研究人员尚未就某些化石是否属于该物种达成共识。

年代测定

化石的年代与化石物种间是否存在亲缘关系无关：年代更古老的人亚族未必就是年代更近者的祖先！但精确测定化石年代对更好地理解物种演化至关重要。早先的年代测定只能给出相对年代：由于沉积物一层层沉积，从理论上说，在沉积物上层发现的化石比在其下层发现的化石年代更近。

今天，借助多种技术，我们已经可以确定化石的“绝对”年代，在时间的长河里将其精确定位。不少技术以石头或物质的天然辐射性为基础，碳—14年代测定法就是其中的典型代表。

碳元素以多种同位素的形式存在，其中包括普通的碳—12和稀有的放射性碳—14。植物利用空气中的二氧化碳进行光合作用时，会同时将这两种不同形式的碳元素吸收进体内。当植物被食草动物吃掉或食草动物被食肉动物吃掉的时候，食草动物或食肉动物也会将这些碳元素吸收进自己体内。在它们死亡以后，碳—14会慢慢衰变成为氮—14。碳—14的半衰期为5 734年，换句话说，碳—14需要花上5 734年才能失去一半的放射性。这么一来，通过测定骨头或木炭中两种碳同位素的比例，我们就能确定骨头或木炭的年代。不过，如果测定对象的年代在4万年以上，这个方法的测定结果就会非常不精确，因为碳—14的残余量还不到初始量的1%。

为了测定更加古老的骨头或岩石的年代，我们可以使用其他同类型的“原子钟”。比如：铀—钍定年法适用于测定50万年前的骨骸或石笋的年代，钾—氩定年法可用于确定几百万年前的火山岩的年代。

除此以外，还有基于其他物理原理的测定方法，比如：基于熔岩固结时磁性矿物记录的地磁场变化的古地磁法（paleo-magnetism）；通过测量曾经经受高温的矿物在再次受热时发出的光以确定矿物年代的热释光法（thermoluminescence），这种方法适用于燧石和陶器；还有与热释光法原理相似但用于测定牙釉质、富碳化石（石笋、珊瑚等）或沉积石英颗粒年代的电子自旋共振法（Electron Spin Resonance，ESR）。

“开枝散叶”的南方古猿

如果一小部分弱小的原始人种群没有在非洲稀树草原残酷命运（或物种灭绝）的屡次打击中幸存下来，那智人就不会出现并迁徙到世界的各个角落。 ——斯蒂芬·J. 古尔德（Stephen J. Gould），1996

自“汤恩幼儿”出土以来，研究人员已经描述了为数众多的物种（参见第v页图），其中绝大多数发现于非洲大陆的南部和东部，年代在450万年前至200万年前。在这段漫长的历史时期里，有些物种不断演化并获得了全新的特征（即“直进演化”），或分成两个种群并逐渐分化直至形成两个新的物种（即“分支演化”），南方古猿家族的兴盛部分来源于此。

其间，这些物种中的某几种在同一时期生活在同一地区。它们之间可能并不会为了食物或其他有用资源展开直接竞争。否则，竞争通常会导致两个物种中的一个消亡或转化。有些人已经描述了居于不同生态位的物种在齿系上的细微差别，这些细微差别正是这个物种假说的有力支撑。

直进演化和分支演化——互补的两种演化方式

人们认为，最古老的人亚族物种主要吃素，与现存主食水果和嫩叶的猿类似。但是，这并不妨碍黑猩猩吃白蚁并主动猎杀小型猿猴。南方古猿是否在树林中捕猎，现在已经不得而知，但是它们大概还是会吃昆虫和容易捕捉的小动物。

属与种

博物学家为每个现存物种或化石物种都取了由两部分组成的学名。这么一来，所有的南方古猿都拥有了相同的属名Australopithecus（即南方古猿属），这个属名说明了它们之间的相似性和亲缘关系。在南方古猿属下，存在多个“种”，比如南方古猿阿法种和南方古猿非洲种。

对于现存动物，“种”是互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合。对于化石物种或古生物种，这些标准就不适用了：首先，我们无法考察它们的繁衍能力；其次，即便它们之间曾存在亲缘关系，在漫长的时间里它们也能变得足够不同，使人们将它们视为不同的物种。

通常情况下，如果新发现的骨骸与已知骨骸不同，便可确定为新物种（但也有例外，比如丹尼索瓦人就是通过DNA检测确定的，参见第106页《丹尼索瓦人》）。但是，仅仅凭借几块残骨便给某个人亚族生物取个种名往往很难做到，因为原始人种非常相似，往往只有几处骨头是某个物种特有的。在确定物种时，还需要考虑性别差异和发育过程中的变异。因此，原本分别定名为腊玛古猿（Ramapithecus）和西瓦古猿（Sivapithecus）的两个生物，后来被确定为同一物种的雄性个体和雌性个体。

另外，我们对物种的实际变异性所知甚少。如果拥有大量化石，还可以通过统计对比将某个化石归入某个类群。可是，当化石数量稀少且多为碎片的时候，判断的武断性就不可避免地增加了。

最后，还有一个不在科学范畴之内的现象：发现人亚族遗骨需要耗费大量心血和精力，这就导致研究人员往往会夸大新化石的特征并给化石取个新名字。这种操作有助于研究人员获得资金支持，尤其是当研究人员声称发现的是人类祖先的化石的时候，不过，这也导致本已相当复杂的系统发生树更加“枝繁叶茂”。所以，在科学出版物里，常常会有物种随着研究人员的偏好和科学知识的进步出现而后又消失的现象。实际上，学界历来将研究人员分为“分裂派”和“归并派”，前者倾向于利用似乎与其他物种有所区别的细枝末节创造新物种，后者倾向于考虑物种的自然变异性，将不同物种归并汇总，但是归并范围往往极为宽泛（参见第72页《“开枝散叶”的原始人》）。

除上文所述的南方古猿阿法种和南方古猿非洲种外，再略举几例南方古猿属的其他有趣物种。

南方古猿湖畔种（Australopithecus anamensis）

南方古猿湖畔种是根据在东非发现的化石描述的，经测定，其化石年代为420万年前至380万年前。南方古猿湖畔种身高约1.4米，生活在相当湿润的林地里，在双足行走方面比露西更强。下颌又长又窄，颇具原始特点；牙齿细小，更有现代特征。一些古人类学家认为，南方古猿湖畔种可能是人属的祖先。正因如此，有人提议将其改名为非洲前人（Praeanthropus africanus）。

黑猩猩、南方古猿湖畔种和现代人的下颌对比

南方古猿加扎勒河种（Australopithecus bahrelghazali）

1995年，米歇尔·布鲁内特率队在乍得发现了一块下颌骨化石，后将其命名为南方古猿加扎勒河种，昵称为“阿贝尔”（Abel）。这是唯一一种在非洲东部和南部以外地区发现的南方古猿，生活在360万年前。在这一时期，撒哈拉还是广袤的森林和稀树草原。南方古猿加扎勒河种可能并不是一个不同以往的物种，不过，这块下颌骨化石证明，南方古猿的领地范围比已知化石的分布区域更广。

南方古猿惊奇种（Australopithecus garhi）

生活在距今250万年的埃塞俄比亚的南方古猿惊奇种，于1997年由埃塞俄比亚古人类学家伯海恩·阿斯法（Berhane Asfaw）率领的研究团队发现，它们拥有较小的脑容量和巨大的牙齿。化石的共同发现者蒂姆·怀特猜想，南方古猿惊奇种有可能是我们的祖先。但是，它们与最初的人类生活在同一时期的事实，并不足以提高这种假设的说服力。

南方古猿近亲种（Australopithecus deyiremeda）

南方古猿近亲种于2011年发现于埃塞俄比亚，生活年代为340万年前，无论在地理区域上还是生活年代上，都可以视为露西的邻居。南方古猿近亲种的颌骨粗壮，牙齿形状也和露西不同，这说明其食谱略有不同。

南方古猿源泉种（Australopithecus sediba）

在南非马拉帕（Malapa）发现了两个保存状况相当完好的骨骼化石之后，李·伯杰（Lee Berger）于2010年描述了这个年代很晚近的物种（生活于200万年前至180万年前）。源泉种的大脑较小，但与其他南方古猿相比更加不对称，因此与人属更加接近。其骨盆比较宽，通常认为这与颅骨变大有关。胸腔呈锥形，上窄下宽，手臂可以做大幅度的动作，非常适于攀缘。脚跟具有原始特征，与猿人的脚跟相似，但脚踝比其他南方古猿更具现代特征。同样，源泉种的双手拇指较长、指节末端增宽；通常认为这是源泉种手巧的一个证明，也是它与人类更接近的一个特征。最后，源泉种的牙齿比阿法种小。

这种原始特征和全新特征（所谓的“衍生特征”）叠加的现象被称为“镶嵌演化”。演化并不会同步地触及所有器官，这就导致很难确定物种在人亚族系谱图上的精确位置。除此之外，还有一个难题：发现的两副骸骨中，一副属于年幼的雄性，其解剖学特征尚未最终定型，因为在个体的发育过程中许多骨头会发生变形。骸骨的发育模拟结果显示，其成年后的体形与南方古猿非洲种接近。正因如此，有一些人将源泉种视为非洲种的“接班人”，而非洲种在之后就灭种了，并没有留下直系后代。另一些人则与化石发现者一样将源泉种视为直立人可能的祖先，所以将它的种加词定为“sediba”，这在当地语言里正是“源泉”的意思。

南方古猿和傍人分布图

然而，源泉种本身也是年代相当近的物种了。在源泉种尚存活于世的时候，人属已经在非洲大地上生活几十万年了。只是，迄今发现的最为古老的化石也只是些碎片，化石的身份也非常有争议（参见第51页《最初的人属》）。此外，源泉种可能诞生得更早，但是至今尚未发现相关遗迹。

平脸肯尼亚人（Kenyanthropus platyops）

1999年，古人类学家米芙·利基（Meave Leakey）在肯尼亚的洛迈奎（Lomekwi）发掘点发现了一个颅骨，经测定其年代为340万年前。这个颅骨的面部扁平，与下颌前凸的南方古猿反差非常明显，特征上更接近古老的人属成员鲁道夫人。米芙·利基对其进行了描述，并因为它与其他物种差异甚大而为其取了新的属名“平脸肯尼亚人”。但是，由于在沉积压力作用过程中发生了形变，围绕这一颅骨化石的争议很大。

1976年，另一类型化石的发现点燃了研究人员的热情。在这一年，玛丽·利基（Mary Leakey）在坦桑尼亚莱托里（Laetoli）地区发现了南方古猿的脚印。370万年前，三只南方古猿列队前进，在火山灰中留下了脚印，火山灰硬化后便将脚印保存了下来。这些脚印为南方古猿双足行走提供了补充证据。

南方古猿的脚印，莱托里（坦桑尼亚），距今370万年

上述这些南方古猿物种中，一种将不断演化，最终产生最初的人类，另一种——也有可能是同一种——则演化成了傍人。还有一些继续维持原先的生活，直至彻底消亡在历史的长河里，没有留下任何子孙后代。

傍人

20世纪下半叶发现的部分南方古猿因颅骨硕大而被描述为“粗壮”型，其他的则相应地被描述为“纤细”型。随后，这些“粗壮”型南方古猿被归入广为接受的傍人属（Paranthropus）。

头大、颌沉是傍人的典型特征。傍人臼齿巨大，适于咀嚼质地坚硬且纤维丰富的食物。牙齿化验结果显示，一种傍人特别爱吃比嫩叶或水果坚硬得多的草本植物。草中往往富含二氧化硅，这也在傍人的牙齿上留下了非常典型的磨损痕迹。在傍人种群里，雄性比雌性大很多，而且与雄性大猩猩一样，颅骨上存在骨嵴，而骨嵴正是强壮的咀嚼肌的固着点。可是，虽然发现了为数众多的傍人颅骨化石，傍人的其他骨骼是什么情况，我们依然知之甚少。

在埃塞俄比亚发现的埃塞俄比亚傍人（Paranthro-pus aethiopicus）是最古老的傍人物种，生活在270万年前至230万年前。随后，鲍氏傍人（Paranthropus boi-sei）在东非出现，并一直生存至120万年前。第三种傍人名叫粗壮傍人（Paranthropus robustus），220万年前至100万年前生活于南非，有人认为它们应是南方古猿非洲种的后代。这三种傍人的确拥有一些共同特征，但它们的亲缘关系并没有那么明显。生活在相似环境中的物种能够演化出相似的特征，这种趋同现象在动物演化史上屡见不鲜，有时候确实容易与遗传得来的物种相似性混淆。

无论如何，到了距今约100万年时，所有的傍人都消失得一干二净，没有留下任何子孙后代。或许，与众不同的饮食习惯使傍人难以适应气候变化和环境改变？或许，人类在傍人的灭绝过程中发挥了某种作用？事实上，傍人的确曾与其他人亚族物种，即人属的成员在这个星球上共同生活过。

第三章　原始人类

在很长的一段时间里，人们一直认为，人类演化史是简单的线性历史：南方古猿演化为一种原始人，也就是能人；能人接着演化为具有现代身体的直立人，而直立人正是智人的直系祖先。然而，近些年来的考古发现对上述每个阶段都提出了质疑，同时勾勒了一幅更加复杂的演化图景。

在人属诞生之前

按照现代演化论的原则，人类起源的研究不能简单归纳为寻找假定存在的人类祖先。怎么就能够确信某个化石代表了某个物种的祖先呢？对于动物物种，古生物学家倾向于探寻它们之间的亲缘关系，而不考虑物种在时间上的先后顺序。如果化石显示两个物种具有相同的衍生特征（即解剖结构上的创新性状），我们就认为这两个物种有亲缘关系。这两个物种也就拥有共同祖先，不过，在大多数情况下，共同祖先都不是明确的，尽管某些化石可能与其相近。这种研究方法就是所谓的“支序分类”法，由此可以得到更加严谨、更便于客观探讨的系统发生树（展现物种之间的亲缘关系）。

涉及我们的物种时，人们往往会将科学理论的严谨性搁置一旁，因为将某个化石定位到人类的演化世系中具有极大的象征意义。无论是对还是错，直系祖先总比绝后表亲更引人关注。在众多的南方古猿和邻近物种（肯尼亚人、傍人等）中找出谁是现代人类谱系的真正起源、谁是最初的人属物种的祖先，确实很有诱惑力。

于是，古人类学家对化石进行探测，以图确定最为“类人”的特征。他们随即遇到了几个难题。一方面，由于遗骨不完整，往往缺少能够确定物种演化位置的有用要素。另一方面，如果采取这种人类中心视角的话，那每个物种都同时呈现出原始特征和“现代”特征，即更加类人的特征（参见《既是智人又是现代人！》）。

最后，正如我们前面提到的，在相似的环境压力作用下，演化可使得多个物种发生相似的改变。换言之，某个物种身上出现现代特征，并不能证明这个物种就是我们的先祖。因此，尽管都曾制造石质工具，但多个人亚族物种并没有因此被列入我们祖先的行列。

怎样才算人类？

即便能够确定某种南方古猿最有可能是人类支系的起源，也无助于找到下面这个重要问题的答案：在演化过程中，这个物种是什么时候变成人的？是不是存在某些明确无误的特征，能够将其鉴别为人类而不是南方古猿？

对古生物学家而言，这个问题马虎不得，因为他们要给发现的化石取名。物种的名称不是没有利害关系的。属名取为“南方古猿”还是“人”，这里面的差别很大，关系到能不能引起公众、记者和能为后续挖掘工作提供资金支持的机构的注意！当然了，按说不应该有这些顾虑的，但实际上这些顾虑的影响不容忽视。

这个问题不但是哲学问题（是否存在“人类特性”？），也是生物学问题（鉴于黑猩猩与人类的基因相近度，是否应将黑猩猩归入人属？），还是古人类学问题：从什么时候起，或变化积累到什么程度，某个人科物种就可以被算作人属了？这个问题也可以反过来问：从现代人开始回溯历史，最早在过去的哪个时刻我们的祖先就能被视为人类了？

自史前研究开始以来，许多人回答了这些问题。他们给出的答案里，往往借用了略显老套的“人类特性”。随着动物行为学、古人类学、神经学和分子生物学等多个学科不断取得新的进步，这些答案也很快地落伍了。

我们举两个例子。使用工具长期被视为典型的人类特征，但有些动物也会使用工具，比如黑猩猩（用木棍捕捉白蚁、用石头砸开坚果）、海豚（用海绵保护自己的吻突），甚至某些鸟类（用刺捕捉树皮下的蛴螬）。早在南方古猿独自在稀树草原上纵横的时候，就已经出现了最为古老的石质工具（参见第55页《最初的工具》）。另一个例子是大脑的增大。无论是在我们的历史中，还是对我们现今在动物界的地位而言，这个现象都非常重要，在某个时期甚至曾经合理化了“脑容量界值”（cerebral rubicon）的概念：脑容量低于某个值的，就是猿；脑容量高于某个值的，就是人。可是，无论是工具还是脑容量，类似的标准都必须摒弃，因为它们过于简化，没有真正的用处。

无论是基因还是解剖学特征，由于各器官以不同的速度演化，很难制定显而易见的临界标准——只要达到了这个标准，猿就应被称为人。在实际操作中，古人类学家根据的是一整套特征，其中包括了在化石上经常能够观察到的特征，比如脑容量或牙齿的大小和形态。但是，学界始终无法取得普遍共识；关于多个原始人种的分类，就一直未曾达成一致。

最初的人属

1961年，玛丽·利基和路易·利基（Louis Leakey）在坦桑尼亚的奥杜韦发现了一个人亚族生物的颅骨和手骨的化石碎片，这个生物生活在大约180万年前，与当时已知的南方古猿和傍人都不同。此前不久，他们在同一个挖掘点发现了一些石质工具和一个傍人的骨骼化石，并认为是这个傍人制造了这些工具。但是，新化石的发现改变了整个局面。这个新发现的人亚族生物，手掌更加类人，臼齿也比较小；初步估计脑容量约为600毫升，比南方古猿的脑容量大；指骨像黑猩猩一样呈弯曲状，但末端指节变宽，应该便于抓握物体。与此前发现的傍人相比，这个人亚族生物似乎更像是石质工具的打造者。它被命名为能人。

在接下来的几年里，古人类学家发现了许多新的原始人遗骨，不过这些遗骨具有不同的特征，似乎有必要将它们定义为新的物种，也就是后来的鲁道夫人（Homo rudolfensis，意为来自鲁道夫湖的人。鲁道夫湖即现在的图尔卡纳湖）。鲁道夫人体形更高大、面部更扁平。经测定，全部遗骨的年代都在大约230万年前至180万年前。到了2015年，在埃塞俄比亚的勒迪—戈拉鲁（Ledi-Geraru）发现的半个下颌骨化石，似乎将人属的诞生时间向前推了50万年，即距今280万年。

能人和智人的颅骨对比

同南方古猿的化石一样，能人的化石也不完整，往往呈碎片化，这导致复原工作很有争议。这些化石是属于两个不同的物种呢，还是属于一个具有很大形态多样性的种群呢？此外，这些人属生物表现出的明显性别二态性，使问题变得更加复杂。不过，这样一来，在化石上观察到的差异就可以部分地归为雌雄两性的差异。最近，人们甚至开始质疑它们是否应该被归入人属了。有些古人类学家认为，许多化石其实属于南方古猿，而真正的人属稍晚才会出现。

另一个难题是，我们对这些人属生物的颅后骨骼（即除了颅骨和颌骨外的全部骨骼）所知甚少。目前，尚未发现与露西同样完整的骨骸。根据已经出土的骨骼化石碎片，我们发现的是比南方古猿稍大也更善于双足行走的人亚族生物，尽管它们依然保留了部分树栖生活习性。它们拥有更短的大脚趾，行走起来更有效率，也具有了能够缓冲震荡的足弓。

地域偏见

年代在200万年以上的人亚族化石全都发现于非洲，这为人属的非洲起源假说奠定了基础。实际上，这些人亚族化石几乎全部出土于东非（从埃塞俄比亚到坦桑尼亚，特别是肯尼亚）。在南非，化石往往发现于洞穴中，那时候的人亚族生物不过是大型猫科动物的口中餐。由于滑坡和流水造成地层扰动，很难确定南非发现的化石的年代。

与此相反，在东非，连续不断的火山喷发让年代测定变得较为简单。半沙漠的自然环境为确定化石位置提供了极大的便利。更何况，东非的地质条件也非常有利。地壳板块运动导致地壳岩层断裂、分离，进而造就了漫长的东非大裂谷，大部分考古研究工作都是在东非大裂谷的两侧展开的。在大裂谷的形成过程中，沉积层发生倾斜，原先无法企及的地层现在触手可及。呈现在古人类学家面前的，是几十万年间形成的连续沉积层，而且还是在很小的面积内。

而在占非洲大陆面积95%以上的撒哈拉以南非洲，考古研究完全不能开展或很难开展。在又湿又热的森林地区，不但底层土壤难以企及，化石也往往因为环境不利于保存而消失不见。再往北，在撒哈拉沙漠里，考古工作非常辛苦，但也会结出累累硕果；乍得沙赫人“图迈”和南方古猿加扎勒河种的发现就是最好的证明。至于北非，对最初的人类化石来说，那片土地还是太年轻了。

其实，在具有相应年头且可能含有丰富化石的地方，只要努力寻找就能挖到人亚族化石。从目前人亚族化石的发现地来看，东非还不足以被视为不容置疑的人属“摇篮”。

最初的工具

2015年，在肯尼亚图尔卡纳湖畔的洛迈奎挖掘点，出土了最为古老的石质工具，都是粗糙凿成的，其中有石锤和用作石砧的巨大石块，年代为大约330万年前。

洛迈奎的原始工匠使用的制造技术相当简单：直接用要加工的石块（即所谓的“石核”）撞击石砧。这个技术被称为“撞击法”，可以加工锋利的石片，尽管很难对加工成果进行精细的控制。其实，石匠的目的可能只是获得石片，石核不过是锤击产生的残留物。但无论如何，这种加工行为意味着它们对所需物品产生了心理表征。

由于这个时候人属尚未登上人亚族演化的舞台，所以这些工具不可能是人属物种制造的。那时候，在非洲大陆上活跃的人亚族物种只有南方古猿，尤其是南方古猿阿法种（即露西所属的物种）和平脸肯尼亚人。不但没有任何证据能够将这些工具与某个特定的人亚族物种联系起来，而且制造石器的生产工艺也曾被不同物种（包括人类演化谱系以外的物种）屡次加以改进和完善。

能人的工具制造精度更高。这些被称为“砾石砍砸器”的石质工具，至少一侧具有锋利的刃口。能人制造工具时，通常是一手握着加工对象，一手握着石锤。加工产生的碎片也能为之所用。这些砍砸器定义了人类有史以来的第一个石器文化——奥杜韦文化（Oldowan，以其发现地奥杜韦命名）。

锋利石片的生产，可能为能人日后的成功提供了助力，使它们在获取更加多样化的食物方面拥有了巨大的优势。实际上，这些石质工具表明，它们越来越适应食肉的饮食习性。

属于奥杜韦文化的石质工具

容量与日俱增的大脑

同南方古猿的情况一样，人属的诞生似乎也与气候变化有关。在大约290万年前至240万年前，气候变得更加凉爽也更加干燥，由此导致森林的面积进一步缩小，并分割成更加开阔、更加多样的栖息地。

与南方古猿相比，人属物种的食物种类更杂，肉类和脂肪所占的比例也更高。人属物种确曾取食尸体上的肉，但并不能因此认为它们拥有猎杀水牛和其他大型动物的能力，这更多的是食腐行为（指食用意外死亡的动物或大型食肉动物杀死的猎物的尸体），而且还要与鬣狗和秃鹫争抢才行。借助手中锋利的石质工具，它们能够切断肌腱获取兽肉，并砸开骨头食用骨髓。

富含蛋白质和脂类的动物性食物的增加，或许与人亚族大脑的增大有所关联。实际上，大脑重量虽然仅占人类体重的2%，能量消耗却占人体能量消耗总量的20%左右（当然是在不做体力劳动的情况下）。大脑大，就需要进食营养格外丰富的食物。

拥有大容量的大脑有什么好处呢？同双足行走一样，我们不能用大脑在几百万年后才显现的优点对此加以解释。一般而言，灵长目动物的大脑比羚羊和猫科动物的大脑更发达。这个特点与寻找食物没有关系，与危机四伏的野外生活也没有关系，而是与灵长目的社会组织形式有关。分辨敌我、日常协作、构建长期联盟关系等等，构成了个体间纷繁复杂的关系，而这又要求对族群内部关系有深入的了解和理解。对南方古猿来说，在面对比森林更加凶险的开阔环境时，抱团生活会安全很多。这样一来，由于能够促使族群成员之间建立深入合作关系，较大的大脑就具有了演化优势。

但是，安全也是要付出代价的！发达的大脑需要营养更加丰富的动物性食物。动物性食物更容易消化，其吸收过程对肠道造成的负担小，肠道消耗的能量相应地更少，由此节省下来的能量正好可为大脑所用。如果大脑能够更加高效地运转，就能够找到更多的食物来源或者制造有助于获取食物的工具。这是发达的大脑带来的第一个良性循环！其产生的第二个良性循环如下：较大的大脑有助于族群成员搭建良好的社会关系，反过来，良好的社会关系确保了它们更好地开发利用环境，比如，通过共享新资源等方式。在今天看来，这些相互作用至少部分解释了人属为什么会出现。

脑容量

通过某个人亚族生物的颅骨化石，可以大致估算其脑容量大小。在同一个物种内部，脑容量的差异非常大（人类的脑容量为1000毫升至2000毫升，最大值几乎是最小值的两倍），也根本不可能知道每个个体的实际脑容量。

在物种之间，平均脑容量的差异比较大：黑猩猩的平均脑容量为400毫升，明显与我们人类（平均脑容量为1350毫升）不同。不过，在对比时还应考虑两个物种的身材差异，因为黑猩猩比人类小很多。理论上说，脑容量随着身材的增加而增加，但二者并非成正比例关系。人类的大脑只占体重的2%，鼩鼱的大脑却能占到体重的10%！

因此，在比较两个物种时，更多的是对比它们的脑化指数。所谓的脑化指数，指的是动物的实际脑大小与根据体重得出的预期脑大小之间的比值。人类的脑化指数比其他物种高，约为7.5，这说明人类的大脑比同等体形的哺乳动物的预期脑要大七八倍。黑猩猩的脑化指数为2.5，海豚的脑化指数为5.3。

与南方古猿相比，最初的人属生物不但拥有更大的大脑，还拥有更大的身体。其实，直至大约50万年前，人亚族的大脑主要都是随着身材的增大而增大的。在那之后的脑容量才是真正地增加了。

无论是脑容量还是脑化指数，都不足以描述人属身上实际发生的变化。其他因素与智力（这里简单理解为解决新问题的能力）的关联更紧密，比如大脑皮层（即大脑表层灰质）神经元的数量及功能（即神经元与其他神经元连接的能力或神经冲动的传导速度）。

脑同样在颅骨上留下了自己的印记，这就为我们提供了一些与脑的构造有关的信息，比如大脑各个脑叶的相对大小或左脑与右脑的差异。人类的演化同样伴随着脑部结构的改变，这些改变可能与脑容量的增加具有同样重要的意义，但是化石并没有给出大脑构造的相关细节（除非化石中保留了DNA，参见《既是智人又是现代人！》）。

直立人是个大个子

1984年，纳利奥科托美（Nariokotome）男孩（又称“图尔卡纳男孩”）的发现，使我们对最早人类的了解向前迈出了一大步。这是一副近乎完整的骨架（只缺了手和脚），其生理特征与我们更加接近。

这个骸骨化石可以追溯至大约150万年前，由理查德·利基（Richard Leakey）考古队的卡莫亚·基穆（Kamoya Kimeu）在图尔卡纳湖畔发现。图尔卡纳男孩死亡时仅有8岁，身高刚过1.5米。成年后，身高或许将达到1.7米，甚至更高。一开始，人们认为它已经11岁了，而且身材更加高大，但牙齿化验结果显示，它的发育速度比人类快了不止一星半点——才到8岁，它就几乎完成了身体发育！与南方古猿相比，它的骨骼与人类更加相似，四肢比例非常接近人类。骨盆和股骨的结构说明它善于行走甚至能够奔跑，但论起攀缘树木或许就不是祖先的对手了。2009年，人们在肯尼亚发现了这一物种的脚印，其中不少与智人的脚印难以区分。

图尔卡纳男孩的颅骨相对较小，面部向前凸出。牙齿比人类粗大，但与能人相比还是有所减小。眼眶上方有粗壮的眉骨，额头后倾，几乎没有下巴。

凭着800毫升左右的脑容量，直立人的演化程度比能人略高，但直立人的身材可比能人高大得多。不过，与能人相比，直立人的大脑更加不对称，布罗卡区和韦尼克区（对人类语言能力至关重要的两个脑部区域）比较发达，但这并不意味着直立人具有语言能力，因为那还需要咽喉的结构满足条件才行。然而，化石并未给出与咽喉结构有关的任何信息。

直立人和智人的颅骨对比

最开始，人们将图尔卡纳男孩确定为直立人（迄今为止还没有发现如此完整的直立人骸骨）。随后，一些古人类学家认为，同亚洲发现的直立人相比，图尔卡纳男孩足够不同，完全可以视之为另一个物种。最终，图尔卡纳男孩被定名为匠人（Homo ergaster），并被视为亚洲直立人的非洲先辈。

无论被称为匠人还是直立人，该人属物种自190万年前起便生活在非洲大地上。人们曾经认为它们是能人的后代，不过人们已经发现了二者的同时代化石（距今约150万年），这说明它们曾经在这个星球上共存了至少50万年。或许，生活方式的不同削弱了相互之间的竞争。与直立人相比，能人的食性更加偏向素食。

同步加速器带来的发现

X射线微断层扫描（X-ray microtomography）可以非常精确地探测骨化石内部且不会损坏化石。在牙齿上取得的研究结果格外引人注意。随着生物个体不断发育，坚硬无比的牙釉质在牙齿表面逐渐沉积。借助同步加速器，可以发现以细纹形式存在的牙釉质沉积。这么一来，就能确定生物个体生命中重大事件的发生日期，比如出生或断奶，因为这些事件都会在牙釉质中留下痕迹。

通过对牙齿微观结构的观察，我们发现南方古猿的发育速度很快，与黑猩猩的发育速度很接近。图尔卡纳男孩的发育速度相对缓慢，但与我们人类还是大有不同。

多种用途的两面器

如果我们在定义自己的物种时坚信历史和史前史所示的人类和智慧长久以来的特征，那我们或许不会自称智人，而会自称工匠人（Homo faber）。

——亨利·伯格森（Henri Bergson），1920

与图尔卡纳男孩同时出土的还有一些砍砸器，和能人制造的砾石砍砸器类似。不过，在这个时期，非洲大陆上已经出现了新的工具——两面器。所谓的两面器，是指加工成杏仁状的石头，多多少少呈椭圆形或三角形，两个侧面做了对称加工，两面之间是锋利的刃口。

迄今为止发现的两面器最早可追溯至大约170万年前，都是直立人制造的。直立人还制造了与两面器类似的手斧，二者的区别在于，手斧的一个面未经加工，且刃口几乎与其自身中轴垂直。制造过程中产生的碎片，直立人也不会丢弃，而是通过打磨将其改造成较小的工具。

属于阿舍利文化的石质工具

根据1872年在圣阿舍利（法国亚眠下辖地区）发现的两面器，人们将这个文化命名为“阿舍利文化”（Acheulian）。阿舍利文化紧接奥杜韦文化而来，但二者的石器制造技术在时间和空间上都有所重叠。奥杜韦文化和阿舍利文化共同定义了旧石器时代早期。

人们先后在近东和印度发现了两面器，其历史可追溯至大约150万年前。欧洲最早的两面器诞生于距今65万年前后。阿舍利文化在大概30万年前逐渐被最初的智人和尼安德特人特有的莫斯特文化（Mousterian）所取代。

两面器的产生是重大技术变革的结果。这是因为两面器的制造有两个前提：首先，要事先对所需工具有精确的初步设想；其次，要拥有比制造砾石砍砸器更高超的手艺。在两面器中，史前史学家还看到了有美感的外观，以及创造对称形工具的主观意愿，而这可比制造单纯满足具体用途的工具要复杂很多。

另外，对于直立人怎么使用两面器，我们依然没有头绪。当然了，两面器能用来切割肌腱、剥离关节或砸开骨头以获取骨髓，为直立人食用尸体提供了极大的便利；在牛尸上进行的试验也为此提供了佐证。但是，两面器的造型多种多样，想必还有其他用途，比如挖掘土地、砍斫树干、刺穿皮肤甚或击打对手（参见《狩猎与传统》）……此外，还可以通过不断的打磨对工具进行改造并改变其用途。

随后，在距今50万年前后，出现了以骨头或鹿角制成的“柔软”手锤，这使得打磨的精度更高。借助这种手锤，直立人使用在远方发现的奇石精心制造了用于祭祀或象征威望的两面器。

新面貌

猴子共有193种，其中192种身披毛发，唯一一种全身光滑无毛的猴子自称为智人。 ——德斯蒙德·莫里斯（Desmond Morris），1960

有些古人类学家，比如丹尼尔·E. 利伯曼（DanielE. Lieberman），将直立人的奔跑能力视作人类世系演化的关键因素。出色的体能加上可用作武器的先进工具，使直立人成了人类历史上第一个真正的猎手。

在稀树草原上，许多动物跑得比人快，但能与人类一样长时间奔跑的却少之又少。人类的真正特长其实是耐力！我们可以非常容易地想象直立人通过追逐而累垮猎物的画面。当然，这里说的猎物可不是那些体形庞大的野兽，而是羚羊或野兔这样的小动物。

直立人之所以善于奔跑，是因为获得了修长双腿之外的新特征。直立人身材更加苗条，胸腔更加呈圆锥状。由于摄取的植物性食物减少，它们的肠道变得更短，腹部也变小了。由于身处热带，它们应当出汗很多。汗液的蒸发快速消散了肌肉运动产生的热量，这是调节体温的有效方式，而动物往往因为不能这样调节体温而耐力受限。

直立人大量出汗，是因为皮肤上有数以百万计的汗腺，这意味着直立人已经失去了祖先曾长有的绝大部分毛发。至于人类是什么时候失去毛发的，化石没有给出任何信息，但失去毛发与善于奔跑有所关联并非毫无根据的假设。

为了了解得更多一些，我们可以问问……虱子！所有灵长目动物的身上都有寄生虫。在今天的人类身上，甚至生活着好几种不同类型的虱子：头虱、体虱、阴虱。这几种虱子之间互有亲缘关系，与黑猩猩或大猩猩身上的虱子也有相似之处。通过分析它们的DNA，我们得到了与人类演化有关的非常有趣的信息。事实上，头虱和体虱是近亲，十来万年前才开始分化，而它们的分化可能与衣物的诞生有关。另外，它们与生活在黑猩猩身上的虱子还有共同祖先，这个共同祖先生活在大约560万年前，而人类和黑猩猩两个支系差不多就是在这个时候分道扬镳的。

人科物种身上虱子的系统发生树

至于阴虱，则与生活在大猩猩身上的虱子亲缘关系较近，二者在约330万年前发生分化。在此之前，虱子应当可以轻而易举地在人亚族和大猩猩族之间传播，至于传播途径，或许是人亚族和大猩猩族重复使用每日在树下形成的枯枝落叶层。不过，两种虱子的分化表明其生存环境变得有所不同，这或许与第一批人属失去毛发脱不了干系。于是，原本生活在大猩猩身上的虱子继续在大猩猩的每一寸毛发中繁衍生息，而生活在人属身上的虱子最终选择蜗居在阴部。这么说来，失去毛发应当比人类诞生还要稍早一些！

毛发的减少还产生了另一个结果。赤道地区光照强烈，这就要求对皮肤提供强有力的保护，以使其免受危险的紫外线的伤害。在毛发的保护下，南方古猿的皮肤可能呈浅色，就像经常在现存猿猴身上观察到的一样。而原始人裸露在外的皮肤中快速积累了大量黑色素，在保护皮肤的同时，这种物质还或多或少给现代人类皮肤着色。

“开枝散叶”的原始人

迄今发现的原始人属物种遗骸，构成了一幅马赛克镶嵌画，各个物种随着最新的解读、重建和发现改变着自己的位置。如上所述，人们已经描述了能人、鲁道夫人和直立人，这三个物种似乎曾经共存，或者至少在某些时期共存。

有些研究人员质疑是否应当将人属分成几个物种。他们的主要依据是1991年至2005年在格鲁吉亚德玛尼西发现的工具和骸骨，其中有五个保存相当完好的颅骨，均可以上溯至大约180万年前。第五个颅骨，代号为D4500，与五年前出土的一个下颌骨极为相配，其脑容量约为550毫升，接近能人的最小脑容量，面部与直立人类似，牙齿则与鲁道夫人相仿。另外四个颅骨的脑容量稍大，为630毫升至700毫升。这些颅骨和颌骨均具有镶嵌演化特点；兼而有之的原始特征和衍生特征，将它们与同一时代的全部人属物种（能人、鲁道夫人、匠人）关联起来。

然而，这些在同一个地方发现的属于同一个时代的颅骨大有可能属于同一个种群。最初对它们进行描述的研究人员认为，这些颅骨在不同的地点出土，彼此之间差异很大，应被视为五个不同的物种。实际上，它们的差异之处应归因于年龄的不同：牙齿分析结果表明，一个颅骨的主人当属“英年早逝”，而另一个牙齿掉光的颅骨，显然来自一个垂暮老者。此外，还应当考虑它们的性别和种群内部的个体变异性。

2013年，颅骨的发现者——大卫·罗德基帕尼泽（David Lordkipanize）及其同事——发布了关于这个种群的分析报告。在他们看来，在个体变异性方面，这个种群足以与人类或黑猩猩等量齐观。正因如此，他们提议将这一时期的人属生物全部划入同一个物种——直立人。不过，由于忽略了与其他地区出土的化石的实际差异，这种“一刀切”的归并方式没有得到全体古人类学家的赞同。

德玛尼西出土的五个颅骨的3D复原图

2013年，美国古人类学家李·R. 伯格（Lee R. Berger）带领考古队在南非的新星（Rising Star）洞穴——距离他发现南方古猿源泉种的地方仅有1 000米远——发现了大量的骨骼化石。总数超过1 500件，来自至少15个年龄各异的个体，通过拼凑可以得到几乎完整的骷髅。这一发现——至少在数量上——堪称古人类学史上最为重大的发现，也为“枝繁叶茂”的人类演化树增添了新的枝叶。

与南方古猿类似，这些骸骨兼具原始特征和现代特征，身材短小，脑容量也小（约为500毫升），乃至于李·R. 伯格将其视为新物种，命名为纳莱迪人（Homo naledi），并将其视为人类的潜在祖先。纳莱迪人的手臂适于攀缘，但双手似乎能够进行精细操作，髋关节与露西类似，双足则极具现代特征。同南方古猿源泉种一样，纳莱迪人也表现出兼具原始特征和衍生特征的镶嵌演化现象。而某些骨头，比如股骨，则带有在南方古猿和现代人身上都前所未见的独特细节特征。

一般而言，骸骨大量累积是掠食者进食或地下河冲刷造成的。由于遗址里没有发现羚羊或其他动物的骸骨，李·R. 伯格断定这个遗址应当是纳莱迪人丧葬行为的结果。不过，绝大多数专家都不认同这个假说，因为如果这个假说成立，那就意味着纳莱迪人已经学会了使用火，否则它们是无法抵达洞穴底部的，然而，迄今为止尚未发现脑容量那么小的人亚族成员会使用火。

起初，经过测定，骸骨的年代为大约200万年至100万年前，这就很难确定其在人属中的演化位置了。2017年人们进行了第二次测定，确定其年代仅为大约30万年前，这样一来，解释它们在人类系谱图中的位置变得更加复杂。尽管骸骨数量众多，但这些年代上的问题使众多研究人员不得不就纳莱迪人的演化位置乃至其是否仍可被归为人属进行激烈的辩论。此外，在尚未于科学期刊中详细描述考古发现前，李·R. 伯格就将其大肆展示并发表在大众杂志上，学界对此也是持保留态度的。

第四章　去往世界尽头

如今，智人已经遍布全球。所有的发现和证据都令人猜想人类起源于非洲。如果果真如此，那我们还需要弄明白下面这两个问题：我们的祖先是怎么迁移并占领其他大陆的？这些迁移的原始人在现代人的诞生过程中发挥了什么样的作用？

走出非洲

解决方法路上找。 ——古代谚语，传为第欧根尼所言

许许多多的考古发现将人类祖先走出非洲并逐渐占领欧亚大陆新地盘的日期向前推，而且越推越远。

虽然已经成了约定俗成的用语，但是“走出非洲”这个说法并非没有缺陷。“走出非洲”，给人的感觉像是一蹴而就，而事实上，我们的祖先是追随着角马或斑马迁徙的脚步逐渐扩大自己的分布区域的。角马或斑马的迁徙受制于气候变化，追随着它们的步伐，以打猎和采集为生的原始人便得以探索新的地域。或许，这些原始人也曾短暂地面临人口增加的压力。其探索行为并非出于自愿或事先规划，而是时快时慢的整体移动，在几千年间不断持续进行，且在个体层面几乎难以察觉。即便每一代人的移动距离可能都不到10千米，在几万年的持续迁移中，还是有些原始人能够到达远东的，而大部分年代测定技术的精度甚至还达不到几万年。

在红海和里海之间的德玛尼西（位于格鲁吉亚）发现的人属物种化石证明，远在大约180万年前，原始人的足迹便已踏上欧亚大陆。它们使用的工具不过是造型简单的砾石砍砸器，与能人制造的工具类似。还有人提出，人类走出非洲大陆的日期其实更早。2016年，在印度北部的马索尔（Masol）挖掘点出土的牛科动物骸骨上发现了食腐的痕迹。随着这些骸骨出土的也有砾石砍砸器，其年代甚至可以追溯至距今260万年。

改变

在迁移过程中，原始人面临着与非洲大陆截然不同的生活环境。生活环境、生活方式及周围物种的不同，导致它们沿着不同的演化路径分化，尤其是在出现了长期的地理隔离后。领地的不断扩大，伴随着纷繁丛杂的演化，最终导致多个人属物种的诞生。

1891年，欧仁·杜布瓦在印度尼西亚的特立尼尔（Trinil）挖掘点发现了历史上第一个直立人化石，并将其命名为直立猿人。随后，在爪哇岛发现了这个物种的其他骸骨，比如可以追溯至大约80万年前的桑吉兰（Sangiran）颅骨。从1921年到1937年，在北京附近的周口店出土了属于同一物种的大量骸骨，起初人们将其命名为中国猿人（昵称“北京人”）。可惜，二战期间，这些化石在运往美国的途中全都消失在了茫茫大海上。

到了20世纪50年代，古人类学家达成了共识，将在亚洲发现的这些化石全部归为“直立人”。直立人五官粗犷：眉骨粗壮，眼眶后侧颅骨缩小，额头后倾，臼齿较大，几无下巴（参见第62页插图）。颅骨向后伸长，像是在枕骨上盘起的发髻。颅骨骨壁极为厚实。脑容量通常在800毫升至1 100毫升之间（如果将德玛尼西发现的化石也考虑进来，那这个数值还要减小）。颅盖的形状很有特点。直立人的颅骨骨壁向着头顶的方向收缩，而智人颅骨的最宽处位于头颅中部。

然而，亚洲的直立人与非洲的直立人并无根本性的差异。1978年在中国云南大理发现的距今26万年的颅骨与在赞比亚发现的年代稍古老些的卡布韦（Kabwe）颅骨及在希腊发现的佩特拉罗纳（Petralona）颅骨非常相似。这些相似之处令人猜想，直立人在旧世界广泛分布，且没有出现大的分化。尽管四散在世界各地，直立人各个群落之间的基因交流或许仍在进行，并没有发生具有决定性意义的中断。

最初的欧洲人

欧洲最为古老的人亚族化石是2018年在西班牙奥尔塞（Orce）发现的一颗人亚族生物牙齿，其年代约为距今140万年。除此化石孤本以外，已经证实的最为古老的欧洲人亚族骸骨发现于西班牙阿塔普埃尔卡（Ata-puerca）的几个矿层里。

在“象坑”（Sima del Elefante）里，出土了几个人亚族骸骨化石和一些人亚族活动的痕迹，还有几件奥杜韦风格的工具。经测定，这些骸骨化石的年代为距今120万年，被认为是直立人。与此相比，邻近的格兰多利纳（Gran Dolina）洞穴的骸骨更加丰富，共出土了距今78万年的约80件骨骼碎片。由于与直立人的骨骼稍有不同，这些骨骼碎片被命名为前人（Homo ante-cessor，又称先驱人）。前人使用的工具也是砾石砍砸器，没有一丁点儿两面器的特点。2013年，在英国黑斯堡（Happisburgh）的海滩上发现了同一年代的脚印，研究认为是两个成年原始人和几个儿童留下的。

凭着另一个年代更近的矿层，阿塔普埃尔卡挖掘点享誉世界，这便是被称为Sima de los Huesos的“骨坑”。自1984年开始挖掘以来，坑中已经出土了几个完整颅骨和6800余件其他骨骼，它们属于距今大约43万年的至少28个个体。今天，这些古人类被命名为海德堡人（Homo heidelbergensis），其身材高大强壮，男性平均身高为1.7米，女性平均身高为1.57米，男女之间的身高差与我们差不多。海德堡人能够制造阿舍利风格的两面器。

海德堡人的骸骨兼具原始特征和衍生特征：它们拥有与直立人相似的粗犷面部；两个眼眶上方均有眉骨，但直立人的两个眉骨是连在一起的；脑容量为1 230毫升，比亚洲直立人的脑容量平均值高出很多。一些解剖学特征，比如不明显的颧骨，令人将它们与稍晚时候生活于同一地区的尼安德特人关联起来（参见第94页《欧洲的尼安德特人》）。此外，DNA分析结果也显示海德堡人可能是尼安德特人的直接祖先。

另一些在欧洲发现的年代更近的化石也表现出类似的特征，比如亨利·德·拉姆利（Henry de Lumley）团队于1969年在法国东比利牛斯省阿拉戈（Arago）洞穴发现的距今45万年至30万年的陶塔维尔人（TautavelMan），或在德国发现的施泰因海姆（Steinheim）颅骨和在希腊发现的佩特拉罗纳颅骨。许多专家都将它们视为海德堡人。在一些研究人员看来，海德堡人是严格意义上的欧洲人种，是从至少100万年前来到欧洲的直立人发展而来的“前尼安德特人”。在演化过程中，这些直立人想必经过了前人阶段。

假说一　人属多个物种的两种系统演化关系假说

假说二　人属多个物种的两种系统演化关系假说

然而，即便直立人、前人、海德堡人这几个不同的类群相继出现并生活在同一地区，却没有任何证据表明它们互为亲代子代。理论上讲，一些类群非常可能灭绝且没留下任何后代，并被来自其他地方的新种群所取代。

于是，另一些研究人员就认为，直立人至少曾经两次向欧洲移民。第一拨移民在120万年前抵达欧洲，它们是前人的祖先，但在60万年前的大冰期期间全部消失了。或许，来自非洲的原始人取代了它们；这些原始人是尼安德特人的祖先，拥有更大的大脑并带来了阿舍利文化。

在这种情况下，海德堡人就是一个欧非物种。在大约30万年前，海德堡人在欧洲诞下了尼安德特人（以及丹尼索瓦人），并在非洲诞下了智人。在此需要明确说明一下，某些被视为海德堡人的化石，特别是卡布韦颅骨，最初被命名为罗德西亚人（Homo rhodesiensis）。许多专家提议将二者统一归入海德堡人名下，但另一些人仍坚持将二者区分开来的做法。

狩猎与传统

无论是晚期直立人还是海德堡人，它们都是优秀的猎手，能够猎杀大型动物，比如马甚或犀牛。也是它们最先在长矛上安装尖利的石头以增强杀伤力。在英国博克斯格罗夫（Boxgrove）曾经发现了一块似乎曾被燧石刺穿的马肩胛骨，其年代为距今50万年。

同样，在德国的舍宁根（Schöningen）发现了三件40万年前制作的标枪。标枪为木质，长达两米，保存状况极佳。标枪被精心切削成流线型，重心位于距前端三分之一处，以便投掷。

猎杀大型猎物是一项非常复杂的活动，需要参与者之间的高度配合。某些研究人员认为，这项活动证明了参与者之间还存在精心设计的沟通语言。在“骨坑”中曾经出土了两块舌骨。舌骨位于咽部上方，是舌头肌肉和咽喉肌肉的附着点，在产生语言的过程中发挥着重要作用。阿塔普埃尔卡发现的舌骨与尼安德特人和智人的舌骨相仿，但这并不足以证明海德堡人拥有语言。

另一个问题涉及埋葬行为。同样是在“骨坑”，所有的骸骨都集中在同一个地点。一些研究人员猜想，它们是在死后被有意丢进坑里的，而且一同丢进坑里的还有漂亮的红黄色两面器。不过，有确切证据的安葬行为是在更近的年代（大约距今10万年）才发生的。

在出土的骸骨中，有个颅骨的额骨上有两处骨折。根据对骨折情况的详细研究，这个人应当是在致命的搏斗中被敌人用同一件武器两次猛击头部而死的。这么一来，这个颅骨可以算是一桩最古老悬案的证据了。

许多挖掘点都发现了食人的痕迹。在格兰多利纳洞穴发现的骸骨中，不少被斩首，骨头上布满被砍的痕迹，还被砸开以吸食骨髓，处理方式和动物（野牛或鹿）骨头别无二致。究竟是为了生存而不得不食人还是祭祀性的食人，现在已经不得而知。由于许多受害者都是儿童，人们猜想当时的原始人视邻近部落的年轻人为捕猎对象。这种行为在陶塔维尔也有发现，在当时似乎相当普遍且并未随着时间的推移而消失。

火的掌控

乌拉穆尔人（Oulhamr）逃进了可怕的黑夜里。他们痛苦不堪、筋疲力尽，在圣火已灭的终极灾难面前，一切似乎都失去了意义。 ——J.H. 罗斯尼·埃内，1911

火，是人类驾驭自然的最古老象征之一。在能够自行生火前，我们的祖先或许先是学会了如何掌控并利用雷电或火山喷发引发的火。

迄今为止，考古学家已经发现了许许多多与火有关的遗迹，有木炭，也有因受热而崩裂的石头。不过，区分自然火灾的痕迹和人工生火的遗迹已非易事，证明某个火堆的确是某个原始人类点燃的更是难上加难。

在南非奇迹洞（Wonderwerk Cave）发现的古老篝火遗迹（可追溯至距今100万年）似乎是烤炙骨头留下的。原始人应当是使用了取自大自然的火来烹饪肉食。如此一来，有些研究人员认为，人类开始烹饪食物的时间远远早于炉灶的出现。但是，依然无法确定这个时期的原始人是否确实学会了使用火。

生火方法

原始人主要使用两种方法生火。第一种方法利用两块木头相互剧烈摩擦产生的热量引燃干草，遗憾的是，用来取火的木制工具几乎没有留存下来。第二种方法利用的是燧石突然摩擦富含硫化铁的石头时产生的火花（两块燧石相互摩擦是没有用的），这种生火工具出现的年代较晚（距今不到3万年），留下的遗迹也极为罕见。

已经确认的最古老的炉灶出现在距今约40万年，人们在法国布列塔尼的梅奈兹——德雷冈（Menez-Dregan）遗址和中国北京的周口店遗址都发现了原始人搭建的炉灶。这些炉灶呈圆形，以煅烧后的石头垒成，炉灶中有灰烬和木炭，还伴有其他原始人类活动的痕迹，比如动物骸骨或石质工具。

当然，火有着多种用途。火可以驱散掠食动物，可以温暖并照亮营地或住所，可以加热矛尖使其变硬，可以为石头切削提供便利，还可以弄熟食物并有助于保存肉或鱼。对于来到了高纬度地区的原始人而言，燃起的篝火延长了它们在日落后或冬天里的交流时间，进而在它们的社会化过程中发挥了某种作用。

烹饪食物时会散发出诱人的香味。不能生吃的食物在弄熟后也会变得美味可口。火还能降低因食物腐坏或感染致病微生物而导致中毒的风险。另外，弄熟之后，块根和块茎的咀嚼和消化消耗的体能更少，也更容易消化吸收，并为机体提供更多的能量。原始人就不必再花费大量时间寻找食物。这样看来，煮熟食物与人属大脑的增大和牙齿的减小或许有着一定的关系。

第五章　其他人属物种

我们智人的祖先曾与其他人属并肩同行。显而易见，这些人属与我们的祖先拥有相同的远祖，但在进化的道路上，它们与我们的祖先分道扬镳了。也许，我们的祖先和它们都视彼此为同类，并曾共同生儿育女，但由于差别实在太大，不能合而为一。这些“其他的人属”，在我们的历史上书写了迷人的篇章，使我们得以遐想自己可能会成为的模样。尼安德特人在大约4万年前灭绝了，而我们的祖先则生存了下来。比起它们的存在，尼安德特人的灭绝更令我们着迷。

欧洲的尼安德特人

在“其他人属”中，尼安德特人是当今最为知名的，或许也是与我们最接近的。一个半世纪以来，史前史学家就尼安德特人与我们的异同展开了旷日持久的探讨，与我们如此相似却又和我们如此陌生的尼安德特人，深深地吸引了公众的注意。

在很长的一段时间内，尼安德特人被视为我们的祖先。后来，尼安德特人被降格为近亲，成为智人下面的一个亚种。再后来，尼安德特人被从智人中分离出去，成为与我们不同的新物种。1997年，通过分子生物学领域的研究，人们先是确认了尼安德特人与人类的分野，随后发现了两个物种之间的杂交特征。

根据“骨坑”出土化石的DNA分析结果，在大约76.5万年前至55万年前，智人（现代人的直系祖先）

根据西班牙“骨坑”出土化石的DNA分析确定的现代人类、尼安德特人和丹尼索瓦人的共同起源示意图

和尼安德特人的共同祖先海德堡人生活在非洲的某个地方（参见第140页《史前DNA》）。海德堡人是直立人的后代，但与直立人并无太大不同，只是比直立人稍微高大一些，也稍微更像人类一些。海德堡人的一些后代后来到了欧洲，并在欧洲继续演化，最终形成了尼安德特人。

尼安德特人在演化过程中也发生了变化：生活在10万年前的尼安德特人与生活在40万年前的前尼安德特人是有所区别的，尽管后者已经具有了某些标志性的特征。为了更好地与智人相区分，人们往往描述的都是后者，它们的化石是19世纪发现的史前原始人化石的组成部分。

尼安德特人和智人的颅骨对比

与同时代的智人相比，尼安德特人的个头较小（男性身高1.7米、女性身高1.6米）。尼安德特人面容粗犷，骨头厚实，肌肉强健。与我们相比，尼安德特人的颅骨较大但偏低，呈伸长状，平均脑容量比我们大，为1 500毫升。相应地，尼安德特人的面部较大，且向前凸出，鼻子大而长，颧骨不太明显，额头后倾，下巴也向后倾斜，与智人的外貌迥异。

尼安德特文化

最为古老的尼安德特人生活在阿舍利文化时期，能够制造两面器。到了大约30万年前，在继续制造两面器的同时，尼安德特人开发了新技术，即“勒瓦娄哇”（Levallois）切削技术（以在巴黎近郊勒瓦卢瓦采石场发现的石质工具命名）。原石（或石核）经过一系列击打除去碎屑后，再一击定型，得到设想的工具。

使用这种切削方法时，不但要对所用材料有很好的了解，还要拥有相当的手艺。这种方法增加了用一个石核制造出的工具的数量。由勒瓦娄哇切削技术得到的石片和尖状器定义了一个全新的石器文化——莫斯特文化。从时间上看，莫斯特文化对应的是距今30万年至4万年的旧石器时代中期。

勒瓦娄哇切削技术器

属于莫斯特文化的石质工具

莫斯特文化在欧洲与尼安德特人相关联，在非洲却与早期智人相对应。究竟是欧洲的尼安德特人还是非洲的早期智人发明了这项制造技术，我们无从知晓。是它们分别独立发明了这项技术吗？还是它们之间的接触实现了新技术的传播呢？而2018年在印度发现的距今38.5万年的勒瓦娄哇风格的工具，更是使得笼罩在这项技术上的迷雾变得越发浓重。

尼安德特人非常关注所用燧石的属性。人们发现，尼安德特人通常只开采居住地附近的石头（5千米范围内），却将制造的工具携带到很远的地方（超过50千米）。它们占据了许多岩洞作为住所。或许，它们在露天自建的住所都是轻型建筑，所以没有留下任何痕迹。在它们的住所里，炉灶很常见，也使用了很长时间。

为了将尖状器固定在长矛上，尼安德特人会使用桦树的树皮制作黏胶，为此，需要按照精确的程序将树皮缓慢加热至一个相当低的温度。尼安德特人猎杀驯鹿和野马，也不放过原牛（现存家牛的祖先）和犀牛这些凶猛的动物。骸骨化石显示，一些尼安德特人曾经骨折，其伤痕形状与今天的牛仔驯服野牛时骨折的伤痕相仿。领地偏北的尼安德特人占据了和狼类似的生态位，食物中有丰富的动物性食物。

领地偏南的尼安德特人主要以植物、蘑菇和小动物（鸟、龟、鱼等）为食。由于冰期导致海平面变化，尼安德特人在海岸上的大部分居住点都消失了，但西班牙南部洞穴中发现的遗迹说明，这一食性在尼安德特人的演化历史上发挥了重要作用。尼安德特人能够用火烹饪食物，但并不是一直都这么做，也不是每个地方的尼安德特人都会这么做。一些古人类学家猜想，尼安德特人并不会生火，所以在某些居住点或在某些时期没有炉灶。不过，也有可能是因为在罕有木材的草原上很难维持火焰的持续燃烧吧。

人们同样注意到，尼安德特人身上没有智人拥有的AHR基因变体。肉类在烧烤时会产生有害分子（致癌的多环芳烃类物质），而AHR基因的作用就是降低这些物质的有害影响。智人似乎经历了AHR基因的强烈选择，尼安德特人则没有。

此外，与它们之前的（和之后的）人属生物一样，尼安德特人也经常有食人行为，正如法国阿尔代什省的穆拉——盖尔西（Moula-Guercy）遗址所示。

牙垢的用处

骨头部分由胶原蛋白构成。在胶原蛋白中，氮原子以不同形式（无放射性的同位素）存在，尤其是异常丰富的氮—14和非常稀少的氮—15。氮—14和氮—15这两种同位素所占的比例因个体死前十年间的饮食不同而异。实际上，植物中会富集少量的氮—15，食草动物中富集的氮—15比植物中稍微多一些，食肉动物中富集的氮—15又比食草动物中多一些。借助同位素化学的研究方法，能测定元素的比例。尽管年代久远，化石中依然含有少量的胶原蛋白，由此便可测定氮—14和氮—15的含量。氮—15含量高的，就说明食性偏肉食。

此外，提取化石中的DNA并对其进行分析测序是另外一项应用日益广泛的化学技术。有待提取并测序的不是骨头中的DNA，而是牙垢中甚或洞穴土壤中含有的DNA。牙垢能够提供与食物相关的信息。即使在洞穴里没有发现任何骨骸，也可以弄清楚谁曾在洞穴里居住过或者什么东西曾经在洞穴里被吃掉。我们就是这样识别出了鬣狗和熊——后者常居住在洞穴里——以及猛犸象、犀牛、驯鹿和马，当然还有人类。虽然尼安德特人没有留下任何可见的遗迹，但我们依然发现它们曾经在洞穴里待过。

同样，通过分析身份不明的残骨的蛋白质（其实是古蛋白质组），也能弄清楚残骨属于什么物种，如果是人亚族的残骨，还可以弄明白它与已知谱系的基因相近度。

尼安德特艺术家

在几十万年的历史期间，尼安德特人的切削技术几乎没有任何改变。不过，在大约4.5万年前，差不多在第一批智人抵达欧洲的时候，不少文化上的创新横空出世。这些创新是文化同化现象，还是尼安德特人对智人技术的模仿，抑或是演化将尼安德特人推向了新的方向，现在已经不得而知。至今，史前史学家仍在就此进行激烈的争论。

其实，上述问题只是尼安德特人禀赋大辩论的冰山一角。在很长一段时间内，人们一直认为，迥异于智人的尼安德特人虽然拥有硕大的脑袋，但并不能进行创新，也没有任何艺术才能。然而，近期的发现对这种负面看法提出了质疑。

在意大利的一处尼安德特人居住地，人们发现了被拔去了羽毛的鸟的骸骨。这些鸟不是普通的鸟，而是秃鹫和胡兀鹫，它们的肉又硬又难闻。即便不了解尼安德特人的口味偏好，也可以猜想它们不是为了食用鸟肉而是要用鸟羽做饰品。

同样，尼安德特人也使用红赭石，有可能是为了装饰自己的身体或者在岩壁上作画。2018年，人们在西班牙发现了距今6.5万年（比智人抵达欧洲的时间早2万年）的岩画，这被认为是尼安德特人的作品，它们不但画了动物和几何符号，还留下了自己的手掌印。

不过，这处遗址的年代测定仍有争议，但另一处遗址的年代测定却是确凿无疑的。2017年，在法国的布吕尼凯勒（Bruniquel）洞穴里，发现了用石笋建造的环形建筑。经测定，其年代为距今约17.5万年，彼时在欧洲大地上生活的人亚族物种只有尼安德特人。为了完成这些功能未知的环形建筑，它们还点燃了木头和骨头取火。

最后，许多的墓葬证明，尼安德特人会保护亡者的尸体。由于它们不像后来的智人那样在掩埋亡者时埋入陪葬品，我们也不清楚它们是否会为亡者组织葬礼。

冰期的幸存者

由于某些历史原因——史前史研究始于西欧，大部分尼安德特人遗址都是在西欧发现的，但那里仅占尼安德特人疆域的五分之一，实际上，尼安德特人的活动范围直至西伯利亚边缘。它们曾在欧洲和中亚生活了几十万年，度过了好几次冰期和温暖的间冰期，曾在泰晤士河畔猎杀河马，也曾在西伯利亚追逐长毛犀牛。气候变化有时是非常迅速的剧变，持续不超过一代人的时间。

尼安德特人的栖息地时常被严酷的气候弄得支离破碎，人口数量也经历了数次明显的衰减。据估计，整个欧洲范围内的尼安德特人总数不超过6 000，这减少了不同族群之间的交流，或许还限制了文化演化的可能性。DNA测序结果显示，在西伯利亚发现的一位女性尼安德特人是近亲交配的产物（父母是同父异母或同母异父的兄妹或姐弟，甚至是舅甥或叔侄），而且近亲结合在它的祖上非常频繁。气候条件和隔绝状态似乎塑造了它们的历史。

尼安德特人的解剖特征或生理特征，究竟是隔离的体现还是适应生活环境的结果呢？人们猜想，粗壮的身材和短小的四肢是它们适应寒冷气候的结果，因为这种身形能够减少热量散失。在它们的DNA里，也找到了适应环境的体现（参见第140页《史前DNA》）。在尼安德特人体内，参与黑色素合成的MRC1基因拥有一个让其效率降低的突变。这意味着，尼安德特人皮肤和头发的颜色比非洲远祖的更浅。它们生活的地区光照较弱，患皮肤癌的风险较低，但缺乏维生素D的风险有所升高。而皮肤黑色素含量较低的话，有助于吸收维生素D合成所需的紫外线。

丹尼索瓦人

从今天起，我们可以大声宣布，人类类群比我们想象的更加多种多样、更加人丁兴旺，而且在演化过程中，同样也受到放之四海而皆准的生物规律的约束。 ——马塞兰·布勒（Marcellin Boule）

如上所述，60万年前抵达欧洲的海德堡人似乎是尼安德特人的祖先。根据“骨坑”出土骸骨的DNA分析结果，海德堡人还有另一个后代——丹尼索瓦人！

丹尼索瓦人的发现时间是2010年，发现方式非常独特，因为我们对它们的了解仅限于DNA。它们的名字取自西伯利亚的丹尼索瓦洞穴（Denisova Cave）。为了测定一个尼安德特人的基因组序列，研究人员在丹尼索瓦洞穴里提取了一些骨骼。但是，一块指骨中发现了意料之外的DNA，这个DNA既不属于智人又不属于尼安德特人，但是与尼安德特人的DNA比较接近。最终，研究人员得出结论，这个DNA属于一个新的物种，但是，除了这块指骨和一颗具有原始特征的牙齿以外，我们对这个物种的形态一无所知。人们将这一物种命名为丹尼索瓦人。在大约4万年前，这些丹尼索瓦人来过这个洞穴。

丹尼索瓦人于大约43万年前与尼安德特人分化。DNA显示，丹尼索瓦人的种群数量较为庞大，或许占据了很大一片区域，说不定直至东南亚。此外，丹尼索瓦人的DNA里含有尼安德特人和智人的DNA里没有的未知基因。这些基因有可能是它们通过与直立人杂交而获得的；直立人在此之前很久就走出非洲，并在亚洲一直生活到比我们想象中更晚的时期。

我们对丹尼索瓦人的外貌一无所知，因为迄今为止尚未发现任何丹尼索瓦人的骨骼化石；我们对丹尼索瓦人的文化也一无所知，因为尚未发现任何与它们有关联的原始工具。一些古人类学家提出，某些神秘的化石应该属于丹尼索瓦人，比如在中国辽宁金牛山发现的一具女性骨骼化石或1982年在印度发现的距今至少24万年的讷尔默达（Narmada）头盖骨化石（这个化石最初被认为属于晚期直立人或早期智人，随后被归为海德堡人）。不过，这些纯属假说，丹尼索瓦人身上依然迷雾重重。

弗洛里斯的“霍比特人”

2003年，由澳大利亚和印度尼西亚研究人员组成的联合考古队在印度尼西亚的弗洛里斯岛发现了人亚族生物的化石。它们的颅骨有些类似爪哇直立人，不过非常小，脑容量只有380毫升；个头只有1米高，脚却大得出奇。研究人员根据托尔金的小说《魔戒》将它们昵称为“霍比特人”。

这些化石可追溯至距今5万年，后来被命名为弗洛里斯人（Homo floresiensis）。尽管身材矮小，但它们并不是南方古猿，况且它们还能制造石质工具。一些古人类学家认为它们是直立人的后代，这些直立人到达亚洲之后由于隔离而演化出了矮小的形态。另一些古人类学家则觉得它们更像智人，是在更早的年代迁移到岛上的。2016年，在弗洛里斯岛上又发现了一块更小的下颌骨，其年代为距今70万年，似乎与直立人有亲缘关系。

生活在岛屿上的许多动物的身材都会缩小，生物学家将这一现象称为“岛屿侏儒症”。食草动物会向着矮小的方向演化，因为矮小的个头使它们能够轻而易举地在比大陆贫瘠的岛屿上发现食物。弗洛里斯岛上的古象（大象的亲戚）肩高只有1.8米，而附近大陆上的古象肩高甚至能达到4米。或许，正是这种现象导致了弗洛里斯人身材矮小。

其他人属物种的结局

在5万年前，地球上生活着数个人属物种：尼安德特人、丹尼索瓦人、弗洛里斯人及其他幸存的原始人类，当然还有智人。但如今，只剩下了我们智人，其他人属物种都消失不见了。是什么原因导致它们灭绝的？是气候变化，还是智人入侵？我们只能对尼安德特人的灭绝提出几种假说。至于其他几个物种，我们的了解太过零碎。根据我们的DNA里留存的痕迹，丹尼索瓦人在智人到达亚洲后就消失了，对于丹尼索瓦人的历史，我们所知仅限于此。

尼安德特人在欧洲生活了几十万年，度过了4次冰期和4次间冰期，直至智人的到来。尽管偶尔因为严寒和干旱背井离乡，尼安德特人还是很好地适应了环境。然而，尼安德特人的数量太少，而且散布在广阔无垠的领地上，由此导致近亲结合极为常见，这不利于它们适应新的状况，比如其他人属物种的到来。

在大约4万年前，或许还要晚一些，尼安德特人灭绝了。具体的灭绝日期尚不可考，因为人们依然无法确定某些遗址的归属。另外，尼安德特人不是同时灭绝的。在西班牙南部（尼安德特人占据的最后一片领地），尼安德特人或许继续存在了几千年，但这里发现的遗址的年代测定结果并未得到所有专家的认可。极有可能，尼安德特人和智人曾在欧洲共同生活了好几千年。

为了解释尼安德特人的灭绝原因，人们提出了各种假说。比如，剧烈的火山喷发（3.9万年前发生于意大利的火山喷发，火山灰一直飘到了俄罗斯）引发了突如其来的气候变化，最终导致了尼安德特人的灭绝；但是智人早在这次喷发之前就到达了欧洲。或者，智人传播了传染病，而尼安德特人对此没有免疫力；但是，尼安德特人种群散居各地，人口密度非常低，传染病是怎么扩散起来的呢？再或者，尼安德特人是被智人直接屠杀至灭绝的；就我们对自己所属物种的了解，这个假说倒也站得住脚，不过，已经发现的尼安德特人骸骨上鲜有直接暴力留下的痕迹，而且，如果真是智人将尼安德特人赶尽杀绝，尼安德特人又怎么可能继续生存几千年之久呢？

尼安德特人和智人处于竞争状态：它们猎杀同样的动物，栖身在同样的岩洞之下。即便二者没有直接冲突，这终归不是长久之计，其中之一必然要出局。不过，尼安德特人有个优势——它们生活在祖祖辈辈一直生活的环境中。当然了，这些也都是猜想。有人提出，智人比尼安德特人更能适应环境，在必要的时候，能够从猎杀大型猎物转为捕杀小动物或捕鱼。可是，人们在尼安德特人身上也发现了这样的饮食习惯。

或许，尼安德特人比智人低的生育率导致了竞争的加剧。同样，死亡率的不同进一步扩大了二者之间的差距。人们由此猜想，由于成年尼安德特人的大量死亡，只有极少的幼儿得到了祖父母的照顾，由此导致幼儿的存活率低，年轻人能够学到的生存技能也很有限。这就是所谓的“祖母假说”，不过这一假说缺乏证据的支撑。

上文所述的这些要素，每一个单独拿出来都不足以导致尼安德特人灭绝，但每一个都能够弱化本就非常稀少的尼安德特人种群。或许，应将上述各种要素综合起来考虑：严重的人口危机和偶发的种族冲突，增加了尼安德特人在与智人竞争时的劣势，最终导致了尼安德特人的灭绝？

第六章　最初的智人

在尼安德特人、丹尼索瓦人和直立人踏遍欧亚大陆的每个角落的时候，人类的演化在非洲仍在继续。根据各种可能性，最初的智人正是诞生于非洲。数不胜数的化石和基因证据，为智人的演化史提供了支撑。然而，在成功使别人信服之前，研究人员遭到了不少的反对，而且，这些反对意见往往并不是科学上的，而是哲学或政治层面上的。

智人的出现

至少30万年前，最初的智人似乎出现在了非洲大地上。让—雅克·于布兰（Jean-Jacques Hublin）和阿卜杜勒瓦希德·本—恩赛尔（Abdelouahed ben-Ncer）的考古队2017年在摩洛哥的杰贝尔依罗（Jebel Irhoud）挖掘点发现的两个颅骨就可以追溯至这个年代。与众多其他化石一样，这两个颅骨兼具祖先特征和衍生特征。颅骨的牙齿很小，还有下巴和颧骨，同较平的面部一样，都是很现代的特征。牙齿的发育细节也说明他们拥有与我们相近的发育时序。

两个颅骨的脑容量分别是1 300毫升和1 400毫升（现代人的平均脑容量为1 350毫升），但颅骨呈伸长状，这是一个明显的祖先特征。这两个远古智人骨头粗大，眉骨相当粗壮，面部也如海德堡人一样很大。然而，形态学统计分析的结果将这些细节归为智人颅骨变异性的范畴，这让他们成了已知最古老的智人。

3D复原

借助X射线微断层扫描技术（参见第63页《同步加速器带来的发现》），可以获得物体表面或内部结构的3D图像。这样一来，就能以虚拟的方式摆弄碎片，而无须将其从脉岩中取出，以免毁坏。此外，我们还能看到隐藏在骨骼内部的结构，比如内耳小骨。沉积层在化石上施加的压力会导致化石产生形变；而通过计算，我们就能够对形变进行校正，继而可以采用3D打印技术复制扫描对象。

颅骨的数字化还有另一个好处。它能使解剖测量工作变得更加容易，还能借助统计工具对颅骨进行对比从而得出较为客观的结果。我们甚至能够量化个体发育带来的形态变化，并通过儿童颅骨来推测创建成人颅骨的3D图像。就像DNA测序技术出现时一样，新技术工具势必会带来能够处理更多信息的数码工具。

扫描杰贝尔依罗挖掘点发现的化石后完成的早期人类颅骨3D复原

在这个发现之前，人们将智人的诞生追溯至更晚近的年代。在南非的弗洛里斯巴德（Florisbad）以及埃塞俄比亚的奥莫基比什（Omo Kibish）和赫托（Herto）也曾发现具有类似特征的化石，其年代分别为26万年前、19.5万年前和16万年前。这些化石将智人的诞生地定位在东非，而杰贝尔依罗发现的化石似乎否定了这一结论。无论如何，数量稀少的化石不足以在时间和空间上精确定位某个事件，比如新物种的诞生（参见第53页《地域偏见》）。

我们可以平行对比尼安德特人和智人的历史。这两个物种都向着脑容量大的方向演化，都发展出了比祖先更加复杂的文化。但是，除去骨骼上的差异外，智人的演化史上究竟发生了什么与众不同的故事呢？

考虑到人亚族演化过程中体重有所增加，人亚族大脑的增大实际开始于约50万年前，而且尼安德特人和智人都曾经历这一过程。然而，DNA的对比显示，尼安德特人和智人在一些重要方面发生了分化，特别是神经系统的发育和功能。这些基因突变中的一部分，至今仍存在于现在的大多数人身上，这表明，这些突变受到了强烈的正向选择。

正是通过孩子，我们才真正成为人。 ——让-雅克·于布兰，2017

人们已经发现的突变里，有一些关系到胎儿的大脑皮层发育，另一些则参与神经元连接形成或与神经冲动传导的基因有关。此外，还有在语言和说话能力的习得过程中非常活跃的FOXP2基因。尼安德特人和智人拥有的FOXP2基因为同一版本，且与祖先的不同。不过，智人身上出现了调节FOXP2基因表达的突变，在语言演化过程中，它或许发挥了某种作用。

胎儿和儿童各个发育阶段延长，是智人演化史上的关键事件。在一些古人类学家看来，这一转变实际上是一种形式的“幼态持续”。所谓的“幼态持续”，指的是生物个体在性成熟后仍然保留幼年特征。这种现象，在许多物种的演化过程中屡见不鲜，但在智人中，实际上并没有真正意义上的“幼态持续”。不过，胎儿的早产和幼年的延长，极大地提高了我们的学习能力，这在我们的演化史上产生了重大的影响。鉴于我们一生都保留着幼年的行为，比如难以满足的好奇心和对游戏的喜爱，说我们是“幼态持续”倒也站得住脚。

既是智人又是现代人！

接下来，非洲的智人一点一点获得了与我们相近的特征：骨骼较轻，颅骨较圆且稍小，面部缩小且更扁，下巴因颌骨和面部变小而显得突出。这些变化不是同步出现的：在面部获得更为现代的形态之后很久，颅骨才具有了现在的样子。

智人诞生的具体细节尚不可考，那能否至少明确智人哪里与祖先不同并确定使其成为新物种的特征呢？根据以往的经验，这并不复杂：只要注意到我们独有的特征在骨骼化石上的出现或消失即可。

不过，事情可没有那么简单！首先，尽管我们是“现代人”，但我们依然保留了一些原始特征，比如凹陷的眼眶使颧骨突显，而这些特征早就出现在南方古猿身上了！与此相反，尼安德特人颧骨较平的脸孔倒是个新的特征（即“衍生特征”），也就是说更“现代”！此外，最初的智人留下来的化石不但少之又少，而且同其他人亚族物种的化石一样，往往兼具原始特征和衍生特征。

这个问题或许看上去无关紧要，但其实牵连甚广。实际上，当人们试图制定原始智人颅骨与现代人颅骨的区别标准时，就有将某些现代人颅骨排除在现代人范畴之外的风险，最终导致人们毫无根据地判定不同种族的现代性。过去种种将人类分门别类的尝试导致了什么后果，我们都很清楚（参见第177页《人类种族存在吗？》）。

其实，并不存在公认的能将智人与其他人属物种区分开来的智人定义。自从DNA分析揭示了智人曾与尼安德特人和丹尼索瓦人杂交以来，这个问题变得更加复杂。实际上，一些古人类学家认为，应当扩大对我们所属物种的定义范围，将曾与智人杂交的全部物种涵盖进来。这种做法回归了生物物种的严格定义（物种是“互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合”）。这些古人类学家主张将智人、尼安德特人、丹尼索瓦人归为同一个物种，这个物种下还将包括智人、尼安德特人、丹尼索瓦人的共同祖先海德堡人，甚至直立人。

在动物界，杂交是平常现象，比如，由共同祖先新近分化而来的两个姐妹物种之间往往存在杂交现象。在大多数情况下，杂交后代的生殖能力较低或根本不能生育，由此导致两个物种难以融合或不能融合。不过，杂交可能性的存在，并不妨碍动物学家将物种区分开来，尼安德特人和智人就属于这种情形。

此外，如果真的将我们这一物种的定义扩展至涵盖全部人属生物，那这个壮大的物种将具有比目前的智人或任何其他灵长目物种高得多的变异性。为了区分不同形态的人类，就得创造同等数量的亚种，这可一点儿也没有简化人属的“术语库”！所以，大部分古人类学家都认为，应当将智人这个名称保留给解剖学意义上的现代人。

伊甸园

众多研究人员认为，原始智人和现代人之间存在不连续性。根据他们的观点，人类演化史上应当有过“瓶颈期”，也就是导致物种多样性显著降低并改变物种演化路径的人口数量锐减期。

这些研究人员的主要依据是，最古老的智人骸骨彼此之间差异非常大，而且与现今的人类相比更加多样化。同样，人类历史悠久，但人类的基因多样性却没有预期的那么高，人口数量可能是造成这种情况的原因之一。在大约20万年前至15万年前，或许是巨型火山喷发而引发的极端天气，导致智人陷入了繁衍的瓶颈期。智人的数量或许从接近1万锐减至寥寥数百。有些人甚至精确提出，我们的祖先随后逃到了非洲的最南端，那里当时属于地中海气候，环境条件更加宜居。

也正是在南非，我们发现了智人在大约7.5万年前生活的遗址，这些遗址颠覆了我们对智人行为和能力的看法。滨海的布隆波斯（Blombos）洞穴里出土了为数众多的物品，类似物品通常被视为年代更近的人属物种所特有。当时生活在这个地区的智人善于利用海洋资源。他们用骨头或石头制作的尖锥捕杀登上海岸的海狮；他们采集贝类并在贝壳上钻孔，很有可能是为了制作项链；他们还在石头上刻画几何符号，这些符号也是人类历史上最古老的象征或美学作品之一。

另一个证据来自现代人的线粒体DNA（mtDNA）。在对比了全球各地采集的线粒体DNA后，人们发现，现代人的线粒体DNA来自生活在大约20万年前（后重新测定为距今17万年至10万年之间）的共同祖先。换句话说，我们或许能够追溯到全体人类的祖先了！至少，当上述研究结果在1987年公布于世时人们是这样宣称的。很快，这个共同祖先就被冠以“线粒体夏娃”的绰号。随后，对Y染色体DNA的分析研究让人们找到了生活于大约14万年前的“亚当”。

实际上，即便线粒体夏娃将她的线粒体遗传给了全体现代人，她也并非所有人类的祖先，也不是第一个女性智人。与线粒体夏娃同时代的其他女性也属于我们的直系尊亲，只不过她们的线粒体在她们通过儿子而非女儿参与种族繁衍的过程中被清除了。线粒体夏娃的唯一特殊性，在于她是我们现在可以通过母系血统追溯到的唯一女性。尽管如此，线粒体夏娃还是证明了人类非洲起源的唯一性，在非洲也观察到了最为多样化的线粒体。Y染色体亚当也是一样，他确定了人类的父系血统，当然了，这一血统也来源于非洲。

线粒体DNA

线粒体DNA指线粒体内含有的DNA。线粒体存在于大部分细胞内，是细胞内部化学反应所需能量的制造过程所不可或缺的细胞器。线粒体DNA的特殊性在于，它只能通过女性一代一代传递下去。实际上，在受精时，精子的线粒体会消失，只有来自母亲的线粒体会遗传给后代。因此，通过分析线粒体DNA就能追溯物种的母系血统。

对于人类，同样可以跟踪Y染色体携带的DNA，因为Y染色体仅能通过父系遗传。

在实际研究中，分析的对象是男性单倍群（haplogroup）或女性单倍群，即Y染色体或线粒体含有的DNA的特定片段，人们会对现代人或化石的单倍群的DNA序列进行对比。

这便是所谓的“走出非洲”模型或“伊甸园”模型，得出这些研究成果的研究人员和报道它们的记者显然受到了“伊甸园”这一《圣经》用词的启示。《新闻周刊》（Newsweek）杂志曾经以《追寻亚当和夏娃》为标题出刊，并配有一对非洲黑人夫妇的插图，这幅插图可把有些读者给惹恼了！

不过，即便这种方式能够回溯人类的历史并确定人类的起源，也未必就能确定智人曾经有过人口危机。其实，所谓的瓶颈期可能是文化层面上的，比如说，某些智人是否比其他智人更倾向于过游民生活（并在接下来的人类历史中发挥极为重要的作用），或在种群繁衍上取得了更大的成功。

起源问题

现存人类种族之间在外表上的差异曾使史前史学家提出人类多重起源假说，即每个“人种”——黑种人、白种人、黄种人——分别是一种猿的后代（参见第177页《人类种族存在吗？》）。不过，这种观点与现代进化理论并不相符。其实，随着时间的推移，来自同一祖先物种的多个姐妹物种会变得越来越不一样，以至于最终不能彼此交配繁衍后代。尽管类似的生活方式偶尔会使不同物种产生类似的外表或行为，但这种趋同现象并不能让它们彼此融合或形成单一物种。如果存在多种猿类且每种猿类分别演化形成了一个拥有巨大脑袋的双足行走的亚种，那么这些亚种之间的差异会比父代物种之间的差异更大。这些亚种也不会彼此融合为新的单一物种，因为经过数百万年的分化后，这种杂交已经不具有基因上的可能性。所以，黑猩猩和猩猩不能互相杂交，它们的后代也不能。

到了20世纪60年代，人类“多重起源”观点卷土重来，不过这次的形式不像之前那么极端了。有些人认为，原始人在一两百万年前出现在非洲，随后逐渐散布到整个欧亚大陆，并在各地形成了当今世上的“各大人种”。他们以“枝形烛台”模型来解释这种假说。源自非洲的现代人散布到世界各地以后，通过多次杂交一点一点地将当地的原始人变成了现代人。

智人多地起源假说——“枝形烛台”模型

有些中国古人类学家持这种观点，并得到了一些美国和法国古人类学家的支持。这些中国古人类学家一心想要证明，亚洲人自古以来便扎根于亚洲，并没有接受来自非洲的现代人基因或其他有限的外来基因。所以，1978年于中国陕西发现的距今26万年的大荔人头骨被他们描述为“沿着亚洲连续的演化世系”从原始亚洲直立人衍生而来的原始直立人。他们的依据是颅骨的一些解剖特征，但是这个假说里也含有其他考量。

另一些古人类学家则支持与此相反的人类“单一起源”假说。他们认为，人类是在更晚近的时代（距今不到10万年）走出了非洲，而且仅仅经历了短期的“隔离”。这就是所谓的“走出非洲”模型，现代人和化石的DNA分析结果大都支持这个模型。由于欧亚大陆上的原始人类种群都被走出非洲的现代人所替代，所以这种模型也被称为“替代假说”。

枝形烛台模型偶尔也会再次被推到台前。2006年发现的属于印度直立人的讷尔默达头盖骨化石，就曾被称为“印度现代人的可能祖先”。

同样，研究者针对最近在印度马索尔出土的年代非常久远的工具提出假说，认为它们由某种亚洲猿类的后代制造。在生物学层面上，很难想象人属居然出现在数百万年来与人亚族生物分隔两地的另一科灵长动物中。

智人单一起源假说——“走出非洲”模型

在人们发现了晚期智人（现代智人）曾经与走出非洲的第一批原始人的后代杂交后，单一起源观点也稍稍恢复了一点生命力。但是，即便能够解释某些解剖学特征或遗传学特征，这些杂交史曾发挥的作用似乎非常有限。来自其他物种的大多数基因都经过了严苛的筛选。

第七章　征服地球

在大约10万年前，智人“在解剖学层面上已经具有现代特征”，也就是说，他们的骨骼在各个方面上都与现代人的骨骼相似。正是在这个时期，智人走出非洲。这一次，在征服全球之前，他们不会停下自己的脚步！在这一过程中，智人将遇到另外一些人属物种，它们在智人之前便生存在地球上，并曾按它们自己的方式演化。

从非洲到美洲

紧随着不计其数的其他人属物种的脚步，智人也扩大了自己的狩猎范围并走出了非洲。迄今发现的最古老的智人遗址中，有以色列的斯虎尔（Skhul）洞穴和卡夫泽（Qafzeh）洞穴，其年代为大约12万年前至8万年前。智人曾在这两个洞穴里居住并埋葬死去的同伴。洞穴里发现了成人和儿童的骸骨，还有鹿角。一些骸骨曾用赭石上色，说明下葬时举行了葬礼。洞穴里出土的钻孔贝壳则被视为最古老的装饰品之一。

智人或许过去曾路过这里，或者来到这里的时间比我们想象的更早。2018年在以色列米斯利亚（Misliya）洞穴中发现的距今约18.5万年的半块颌骨似乎恰恰说明了这一点。另外，基因方面的数据也令人猜想，智人曾在距今20万年至10万年间数次离开非洲。然后，在大约7万年至6万年前，更庞大或者说更成功的一拨移民离开非洲并远渡重洋，在人类历史上首次抵达了澳大利亚和美洲的海岸。

一旦到了中东，就没有什么能够阻挡智人继续向东迁移的脚步了（尽管他们只留下了寥寥无几的迁移痕迹）。在历史上，他们必然曾经多次走过这条路。除此以外，还存在其他可能的迁移路线，比如取道直布罗陀，不过这条路线似乎在较晚的时候才被启用。另一些智人走的是“南路”，经由红海最窄之处的曼德海峡前往阿拉伯半岛，接下来，在横渡波斯湾以后，就有可能沿着海岸抵达印度和东南亚。当时的海平面比现在低，印度尼西亚的大部分地区都可以经陆路到达。

为什么人类再次踏上探索世界的征途呢？有些史前史学家提出假说，认为智人的这次迁移与印度尼西亚多巴火山的喷发有关。他们猜想，在大约7.5万年前，多巴火山的灾难性喷发导致了全球气温显著降低并且持续了很长时间。不过，无论是在火山喷发的年代上还是在火山喷发对环境和智人演化的实际影响上，这个假说的争议都非常大。

每次迁移事件，既不是单个猎人的个人行为，也不是整个种族的全员外逃。踏上迁徙之路的是规模不大的群体，每次只有几十个人，通常认为只有25人，差不多是6户人家，这也是以狩猎和采集为生的族群的通常规模。大多数踏上迁徙之路并走出非洲的族群无疑已经灭绝了。在中国发现的一些遗址，是智人早就到来的见证，不过，这些智人随后就灭绝了，没有留下子孙后代。

但是，另一些族群却繁衍壮大，成了今天人类的祖先，因为我们每个人或多或少都遗传了他们的某些基因。特别是，在我们的线粒体DNA内就能找到它们的踪迹（参见第123页《线粒体DNA》）。为此，人们定位了单倍群（即特定的DNA片段）上基因突变的准确位置。这些突变数量繁多，因区域而异。通过对比突变的序列，就能根据智人种群随着时间推移散居世界各地的情况来追溯突变的历史。研究发现，L3线粒体单倍群是由一个更加古老的单倍群发生突变后于8.4万年前出现在非洲的。人们在非洲发现了多种多样的单倍群，其中就有L3单倍群。世界其他地方的L3单倍群都是由非洲的L3单倍群衍生而来的，最初的变体出现在大约6.3万年前。换言之，现在非洲以外的所有人类都是一个携带L3单倍群的非洲智人种群的后代。

2015年，在湖南道县遗址出土了智人的牙齿，人们由此猜想智人或在大约10万年前至8万年前就已经来到了中国，尽管这个时间仍有争议。不过，一些智人确于大约6.5万年前至5万年前抵达了澳大利亚，他们想必是划着用树干挖成的独木舟漂洋过海而来的。或许，是雷暴引燃灌木丛产生的烟雾吸引了智人远渡重洋来到这块新的土地上？

再往北，来到了东北亚的智人也在大约1.5万年前趁着海平面降低徒步穿越了白令海峡。他们在抵达了彼时正处于冰川期的美洲后是怎么在恶劣的环境中继续探索之路的，我们不得而知。或许他们取道了两块大陆冰川之间的一条走廊？又或者，他们沿着海岸航行直到发现了较为温暖的海岸？无论如何，他们在南方发现了广阔无垠的处女地和数不胜数的猎物：有乳齿象（美洲的一种猛犸象），还有大群的野牛。这些智人就是后来的古印第安人，能制造燧石工具或精细切割黑曜石，他们的文化以美国新墨西哥州克洛维斯（Clovis）村的名字命名为克洛维斯文化。

他们中的一些人继续探索，直到抵达巴塔哥尼亚（位于今天的阿根廷）和火地岛（现在的南美大陆最南端的群岛）。有些史前史学家提出，他们迁移并定居这里的年代更早，应在大约3万年前。另外一些人甚至认为人类抵达美洲的时间还要再早，依据是在大约13万年前被石块砸开的乳齿象骨骸。

在东扩良久以后，智人于距今大约4.5万年的时候开始西征。彼时的欧洲正处于最后一个冰期，恶劣的气候或许减慢了智人的脚步，相比之下，亚洲南部的环境更加接近他们所熟悉的生活环境。一些研究人员认为，当时生活在欧洲的尼安德特人成了阻挠智人西征的另一个“障碍”，好在尼安德特人数量稀少，智人能够轻而易举地跨越这个“障碍”。

迁徙造就智人

如今，在我们身上的每个细胞和每个分子里，都能找到演化的痕迹。 ——弗朗索瓦·雅各布，1981

自大约10万年前起，尼安德特人也曾在近东地区活动，有时候甚至与智人生活在相同的地点。尼安德特人和智人制造相差无几的工具，也都有埋葬逝者的习惯。很有可能，这两个物种的男男女女就是在这个地区邂逅彼此并生儿育女的。其实，尼安德特人和现代人的基因组对比显示，不同人属物种之间曾经杂交繁殖，而这导致了物种之间的基因交换（参见第140页《史前DNA》）。

今天，非洲以外的智人携带着1%至4%的尼安德特基因。由于每个人携带的尼安德特基因不完全相同，遗传学家斯万特·帕博（Svante Pääbo）估计，尼安德特人基因组的20%至40%仍在我们体内延续。反过来，尼安德特人体内也有来自智人的基因。不过，尼安德特基因在智人基因组中比例很低的事实说明，尼安德特人和智人并未发生普遍的融合。或许，二者的杂交后代繁殖力低下，阻止了“外来”基因在物种中的扩散。

现代非洲人的基因组里没有这些尼安德特DNA，这就说明两个物种的种间杂交发生在智人走出非洲、移居欧亚大陆和美洲之后。留在非洲的智人未曾遇到尼安德特人，即便后来有些尼安德特基因通过从欧洲向非洲回迁的智人传到了北非。

进入智人体内的新基因随后发生了突变并改变了序列，进而变得与尼安德特人的初始基因有所不同。通过研究突变的数量，就能够确定杂交发生的年代。研究结果表明，杂交可能发生在大约10万年前，那时两者在近东地区比邻而居；抑或是在大约6万年前至5万年前，原先留在非洲的智人最终走出非洲之时。因此，根据4.5万年前生活于西伯利亚的一个智人的DNA研究结果，他的先祖曾在他出生前1.3万年至0.7万年就已经经历过杂交。罗马尼亚的欧亚瑟洞（Pestera cu Oase，意为“骨头洞”）里出土了生活于约4万年前的智人骸骨（这是欧洲已知最古老的智人），他体内的尼安德特DNA占比是现代人的3倍；从他往前追溯就能发现，他4至6代前的祖先还是尼安德特人！不过，这个智人似乎没有留下后代，因为现代人的基因组里已经没有了他的遗传特征。

人属物种的种间杂交繁殖不止于此。丹尼索瓦人也曾将它们的一些基因传给了智人，这些智人的后代后来移居到了澳大利亚、巴布亚新几内亚和菲律宾。在亚洲大陆居民和美洲原住民的体内也发现了丹尼索瓦人的基因，不过数量很少。更加惊人的是，遗传学家在丹尼索瓦人的基因组里发现了未知DNA的遗存，据猜测，这些未知DNA来自更加古老的人属物种，可能是亚洲直立人。同样，一些非洲民族的DNA里携带着明显源于其他依然不为人知的人属物种的序列，这些人属物种应当是在距今70万年时与海德堡人的先祖分道扬镳，最终在距今3.5万年时灭绝。

人属物种的杂交

上述杂交对智人可能是有利的，比如杂交使智人能够更快地适应高纬度地区更寒冷的环境。智人没有等待自然选择去利用偶尔发生的有利突变，而是直接利用了其他物种中经过数十万年的演化逐渐获得了必要适应性特征的既有基因。这种有用基因（或其等位基因）在物种间转移的现象，被称为适应性基因渗入。如果突变产生的新等位基因在种群中频繁出现，我们就视之为正突变。

我们的祖先利用了其他物种的这种“非自愿援助”，特别是作用于皮肤、免疫系统和消化系统等方面的基因。比如，在杀灭病毒过程中发挥作用的stat2基因就是尼安德特人送给我们祖先的。直至今日，在欧亚大陆，10%的人仍携带这个基因，而在美拉尼西亚这一比例还要更高。尼安德特基因的引入，使我们的祖先能更好地抵御他们在非洲没有遇到过的不同微生物引发的感染。Toll样受体（Toll-like receptors，TLR）属于免疫系统蛋白质，至今仍奋战在抵抗细菌和寄生虫入侵的最前线；而在为Toll样受体编码的基因中，有两个源自尼安德特人，一个与丹尼索瓦人的基因类似。

中国藏族人似乎从丹尼索瓦人那里获得了有助于适应高原生活的基因。在寒冷的环境中，棕色脂肪组织能够产生热量。居于中国南部的纳西族以及生活在西伯利亚东北部的雅库特人和鄂温人都拥有在棕色脂肪组织的发育中发挥作用的TBX15基因，而这个基因也来自或许非常适应冰川气候的丹尼索瓦人。

我们每个人身上都带着尼安德特人的痕迹。 ——斯万特·帕博

我们DNA的一些区域受到尼安德特基因渗入（即基因转移）的影响甚小，要么是因为尼安德特基因提高了不育的风险，要么是因为尼安德特基因的存在会由于形态上或社会上的原因导致负向选择。X染色体携带着与男性生育能力有关的重要基因，它含有的尼安德特DNA微乎其微，似乎种间杂交产生的变化都已被自然选择所抹去。基因的携带者生殖能力较低的话，就不能将自己的性状遗传下去——自然选择往往就是这么简单！同样的，对语言能力至关重要的FOXP2基因区域里也没有来自尼安德特人的基因。可以想见，携带这种尼安德特式突变的智人将失去舌灿莲花的能力，也就很难找到另一半了（不过我们没有任何证据）。

另外，并非所有来自尼安德特人的基因都大有用处或不再有用。SLC16A11基因来自尼安德特人，它的等位基因与罹患糖尿病风险的升高有关，在美洲原住民身上非常常见，在亚洲人身上也有发现。不过，这个基因在尼安德特人身上具有什么功效，我们就不得而知了。

史前DNA

1997年，遗传学家斯万特·帕博与同事完成了人类历史上首次尼安德特人DNA片段测序（参见第10页《DNA、基因、突变》）。自此以后，对古代DNA的分离与提纯技术取得了长足的进步。2010年，斯万特·帕博与同事分析了3个生活于距今约4万年的尼安德特人的基因组，证明了现代人的细胞内存在尼安德特人的DNA。2016年，DNA分析确认了西马·德·洛斯·乌埃索斯骸骨坑内发现的可追溯至距今43万年的骸骨实为前尼安德特人，并确定了它们的起源。

在人类中，据估算每个核苷酸每年的基因突变率约为0.5×10^-9。根据两个基因组之间的差异，可以计算出二者开始分化的时间。显而易见，估算结果只是近似值，不过可以借助化石的年代加以校准。

对古代DNA进行分析还能获得人口方面（通过每个基因的等位基因的多样性）和社会方面（比如给定社会里的近亲结合程度）的信息。

旧石器时代晚期的文化

在距今大约4.5万年，晚期智人来到了欧洲。在同一时期，工具的制造发生了重大变化，从尼安德特人（及较古老的智人）的莫斯特文化过渡到了奥瑞纳文化（Aurignacian）。对史前史学家而言，人类文明从旧石器时代中期过渡到了旧石器时代晚期。

智人发明了新的切削技术，可以将石核加工成大量细长的船底形石叶或小石叶。他们制造了多种多样的工具，比如刮削器、端刮器、石锥、雕刻器等等。此外，智人还用硬质动物材料（如骨头或象牙）制作标枪枪尖用于狩猎，史前史学家由此观察到了人类与其他生物的决裂；这些学者认为，尼安德特人不使用硬质动物材料制造武器，因为它们不愿使用以动物身体材料制成的武器猎杀猎物。

这些新欧洲人依然以狩猎采集为生。根据在目前仍以打猎和采集为生的极少数部落（如卡拉哈里沙漠的桑人或亚马孙流域的美洲原住民）中观察到的结果，可以猜想那时候只有男人猎杀大型动物。最为常见的猎物是驯鹿，不过人们也发现了大量其他动物，如马、原牛、盘羊、犀牛、猛犸象等，各遗址发现的动物都有所不同。女人则捕捉小动物（如蜗牛、蜥蜴、鸟等），采集鸟蛋，捡拾贝壳。此外，她们还会采集各种植物、块茎、可食用块根、野果、蘑菇等。尽管打猎提供了大量的肉类和脂肪，但女人的采集收获往往在智人的食物中占据较大的比例。

各个地区和时期的工具、武器和日用品有所不同。根据史前史学家的划分，欧洲先后出现了以下文化。

奥瑞纳文化（距今4.5万年至2.6万年）：将燧石切割成狭长石叶的技术已经普及，用木头或鹿角制成的“柔软”手锤也被普遍使用。与石质手锤相比，木质手锤或鹿角手锤精度更高，智人可以用它们敲打燧石块制造石片。人们还发现了用牙齿或贝壳制成的首饰。人类历史上最古老的小雕像也诞生于这个时期，比如德国福格尔赫德出土的动物牙雕或者霍伦斯坦因——施塔德尔洞穴发现的狮子人牙雕。或许，狗的驯化也可以追溯到这个时期。

属于奥瑞纳文化的工具

牙雕小马（德国福格尔赫德）

格拉维特文化（Gravettian，距今2.7万年至1.9万年）：工具以带柄长直石叶为典型代表。在遗址里发现了被称为“维纳斯”的女性小雕像，雕像往往造型非常夸张，可能是生殖力的象征，比如在奥地利发现的维伦多夫的维纳斯和在法国朗德省发现的布拉桑普伊（Brassempouy）妇人小雕像。

维伦多夫的维纳斯（奥地利，距今2.5万年）

梭鲁特文化“月桂叶形”燧石叶

梭鲁特文化（Solutrian，距今2万年至1.6万年）：在这个时期，生活于法国和西班牙的智人制造细长的“月桂叶形”燧石叶，并采用压制法而非锤击法加以精修。最大的石叶可能用作装饰或象征威望。他们还发明了投掷器，能以较高的准头将标枪投射至很远的距离。在这个时期的遗址里，还发现了欧洲历史上最早的骨针。

马格德林文化（Magdalenian，距今1.7万年至1万年）：马格德林文化分布甚广，且有多个变体，从葡萄牙至波兰皆有发现。这个时期的工具愈加精巧且多样，出现了用作箭头的三角尖形器。当时的智人能用骨头或象牙制作鱼叉，还能制作鱼钩。他们还用驯鹿角制成“穿孔棍”，或许是用来将受热弯曲的木制标枪矫直，抑或是用来拉紧帐篷上的绳索。他们还制作了乐器，比如用鸟骨做的穿孔骨笛。

某些属于这个时期的遗址反映了当时人类的生活面貌，不过我们却很难将这些人与史前时期挂钩。俄罗斯的松基尔（Sungir）遗址可追溯至距今3.2万年。在这个遗址里，埋葬着一个成年男人和两个青少年的骸骨。下葬的时候，他们身穿兽皮衣服，上缀数以千计由猛犸象牙雕成的珠子，每颗珠子的制作都得花上至少一个小时的工夫；腰缠饰以狐狸犬齿的腰带；还戴着象牙手镯、贝壳项链和垂饰。墓穴中还摆放了象牙标枪、武器和小雕像作为陪葬品。这些惊人的财富说明了墓穴中的三人生前拥有很高的社会地位，也说明了他们生活在一个组织严密、阶级分明的社会里。DNA分析结果显示，这三个人有亲缘关系，但并不是直系亲属。

克罗马农

在当代人的想象里，“克罗马农”几乎是“史前人类”的同义词。实际上，克罗马农是法国多尔多涅省韦泽尔山谷中的一个天然洞穴的名字。1868年修建公路时，人们在洞穴中进行挖掘，发现了5个人的骸骨、石质工具和动物骨骼，之后，史前史学家路易·拉尔泰（Louis Lartet）对其进行了描述。

这处遗址是个墓葬，共埋了5个智人的骸骨，其中3个男人、1个女人、1个儿童，年代为大约2.8万年前。3个男人中，一个身高接近1.8米，肌肉极为发达。由于他的牙齿已经全部掉光，人们给他取了个绰号叫“老头”，不过他死亡的时候可能只有50来岁。一同出土的工具则属于奥瑞纳文化。

由于这些化石名气甚高，“克罗马农”这个名字便在很长时间里被用来指称生活于距今4.5万年至1万年间的旧石器时代晚期的人类。如今，人们多使用“智人”或“解剖学意义上的现代人”这两个名称。

与之前的时期不同，旧石器时代晚期出现了大量描绘动物的作品，或涂或刻，以各种材料为载体。男人（或女人）雕刻木头、骨头和象牙，并在岩壁上涂画壮观的壁画。尽管尼安德特人似乎也曾作画，但岩画创作在旧石器时代晚期变得更加频繁。

然而，绘画风格并无显著发展。肖维（Chauvet）岩洞的壁画创作于大约3.5万年至3万年前，远早于创作于距今1.7万年的拉斯科（Lascaux）洞穴岩画，但前者所表现出来的智力水平和艺术才能与后者完全相同。尽管欧洲最先发现并研究了岩画，但岩画艺术并非欧洲独有。在印度尼西亚的苏拉威西岛多个洞穴的岩壁上，发现了距今4万年时画上去的手印和动物。有可能，生活于当时人类疆域两端附近（从西欧到澳大利亚）的智人独立完成了各自的第一批艺术作品。不过，也有可能，岩画创作只是随着智人移居世界各地时传播开来的一项古老传统。

澳大利亚原住民素有在峭壁上和不深的洞穴里绘画的传统，而且将这个传统延续至今。他们会定期翻新古老的作品，所以无法准确确定作品的初创时期，不过，画上沉积的赭石和黑赤铁矿石可追溯至距今5万年至4万年。也许有一天，我们会发现澳大利亚第一批居民的画作呢。

在他们的作品里，有些描绘的是关于人类起源的原住民神话，有些讲述的是他们群体生活的某些场景。各地的岩画或许具有不同的含义。欧洲的岩画以动物为主角，并配有各种几何符号、手印和女阴，人的形象少得可怜。某些岩画似乎与狩猎有关（比如拉斯科洞穴岩壁上受伤的野牛），但狩猎并不是非常重要的创作主题。岩画上的动物中有狮子和鬣狗，不过它们并非用来食用，而作为主要猎物的驯鹿，出现的数量却少得可怜。

肖维岩洞石壁上的原牛、马和犀牛（法国，距今3.3万年）

一些洞穴的污泥中留下了脚印，比如法国的佩什梅尔（Pech Merle）洞穴或蒂克·德·奥杜贝尔（Tucd’Audoubert）洞穴，脚印的大小说明曾有年轻人进来过，可能是为了进行启蒙教育。尽管我们提到的往往都是男性“艺术家”，但是，根据岩壁上的手印（以嘴吹赭石的方法绘制），女性似乎也参与了岩画的创作。

新人类？

旧石器时代晚期的艺术作品突出表现了智人生活的巨大变化：他们探索的疆域远超前辈曾经抵达的边界。与此同时，由于新技术或新文化习俗的出现，日常用品的制作也迅速发生了改变。而在过去的几十万年里，制作技术未曾有过大的变动。

这些翻天覆地的变化是智人过往历史的简单延续吗？还是说，智人的演化经历了一次质的飞跃，否则该怎样解释这种突飞猛进呢？人们猜想，在大约5万年前至4万年前，智人的创造能力和语言能力由于脑组织结构的改变而提高，进而引发了一场迅速席卷全球的“人类革命”。

然而，智人突然之间取代尼安德特人，成了在欧洲发生的主要变化。如果摒弃传统史前史学的欧洲中心论，同时以同样的重视程度审视世界的其他角落，就会发现亚洲和非洲所经历的是渐进式的过渡。在过去，一些信号被视为从尼安德特人的旧石器时代中期向智人的旧石器时代晚期过渡的标志；而近些年来，不计其数的考古发现否认了它们与此过渡进程的相关性。在奥瑞纳文化诞生前，生活在非洲的智人就已经在制造骨质尖状器了，还能用针缝制衣物，佩戴项链或其他饰品，以及在洞穴岩壁上作画（参见第122页提及布隆波斯洞穴的段落）。

上述两种模型并非截然不同。智人的很多新行为，其实在过去就已出现，只是形式没有那么丰富罢了。显然，在深入地下洞穴绘制无与伦比的岩画前，肖维岩洞里的创作者曾花费数年光阴在洞外学习绘画技术、改进绘画姿势，但是他们的学习过程并没能保留下来。同样，虽然他们的前人也没有留下任何遗迹，但他们的行为或许只是在延续一项非常古老的传统。

晚期智人

我们偶尔会用Homo sapiens sapiens（即“晚期智人”）这个称呼，不过，重复两遍sapiens（本义为“聪明的”），不但累赘，更显自负，那为什么会起这么个名称呢？在原则上，拉丁文三名法用于物种的亚种；所谓的亚种，指与同一物种的其他种群存在地理隔离且表现出不同特征的种群。人们假定（或者已经证实），被称为亚种的种群可与同一物种的其他种群互相交配并繁殖可育后代。“亚种”的说法有时很实用，尽管“种”的概念本身已然很复杂且有争议。

在古生物学上，往往很难赋予化石物种精确的种名，亚种的定义也就没有任何意义，因为无法证实已经灭绝的动物是否能够互相交配并繁殖可育后代。不过，在考察物种时，不但要从空间的维度考虑，还要从时间的维度考虑；亚种的概念，不但有助于凸显化石之间的相近性，还有助于设想它们之间存在直接亲代关系。不过，这么一来，就要考虑不断变化的物种定义的问题。而在此基础上，还要考虑亲代关系的问题；但是，由于通常情况下根本无法建立亲代关系，所以演化分类时不将其纳入考虑。

史前史学家引入智人这个名称，是为了与尼安德特人做区分；那时的学界还将二者视为同一物种。当时，尼安德特人被称为尼安德特智人，而将尼安德特人变成智人的近亲，也算是为尼安德特人“正名”。今天看来，尼安德特人和智人之间互相交配并繁殖可育后代的能力似乎非常有限，仍将二者归为同一物种已成无稽之谈。所以，我们将二者加以区分。

不过，一些古人类学家意欲将智人分为早期智人和晚期智人（现代智人）两个亚种。所以，埃塞俄比亚赫托发现的可追溯至距今16万年的颅骨被命名为长者智人（Homo sapiens idaltu），这个名称说明他与解剖学意义上的晚期智人相近但有所区别。长者智人被视为罗德西亚人和智人的过渡种。长者智人，尽管字面意思似乎已经非常明确，但其定义并不明确：长者智人在何时变成晚期智人？判断标准又是什么？

如果长者智人向晚期智人的转变非常迅速，比如经历了生物学和文化两个方面的质的飞跃，那或许能够确定转变发生的年代和方式。

第八章　史前时代的结束

随着最近一次大冰期的结束，气候再次改变了人类的演化历程。新的文明，也就是我们现今的文明，取代了旧石器文明。正是在这一时期，人类开始改变环境：森林变成了农田，奶牛替代了原牛。在大约1万年前，当最初的牧民开始建造最初的村落时，我们生活的这个世界诞生了。

中石器时代

大约1.5万年前，全球气候开始变暖。尽管有过最近一次突然袭来的大冰期，全球变暖仍在1.2万年前变成常态（我们现在仍处于温暖的间冰期）。几百年间，地球平均气温升高了8摄氏度，大气也变得更加湿润。撒哈拉沙漠成了稀树草原，欧洲则森林遍布。巨大的冰盖融化产生的水涌入海洋，导致海平面上升了120米。

在中石器时代，以打猎和采集为生的智人适应了与其生活在旧石器时代晚期的祖先大相迥异的生活条件。较为温暖的气候深刻地改变了地球的面貌。冻原和荒原消失不见，松树林和橡树林先后取而代之。一些动物，比如原牛和马，适应了新的生活环境；另一些动物则消失了。驯鹿迁往北边，猛犸象从此灭绝，取而代之的是鹿、野猪和野兔。比起之前的冰期，野生动物更加丰富多样，这使我们的祖先得以长时间定居在同一个地方。

对于中石器时代猎人的生活方式，我们知之甚少，因为当时的环境条件不利于遗址的保存。不过，我们还是发现了重大的文化变迁。当时的智人能将石头加工成主要用作箭尖的“小石叶”。由于在森林里弓箭比投掷器更加实用，所以弓箭的使用相当普遍。在法国，岩画艺术似乎走向了倒退；在西班牙，却诞生了新的岩画风格，作者非常乐于在作品中表现人的形象。

在海边，贝类采集几乎具备工业规模，堆积在海岸上的贝壳就像一座座沙丘。他们还用编织的渔网或捕鱼篓捕鱼，建造独木舟在江河湖海上航行。也是这个时期，人类首次定居在科西嘉和克里特等地中海岛屿。

西班牙东部的岩画作品（中石器时代）

大型动物的灭绝

在冰期结束时，大量物种灭绝，尤其是那些被归为大型动物的物种，即体重超过45千克的动物。由于体形较大，它们在考古遗址中的消失是显而易见的。这次灭绝是全球现象，从欧亚大陆的猛犸象，到南美洲的大地懒，还有澳大利亚的袋狮，都未能幸免。

几十年来，两个灭绝假说一直针锋相对，那就是气候变化假说和人类活动假说。前者认为，气候变暖改变了植被状况。然而，食草动物往往比较专一，吃草的猛犸象不能改为吃树叶。驯鹿等物种已经北迁，以寻找可以接受的生存环境，但对于猛犸象和长毛犀牛来说，这是不可能的，因为气候变暖已经导致适合它们生存的寒冷荒原消失殆尽。

然而，上面这些并不足以解释全部的物种灭绝事件和灭绝速度。对于人类活动假说而言，单单看到人类到来和某个物种消失之间的模糊巧合是远远不够的，还要证明人类的的确确猎杀了这个物种。除此以外，还需要确定人类到来和物种消失的准确年代。如果物种灭绝在人类到来之前，那人类就与物种的灭绝没有任何干系。如果物种灭绝在人类到来之后，那人类在物种灭绝中负有责任的可能性就会增加，但这未必就是确凿无疑的事实。

体形大的物种往往繁殖率较低，而对繁殖率较低的动物而言，哪怕很低的猎杀压力也足以导致它们灭绝，牛顿巨鸟就是个很好的例证。牛顿巨鸟是生活在澳大利亚的一种不会飞的鸟，体重超过200千克。2015年，在200多个距今5.4万年至4.3万年的遗址上，发现了具有炭化痕迹的牛顿巨鸟的蛋壳。然而，要在蛋壳上留下类似的炭化痕迹需要很高的温度。因此，有些人认为，这些痕迹排除了仅仅是灌木丛起火这一种可能性。人类收集鸟蛋（或许还猎杀成鸟），似乎成了导致牛顿巨鸟灭绝的原因。另外，澳大利亚还生活着另一种名叫鸸鹋的善于奔跑的走禽。虽然人类也食用鸸鹋的蛋，但这种体形比牛顿巨鸟小很多的鸟并未灭绝。

在同一时期灭绝的物种还有重达半吨的巨袋鼠、重达2吨的巨袋熊和身长达7米的巨蜥（与科莫多巨蜥有亲缘关系，体形为科莫多巨蜥的3倍大）。它们或许不是被澳大利亚的第一批居民直接消灭的，但此间的巧合着实令人不安。

史前巨袋熊复原像

同样的故事也发生在许多岛屿物种身上，比如新西兰的恐鸟和马达加斯加的象鸟。同样未能逃过一劫的，还有北美洲的乳齿象及南美洲的雕齿象和大地懒。不过，雕齿象和大地懒的种群在人类到来之前就已经因为气候变化而变得脆弱不堪。

在人类定居于新发现的岛屿和大陆前，生活在那里的动物与人类从未有过接触。即便不像南太平洋的物种那样一动不动地看着水手靠近并杀掉自己，它们也毫不适应人类这个新型掠食者的狩猎技术。非洲和欧亚大陆的情形则与此不同，在气候变化中躲过一劫的物种没有再遭遇其他不测，最终存活了下来（直至现代人对它们展开了血腥的大屠杀，从鲸到犀牛都是如此；这里仅举几例大型动物）。

新石器时代革命

在一些地区，比如近东，中石器时代更像是个过渡期。在大约1万年前，生活在这些地区的智人渐渐转为定居，并用原生黏土建造了人类最早的房屋。他们依然像从前一样栽种作物，有豌豆、扁豆、小麦、黑麦，不过采用了更加系统化的栽种方式。他们制造了必需的工具——带有燧石刀刃的木柄镰刀，并挑选了最适应他们的播种技术或收割技术的品种。在打猎的同时，他们还开始饲养动物，先是盘羊和野山羊，然后是原牛和野猪，后面两个最后被驯化为奶牛和家猪。

在地中海东岸（包括以色列、黎巴嫩和现土耳其的一部分）及底格里斯河和幼发拉底河流域（叙利亚和伊拉克），考古学家发现了这些人类活动留下的无数遗迹。这个地区呈新月状，土地肥沃，物产丰富，因而得了“肥沃新月”的美称。稍晚以后，世界其他地方的智人经历了相同的过渡期，不过他们栽种的作物和饲养的动物都有所不同：作物有土豆、水稻或高粱，动物则有火鸡、羊驼或骆驼。

这种全新的生活方式与过去的截然不同，以至于人们将两种文化间的过渡期称为“新石器时代革命”。以打猎和采集为生的迁徙部落向以耕作和养殖为业的定居农民的转变尽管花费了数千年才完成，但是对自然环境和人类自身都产生了极为重大的影响。

在新石器时代，智人继续加工燧石制造“小石叶”，然后将小石叶挨个摆放整齐，用来制成镰刀和小刀的刀刃。此外，他们还制造石斧并对其进行打磨（新石器时代过去也被称为“磨制石器时代”）。再往后，他们用翡翠（一种在阿尔卑斯山脉发现的绿色石头）制作礼斧，而礼斧之后将在从西西里岛到爱尔兰的整个欧洲范围内流通。

他们早就知道怎样把黏土塑造成型，还懂得通过加热使其硬化。定居之后，他们制造了陶器以储藏谷物，这就降低了单纯依赖野生作物作为谷物来源的供应风险。不过，食用谷物也造成了一些后果。为了获得面粉，就需要磨碎谷物颗粒。妇女承担了这项任务。她们跪在地上，用石杵将谷物颗粒在磨盘上碾碎，一碾就是几个小时。长时间的碾磨在她们的骨骼上留下了痕迹，引起了脊柱和大脚趾变形。另外，臂骨结构说明她们的手臂肌肉和当今的划船冠军一样强健有力，而她们的脊柱由于头部长时间承受很大的负荷而发生了形变。

杰尔夫·阿合玛尔（Jerf el-Ahmar）遗址（叙利亚，距今9 000年）

随着时间的推移，村落里的人口数量逐渐增加。由于人们不再频繁迁移，生活垃圾慢慢地污染了水源。霍乱和斑疹伤寒等疾病变得愈加严重。与动物杂居一处，也成了寄生虫和细菌传播的重要原因。苍蝇和家鼠渐渐适应了这种对它们生存非常有利的环境，寄居于人类粮仓的老鼠则成了寄生虫和多种疾病的传染源。

在新石器时代，智人的牙齿饱受新食物之苦。由于唾液中含有淀粉酶，谷物中的淀粉自入口时便开始消化，消化产生的糖类导致龋齿。在这个时期的骸骨上，能观察到明显的龋齿数量的增加。

基因变化

我们或许会认为，比起数百万年的人亚族历史或数十万年的智人历史，仅数千年的新石器时代在人类演化过程中没有发挥任何作用。不过，在短暂的新石器时代里，人类经历的生活方式变化产生了强大的选择压力。

从体格上看，与祖先相比，新石器时代的智人身材较小，不过这似乎并不是基因演化的结果。生活条件的变化，比如较大的劳动强度（对于孩童也是一样）或虽然丰富但与机体不相适应的饮食，足以对此加以解释。

也正是在饮食方面我们观察到了显性基因变化。我们出生时能够制造乳糖酶，这种酶能分解母乳中含有的乳糖。在随后的发育中，我们将失去制造乳糖酶的能力。由于哺乳动物成年后原则上不再食用奶，肠道细胞便不再制造失去用处的乳糖酶。

新石器时代，农民饲养牧羊、山羊和奶牛，它们提供的鲜奶是有益食品。不过由于缺乏乳糖酶，成年智人不能很好地消化吸收鲜奶。在人类细胞里，负责制造乳糖酶的是LCT基因。大约8 000年前，生活在高加索地区的一个智人的LCT基因发生了突变。这改变了LCT基因的活性，使它在成年智人体内仍能正常制造乳糖酶。这一突变在欧亚大陆的智人种群中迅速扩散。如今，75%的欧洲人体内都有这个突变。在至少四个非洲智人种群（比如马萨伊人）中，也独立发生了类似突变。

这些突变的快速选择证明，通过遗传从父母处得到突变的人确实具有演化优势，存活率也大大提高。所以，可以这样猜想：在年成不好的时候，奶可以作为智人（包括成年智人）的重要食物。另一种假设是，奶可以提供维生素D。既然智人有可能因为继承了祖先的深色皮肤而无法制造足够的维生素D，那么他们就要依赖动物奶来满足自身需求。

在谷物消化方面也发生了类似现象。谷物富含淀粉，在淀粉酶的作用下，淀粉可以转化为糖类。一些地区的农民适应了这种饮食，与狩猎采集者相比，谷物成了他们更加重要的食物来源。他们的后代比祖先更多地携带了AMY1基因，进而能够制造更多的淀粉酶并以更快的速度消化淀粉。

迁移

农民需要木头建造房舍、烹煮食物，不久以后，他们还要燃烧木材烧制陶器。为了获得木头，他们砍伐了村落周围的树林，又不留给树足够长的生长时间以恢复树林。我们已经发现，在某些遗址里，房梁的直径呈逐渐减小之势。山羊的数量越来越多，也对树木的再生造成了危害。在每年播种前，农民都会通过焚烧清除土地上的植被（即所谓的“刀耕火种”），最终导致村落周围地区的沙化。然后，农民就会遗弃旧的村落，另觅环境退化较不严重的地方建设新家园。

此外，随着游民生活走向终结、食物供应日益稳定，农民的人口数量也与日俱增，这就需要更多的土地播种作物、饲养动物。于是，农民开始从近东地区向各个方向迁移，并将他们的技术传遍各地，尤其是位于西北方向的欧洲。

根据考古学家的描述，农民的迁移主要有两条路线。一些人沿着地中海北岸迁移，最终抵达了西班牙。他们留下的遗址里有饰有几何图案的陶器，这些图案是他们用名为鸟蛤（Cardium）的软体动物的壳镶嵌制成的，正因如此，他们的文化得名为“鸟蛤陶文化”。凭借饲养的绵羊和山羊，他们将小麦、大麦和扁豆带到了欧洲。

在北边，另一些人顺着中欧的多瑙河迁移，最终抵达了布列塔尼。他们制作的陶器上带有不同的花纹，后世称之为“线纹陶文化”。他们的迁移给欧洲带来了奶牛和家猪。他们建造的房屋呈长方形，墙壁使用木材和泥土，房顶覆以茅草。

鸟蛤陶文化也好，线纹陶文化也好，它们其实代表了智人的移民潮。不过，这些移民活动极为缓慢，几乎察觉不到，用了4000年时间才抵达大西洋岸边。在迁移的道路上，代表了新石器文明的农民遇到了以打猎和采集为生的人群。但这一次不存在杂交的问题了，因为他们都是智人。不过，他们说的语言不同，生活方式也完全不同。他们在多大程度上相互融合或相互冲突，现在已经不得而知。这两种情形或许都曾经发生；不过，在文化层面上，新石器文化在世界各地都成了主流，中石器时代的生活方式渐渐地消失了。

考古遗址见证了新石器文化在欧洲的推进过程。在法国阿韦龙省特雷耶（Treilles）发现的新石器时代墓地中，出土了24个埋葬于5 000年前的智人骸骨化石，他们的DNA就是印证。线粒体DNA和Y染色体给出了他们母系和父系血统的相关信息。根据研究结果，这24个人都是近亲，父系血统起源于地中海，可能来自土耳其的阿纳托利亚，母系血统则可以追溯至在旧石器时代生活于法国的人类种群。

在两种文明的碰撞和冲突中，代表了新石器文明的农民似乎更加暴力。人们在德国的塔尔海姆（Talheim）发现了7 000年前发生的屠杀留下的遗迹，男人、女人、孩童共计34人惨死于弓箭或石斧之下，骸骨上的伤痕正是新石器文明制造的武器造成的。而在法国阿尔萨斯地区阿克奈姆（Achenheim）的一处遗址里，出土了6个人的化石遗骸，他们死于斧子击打造成的多处骨折。凶手把他们的左臂都砍了下来，要么是作为战利品，要么是为了证明自己高效的屠杀能力。但是遗址里没有发现女人的骸骨，这或许意味着凶手的突袭并未大获全胜。村落里发现了300座储存粮食的筒仓，或许这就是凶手发动袭击的原因吧。在旧石器时代的遗址里，带有箭伤的骸骨并不鲜见，但到了新石器时代，暴力留下的痕迹明显增加，这种情况或许与定居生活带来的财富积累有关。

新石器文明，我们社会的基础，曾经是一个希望。然而，直至今日，我们的历史仍未达到预想的高度。 ——让·纪莱讷（Jean Guilaine）

在德国黑克斯海姆（Herxheim）的线纹陶文化遗址里，考古学家发现了许多人类被烹煮和食用的遗迹。这种食人行为可能是仪式性的，用来纪念死亡的同伴，或是庆祝消灭敌人。

与新石器时代的开端一样，新石器时代的结束也是个渐进的过程，其间发生了多个重要但不同步的事件。比如，大城市的出现——在距今5 500年时即出现了拥有近5万居民的美索不达米亚古城乌鲁克（Uruk）——就可以视为其标志性事件之一。最早的牛拉战车和最早的青铜器也在这一时期诞生。不过，史前时代结束和历史开始的真正标志是书写的登场：大约5400年前，人类历史上最早的书写系统出现在近东地区。然而，新石器时代虽然在近东地区宣告结束，却在西欧继续延续了几千年。直到距今4 000年，西欧才告别了新石器时代，正式迈入了青铜时代。

结语　今天的智人

如今生活在地球上的70亿现代人，都是10万年前居住在非洲的几千个智人的后代。我们现在具有的大部分特征都是从这些智人身上遗传而来的，不过，自从分散到世界各地之后，我们的祖先并没有停止演化。他们生活在多种多样的环境中，与其他人邂逅，改变了自己的生活方式后又被生活方式所改变。我们现在拥有的多样性，正是这段历史带来的遗产。

过去的痕迹

人类在扩张至所有大陆后的几千年里，继续积累基因突变，以适应生活环境和加强文化特色。基因交流从未中断，尤其是在地理上相邻的种群之间。此外，许多事件也促进了基因组的重组，比如征服战争、探险活动、奴隶贸易、经济移民、旅游观光等等。

在近代历史（从地质学意义上说，为最近的5万年）上，人类产生了各种各样的差异：身高、肤色、体毛形态、糖尿病倾向等等。这些多样性里，一部分是人类适应环境而产生的，不过并非所有特征都是适应的结果。在与世隔绝的小种群里，比如在岛屿上，可能会发生遗传漂变现象（genetic drift），某些基因频率会在没有经历自然选择，也就是没有刻意适应环境的情况下发生变化。在所罗门群岛的美拉尼西亚人中就观察到了这种现象。那里的美拉尼西亚人都拥有深色皮肤和金色头发。这种所罗门群岛岛民独有的特征与TYRP1基因的突变有关，而在北欧居民身上发现的控制金发的基因则与此不同。

某些身体特征并非仅由基因决定。人的身高不仅取决于基因，还取决于童年时期的生活方式。因此，身高并不是完全由遗传决定的。的确，在20世纪，欧洲男性和女性的身高有所增加，但这并非人类演化产生的变化，而是生活方式改变的结果——儿童不再下矿工作，与过去相比，他们吃得更好，睡得更多。不过，这种改变是可逆的。如果回归19世纪的生活方式，那人类的身高或许会平均减少10厘米至20厘米！然而，人群之间的身高差异与环境适应是有部分关系的。因纽特人的矮小身材就与极地的严寒气候不无关系，因为这种身材能减少热量损失。但是，其他因素也发挥了作用，比如对某些身体特征的文化偏好。

肤色显然与环境有关。紫外线能引发皮肤癌变，皮肤里的黑色素能防御紫外线的伤害，而黑色素含量高的话，肤色就会较深。此外，黑色素还能避免叶酸的分解，无论是对孕妇体内胎儿的神经系统发育还是对男人精子的产生，叶酸都发挥着重要作用。与此相反，在光照强度较弱的地区，颜色较浅的皮肤有利于更好地合成维生素D，不过合成过程中还是需要一定量的紫外线的。

纬度和黑色素含量之间存在很大的关联。在人体内，黑色素的合成大约受10个基因的控制，其中每个基因都存在几种变体，各个变体的活性有高有低。通常情况下，自然选择根据当地的光照情况影响这些基因的分布。可是，演化从未跟上智人迁移的速度。所以，尽管在欧洲生活了成千上万个年头，人类在很长时间内依然保留了继承自祖先的深色皮肤。

这些情况，是切达人（Cheddar man）的基因告诉我们的。切达人生活于距今1万年的英格兰，彼时尚处于中石器时代。切达人有着深色的皮肤（因为黑色素含量很高）和蓝色的眼睛，与7 000年前生活在西班牙的另一批人毫无二致。（但两种人之间没有任何亲缘关系！）SLC24A5基因参与人体内黑色素的合成，在不久以后欧洲智人皮肤淡化的过程中，这个基因发挥了重要作用。其实，在大约6 000年前，随着第一批近东农民的到来，SLC24A5的等位基因Alal 11 Thr就在欧洲出现了。另一些研究表明，当时生活在斯堪的纳维亚的智人拥有较浅的肤色，这或许是来自中亚的外来基因造成的。在旧石器时代晚期，欧洲的智人尽管为数不多，但无疑表现出了很强的多样性。很有可能，克罗马农人的肤色远比我们通常想象的要黑得多。

这么看来，相对较低的光照强度似乎并未造成太大的选择压力，或许是因为以打猎和采集为生的智人从食物中获得了足量的维生素D。

相反，到了新石器时代，智人的食谱变得较为贫乏。当北方的智人转而从事农耕生活时，肤色变白就变得至关重要，这也是SLC24A5基因的变体在智人种群中迅速传播开来的原因。到了今天，95%的欧洲人体内都含有这个变体。

至于蓝色眼睛，或许是性选择的功劳。从遗传学角度来看，存在好几种不同的蓝色眼睛；不过，在欧洲，蓝色眼睛这一特征与1万年前至6 000万年前出现的单一基因突变有关。但是，这个远不如肤色重要的特征为什么会被选择并遗传下来呢？这是因为，与蓝色眼睛相关的基因突变也在肤色变白过程中发挥了作用，尽管作用微乎其微。不过，这个理由似乎不足以确保这个突变的传播。会不会是这个突变位于某个重要基因附近，所以只是搭了后者的便车才得以遗传下来呢？虽然达尔文对这些基因一无所知，但他还是给我们提供了另一种解释。我们知道，稀有特征会带来非同寻常的吸引力。因此，拥有蓝色眼睛的人可能会留下更多的子孙后代，也就是更多蓝眼睛特征的携带者！

当然了，文化偏好在其他方面发挥了作用，比如现代人的体毛差异。在体毛方面，人类表现出明显的性别二态性，这无疑与我们远祖的偏好有关。不过，男人有胡子而女人没胡子，是因为女人偏爱有胡子的男人还是因为男人喜欢没胡子的女人导致的呢？同样，现代人今天具有的多样性，也正是人们择偶品味不尽相同产生的结果。

基因的多样性

人类的DNA由32亿个核苷酸排列而成，这些核苷酸是分子的组成部分。在这其中，只有大约500万个核苷酸是因人而异的。换句话说，任意两个人在遗传物质上的相似程度达99.6%。从基因角度考虑的话，人与人之间的差异程度比黑猩猩之间的差异程度要小。

不同种群在这些个体突变的频率上存在差异。个体突变的频率导致了种群之间的差异。没有任何基因突变只存在于一个种群中并且出现在这个种群的每个个体身上。换言之，任何个体变异都不是某个大陆或某个种群所独有的。

另外，许多研究结果表明，同一种群的个体之间的基因多样性要大于两个不同大陆上的两个不同种群之间的平均基因变异性。任何特定的种群内部都包含人类整体基因多样性的80%。尽管从外表上看不出来，但来自卡拉哈里沙漠的两个布须曼人之间的基因差异可能比一个欧洲人和一个亚洲人之间的基因差异更大。

个体之间的差异很小，但并非随机分布。尽管不存在某个种群独有的标记，但是，以个体变异的特定组合为基础，我们可以相当容易地将某个DNA归于某个大陆。与此相反，知道了某个个体的来源并不能让我们了解这个个体的基因情况。

对数千人进行的基因研究表明，人类存在几个大的地理类群。其中一项研究将人类分为以下7个不同类群：撒哈拉以南非洲人、欧洲人、中东人、中亚和南亚人、东亚人、大洋洲人以及美洲原住民。另一项研究则将人类分为以下3个不同类群：撒哈拉以南非洲人，欧洲、北非和西南亚人，亚洲其他部分、大洋洲和美洲人。

整体而言，基因相似性和地理邻近性之间存在很强的一致性。南北方向上观察到的基因变异性高于东西方向上观察到的基因变异性，这也与环境适应性随着纬度增加而更加明显的情况相符。

人类种族存在吗？

无论是在历史上还是在文化上，人类多样性的问题都关系到种族是否存在。显然，这是个非常敏感的问题，因为在人类的历史上，物种分类往往与划分种族等级甚至灭绝某些种族的企图有关联。政治利益或经济利益常常隐藏在伪科学的考量身后。

在德国纳粹于20世纪推行种族灭绝政策之后，联合国大会于1965年通过了《消除一切形式种族歧视国际公约》。在法国，种族并非官方认可的类别，官方甚至禁止进行任何将人群以“种族”划分的调查。在其他许多国家却不是这种情况，那里的居民必须明确自己的种族归属。通常情况下，我们已经不再按传统方式将人们划分为白种人、黑种人、黄种人这三大种族，而是将人们划入根据肤色和地理来源等标准人为构建的类别[比如在美国就存在黑人（非裔）、白人、西班牙裔、美洲原住民等类别]。

今天的生物学已经不再认可传统种族的存在，并将传统的种族划分视为毫无逻辑，不但无用还往往有害。然而不争的事实是，大多数人仍会提及种族。即便人类无法分类，但“人类种族不存在”的论断似乎也是在挑战基本常识。这么一来，科学可就站到我们对世界认知的对立面了（生物学并不是唯一出现这种情况的领域，地球和太阳的相对运动就是另一个例子）。

该怎么理解科学知识和一般感知之间的矛盾呢？在面对极为多变的集合时，我们本能地倾向于找出有利于进行信息组织的极端情形，偶尔会将数量上更多的中间情形抛在脑后。挪威人显然与日本人不同，我们也能够准确无误地将来自这两个群体的任何一个个体归入其中一个群体。不过，此举并不意味着给“挪威人种”和“日本人种”甚或“白种人”和“黄种人”下了定义，因为如此一来，就等于将从西欧到远东的其他民族都置于一边了，而他们显然不能被列入“挪威人种”或“日本人种”中的任何一类。

传统的“三大人种”划分依据的标准只有一个，那就是肤色，而肤色实际上是不同种群分别独立获得的。这意味着，非洲、印度和澳大利亚的黑肤色种群与各自比邻而居的浅肤色种群的亲缘关系比他们彼此之间的亲缘关系更近。

显而易见，最先试图定义种族的人类学家不得不先建立子类，然后再把子类继续细分，以至于产生了几十个“种族”，而最终这些“种族”还是不免与族群、种群或民族混为一谈。这些“种族”便成了所谓的“原始意象”（archetype）——它们以形态、地缘、文化、宗教标准建立，不具备任何精确性，而且没有任何生物学层面的事实依据。

生物学上的“种族”（race）概念

在生物学家眼中，race指野生物种内部与其他种群相互隔离且在大小、外形或行为上有明显区别的群体。这个术语差不多是品种（更多地用于植物）或亚种的同义词。这也是与亲本物种分化的一个阶段，最终可能导致新物种的出现。

race一词的另一个含义是指通过严格控制家养动物的繁殖，获得非常独特的动物，比如暹罗猫或奥布拉克奶牛。

这两个定义中的任何一个都不适用于我们人类，因为人类的繁殖并不受控，而且人类个体并非相互隔离。

不过，有些科学家和企业家提出，“种族”概念具有潜在的医疗利益。实际上，随着DNA测序新技术的诞生，人们可以考虑开发基于基因的个性化药物，这样既能虑及病人对不同病原体的易感性，又能虑及他们对治疗的不同反应。英国和中国正在实施的旨在建立巨型遗传信息库的“十万人基因组”计划正是希望实现这个目标。一些专门从事DNA测序的公司正在推动建立基因组图谱，以详细说明我们基因组中存在的所有潜在的有害突变。

某些疾病在特定种群中更加常见。比如，在阿什肯纳兹犹太人中，BRCA—1基因和BRCA—2基因上发生的特殊突变增加了乳腺癌的风险；在撒哈拉以南的非洲人中，能够引发镰状细胞性贫血的基因突变则更加常见（因为这个突变能够保护携带者免遭疟疾的困扰）。有些实验室提供“人种”检测，据称能让受测对象知道祖先的地理起源。制药业开发了针对特定种族的专用药，比如因为开发过程缺乏科学严谨性、概念模糊带来重大风险而在2005年引起很大争议的拜迪尔（BiDil）。

一棵交了好运的树上长出了一根出人意料的树枝，这根树枝上又发出了出人意料的枝杈，这个枝杈上又萌出了一个小小的细枝，这个细枝就是智人。 ——斯蒂芬·J. 古尔德，1989

不过，即便某些疾病只在特定种群中高发，也不能证明这些疾病是由基因决定的。还存在着与病人的社会背景或文化背景有关的可预见因素。在医学上，病人的直系尊亲是远比病人所属“种族”更有用的信息，何况“种族”更多是个文化概念，而非生物学事实。按这种逻辑，在欧洲或在美国，双亲分别为欧洲裔和非洲裔的儿童在文化上会被视为“黑人”，但这种划分并未给出一丁点儿生物学层面上的依据。

基因组学在医学上的另一个应用，是将人类的演化纳入疾病研究的考量范畴之中，这正是演化医学的基础。而演化医学的一个目标就是弄明白旧石器时代选择的基因可能对当今人类造成怎样的负面影响，毕竟我们的生活条件和饮食习惯都与祖先的截然不同。

未来的人类

由于人类的历史伴随着重大的解剖特征变化和心理变化，我们禁不住设想人类未来将如何演化。我们很自然地会循着两个看似符合逻辑的方向设想：首先是过去经历的转变在未来的延续，然后是对现代环境和新的生活条件的适应。按照这个思路，未来的人类将拥有硕大无比的头颅、虚弱不堪的身体、高度发达的手指（用于敲打微型键盘）和适于观看屏幕的双眼。

这种观点建立在对演化机制缺乏了解的基础上（参见第19页《人类演化：达尔文vs拉马克》）。即便我们真的需要，我们也没有理由获得更加修长、更加强壮、更加灵活的手指，除非自然选择发挥作用并有利于偶尔获得这些特征的人生存和繁衍（而且这些特征还须是源于基因的特征）。而现在，似乎还未具备这些条件，在很长的时间里，我们的手指恐怕还要保留现在的外形和能力。

至于头颅，须知演化是受限于解剖结构的。我们可以设想拥有硕大无比的头颅（和硕大无比且更加出色的大脑）的个体在日常生活中占据优势并且子孙满堂。但这样一来，分娩将会更加艰难，除非胎儿较早出生，可这样就增加了早产的风险，或者骨盆将发生改变，但那样又面临干扰双足行走的风险。

另外有一种可能发生的演化，虽然比较不明显但时常被提及，那便是智齿的演化。智齿是我们的第三臼齿，在发育过程中萌出较晚，萌出时往往令人难受，须由牙医拔除。约有20%的人只长部分智齿或完全不长智齿。我们祖先颌骨的减小阻碍了第三臼齿的正常萌出，引发了龋齿、肿块，甚至导致邻近牙齿或颌骨的破坏。在旧石器时代，这些情况都是可能导致死亡的。因此，智齿被置于强烈的负向选择之下。但在今天，这个负向选择已经消失，至少在发达国家是如此。在没有负向选择的情况下，即便基因突变持续累积，演化也不会再朝着特定方向进行了。

然而，没有任何理由认为，我们将会抵达演化终点并将不再继续发生转变。现如今全球人口已达70亿，自旧石器时代以来，人类的多样性显著增加，基因突变在基因组中持续累积，而自然选择也不再像过去那样严苛。我们的婴儿死亡率大大降低，我们生产药物对抗致命疾病，在现代医学的帮助下，本身不孕不育的夫妻也有了繁衍后代的可能。这么一来，人类究竟有哪些实际的演化可能性呢？

有些方面依然在自然选择的作用之下。每当细菌或病毒引发流行病的时候，人们就会发现有些人具有天然的抵抗力；与此相反，如果是严重的流行病，另一些人就会因此而丧命。在这种情况下，自然选择以粗暴的方式发挥作用，一些人失去生命而另一些得以幸存。流行病结束后，由于死亡率不同，具有抵抗力的人群占比上升，种群整体对这种流行病的抵御能力便有所上升。

由人类免疫缺陷病毒（HIV）引起的艾滋病（AIDS）就是如此。在人类免疫缺陷病毒攻击人体淋巴细胞时，CCR5基因会发挥作用，其变体CCR5—Δ32能够阻断病毒。在亚洲西部和欧洲，10%的人拥有CCR5—Δ32变体，人们猜想，这个变体是因为能够保护人体免受另一种恶性疾病（或许是天花）的侵害而在过去被选择的。在非洲，这个变体更加鲜见，但在得了艾滋病的人群中，它正因为艾滋病的较高致死率而经历着强烈选择。

同样的，人群中可能存在一些对合成分子较不易感的个体。而某些合成分子（比如内分泌干扰物）似乎是造成发达国家不孕不育率上升的罪魁祸首。那么，对这些合成分子的抵抗性将自动成为正向选择的目标，因为具备抵抗性的人类个体拥有更高的生殖能力。不过，这需要人类长时间接触这些合成分子才行。我们还是祈祷这种情况不要发生，否则人类的演化可就要告终了！

这类不太引人注意的生理演化可能伴有更为明显的诱发变异，但至少要在几个世纪后才能看见。至于更加显而易见的解剖特征变化，就需要等待更久，可能要等上几千年，而那时人类的生活环境如何，现在的我们是无法想象的。

从长远来看，在以百万年为单位计量的物种演化进程中，只要充分考虑人类现在的身体结构和实际发挥作用的生物学原理，我们大致可以预见人类将会发生的任何改变。至于在科幻小说里，一切皆有可能！

控制演化的痴心妄想

自19世纪以来，优生学企图通过对生育的“科学”控制来达到改良人类的目的。优生学往往带有浓厚的种族主义色彩；德国纳粹在20世纪实施的种族灭绝政策，还有对数百万人实施的绝育政策（如20世纪70年代前的瑞典或美国），使得优生学成为一门臭名昭著的学科。

随着医疗辅助生殖技术和DNA测序技术的进步，优生主义观点悄悄卷土重来了。当人们试图避免将携带严重遗传病的胚胎植入女性子宫的时候，没有人会表示不满；借助同样的技术，父母还能为未来的孩子选择理想的基因。人类基因组大规模测序项目还有另一个目的，那就是找出在其他方面发挥作用的基因，比如体型、肤色、智力甚或性格。

这些项目既虚幻又危险。说它们虚幻是因为，第一，我们成为什么样的人并不完全由基因控制，社会环境和家庭环境的作用或许更大；第二，基因的“质量”往往取决于携带者的生活条件。说它们危险，则是因为基因选择能力很容易转变成社会控制。在胚胎或胎儿性别检测成为可能之后，有些国家的男婴出生量急剧增多。最后，在胚胎上实施的任何操作都会产生长期影响，因为会波及被“选择”或改造的个体的子孙后代。今天，大多数国家禁止改造人类胚胎，但是资金或政治上的压力或许会在某一天突破伦理上的障碍。

另一种意识形态，超人类主义，旨在通过合成生物学、神经学、纳米技术或计算机科学等学科的结合，超越人类现有的生理极限。超人类主义不仅要修复人类机体，还要“提升”人类的体能和智力。与旨在改良人类本性的优生学相反，超人类主义考虑的首先是个体。不过，一些超人类主义者也提出了着眼整个人类物种未来的远期目标，比如无限延长我们的寿命。

近代史上，引导人类演化、打造“新人类”的尝试往往涉及种族灭绝或屠杀不符合标准的群体。这些梦想（或梦魇）无助于解决人类面临的各种问题，如资源过度开采、人口过剩、疾病流行、贫困等等。如果真想改变人类，我们首先应该倾向于改变人与世界的关系，还要充分考虑到人类这一物种所具有的种种多样性。

术语表

（DNA）序列 组成DNA的腺嘌呤（A）、胸腺嘧啶（T）、鸟嘌呤（G）、胞嘧啶（C）四种碱基的精确排列顺序。
DNA 脱氧核糖核酸，为包含生物发育和功能所需信息的分子。人类DNA由32亿个核苷酸（分为A、T、C、G四种）构成。细胞内的DNA分布在多个称为染色体的细丝上。
阿舍利文化 距今140万年至20万年的文化，以制造两面器为特征，往往与直立人和海德堡人有关联。
奥杜韦文化 是人类创造的最古老文化（距今约330万年至130万年），以制造粗糙的砍砸器为特征。
傍人南方古猿的全部邻近物种，拥有粗壮的骨骼和硕大的臼齿，在约100万年前灭绝。
测序测定个体的一个DNA片段或全部DNA的序列。
单倍群 多个基因组成的DNA片段，其序列视个体或种群不同而不同。由于单倍群为DNA片段通过累积突变而衍生得来，通过研究可以回溯单倍群之间的亲缘关系。
等位基因 一个基因往往有多个序列相异的变体，这些变体被称为等位基因。变体的活性有高有低，甚至可能完全失活。
分支演化 以物种共有的新特征（即“衍生特征”）为基础的系统发生树构建方法。
古人类学 以人类起源和演化为研究对象的学科。
古生物种 仅能通过化石了解的已经消失的动物或植物物种。
基因含有细胞所需物质（往往是蛋白质）的制造所需信息的DNA片段。
基因组 物种的全部DNA。DNA分为细胞核DNA和线粒体DNA。个体的基因组即个体的基因型。
旧石器时代 史前时代最古老的时期，开始于约300万年前人属和最初的石质工具的出现，结束于1.2万年前冰期末期。
旧石器时代早期 与奥杜韦文化和阿舍利文化对应的时期。
旧石器时代中期 始于大约30万年前。在欧洲，该时期与尼安德特人及莫斯特文化有关。
旧石器时代晚期 在欧洲，与智人有关，开始于距今约4万年，结束于距今约1万年的冰期结束之时。
旧世界 欧洲、非洲和亚洲，与曾被称为新世界的美洲相对应。这个名称诞生于欧洲人发现澳大利亚和南极洲之前。
两面器 对称的切削石块，往往呈杏仁形，用作工具或武器。
灵长目 全部的猿、狐猴及二者的共同祖先。
莫斯特文化 尼安德特人和非洲早期智人创造的文化。
染色体 携带个体遗传信息的DNA细丝。
人科包括所有猿类的灵长目动物科，包括猩猩、黑猩猩、倭黑猩猩、大猩猩、人类及其祖先。
人亚族 包含与智人的亲缘关系比与黑猩猩的亲缘关系更近的灵长目动物亚族，包括乍得沙赫人、图根原人、南方古猿、傍人及人属的全部物种。
山猿在意大利和东非发现的可追溯至距今900万年至700万年的一种已经灭绝的灵长目动物。某些古生物学家认为它能双足行走，不过双足行走对它的重要性仍未有定论。
适应在演化过程中动物或植物随着环境变化而改变的现象。
适应性基因渗入 基因渗入指基因从一个物种向另一个物种的转移，比如基因在尼安德特人和智人杂交时发生的转移。当发生转移的基因对个体有用且通过正向选择在其基因组里保留时，即为适应性基因渗入。
手锤以石头、骨头或鹿角制成的用于切削石头的工具，用其反复敲打石块可获得石片。
突变 DNA序列的改变。突变是偶然发生的，是等位基因和单倍群存在的原因。基因发生突变时，往往其活性会改变。
物种在生物学上，指互为亲代子代的或能够彼此交配繁衍后代的生物个体的集合；前述标准在古生物学上不适用，在古生物学中，人们根据化石的解剖特征确定物种。
系统发生树 一种呈现自祖先物种演化而来的多个物种之间的亲缘关系的树状图。可为某个生物类群（如脊椎动物、哺乳动物）或某些物种（如人亚族、人属）构建系统发生树。
线粒体DNA 线粒体中含有的DNA。线粒体为负责制造能量的细胞器。只有女性能通过卵细胞将线粒体DNA遗传下去。
镶嵌演化现象 化石物种通常表现出同时具有原始特征和衍生特征的现象。实际上，演化并非以同样的速度作用于所有器官上。所以，有些灭绝的人亚族物种虽然已能双足行走（演化创新），但大脑仍与其祖先相似。
小石叶 以燧石或黑曜石制成的小型工具，往往安装在支撑物上（如鱼叉、鱼钩等）。
新石器时代 在距今约1万年的近东地区紧接中石器时代而来的时期，在此期间，随着农业和畜牧业的发展，原先以狩猎和采集为主的经济被以农业生产为主的经济所取代。
性别二态性 同一物种的雌性个体和雄性个体的解剖学差异（性器官除外）。
衍生特征 表现形式与祖先不同且在演化过程中发生了改变的特征，又称“派生特征”。人类的非对生大脚趾为一种衍生特征，因为这个特征仅在人类世系中出现并使人类有别于其他灵长目动物。
演化自生命在地球上起源以来物种诞生和转变的历史。
幼态持续 物种演化过程中发育时间顺序改变导致的成年期仍保留幼年特征的现象。
原康修尔猿 一种已经灭绝的灵长动物，最古老的化石可追溯至大约2 300万年前的中新世。
爪哇人 欧仁·杜布瓦于1891年在爪哇发现的化石，起初被命名为直立猿人，最终在1950年被归为直立人。
正向选择 在物种演化过程中，基因组的改变（比如发生突变）有利于携带者生存或繁殖的，称为正向选择；突变缩短了携带者生命或降低了携带者生殖力的，称为负向选择。
中石器时代 上承冰期结束时的旧石器时代、下启动植物被大量驯化的新石器时代的时期，以狩猎、捕鱼和采集以及小石叶的制造为特征。
转录细胞使用DNA携带的信息制造所需分子的机制。
祖先特征 表现形式与祖先相同的特征，又称“原始特征”或“祖传特征”。人类的对生大拇指为一种祖先特征，因为从至少5 000万年前起所有灵长目动物都具有了这个特征。
最近共同祖先 两个物种共有的最近的祖先物种，通常不得而知。

2024-11-03