編輯推薦
本書是國內第1本係統介紹各種多層模型的教學和科研參考書。書中采用國際通用的著名統計軟件SAS來演示各種多層模型的應用,結閤具體的實例,由淺入深地逐步介紹如何使用不同的SAS程序,如Proc MIXED,Proc NLMIXED和Proc GLIMMIX,來進行各種多層資料的模型分析。
本書可作為綜閤性大學,醫學院、財經大學,師範院校等相應專業的研究生或本科生教材,也可供實際應用工作者參考。
內容簡介
Multilevel Models: Appfications Using SAS is written in nontechnical terms focuses on the methods and applications of various multilevel models including liner multilevel modelsmultilevel logistic regression models multilevel Poisson regression models multilevel negative binomial models as well as some cutting-edge applications such as multilevel zero-inflated Poisson (ZIP) model random effect zero-inflated negative binomial model (RE-ZINB) mixed-effect mixed-distribution models bootstrapping multilevel models and group-based trajectory models. Readers will learn to build and apply multilevel models for hierarchically structured cross-sectional data and longitudinal data using the internationally distributed software package Statistics Analysis System (SAS). Detailed SAS syntax and output are provided for model applications providing students research scientists and data analysts with ready templates for their applications.
作者簡介
.
內頁插圖
目錄
Chapter 1 Introduction
1.1 Conceptual framework of multilevel modeling
1.2 Hierarchically structured data
1.3 Variables in multilevel data
1.4 Analytical problems with multilevel data
1.5 Advantages and limitations of multilevel modeling
1.6 Computer software for multilevel modeling
Chapter 2 Basics of Linear Multilevel Models
2.1 Intraclass correlation coefficient (ICC)
2.2 Formulation of two-level multilevel models
2.3 Model assumptions
2.4 Fixed and random regression coefficients
2.5 Cross-level interactions
2.6 Measurement centering
2.7 Model estimation
2.8 Model fit, hypothesis testing, and model comparisons
2.8.1 Model fit
2.8.2 Hypothesis testing
2.8.3 Model comparisons
2.9 Explained level-1 and level-2 variances
2.10 Steps for building multilevel models
2.11 Higher-level multilevel models
Chapter 3 Application of Two-level Linear Multilevel Models
3.1 Data
3.2 Empty model
3.3 Predicting between-group variation
3.4 Predicting within-group variation
3.5 Testing random level-1 slopes
3.6 Across-level interactions
3.7 Other issues in model development
Chapter 4 Application of Multilevel Modeling to Longitudinal Data
4.1 Features of longitudinal data
4.2 Limitations of traditional approaches for modeling longitudinal data
4.3 Advantages of multilevel modeling for longitudinal data
4.4 Formulation of growth models
4.5 Data description and manipulation
4.6 Linear growth models
4.6.1 The shape of average outcome change over time
4.6.2 Random intercept growth models
4.6.3 Random intercept and slope growth models
4.6.4 Intercept and slope as outcomes
4.6.5 Controlling for individual background variables in models
4.6.6 Coding time score
4.6.7 Residual variance/covariance structures
4.6.8 Time-varying covariates
4.7 Curvilinear growth models
4.7.1 Polynomial growth model
4.7.2 Dealing with collinearity in higher order polynomial growth model
4.7.3 Piecewise (linear spline) growth model
Chapter 5 Multilevel Models for Discrete Outcome Measures
5.1 Introduction to generalized linear mixed models
5.1.1 Generalized linear models
5.1.2 Generalized linear mixed models
5.2 SAS Procedures for multilevel modeling with discrete outcomes
5.3 Multilevel models for binary outcomes
5.3.1 Logistic regression models
5.3.2 Probit models
5.3.3 Unobserved latent variables and observed binary outcome measures
5.3.4 Multilevel logistic regression models
5.3.5 Application of multilevel logistic regression models
5.3.6 Application of multilevel logit models to longitudinal data
5.4 Multilevel models for ordinal outcomes
5.4.1 Cumulative logit models
5.4.2 Multilevel cumulative logit models
5.5 Multilevel models for nominal outcomes
5.5.1 Multinomial logit models
5.5.2 Multilevel multinomial logit models
5.5.3 Application of multilevel multinomial logit models
5.6 Multilevel models for count outcomes
5.6.1 Poisson regression models
5.6.2 Poisson regression with over-dispersion and a negative binomial model
5.6.3 Multilevel Poisson and negative binomial models
5.6.4 Application of multilevel Poisson and negative binomial models
Chapter 6 Other Applications of Multilevel Modeling and Related Issues
6.1 Multilevel zero-inflated models for count data with extra zeros
6.1.1 Fixed-effect ZIP model
6.1.2 Random effect zero-inflated Poisson (RE-ZIP) models
6.1.3 Random effect zero-inflated negative binomial (RE-ZINB) models
6.1.4 Application of RE-ZIP and RE-ZINB models
6.2 Mixed-effect mixed-distribution models for semi-continuous outcomes
6.2.1 Mixed-effects mixed distribution model
6.2.2 Application of the Mixed-Effect mixed distribution model
6.3 Bootstrap multilevel modeling
6.3.1 Nonparametric residual bootstrap multilevel modeling
6.3.2 Parametric residual bootstrap multilevel modeling
6.3.3 Application of nonparametric residual bootstrap multilevel modeling
6.4 Group-based models for longitudinal data analysis
6.4.1 Introduction to group-based model
6.4.2 Group-based logit model
6.4.3 Group-based zero-inflated Poisson (ZIP) model
6.4.4 Group-based censored normal models
6.5 Missing values issue
6.5.1 Missing data mechanisms and their implications
6.5.2 Handling missing data in longitudinal data analyses
6.6 Statistical power and sample size for multilevel modeling
6.6.1 Sample size estimation for two-level designs
6.6.2 Sample size estimation for longitudinal data analysis
Reference
精彩書摘
In the linear model case, this integral can be solved in closed form, and the resulting likelihood or restricted likelihood can be maximized directly. For nonlinear multilevel models, however, the integral is usually unknown and must be approximated. Many methods have been proposed for such maximization approximation. Two basic methods are: 1) linearization, which approximates the integrated likelihood function using techniques such as Taylor series expansion, 2) integral approximation with numerical methods. These approaches are implemented in two SAS procedures, PROC GLIMMIX and PROC NLMIXED and two macros, %GLIMMIX and %NLMIXED, respectively.
Prior to the current version of SAS (SAS 9.2) (SAS Institute Inc., 2008), PROC GLIMMIX is solely based on linearization methods. In version 9.2 of PROC GLIMMIX, linearization is the default estimation method, and two numerical integration methods——Laplace approximation method and adaptive Gauss-Hermite quadrature have been added as options. The linearization method is also called a pseudo-likelihood method, in which pseudo-data are generated from the original data, and likelihood function is approximated using Taylor series expansions (Schabenberger, 2005). The essential idea of the linearization method is to approximate GLMM using normal linear mixed model estimates repeatedly. Among the various linearization methods available in the procedure, the default method is the restricted or residual pseudo-likelihood (REPL) (Wolfinger & OConnell, 1993). The maximization of the pseudo-likelihood can be carried out by various optimization techniques in PROC GLIMMIX. The default optimization technique is the Newton-Raphson algorithm.
The major advantages of linearization-based methods include: First, they can fit models for which the joint distribution is difficult or impossible to ascertain. Second, compared with numerical integration methods, they allow a larger number of random effects to be estimated in the model. Third, the variance/covariance structure of the level-1 residual matrix (i.e., R matrix) can be readily accommodated. Fourth, the model is iteratively estimated based on the linear mixed model, thus both ML and REML are available for model estimation (Schabenberger, 2005). In addition, in our experience, linearization based models are much faster to run.
The disadvantages of linearization-based methods include: First, they are based on iterative model estimation using pseudo-data constructed from the original data; as such, they do not have a real likelihood, and therefore -2LL or deviance statistic cannot be used for model comparisons. Second, PROC GLIMMIX does not support a broad array of variance/covariance structures of the R matrix that you can draw on with the PROC MIXED procedure (Schabenberger, 2005).
前言/序言
Interest in multilevel statistical models for social science and public health studies has been aroused dramatically since the mid-1980s. New multilevel modeling techniques are giving researchers tools for analyzing data that have a hierarchical or clustered structure. Multilevel models are now applied to a wide range of studies in sociology, population studies, education studies, psychology, economics, epidemiology, and public health.
Individuals and social contexts (e.g., communities, schools, organizations, or geographic locations) to which individuals belong are conceptualized as a hierarchical system, in which individuals are micro units and contexts are macro units. Research interest often centers on whether and how individual outcome varies across contexts, and how the variation is explained by contextual factors; what and how the relationships between the outcome measures and individual characteristics vary across contexts, and how the relationships are influenced or moderated by contextual factors. To address these questions, studies often employ data collected from more than one level of observation units, i.e., observations are collected at both an individual level (e.g., students) and one or more contextual levels (e.g., schools, cities). As a result, the data are characterized by a hierarchical structure in which individuals are nested within units at the higher levels. This kind of data is called hierarchically structured data or multilevel data. The conventional single-level statistical methods, such as ordinary least square(OLS) regression are inappropriate for analysis of multilevel data because observations are nonindependent and the contextual effects cannot be addressed appropriately in such models. Multilevel modeling not only takes into account observation dependence in the multilevel data, but also provides a more meaningful conceptual framework by allowing assessment of both individual and contextual effects, as well as cross-level interaction effects.
This book covers a broad range of topics about multilevel modeling. Our goal is to help students and researchers who are interested in analysis of multilevel data to understand the basic concepts, theoretical frameworks and application methods of multilevel modeling. This book is written in non-mathematical terms, focusing on the methods and application of various multilevel models, using the internationally widely used statistical software, the Statistics Analysis System (SAS). Examples are drawn from analysis of real-world research data. We focus on twolevel models in this book because it is most frequently encountered situation in real research. These models can be readily expanded to models with three or more levels when applicable. A wide range of linear and non-linear multilevel models are introduced and demonstrated.
復雜係統數據解析:進階建模與實踐指南 本書麵嚮在數據科學、社會學、心理學、教育學以及醫學等領域進行深入研究的專業人士和高級學生,旨在提供一套係統、全麵且高度實用的復雜數據結構建模框架與應用策略。我們聚焦於超越傳統綫性模型的局限性,深入探索那些數據內在存在層級結構、重復測量或非獨立觀測的復雜場景。 在當今數據驅動的研究環境中,研究者越來越頻繁地麵對“嵌套”或“縱嚮”數據結構。例如,學生嵌套在班級中,班級嵌套在學校裏;患者在不同時間點的多次測量;或是來自不同地域、具有不同社會背景的個體數據。簡單地將這些數據視為獨立的觀測值進行傳統迴歸分析,不僅會低估標準誤差,導緻推斷偏差,更會忽略數據層級結構中蘊含的關鍵信息——即層級間的相互作用和異質性。 本書的核心目標,是為讀者構建起一座堅實的橋梁,連接前沿的統計理論與前沿的軟件實現能力,特彆是針對那些在復雜模型擬閤中錶現卓越的統計軟件環境。我們假設讀者已經掌握瞭基礎的統計推斷原理和多元迴歸分析的基礎知識,因此,本書將直接切入復雜模型的理論精髓和實際操作細節。 --- 第一部分:超越獨立性假設——層級模型的理論基石 本部分將首先奠定讀者對復雜數據結構本質的深刻理解。我們不滿足於識彆“層級”的存在,更深入探討這種結構如何係統性地影響數據的方差和協方差結構。 第一章:復雜數據的挑戰與模型選擇的邏輯 我們將詳細剖析何為“非獨立性”及其對統計效力的負麵影響。通過具體的案例分析,展示標準OLS(普通最小二乘法)在麵對嵌套數據時産生的偏誤。隨後,引入混閤效應模型的概念,作為解決此類問題的首選工具,明確其在建模不同層級效應方麵的優勢。 第二章:隨機截距模型的構建與解釋 隨機截距模型是層級分析的起點。本章將詳盡闡述如何設置隨機截距,以捕捉不同群組(Level 2 或更高層級)的基綫差異。我們將深入探討方差分量(Variance Components)的理論,並教授如何解讀這些分量,例如組內相關係數(ICC),以量化層級結構對總變異的貢獻程度。解讀隨機截距的分布及其對個體差異的解釋將是本章的重點。 第三章:隨機斜率與交叉水平交互作用 更進一步,本章探討瞭隨機斜率模型的必要性。當預測變量對結果的影響程度本身也因群組而異時,隨機斜率模型變得不可或缺。我們將詳細講解如何設置隨機斜率,並解釋其與隨機截距的聯閤分布。此外,本章還將細緻區分交叉水平交互作用(Cross-Level Interactions)——即低層級變量對高層級變量的調節效應——的理論框架和統計意義。 第四章:模型擬閤的統計原理與收斂性診斷 復雜模型,尤其是包含大量隨機效應的模型,其收斂性是實踐中最大的挑戰之一。本章將深入講解最大似然估計(ML)與限製性最大似然估計(REML)的數學原理差異,並指導讀者如何通過信息準則(如AIC/BIC)進行模型選擇。我們將提供一套係統的診斷流程,用於識彆和解決模型擬閤不佳、參數估計不穩定的問題。 --- 第二部分:應用擴展與高級方法論 在掌握瞭基礎的隨機效應模型後,本書將轉嚮更具挑戰性和現實意義的應用場景,涵蓋縱嚮數據分析和廣義綫性混閤模型(GLMM)。 第五章:縱嚮數據分析與增長麯綫模型 重復測量數據(如追蹤研究)是層級模型的典型應用場景。本章將側重於增長麯綫模型(Growth Curve Models)。我們將介紹如何將時間作為連續變量納入模型,區分隨機截距和隨機斜率隨時間變化的軌跡。本章將詳細討論如何處理不規則測量時間點以及如何通過協變量預測個體軌跡的差異。 第六章:廣義綫性混閤模型(GLMM)的理論基礎 當因變量不再是正態分布時(例如二元、計數或比例數據),標準混閤模型無法適用。本章將構建GLMM的理論框架,重點講解連接函數(Link Functions)和指數族分布在高層級數據中的應用。我們將解析Logit、Probit以及泊鬆/負二項分布在混閤模型結構下的具體錶達形式。 第七章:GLMM的實施策略與特殊情況處理 本章側重於GLMM的實際操作。我們將針對二元(如患病/未患病)和計數(如事件發生次數)數據,提供詳細的參數估計和解釋指南。特殊關注拉普拉斯近似(Laplace Approximation)和懲罰擬閤準則(PQL)等數值方法在擬閤復雜GLMM時的優劣,並教授讀者如何判斷模型輸齣的可靠性。 第八章:貝葉斯方法在層級建模中的優勢 麵對高度復雜的層級結構或樣本量較小的情況,傳統的最大似然方法可能受限。本章將介紹貝葉斯統計方法如何為層級模型提供強大的替代方案。我們將闡述如何設置先驗分布、如何運行MCMC(馬爾可夫鏈濛特卡洛)算法,並側重於貝葉斯框架下對隨機效應後驗分布的解釋和報告。 --- 第三部分:模型應用與結果的可靠報告 本部分將引導讀者將理論知識轉化為具有說服力的研究報告,確保模型的穩健性和結論的透明度。 第九章:模型選擇、嵌套與非嵌套模型的比較 本章將提供一個清晰的決策樹,指導研究者何時需要引入隨機斜率,何時隻需保留隨機截距。我們將講解如何使用似然比檢驗(Likelihood Ratio Tests)來嚴格比較嵌套模型,以及在非嵌套模型比較中應采用的統計標準。 第十章:效應的分解與解釋:層級效應的量化 模型的最終價值在於其解釋力。本章專注於如何清晰地嚮非統計學背景的受眾報告層級模型的結果。我們將提供標準化和非標準化係數的解釋指南,重點講解如何分解和報告來自不同層級的固定效應、隨機效應方差,以及如何可視化復雜的交叉水平交互作用。 第十一章:缺失數據處理與模型穩健性檢驗 在真實世界的研究中,數據缺失是常態。本章將探討在層級模型框架下處理缺失數據的方法,包括列錶刪除(Listwise Deletion)的局限性,以及完全信息最大似然(FIML)和多重插補(Multiple Imputation)在混閤模型中的應用策略。此外,本章還將介紹通過改變模型假設(如改變殘差結構或隨機效應分布)來進行模型穩健性檢驗的實用技巧。 附錄:模型設計與報告標準 附錄提供瞭一套全麵的研究設計檢查清單,用於規劃復雜模型研究,確保數據收集過程能夠支持後續的層級分析。同時,我們參考瞭主要學術期刊的報告指南,指導讀者如何撰寫一份清晰、完整且符閤學術規範的混閤效應模型結果報告。 --- 貫穿全書的實踐指導:本書的每一章節理論講解後,均會緊密結閤當前主流統計軟件的實際操作流程,通過詳盡的輸入文件示例和輸齣結果解析,確保讀者能夠立即將所學知識應用於自己的研究數據中。我們相信,對復雜數據結構的掌握,是現代定量研究走嚮深度的關鍵一步。