limma

R 패키지 메타데이터와 수집 신호를 모아 봅니다.

Packages / CRAN / limma

limma

v3.69.2

limma

Repository BioconductorLicense GPL (>= 2)

표시할 핵심 지표가 없습니다.

Core Signals

첫 화면에서 판단해야 할 수집 신호를 먼저 배치합니다.

표시할 핵심 신호가 없습니다.

Supported Backends

DESCRIPTION에서 감지한 backend 관련 package입니다.

backend package 신호가 없습니다.

Quick Facts

기본 메타데이터를 작은 카드와 토큰으로 압축합니다.

profile

Repository

Bioconductor

Version

3.69.2

License

GPL (>= 2)

Last observed

2026-07-24

CRAN: cran.r-project.org/package=limma

수집 소스별 패키지 정보

1개 소스

Bioconductor

3.69.2

2026-07-24

License

GPL (>= 2)

Depends

R (>= 3.6.0)

Imports

grDevices, graphics, stats, utils, methods, statmod

Suggests

BiasedUrn, ellipse, gplots, knitr, locfit, MASS, splines, affy, AnnotationDbi, Biobase, BiocStyle, GO.db, illuminaio, org.Hs.eg.db, vsn

Last observed

2026-07-24 02:11:40

이 패키지가 의존하는 패키지

0개 표시전체 0개

Package	Type	Spec
표시할 dependency edge가 없습니다.
검색 결과가 없습니다.

1 / 1

이 패키지를 쓰는 패키지

5개 표시전체 78개

Package	Type	Spec
BALLI 0.2.0 CRAN · 2026-07-24	Depends	limma
BioInsight 0.3.1 CRAN · 2026-07-24	Depends	limma
CEDA 1.1.1 CRAN · 2026-07-24	Depends	limma
cp4p 0.3.6 CRAN · 2026-07-24	Depends	limma
DAAGbio 0.63-4 CRAN · 2026-07-24	Depends	limma (>= 2.9.15)
DRomics 2.6-2 CRAN · 2026-07-24	Depends	limma
fmt 2.0 CRAN · 2026-07-24	Depends	limma
PerfMeas 1.2.5 CRAN · 2026-07-24	Depends	limma
ttScreening 1.8 CRAN · 2026-07-24	Depends	limma
batchtma 0.2.0 CRAN · 2026-07-24	Imports	limma
BioUtils 0.1.3 CRAN · 2026-07-24	Imports	limma
BPM 1.0.0 CRAN · 2026-07-24	Imports	limma
Cascade 2.4 CRAN · 2026-07-24	Imports	limma
cinaR 0.2.6 CRAN · 2026-07-24	Imports	limma
DiPALM 1.3 CRAN · 2026-07-24	Imports	limma
dsb 2.0.1 CRAN · 2026-07-24	Imports	limma
eLNNpairedCov 0.3.2 CRAN · 2026-07-24	Imports	limma
GSEMA 0.99.4 CRAN · 2026-07-24	Imports	limma
GWASbyCluster 0.1.7 CRAN · 2026-07-24	Imports	limma
hicream 0.0.4 CRAN · 2026-07-24	Imports	limma
lfproQC 1.4.3 CRAN · 2026-07-24	Imports	limma
lilikoi 2.1.1 CRAN · 2026-07-24	Imports	limma
limorhyde2 0.1.1 CRAN · 2026-07-24	Imports	limma (>= 3.42.2)
lipidomeR 0.1.2 CRAN · 2026-07-24	Imports	limma
MetAlyzer 1.1.0 CRAN · 2026-07-24	Imports	limma
metaMA 3.1.3 CRAN · 2026-07-24	Imports	limma
mi4p 1.3 CRAN · 2026-07-24	Imports	limma
miRtest 2.2 CRAN · 2026-07-24	Imports	limma
MKmisc 2.0 CRAN · 2026-07-24	Imports	limma
MKomics 0.8 CRAN · 2026-07-24	Imports	limma
MSclassifR 0.5.0 CRAN · 2026-07-24	Imports	limma
newIMVC 0.1.0 CRAN · 2026-07-24	Imports	limma
nlcv 0.3.6 CRAN · 2026-07-24	Imports	limma
oncoPredict 1.3.1 CRAN · 2026-07-24	Imports	limma
OncoSubtype 1.0.0 CRAN · 2026-07-24	Imports	limma
Patterns 1.7 CRAN · 2026-07-24	Imports	limma
plfMA 2.0 CRAN · 2026-07-24	Imports	limma
promor 0.2.3 CRAN · 2026-07-24	Imports	limma
RANKS 1.1 CRAN · 2026-07-24	Imports	limma
RCPA 0.2.9 CRAN · 2026-07-24	Imports	limma
robusttseq 0.1.0 CRAN · 2026-07-24	Imports	limma
RPPanalyzer 1.4.9 CRAN · 2026-07-24	Imports	limma
scBio 0.1.6 CRAN · 2026-07-24	Imports	limma
scGOclust 0.2.1 CRAN · 2026-07-24	Imports	limma
scROSHI 1.0.0.0 CRAN · 2026-07-24	Imports	limma
ssizeRNA 1.3.3 CRAN · 2026-07-24	Imports	limma
tinyarray 2.4.3 CRAN · 2026-07-24	Imports	limma
TransProR 1.0.7 CRAN · 2026-07-24	Imports	limma
treediff 0.2.3 CRAN · 2026-07-24	Imports	limma
wrProteo 2.0.2 CRAN · 2026-07-24	Imports	limma
XYomics 0.1.4 CRAN · 2026-07-24	Imports	limma
AnnoProbe 0.1.7 CRAN · 2026-07-24	Suggests	limma
aroma.affymetrix 3.2.3 CRAN · 2026-07-24	Suggests	limma (>= 3.24.1)
canvasXpress 1.59.5 CRAN · 2026-07-24	Suggests	limma
corncob 0.4.2 CRAN · 2026-07-24	Suggests	limma
DGEobj.utils 1.0.6 CRAN · 2026-07-24	Suggests	limma
easybio 1.2.3 CRAN · 2026-07-24	Suggests	limma
ggpicrust2 2.5.17 CRAN · 2026-07-24	Suggests	limma
GiANT 1.3.4 CRAN · 2026-07-24	Suggests	limma
hexbin 1.28.5 CRAN · 2026-07-24	Suggests	limma
inDAGO 1.0.3 CRAN · 2026-07-24	Suggests	limma
IOBR 2.2.3 CRAN · 2026-07-24	Suggests	limma
limorhyde 1.0.3 CRAN · 2026-07-24	Suggests	limma (>= 3.36.1)
maGUI 4.0 CRAN · 2026-07-24	Suggests	limma
NACHO 2.0.6 CRAN · 2026-07-24	Suggests	limma
pctax 0.1.7 CRAN · 2026-07-24	Suggests	limma
pmartR 2.5.1 CRAN · 2026-07-24	Suggests	limma
protti 1.0.0 CRAN · 2026-07-24	Suggests	limma
RepeatedHighDim 2.5.0 CRAN · 2026-07-24	Suggests	limma
SCdeconR 1.0.2 CRAN · 2026-06-11	Suggests	limma
seqgendiff 1.2.4 CRAN · 2026-07-24	Suggests	limma
Seurat 5.5.1 CRAN · 2026-07-24	Suggests	limma
SeuratExplorer 0.1.6 CRAN · 2026-07-24	Suggests	limma
simphony 1.0.3 CRAN · 2026-07-24	Suggests	limma (>= 3.34.9)
st 1.2.7 CRAN · 2026-07-24	Suggests	limma
volcano3D 2.0.11 CRAN · 2026-07-24	Suggests	limma
wrGraph 1.3.16 CRAN · 2026-07-24	Suggests	limma
wrMisc 2.1.1 CRAN · 2026-07-24	Suggests	limma
검색 결과가 없습니다.

1 / 16

Reverse dependency summary

3 types

Type	Packages
Depends	9
Imports	42
Suggests	27

패키지 문서 원문

3 artifacts

field

PDF

Bioconductor · 3.68.2 · Reference Manual · application/pdf · 816,420 · 2026-05-07

Title

PDF

Label

PDF

field

Text

Bioconductor · 3.68.2 · NEWS · text/plain; charset=utf-8 · 78,292 · 2026-05-07

Title

Text

Label

Text

Text content

Changes in version 3.68.0 (2026-04-28)                 

    o   
	New arguments `offset` and `offset.prior` for voom(). voom()
	now supports offsets in DGEList objects.

    o   
	New argument `treat.lfc` for topSplice() to allow TREAT
	fold-change thresholding with diffSplice() output.

    o   
	In normalizeCyclicLoess(), default for `adaptive.span` becomes
	`TRUE`, and the `span` is chosen using the default settings of
	chooseLoessSpan().

    o   
	The function topTableF(), which has been deprecated for many
	years, is now formally removed.

                 Changes in version 3.66.0 (2025-10-28)                 

    o   
	New argument `upshot` for treat().

    o   
	New argument `span` for fitFDistUnequalDF1(), squeezeVar() and
	eBayes().

    o   
	The default for `adaptive.span` in voom() and
	voomWithQualityWeights() is now `TRUE`. All the voom functions
	now use chooseLowessSpan() by default to choose the `span`.

    o   
	New argument `xlab` for plotSplice() to allow labeling in terms
	of transcripts or other splicing events instead of exons.

    o   
	The help pages for diffSplice(), topSplice() and plotSplice()
	have been updated to emphasize differential transcript usage
	instead of differential exon usage or differential splicing.
	"differential usage" concept tags have been added to the
	appropriate help pages, creating an entry in the reference
	manual index.

    o   
	makeContrasts() now uses the contrast names if `contrasts` is a
	named vector.

    o   
	Fix typo in error message from makeContrasts() when the
	variable names are not syntactically valid.

    o   
	Bug fix to contrastAsCoef() when the `contrast` matrix is not
	of full column rank.

    o   
	The columns of the `covariates` matrix input to
	removeBatchEffect() are now mean-corrected.

    o   
	removeBatchEffect() now gives a message if `design` is not
	specified.

    o   
	Bug fix for topTable() when `confint=TRUE` and either `p.value`
	is less than 1 or `lfc` is greater than zero.

    o   
	Revise help page for lmFit(), removing out-of-date mention of
	default correlation.

    o   
	Minor edits to poolVar() and update of help page.

    o   
	Complete remaining dates for release versions in NEWS.Rd.

                 Changes in version 3.64.0 (2025-04-16)                 

    o   
	In topSplice(), the default for `test` is now "F" instead of
	"simes".

    o   
	diffSplice() is now an S3 generic function with a method for
	MArrayLM objects.

    o   
	New argument `legacy` for diffSplice().

    o   
	diffSplice() now adjusts transcript-level p-values using the
	full Simes method, instead of a modified Simes method leaving
	out the largest p-value for each gene that has been used until
	now. This will cause the Simes p-values returned by
	diffSplice() to be slightly more conservative than before.

    o   
	diffSplice() now returns Bonferroni adjusted p-values as an
	alternative to Simes method.

    o   
	The `gene.genes` data.frame returned by diffSplice() now has
	row names set to the gene IDs. Previously the last exon ID for
	each gene was used.

    o   
	diffSplice() now allows for exons with df.residual=0 and
	sigma=NA.

    o   
	Bug fix to diffSplice() when fitFDistUnequalDF1() was called
	with `robust=TRUE`, i.e., whenever `robust=TRUE` and the number
	of transcripts differs between genes.

    o   
	Bug fix to genewise variance pooling in diffSplice() when the
	residual df are unequal. Previously the exon-wise variances
	were not weighted by df when pooling.

    o   
	New argument `directional` for camera() and cameraPR().

    o   
	vooma() now allows missing values in both `y` and `predictor`,
	provided there are no entirely missing rows and provided all
	non-NA y values have a non-NA predictor.

    o   
	voomaLmFit() now allows missing values in `y` and `predictor`,
	provided there are no entirely missing rows and provided all
	non-NA y values have a non-NA predictor.

    o   
	voomaLmFit() now allows `predictor` to be either a matrix or a
	column vector with genewise values.

    o   
	Edit voomaLmFit() help page. Add Li (2024) PhD Thesis as
	reference.

                 Changes in version 3.62.0 (2024-10-30)                 

    o   
	New function fitFDistUnequalDF1() for improved empirical Bayes
	hyperparameter estimation. The new function is an alternative
	to the long-standing limma functions fitFDist() and
	fitFDistRobustly().  The new function gives special attention
	to contexts where the df1 argument is unequal between cases or
	can include small fractional values, such as those resulting
	from edgeR's voomLmFit() or from the edgeR 4.0 quasi-likelihood
	pipeline.  The new function uses profile maximum likelihood
	instead of moment estimation to estimate df.prior, and so may
	be more accurate than fitFDist() or fitFDistRobustly() even
	when the df1 values are all equal. The new hyperpameter
	estimation will be used by default by eBayes(), treat() and
	squeezeVar() whenever the residual degrees of freedom are
	unequal, but users can request legacy behavior for those
	functions by setting `legacy=TRUE`.

    o   
	New argument `adaptive.span` for voomWithQualityWeights() and
	normalizeCyclicLoess(), with the same purpose as the same
	argument for voom(). If `TRUE` then an optimal value for the
	span is chosen based on the number of genes in the dataset.
	voom() and voomWithQualityWeights() now return the chosen
	`span` as part of the output object if `adaptive.span=TRUE`.

    o   
	contrasts.fit() now converts the coefficient name "(Intercept)"
	created automatically by model.matrix() to "Intercept", same as
	is done by makeContrasts(). This should avoid inconsistent
	coefficient names between the fitted model object and contrast
	matrices created by makeContrasts().

    o   
	New argument `keep.EList` for voomaLmFit(), similar to the same
	argument for edgeR::voomLmFit(). The observation weights
	generated by voomaLmFit() are now stored only if
	`keep.EList=TRUE`.

    o   
	voomaByGroup() now uses the overall mean-variance trend for
	groups with only one sample, for which a separate trend is not
	estimable.

    o   
	Add Ravindra et al (2023) as a reference and Mengbo Li as an
	author to the voomaByGroup() help page.  Emphasize in the help
	details that only simple design matrices are supported by this
	function.

    o   
	Revise the eBayes/treat help page to further clarify the
	`legacy` and `fc` arguments.

    o   
	Bug fix to voomaByGroup() to save variance plot parameters
	correctly when `save.plot=TRUE`.

    o   
	Remove argument `zero.weights` from the MArrayLM method for
	plotMD() and plotMA().

    o   
	Fix bug in the MArrayLM methods for plotMD() and plotMA(),
	which were attempting to access a `weights` component of the
	fitted model object, analogously to what plotMD() does with the
	EList or MAList objects, even though MArrayLM objects do not
	contain a `weights` component.

    o   
	Use of the stats package digamma() function replaced with
	statmod's logmdigamma() in fitFDist(), fitFDistUnequalDF1() and
	intraspotCorrelation().  Slight improvements to speed, accuracy
	and code simplicity, but results should remain the same to at
	least 13 significant figures.

                 Changes in version 3.60.0 (2024-05-01)                 

    o   
	The default settings for the `small.n` and `min.span` arguments
	of chooseLowessSpan() have been increased, increasing the
	chosen span value.

    o   
	New argument `adaptive.span` for voom(). If `TRUE`, then
	chooseLowessSpan() is used to select an optimal `span` value
	depending on the number of genes, same as is done for vooma(),
	voomaByGroup() and voomaLmFit().

    o   
	The plot information saved by voom() when `save.plot=TRUE` now
	includes `pch` and `cex` plotting character settings.

    o   
	The default `span` set by vooma() is increased slightly to the
	value given by `chooseLowessSpan(ngenes, small.n=50,
	min.span=0.3, power=1/3)`. A ne

news

Bioconductor 3.23

Bioconductor · 3.68.2 · text/html · 360,277 · 2026-05-07

Title

Bioconductor - Bioconductor 3.23 Released

Label

Bioconductor 3.23

Text content

Bioconductor - Bioconductor 3.23 Released window.dataLayer = window.dataLayer || []; function gtag() { dataLayer.push(arguments); } gtag("js", new Date()); gtag("config", "G-WJMEEH1J58"); Bioconductor 3.23 Release! About Learn Packages Developers Search Get Started Menu Home Bioconductor 3.23 Released April 29, 2026 Bioconductor: We are pleased to announce Bioconductor 3.23, consisting of 2418 software packages, 437 experiment data packages, 928 annotation packages, 28 workflows and 8 books. There are 94 new software packages, 5 new data experiment packages, no new annotation packages, no new workflows, 2 new books, and many updates and improvements to existing packages. Bioconductor 3.23 is compatible with R 4.6, and is supported on Linux, 64-bit Windows, Intel 64-bit macOS 11 (Big Sur) or higher, macOS arm64 and Linux arm64. This release will also include updated Bioconductor Docker containers . Note: Currently Bioconductor does not have active daily windows builders. We will be testing the windows and mac binaries generated through the r-universe system. These updated windows and mac binaries will be available shortly after this release. We appreciate your patience as we make them available. Thank you to everyone for your contribution to Bioconductor. Visit Bioconductor BiocViews for details and downloads. Contents Getting Started with Bioconductor 3.23 New Software Packages New Data Experiment Packages New Annotation Packages New Workflow New Books NEWS from existing software packages NEWS from existing data experiment packages Deprecated and Defunct Packages Getting Started with Bioconductor 3.23 To update to or install Bioconductor 3.23: Install R 4.6. Bioconductor 3.23 has been designed expressly for this version of R. Follow the instructions at Installing Bioconductor . New Software Packages There are 94 new software packages in this release of Bioconductor. Aerith Visualisation of peptide isotopic peaks and SIP peptide spectra match (PSM). Filtration of high quality PSM. Accurate isotopic abundance calculation of peptide and metabolites. Visualisation of SIP proteomics results. annoLinker Fast annotation of genomic peaks using DNA interaction data by constructing interaction networks with igraph, where peaks overlapping any node in a connected subgraph are annotated with all genes in that subgraph. The annotation evidence could be visualized as either a network graph or a genomic track integrated with gene annotation information. asuri The ASURI (Analysis of SUrvival and patients RIsk prediction based on gene signatures) package discovers marker genes that are related to risk prediction capabilities and to a clinical variable of interest. It uses two main steps, including subsampling glmnet and unicox. The package implements robust functions to discover survival markers related to a clinical phenotype and to predict a risk score, allowing to study the patient’s risk based on the gene signatures. Several plots are provided to visualise the relevance of the genes, the risk score, and patient stratification, as well as a robust version of the Kaplan-Meier curves. atacInferCnv The package prepares input scATAC-seq data and adapts for copy number variance profiling with InferCNV package usage. It has also various paramters to control the analysis (e.g. external normal reference usage, meta-cells, bin size, etc) and custom plot visualizations. BatChef This package implements a variety of methods for batch correction in single-cell RNA sequencing (scRNA-seq) data. It incorporates quantitative metrics (e.g. Wasserstein distance, Adjusted Rand Index) to evaluate their performance. Furthermore, the package assists users in identifying and applying the optimal method for specific datasets. Battlefield Battlefield is a Swiss-army toolkit originally developed to define and extract spatial spots from specific tissue regions—such as front regions, niche borders, invasive margins, and cluster interfaces—using spatial transcriptomics data or clustered tissue maps. It has since been extended to support trajectory selection and layer inspection, and now provides a collection of low-level utilities for spatial transcriptomics analysis. These utilities are primarily intended to be reused within higher-level analytical packages. It is designed to work with sequencing-based platforms such as Visium at several resolutions and Visium HD(binned). betterChromVAR A much faster analytical implementation of chromVAR, with additional features, used to infer TF activity from (bulk or single-cell) ATAC-seq data and motif annotations (or binding probabilities). The package also includes the CVnorm normalization method based on the chromVAR logic. BiocAzul Represents the OpenAPI v2 Azul API as an R object for performing requests. The infrastructure uses the AnVIL and rapiclient packages. Users can connect to either the AnVIL or Human Cell Atlas Data Explorers. BiocBuildReporter This package reads remote parquet files that have processed Bioconductor build report logs. Users may query the tables directly for specific information or use pre-defined helper functions for common queries. The logs processed are from https://bioconductor.org/checkResults/. In the future we will extend this package out to include processing of r-universe logs. BiocMaintainerApp This package allows interactive viewing of package maintainer information. The Bioconductor Package Maintainer Application sends yearly verification emails to accept Bioconductor policies; this application also depicts maintainer status on opting in and if the email is deemed valid. BiocPkgDash This package provides an interactive Shiny dashboard for Bioconductor package maintainers. It visualizes various package statuses, metadata, and development metrics, offering insights into package health and activity. This tool aims to support maintainers of multiple packages by filtering packages via maintainer email. carnation Highly interactive & modular shiny app to explore three facets of RNA-Seq analysis: differential expression (DE), functional enrichment and pattern analysis. Several visualizations are implemented to provide a wide-ranging view of data sets. For DE analysis, we provide PCA plot, MA plot, Upset plot & heatmaps, in addition to a highly customizable gene plot. Seven different visualizations are available for functional enrichment analysis, and we also support gene pattern analysis. Genes of interest can be tracked across all modules using the gene scratchpad. In addition, carnation provides an integrated platform to manage multiple projects and user access that can be run on a central server to share with collaborators. CellMentor Implements supervised cell type-aware non-negative matrix factorization (NMF) for dimensional reduction in single-cell RNA sequencing analysis. The package provides methods for incorporating cell type information into the dimensionality reduction process, enabling improved visualization and downstream analysis of single-cell data while preserving biological structure. CellMentor employs a unique loss function that simultaneously minimizes variation within known cell populations while maximizing distinctions between different cell types, enabling effective transfer of learned patterns from labeled reference datasets to new unlabeled data. ClonalSim ClonalSim generates realistic mutational profiles of tumor samples with hierarchical clonal structure. It simulates founder, shared, and private mutations with biologically realistic noise models including intra-tumor heterogeneity (Beta distribution) and technical sequencing noise (negative binomial depth variation, binomial read sampling, base errors). The package is designed for benchmarking variant callers, testing clonal deconvolution algorithms, and teaching tumor heterogeneity concepts. ClusterGVis Provides a streamlined workflow for clustering and visualizing gene expression patterns, particularly from time-series RNA-Seq and sin

Reference for limma (3.68.2)

80개 topic

01.Introduction

Introduction to the LIMMA Package

Bioconductor · 3.68.2 · package · limma/man/01Introduction.Rd · 2026-05-07

LIMMA is a package for the analysis of gene expression microarray data, especially the use of linear models for analysing designed experiments and the assessment of differential expression. LIMMA provides the ability to analyse comparisons between many RNA targets simultaneously in arbitrary complicated designed experiments. Empirical Bayesian methods are used to provide stable results even when the number of arrays is small. The linear model and differential expression functions apply to all gene expression technologies, including microarrays, RNA-seq and quantitative PCR.

Aliases

01.Introductionlimmalimma-package

Keywords

documentation

Details

There are three types of documentation available: The LIMMA User's Guide can be reached through the "User Guides and Package Vignettes" links at the top of the LIMMA contents page. The function limmaUsersGuide gives the file location of the User's Guide. An overview of limma functions grouped by purpose is contained in the numbered chapters at the foot of the LIMMA package index page, of which this page is the first. The LIMMA contents page gives an alphabetical index of detailed help topics. The function changeLog displays the record of changes to the package.

See also

02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth, with contributions from many colleagues

References

Law CW, Chen Y, Shi W, Smyth GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. 10.1186/gb-2014-15-2-r29. See also the Preprint Version at https://gksmyth.github.io/pubs/VoomPreprint.pdf incorporating some notational corrections. Phipson B, Lee S, Majewski IJ, Alexander WS, and Smyth GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. 10.1214/16-AOAS920 Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. 10.1093/nar/gkv007 Smyth GK (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3. 10.2202/1544-6115.1027. See also the Preprint Version https://gksmyth.github.io/pubs/ebayes.pdf incorporating corrections to 30 June 2009.

02.Classes

Topic: Classes Defined by this Package

Bioconductor · 3.68.2 · limma/man/02classes.Rd · 2026-05-07

This package defines the following data classes. [limma:rglist]RGList A class used to store raw intensities as they are read in from an image analysis output file, usually by read.maimages. [limma:malist]MAList Intensities converted to M-values and A-values, i.e., to with-spot and whole-spot contrasts on the log-scale. Usually created from an RGList using MA.RG or normalizeWithinArrays. Objects of this class contain one row for each spot. There may be more than one spot and therefore more than one row for each probe. [limma:EList]EListRaw A class to store raw intensities for one-channel microarray data. May or may not be background corrected. Usually created by read.maimages. [limma:EList]EList A class to store normalized log2 expression values for one-channel microarray data. Usually created by normalizeBetweenArrays. [limma:marraylm]MArrayLM Store the result of fitting gene-wise linear models to the normalized intensities or log-ratios. Usually created by lmFit. Objects of this class normally contain only one row for each unique probe. [limma:TestResults]TestResults Store the results of testing a set of contrasts equal to zero for each probe. Usually created by decideTests. Objects of this class normally contain one row for each unique probe. All these data classes obey many analogies with matrices. In the case of RGList, MAList, EListRaw and EList, rows correspond to spots or probes and columns to arrays. In the case of MarrayLM, rows correspond to unique probes and the columns to parameters or contrasts. The functions summary, dim, length, ncol, nrow, dimnames, rownames, colnames have methods for these classes. Objects of any of these classes may be [limma:subsetting]subsetted. Multiple data objects may be [limma:cbind]combined by rows (to add extra probes) or by columns (to add extra arrays). Furthermore all of these classes may be coerced to actually be of class matrix using as.matrix, although this entails loss of information. Fitted model objects of class MArrayLM can be coerced to class data.frame using [limma:asdataframe]as.data.frame. The first three classes belong to the virtual class [limma:LargeDataObject]LargeDataObject. A show method is defined for LargeDataOjects which uses the utility function printHead.

Aliases

02.Classes

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

03.ReadingData

Topic: Reading Microarray Data from Files

Bioconductor · 3.68.2 · limma/man/03reading.Rd · 2026-05-07

This help page gives an overview of LIMMA functions used to read data from files.

Aliases

03.ReadingData

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Custom sections

Reading Target Information

The function readTargets is designed to help with organizing information about which RNA sample is hybridized to each channel on each array and which files store information for each array.

Reading Intensity Data

The first step in a microarray data analysis is to read into R the intensity data for each array provided by an image analysis program. This is done using the function read.maimages. read.maimages optionally constructs quality weights for each spot using quality functions listed in QualityWeights. If the data is two-color, then read.maimages produces an RGList object. If the data is one-color (single channel) then an EListRaw object is produced. In either case, read.maimages stores only the information required from each image analysis output file. read.maimages uses utility functions removeExt, read.imagene and read.columns. There are also a series of utility functions which read the header information from image output files including readGPRHeader, readImaGeneHeader and readGenericHeader. read.ilmn reads probe or gene summary profile files from Illumina BeadChips, and produces an ElistRaw object. read.idat reads Illumina files in IDAT format, and produces an EListRaw object. detectionPValues can be used to add detection p-values. The function as.MAList can be used to convert a marrayNorm object to an MAList object if the data was read and normalized using the marray and marrayNorm packages.

Reading the Gene List

Most image analysis software programs provide gene IDs as part of the intensity output files, for example GenePix, Imagene and the Stanford Microarray Database do this. In other cases the probe ID and annotation information may be in a separate file. The most common format for the probe annotation file is the GenePix Array List (GAL) file format. The function readGAL reads information from a GAL file and produces a data frame with standard column names. The function getLayout extracts from the GAL-file data frame the print layout information for a spotted array. The functions gridr, gridc, spotr and spotc use the extracted layout to compute grid positions and spot positions within each grid for each spot. The function printorder calculates the printorder, plate number and plate row and column position for each spot given information about the printing process. The utility function getSpacing converts character strings specifying spacings of duplicate spots to numeric values. The Australian Genome Research Facility in Australia often produces GAL files with composite probe IDs or names consisting of multiple strings separated by a delimiter. These can be separated into name and annotation information using strsplit2. If each probe is printed more than once of the arrays in a regular pattern, then uniquegenelist will remove duplicate names from the gal-file or gene list.

Identifying Control Spots

The functions readSpotTypes and controlStatus assist with separating control spots from ordinary genes in the analysis and data exploration.

Manipulating Data Objects

[limma:cbind]cbind, [limma:cbind]rbind, [limma:merge]merge allow different RGList or MAList objects to be combined. cbind combines data from different arrays assuming the layout of the arrays to be the same. merge can combine data even when the order of the probes on the arrays has changed. merge uses utility function makeUnique.

Author

Gordon Smyth

04.Background

Topic: Background Correction

Bioconductor · 3.68.2 · limma/man/04Background.Rd · 2026-05-07

This page deals with background correction methods provided by the backgroundCorrect, kooperberg or neqc functions. Microarray data is typically background corrected by one of these functions before normalization and other downstream analysis. backgroundCorrect works on matrices, EListRaw or RGList objects, and calls backgroundCorrect.matrix. The movingmin method of backgroundCorrect uses utility functions ma3x3.matrix and ma3x3.spottedarray. The normexp method of backgroundCorrect uses utility functions normexp.fit and normexp.signal. kooperberg is a Bayesian background correction tool designed specifically for two-color GenePix data. It is computationally intensive and requires several additional columns from the GenePix data files. These can be read in using read.maimages and specifying the other.columns argument. neqc is for single-color data. It performs normexp background correction and quantile normalization using control probes. It uses utility functions normexp.fit.control and normexp.signal. If robust=TRUE, then normexp.fit.control uses the function huber in the MASS package.

Aliases

04.Background

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

05.Normalization

Topic: Normalization of Microarray Data

Bioconductor · 3.68.2 · limma/man/05Normalization.Rd · 2026-05-07

This page gives an overview of the LIMMA functions available to normalize data from single-channel or two-colour microarrays. Smyth and Speed (2003) give an overview of the normalization techniques implemented in the functions for two-colour arrays. Usually data from spotted microarrays will be normalized using normalizeWithinArrays. A minority of data will also be normalized using normalizeBetweenArrays if diagnostic plots suggest a difference in scale between the arrays. In rare circumstances, data might be normalized using normalizeForPrintorder before using normalizeWithinArrays. All the normalization routines take account of spot quality weights which might be set in the data objects. The weights can be temporarily modified using modifyWeights to, for example, remove ratio control spots from the normalization process. If one is planning analysis of single-channel information from the microarrays rather than analysis of differential expression based on log-ratios, then the data should be normalized using a single channel-normalization technique. Single channel normalization uses further options of the normalizeBetweenArrays function. For more details see the [=limmaUsersGuide]LIMMA User's Guide which includes a section on single-channel normalization. normalizeWithinArrays uses utility functions MA.RG, loessFit and normalizeRobustSpline. normalizeBetweenArrays is the main normalization function for one-channel arrays, as well as an optional function for two-colour arrays. normalizeBetweenArrays uses utility functions normalizeMedianValues, normalizeMedianAbsValues, normalizeQuantiles and normalizeCyclicLoess, none of which need to be called directly by users. normalizeCyclicLoess calls chooseLowessSpan when adaptive.span=TRUE. neqc is a between array normalization function customized for Illumina BeadChips. The function normalizeVSN is also provided as a interface to the vsn package. It performs variance stabilizing normalization, an algorithm which includes background correction, within and between normalization together, and therefore doesn't fit into the paradigm of the other methods. removeBatchEffect can be used to remove a batch effect, associated with hybridization time or some other technical variable, prior to unsupervised analysis.

Aliases

05.Normalization

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

References

Smyth GK, Speed TP (2003). Normalization of cDNA microarray data. Methods 31, 265-273. https://gksmyth.github.io/pubs/normalize.pdf

06.LinearModels

Topic: Linear Models for Microarrays

Bioconductor · 3.68.2 · limma/man/06linearmodels.Rd · 2026-05-07

This page gives an overview of the LIMMA functions available to fit linear models and to interpret the results. This page covers models for two color arrays in terms of log-ratios or for single-channel arrays in terms of log-intensities. If you wish to fit models to the individual channel log-intensities from two colour arrays, see 07.SingleChannel. The core of this package is the fitting of gene-wise linear models to microarray data. The basic idea is to estimate log-ratios between two or more target RNA samples simultaneously. See the LIMMA User's Guide for several case studies.

Aliases

06.LinearModels

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Custom sections

Fitting Models

The main function for model fitting is lmFit. This is recommended interface for most users. lmFit produces a fitted model object of class [limma:marraylm]MArrayLM containing coefficients, standard errors and residual standard errors for each gene. lmFit calls one of the following three functions to do the actual computations: lm.series Straightforward least squares fitting of a linear model for each gene. mrlm An alternative to lm.series using robust regression as implemented by the rlm function in the MASS package. gls.series Generalized least squares taking into account correlations between duplicate spots (i.e., replicate spots on the same array) or related arrays. The function duplicateCorrelation is used to estimate the inter-duplicate or inter-block correlation before using gls.series. All the functions which fit linear models use linkgetEAW to extract data from microarray data objects, and unwrapdups which provides an unified method for handling duplicate spots.

Forming the Design Matrix

lmFit has two main arguments, the expression data and the design matrix. The design matrix is essentially an indicator matrix which specifies which target RNA samples were applied to each channel on each array. There is considerable freedom in choosing the design matrix - there is always more than one choice which is correct provided it is interpreted correctly. Design matrices for Affymetrix or single-color arrays can be created using the function [stats]model.matrix which is part of the R base package. The function modelMatrix is provided to assist with creation of an appropriate design matrix for two-color microarray experiments. For direct two-color designs, without a common reference, the design matrix often needs to be created by hand.

Making Comparisons of Interest

Once a linear model has been fit using an appropriate design matrix, the command makeContrasts may be used to form a contrast matrix to make comparisons of interest. The fit and the contrast matrix are used by contrasts.fit to compute fold changes and t-statistics for the contrasts of interest. This is a way to compute all possible pairwise comparisons between treatments for example in an experiment which compares many treatments to a common reference.

Assessing Differential Expression

After fitting a linear model, the standard errors are moderated using a simple empirical Bayes model using eBayes or treat. A moderated t-statistic and a log-odds of differential expression is computed for each contrast for each gene. treat tests whether log-fold-changes are greater than a threshold rather than merely different to zero. eBayes and treat use internal functions squeezeVar, fitFDist, fitFDistRobustly, fitFDistUnequalDF1, tmixture.matrix and tmixture.vector.

Summarizing Model Fits

After the above steps the results may be displayed or further processed using: topTable Presents a list of the genes most likely to be differentially expressed for a given contrast or set of contrasts. topTreat Presents a list of the genes most likely to be differentially expressed for one specified coefficient or contrast, following treat. volcanoplot Volcano plot of fold change versus the B-statistic for any fitted coefficient. plotlines Plots fitted coefficients or log-intensity values for time-course data. genas Estimates and plots biological correlation between two coefficients. write.fit Writes an MarrayLM object to a file. Note that if fit is an MArrayLM object, either write.fit or write.table can be used to write the results to a delimited text file. For multiple testing functions which operate on linear model fits, see 08.Tests.

Model Selection

selectModel provides a means to choose between alternative linear models using AIC or BIC information criteria.

Author

Gordon Smyth

References

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. http://projecteuclid.org/euclid.aoas/1469199900 Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, No. 1, Article 3. https://gksmyth.github.io/pubs/ebayes.pdf Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.

07.SingleChannel

Topic: Individual Channel Analysis of Two-Color Microarrays

Bioconductor · 3.68.2 · limma/man/07SingleChannel.Rd · 2026-05-07

This page gives an overview of the LIMMA functions fit linear models to two-color microarray data in terms of the log-intensities rather than log-ratios. The function intraspotCorrelation estimates the intra-spot correlation between the two channels. The regression function lmscFit takes the correlation as an argument and fits linear models to the two-color data in terms of the individual log-intensities. The output of lmscFit is an MArrayLM object just the same as from lmFit, so inference proceeds in the same way as for log-ratios once the linear model is fitted. See 06.LinearModels. The function targetsA2C converts two-color format target data frames to single channel format, i.e, converts from array-per-line to channel-per-line, to facilitate the formulation of the design matrix.

Aliases

07.SingleChannel

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

08.Tests

Topic: Hypothesis Testing for Linear Models

Bioconductor · 3.68.2 · limma/man/08Tests.Rd · 2026-05-07

LIMMA provides a number of functions for multiple testing across both contrasts and genes. The starting point is an MArrayLM object, called fit say, resulting from fitting a linear model and running eBayes and, optionally, contrasts.fit. See 06.LinearModels or 07.SingleChannel for details.

Aliases

08.Tests

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Custom sections

Multiple testing across genes and contrasts

The key function is decideTests. This function writes an object of class [limma:TestResults]TestResults, which is basically a matrix of -1, 0 or 1 elements, of the same dimension as fit$coefficients, indicating whether each coefficient is significantly different from zero. A number of different multiple testing strategies are provided. decideTests calls classifyTestsF to implement the nested F-test strategt. selectModel chooses between linear models for each probe using AIC or BIC criteria. This is an alternative to hypothesis testing and can choose between non-nested models. A number of other functions are provided to display the results of decideTests. The functions heatDiagram (or the older version heatdiagram displays the results in a heat-map style display. This allows visual comparison of the results across many different conditions in the linear model. The functions vennCounts and vennDiagram provide Venn diagram style summaries of the results. Summary and show method exists for objects of class TestResults. The results from decideTests can also be included when the results of a linear model fit are written to a file using write.fit.

Gene Set Tests

Competitive gene set testing for an individual gene set is provided by wilcoxGST or geneSetTest, which permute genes. The gene set can be displayed using barcodeplot. Self-contained gene set testing for an individual set is provided by roast, which uses rotation technology, analogous to permuting arrays. Gene set enrichment analysis for a large database of gene sets is provided by romer. topRomer is used to rank results from romer. The functions alias2Symbol, alias2SymbolTable and alias2SymbolUsingNCBI are provided to help match gene sets with microarray probes by way of official gene symbols.

Global Tests

The function genas can test for associations between two contrasts in a linear model. Given a set of p-values, the function propTrueNull can be used to estimate the proportion of true null hypotheses. When evaluating test procedures with simulated or known results, the utility function auROC can be used to compute the area under the Receiver Operating Curve for the test results for a given probe.

Author

Gordon Smyth

09.Diagnostics

Topic: Diagnostics and Quality Assessment

Bioconductor · 3.68.2 · limma/man/09Diagnostics.Rd · 2026-05-07

This page gives an overview of the LIMMA functions available for microarray quality assessment and diagnostic plots. This package provides an [limma:anova-method]anova method which is designed for assessing the quality of an array series or of a normalization method. It is not designed to assess differential expression of individual genes. [limma:anova-method]anova uses utility functions bwss and bwss.matrix. The function arrayWeights estimates the empirical reliability of each array following a linear model fit. Diagnostic plots can be produced by imageplot Produces a spatial picture of any spot-specific measure from an array image. If the log-ratios are plotted, then this produces an in-silico representation of the well known false-color TIFF image of an array. imageplot3by2 will write imageplots to files, six plots to a page. plotFB Plots foreground versus background log-intensies. plotMD Mean-difference plots. Very versatile plot. For two color arrays, this plots the M-values vs A-values. For single channel technologies, this plots one column of log-expression values vs the average of the other columns. For fitted model objects, this plots a log-fold-change versus average log-expression. mdplot can also be useful for comparing two one-channel microarrays. plotMA MA-plots, essentially the same as mean-difference plots. plotMA3by2 will write MA-plots to files, six plots to a page. plotWithHighlights Scatterplots with highlights. This is the underlying engine for plotMD and plotMA. plotPrintTipLoess Produces a grid of MA-plots, one for each print-tip group on an array, together with the corresponding lowess curve. Intended to help visualize print-tip loess normalization. plotPrintorder For an array, produces a scatter plot of log-ratios or log-intensities by print order. plotDensities Individual channel densities for one or more arrays. An essential plot to accompany between array normalization, especially quantile normalization. plotMDS Multidimensional scaling plot for a set of arrays. Useful for visualizing the relationship between the set of samples. plotSA Sigma vs A plot. After a linear model is fitted, this checks constancy of the variance with respect to intensity level. plotPrintTipLoess uses utility functions gridr and gridc. plotDensities uses utility function RG.MA.

Aliases

09.Diagnostics

Keywords

documentation

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

10.GeneSetTests

Topic: Gene Set Tests

Bioconductor · 3.68.2 · limma/man/10GeneSetTests.Rd · 2026-05-07

This page gives an overview of the LIMMA functions for gene set testing and pathway analysis. roast Self-contained gene set testing for one set. Uses zscoreT to normalize t-statistics. mroast Self-contained gene set testing for many sets. Uses zscoreT to normalize t-statistics. fry Fast approximation to mroast, especially useful when heteroscedasticity of genes can be ignored. camera Competitive gene set testing. cameraPR Competitive gene set testing with a pre-ranked gene set. romer and topRomer Gene set enrichment analysis. ids2indices Convert gene sets consisting of vectors of gene identifiers into a list of indices suitable for use in the above functions. alias2Symbol, alias2SymbolTable and alias2SymbolUsingNCBI Convert gene symbols or aliases to current official symbols. geneSetTest or wilcoxGST Simple gene set testing based on gene or probe permutation. barcodeplot Enrichment plot of a gene set. goana and topGO Gene ontology over-representation analysis of gene lists using Entrez Gene IDs. goana can work directly on a fitted model object or on one or more lists of genes. kegga and topKEGG KEGG pathway over-representation analysis of gene lists using Entrez Gene IDs. kegga can work directly on a fitted model object or on one or more lists of genes.

Aliases

10.GeneSetTests

Keywords

documentation

Concepts

gene set tests

See also

01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

Author

Gordon Smyth

11.RNAseq

Topic: Analysis of RNA-seq Data

Bioconductor · 3.68.2 · limma/man/11RNAseq.Rd · 2026-05-07

This page gives an overview of LIMMA functions to analyze RNA-seq data. voom Transform RNA-seq or ChIP-seq counts to log counts per million (log-cpm) with associated precision weights. After this tranformation, RNA-seq or ChIP-seq data can be analyzed using the same functions as would be used for microarray data. voomWithQualityWeights Combines the functionality of voom and arrayWeights. diffSplice Test for differential transcript or exon usage between experimental conditions. topSplice Show a data.frame of top results from diffSplice. plotSplice Plot results from diffSplice. plotExons Plot logFC for individual exons for a given gene.

Aliases

11.RNAseq

Keywords

documentation

See also

See also the edgeR package for normalization and data summaries of RNA-seq data, as well as for alternative differential expression methods based on the negative binomial distribution. voom accepts DGEList objects and normalization factors from edgeR. The edgeR function voomLmFit is a drop-in replacement for either voom or voomWithQualityWeights. 01.Introduction, 02.Classes, 03.ReadingData, 04.Background, 05.Normalization, 06.LinearModels, 07.SingleChannel, 08.Tests, 09.Diagnostics, 10.GeneSetTests, 11.RNAseq

References

Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. 10.1186/gb-2014-15-2-r29 Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. 10.1093/nar/gkv007

EList-class

Expression List (EList) class

Bioconductor · 3.68.2 · class · limma/man/EList.Rd · 2026-05-07

A list-based S4 classes for storing expression values (E-values), for example for a set of one-channel microarrays or a set of RNA-seq samples. EListRaw holds expression values on the raw scale. EList holds expression values on the log scale, usually after background correction and normalization. EListRaw objects are often created by read.maimages, while EList objects are often created by normalizeBetweenArrays or by voom. Alternatively, an EList object can be created directly by new("EList",x), where x is a list.

Aliases

EList-classEListRaw-class

Keywords

classes

Examples

# Two ways to make an EList object: y <- matrix(rnorm(10,5),10,5) rownames(y) <- paste0("Gene",1:10) colnames(y) <- LETTERS[1:5] Genes <- data.frame(Chr=sample(1:21,10)) row.names(Genes) <- row.names(y) # Create the object, than add components: E <- new("EList") E$E <- y E$genes <- Genes # Create with components: E <- new("EList", list(E=y, genes=Genes))

See also

02.Classes gives an overview of all the classes defined by this package. [Biobase:class.ExpressionSet]ExpressionSet is a more formal class in the Biobase package used for the same purpose.

Custom sections

Required Components

These classes contains no slots (other than .Data), but objects should contain a list component E: Enumeric matrix containing expression values. In an EListRaw object, the expression values are unlogged, while in an EList object, they are log2 values. Rows correspond to probes and columns to samples.

Optional Components

Optional components include: Ebnumeric matrix containing unlogged background expression values, of same dimensions as E. For an EListRaw object only. weightsnumeric matrix of same dimensions as E containing relative spot quality weights. Elements should be non-negative. otherlist containing other matrices, all of the same dimensions as E. genesdata.frame containing probe information. Should have one row for each probe. May have any number of columns. targetsdata.frame containing information on the target RNA samples. Rows correspond to samples. May have any number of columns. Valid EList or EListRaw objects may contain other optional components, but all probe or sample information should be contained in the above components.

Methods

These classes inherit directly from class list so any operation appropriate for lists will work on objects of this class. In addition, EList objects can be [limma:subsetting]subsetted and [limma:cbind]combined. EList objects will return dimensions and hence functions such as [limma:dim]dim, [base:nrow]nrow and [base:nrow]ncol are defined. ELists also inherit a [methods]show method from the virtual class [limma:LargeDataObject]LargeDataObject, which means that ELists will print in a compact way.

Author

Gordon Smyth

LargeDataObject-class

Large Data Object - class

Bioconductor · 3.68.2 · class · limma/man/LargeDataObject.Rd · 2026-05-07

A virtual class including the data classes RGList, MAList and MArrayLM, all of which typically contain large quantities of numerical data in vector, matrices and data.frames.

Aliases

LargeDataObject-classshow,LargeDataObject-method

Keywords

classesdata

Examples

# see normalizeBetweenArrays

See also

02.Classes gives an overview of all the classes defined by this package.

Custom sections

Methods

A show method is defined for objects of class LargeDataObject which uses printHead to print only the leading elements or rows of components or slots which contain large quantities of data.

Author

Gordon Smyth

MAList-class

M-value, A-value Expression List - class

Bioconductor · 3.68.2 · class · limma/man/malist.Rd · 2026-05-07

A simple list-based class for storing M-values and A-values for a batch of spotted microarrays. MAList objects are usually created during normalization by the functions normalizeWithinArrays or MA.RG.

Aliases

MAList-class

Keywords

classesdata

See also

02.Classes gives an overview of all the classes defined by this package. marrayNorm is the corresponding class in the marray package.

Custom sections

Slots/List Components

MAList objects can be created by new("MAList",MA) where MA is a list. This class contains no slots (other than .Data), but objects should contain the following components: ll M: numeric matrix containing the M-values (log-2 expression ratios). Rows correspond to spots and columns to arrays. A: numeric matrix containing the A-values (average log-2 expression values). Optional components include: ll weights: numeric matrix of same dimensions as M containing relative spot quality weights. Elements should be non-negative. other: list containing other matrices, all of the same dimensions as M. genes: data.frame containing probe information. Should have one row for each spot. May have any number of columns. targets: data.frame containing information on the target RNA samples. Rows correspond to arrays. May have any number of columns. Usually includes columns Cy3 and Cy5 specifying which RNA was hybridized to each array. printer: list containing information on the process used to print the spots on the arrays. See [limma:PrintLayout]PrintLayout. Valid MAList objects may contain other optional components, but all probe or array information should be contained in the above components.

Methods

This class inherits directly from class list so any operation appropriate for lists will work on objects of this class. In addition, MAList objects can be [=subsetting]subsetted and [=cbind]combined. RGList objects will return dimensions and hence functions such as [limma:dim]dim, [base:nrow]nrow and [base:nrow]ncol are defined. MALists also inherit a [methods]show method from the virtual class [limma:LargeDataObject]LargeDataObject, which means that RGLists will print in a compact way. Other functions in LIMMA which operate on MAList objects include normalizeWithinArrays, normalizeBetweenArrays, normalizeForPrintorder, plotMA and plotPrintTipLoess.

Author

Gordon Smyth

MArrayLM-class

Microarray Linear Model Fit - class

Bioconductor · 3.68.2 · class · limma/man/marraylm.Rd · 2026-05-07

A list-based S4 class for storing the results of fitting gene-wise linear models to a set of microarrays. Objects are normally created by lmFit, and additional components are added by eBayes.

Aliases

MArrayLM-class

Keywords

classesregression

See also

02.Classes gives an overview of all the classes defined by this package.

Custom sections

Components

MArrayLM objects do not contain any slots (apart from .Data) but they should contain the following list components: ll coefficients matrix containing fitted coefficients or contrasts stdev.unscaled matrix containing unscaled standard deviations of the coefficients or contrasts sigma numeric vector containing residual standard deviations for each gene df.residual numeric vector containing residual degrees of freedom for each gene The following additional components may be created by lmFit: ll Amean numeric vector containing the average log-intensity for each probe over all the arrays in the original linear model fit. Note this vector does not change when a contrast is applied to the fit using contrasts.fit. genes data.frame containing probe annotation. design design matrix. cov.coefficients numeric matrix giving the unscaled covariance matrix of the estimable coefficients pivot integer vector giving the order of coefficients in cov.coefficients. Is computed by the QR-decomposition of the design matrix. qr QR-decomposition of the design matrix (if the fit involved no weights or missing values). other components returned by lm.fit (if the fit involved no weights or missing values). The following component may be added by contrasts.fit: ll contrasts numeric matrix defining contrasts of coefficients for which results are desired. The following components may be added by eBayes: ll s2.prior numeric value or vector giving empirical Bayes estimated prior value for residual variances df.prior numeric value or vector giving empirical Bayes estimated degrees of freedom associated with s2.prior for each gene df.total numeric vector giving total degrees of freedom used for each gene, usually equal to df.prior + df.residual. s2.post numeric vector giving posterior residual variances var.prior numeric vector giving empirical Bayes estimated prior variance for each true coefficient F numeric vector giving moderated F-statistics for testing all contrasts equal to zero F.p.value numeric vector giving p-value corresponding to F.stat t numeric matrix containing empirical Bayes t-statistics

Methods

MArrayLM objects will return dimensions and hence functions such as [limma:dim]dim, [base:nrow]nrow and [base:nrow]ncol are defined. MArrayLM objects inherit a show method from the virtual class LargeDataObject. [limma:fitted.MArrayLM]fitted will return a matrix of fitted values and [limma:residuals.MArrayLM]residuals will return a matrix of residuals. The functions eBayes, decideTests and classifyTestsF accept MArrayLM objects as arguments.

Author

Gordon Smyth

PrintLayout

Print Layout - class

Bioconductor · 3.68.2 · class · limma/man/PrintLayout.Rd · 2026-05-07

A list-based class for storing information about the process used to print spots on a microarray. PrintLayout objects can be created using getLayout. The printer component of an RGList or MAList object is of this class.

Aliases

PrintLayout-class

Keywords

classesdata

Examples

# Settings for Swirl and ApoAI example data sets in User's Guide printer <- list(ngrid.r=4, ngrid.c=4, nspot.r=22, nspot.c=24, ndups=1, spacing=1, npins=16, start="topleft") # Typical settings at the Australian Genome Research Facility # Full pin set, duplicates side-by-side on same row printer <- list(ngrid.r=12, ngrid.c=4, nspot.r=20, nspot.c=20, ndups=2, spacing=1, npins=48, start="topright") # Half pin set, duplicates in top and lower half of slide printer <- list(ngrid.r=12, ngrid.c=4, nspot.r=20, nspot.c=20, ndups=2, spacing=9600, npins=24, start="topright")

See also

02.Classes gives an overview of all the classes defined by this package.

Custom sections

Slots/List Components

Objects of this class contains no slots but should contain the following list components: ll ngrid.r: number of grid rows on the arrays ngrid.c: number of grid columns on the arrays nspot.r: number of rows of spots in each grid nspot.c: number of columns of spots in each grid ndups: number of duplicates of each DNA clone, i.e., number of times print-head dips into each well of DNA spacing: number of spots between duplicate spots. Only applicable if ndups>1. spacing=1 for side-by-side spots by rows, spacing=nspot.c for side-by-side spots by columns, spacing=ngrid.r*ngrid.c*nspot.r*nspot.c/2 for duplicate spots in top and bottom halves of each array. npins: actual number of pins or tips on the print-head start: character string giving position of the spot printed first in each grid. Choices are "topleft" or "topright" and partial matches are accepted.

Author

Gordon Smyth

QualityWeights

Spot Quality Weights for Spotted Microarrays

Bioconductor · 3.68.2 · limma/man/qualwt.Rd · 2026-05-07

Functions to calculate quality weights for individual spots based on the image analysis output file for a spotted microarray.

Aliases

QualityWeightswtareawtflagswtIgnore.Filter

Keywords

reading data

Usage

wtarea(ideal = c(160,170)) wtflags(weight = 0, cutoff = 0) wtIgnore.Filter

Arguments

ideal

numeric vector giving the ideal range of areas for good quality spots (in pixels). The minimum and maximum values are used to specify the range of ideal values. All values should be positive.

weight

non-negative weight to be given to flagged spots.

cutoff

cutoff value for Flags below which spots will be downweighted.

Details

These functions can be passed as an argument to read.maimages to construct quality weights as the microarray data is read in. wtarea downweights unusually small or large spots and is designed for SPOT output. It gives weight 1 to spots that have areas in the ideal range, given in pixels, and linearly downweights spots that are smaller or larger than this range. wtflags is designed for GenePix output and gives the specified weight to spots with Flags value less than the cutoff value. Choose cutoff=0 to downweight all flagged spots. Choose cutoff=-50 to downweight bad or absent spots or cutoff=-75 to downweight only spots which have been manually flagged as bad. wtIgnore.Filter is designed for QuantArray output and sets the weights equal to the column Ignore Filter produced by QuantArray. These weights are 0 for spots to be ignored and 1 otherwise.

Value

A function that takes a dataframe or matrix as argument and produces a numeric vector of weights between 0 and 1.

Examples

# Read in spot output files from current directory and give full weight to 165 # pixel spots. Note: for this example to run you must set fnames to the names # of actual spot output files (data not provided). RG <- read.maimages(fnames,source="spot",wt.fun=wtarea(165)) # Spot will be downweighted according to weights found in RG MA <- normalizeWithinArrays(RG,layout)

See also

An overview of LIMMA functions for reading data is given in 03.ReadingData.

Author

Gordon Smyth

RGList-class

Red, Green Intensity List - class

Bioconductor · 3.68.2 · class · limma/man/rglist.Rd · 2026-05-07

A list-based S4 class for storing red and green channel foreground and background intensities for a batch of spotted microarrays. RGList objects are normally created by read.maimages.

Aliases

RGList-classcoerce,RGList,exprSet2-method

Keywords

classesdata

See also

02.Classes gives an overview of all the classes defined by this package. marrayRaw is the corresponding class in the marray package.

Custom sections

Slots/List Components

RGList objects can be created by new("RGList",RG) where RG is a list. Objects of this class contains no slots (other than .Data), but objects should contain the following list components: ll R numeric matrix containing the red (cy5) foreground intensities. Rows correspond to spots and columns to arrays. G numeric matrix containing the green (cy3) foreground intensities. Rows correspond to spots and columns to arrays. Optional components include ll Rb numeric matrix containing the red (cy5) background intensities Gb numeric matrix containing the green (cy3) background intensities weights numeric matrix of same dimension as R containing relative spot quality weights. Elements should be non-negative. other list containing other matrices, all of the same dimensions as R and G. genes data.frame containing probe information. Should have one row for each spot. May have any number of columns. targets data.frame containing information on the target RNA samples. Rows correspond to arrays. May have any number of columns. Usually includes columns Cy3 and Cy5 specifying which RNA was hybridized to each array. printer list containing information on the process used to print the spots on the arrays. See [limma:PrintLayout]PrintLayout. Valid RGList objects may contain other optional components, but all probe or array information should be contained in the above components.

Methods

This class inherits directly from class list so any operation appropriate for lists will work on objects of this class. In addition, RGList objects can be [limma:subsetting]subsetted, [limma:cbind]combined and [limma:merge]merged. RGList objects will return dimensions and hence functions such as [limma:dim]dim, [base:nrow]nrow and [base:nrow]ncol are defined. RGLists also inherit a [methods]show method from the virtual class [limma:LargeDataObject]LargeDataObject, which means that RGLists will print in a compact way. RGList objects can be converted to exprSet2 objects by as(RG,"exprSet2"). Other functions in LIMMA which operate on RGList objects include normalizeBetweenArrays, normalizeForPrintorder, normalizeWithinArrays.

Author

Gordon Smyth

TestResults-class

Matrix of Test Results - class

Bioconductor · 3.68.2 · class · limma/man/TestResults.Rd · 2026-05-07

A matrix-based class for storing the results of simultanous tests. TestResults objects are usually created by decideTests.

Aliases

TestResults-classshow,TestResults-methodsummary.TestResultslevels.TestResultslabels.TestResultsunique.TestResults

Keywords

classeshtest

Usage

3methodsummaryTestResults(object, ) 3methodlabelsTestResults(object, ) 3methodlevelsTestResults(x)

Arguments

object, x

object of class TestResults

other arguments are not used

Examples

# Assume a data object y and a design matrix fit <- lmFit(y, design) fit <- eBayes(fit) results <- decideTests(fit) summary(results)

See also

02.Classes gives an overview of all the classes defined by this package. 08.Tests gives an overview of multiple testing.

Custom sections

Slots/List Components

A TestResults object is essentially a numeric matrix with elements equal to 0, 1 or -1. Zero represents acceptance of the null hypothesis, 1 indicates rejection in favor of the right tail alternative and -1 indicates rejection in favor of the left tail alternative. TestResults objects can be created by new("TestResults",results) where results is a matrix. Objects of this class contain no slots (other than .Data), although the attributes dim and dimnames may be treated as slots.

Methods

This class inherits directly from class matrix so any operation appropriate for matrices will work on objects of this class. [methods]show and summary methods are also implemented. Functions in LIMMA which operate on TestResults objects include heatDiagram, vennCounts, vennDiagram, write.fit.

Author

Gordon Smyth

alias2Symbol

Convert Gene Aliases to Official Gene Symbols

Bioconductor · 3.68.2 · limma/man/alias2Symbol.Rd · 2026-05-07

Maps gene alias names to official gene symbols.

Aliases

alias2Symbolalias2SymbolTablealias2SymbolUsingNCBI

Keywords

gene annotation

Usage

alias2Symbol(alias, species = "Hs", expand.symbols = FALSE) alias2SymbolTable(alias, species = "Hs") alias2SymbolUsingNCBI(alias, gene.info.file, required.columns = c("GeneID","Symbol","description"))

Arguments

alias

character vector of gene aliases

species

expand.symbols

logical. This affects those elements of alias that are the official gene symbol for one gene and also an alias for another gene. If FALSE, then these elements will just return themselves. If TRUE, then all the genes for which they are aliases will also be returned.

gene.info.file

either the name of a gene information file downloaded from the NCBI or a data.frame resulting from reading such a file.

required.columns

character vector of columns from the gene information file that are required in the output.

Details

Aliases are mapped via NCBI Entrez Gene identity numbers using Bioconductor organism packages. alias2Symbol maps a set of aliases to a set of symbols, without necessarily preserving order. The output vector may be longer or shorter than the original vector, because some aliases might not be found and some aliases may map to more than one symbol. alias2SymbolTable returns of vector of the same length as the vector of aliases. If an alias maps to more than one symbol, then the one with the lowest Entrez ID number is returned. If an alias can't be mapped, then NA is returned. species can be any character string XX for which an organism package org.XX.eg.db exists and is installed. The only requirement of the organism package is that it contains objects org.XX.egALIAS2EG and org.XX.egSYMBOL linking the aliases and symbols to Entrez Gene Ids. At the time of writing, the following organism packages are available from Bioconductor 3.16: lll Package Species org.Ag.eg.db Anopheles org.Bt.eg.db Bovine org.Ce.eg.db Worm org.Cf.eg.db Canine org.Dm.eg.db Fly org.Dr.eg.db Zebrafish org.EcK12.eg.db E coli strain K12 org.EcSakai.eg.db E coli strain Sakai org.Gg.eg.db Chicken org.Hs.eg.db Human org.Mm.eg.db Mouse org.Mmu.eg.db Rhesus org.Pt.eg.db Chimp org.Rn.eg.db Rat org.Ss.eg.db Pig org.Xl.eg.db Xenopus alias2SymbolUsingNCBI is analogous to alias2SymbolTable but uses a gene-info file from NCBI instead of a Bioconductor organism package. It also gives the option of returning multiple columns from the gene-info file. NCBI gene-info files can be downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/. For example, the human file is https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz and the mouse file is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz.

Value

alias2Symbol and alias2SymbolTable produce a character vector of gene symbols. alias2SymbolTable returns a vector of the same length and order as alias, including NA values where no gene symbol was found. alias2Symbol returns an unordered vector that may be longer or shorter than alias. alias2SymbolUsingNCBI returns a data.frame with rows corresponding to the entries of alias and columns as specified by required.columns.

Examples

alias2Symbol(c("PUMA","NOXA","BIM"), species="Hs") alias2Symbol("RS1", expand=TRUE)

See also

This function is often used to assist gene set testing, see 10.GeneSetTests.

Author

Gordon Smyth and Yifang Hu

anova.MAList-method

ANOVA Table - method

Bioconductor · 3.68.2 · methods · limma/man/anova-method.Rd · 2026-05-07

Analysis of variance method for objects of class MAList. Produces an ANOVA table useful for quality assessment by decomposing between and within gene sums of squares for a series of replicate arrays. This method produces a single ANOVA Table rather than one for each gene and is not used to identify differentially expressed genes.

Aliases

anova.MAList

Keywords

models

See also

MAList-class, bwss.matrix, [stats:anova]anova. An overview of quality assessment and diagnostic functions in LIMMA is given by 09.Diagnostics.

Custom sections

Usage

anova(object,design=NULL,ndups=2,)

Arguments

objectobject of class MAList. Missing values in the M-values are not allowed. designnumeric vector or single-column matrix containing the design matrix for linear model. The length of the vector or the number of rows of the matrix should agree with the number of columns of M. ndupsnumber of duplicate spots. Each gene is printed ndups times in adjacent spots on each array. other arguments are not used

Details

This function aids in quality assessment of microarray data and in the comparison of normalization methodologies. It applies only to replicated two-color experiments in which all the arrays are hybridized with the same RNA targets, possibly with dye-swaps, so the design matrix should have only one column. The function has not been heavily used and is somewhat experimental.

Value

An object of class anova containing rows for between genes, between arrays, gene x array interaction, and between duplicate with array sums of squares. Variance components are estimated for each source of variation.

Note

This function does not give valid results in the presence of missing M-values.

Author

Gordon Smyth

arrayWeights

Array Quality Weights

Bioconductor · 3.68.2 · limma/man/arrayWeights.Rd · 2026-05-07

Estimate relative quality weights for each array or group in a multi-array experiment.

Aliases

arrayWeights

Keywords

regressionmodels

Usage

arrayWeights(object, design = NULL, weights = NULL, var.design = NULL, var.group = NULL, prior.n = 10, method = "auto", maxiter = 50, tol = 1e-5, trace = FALSE)

Arguments

object

any matrix-like object containing log-expression values or log-ratio expression values, for example an EList or ExpressionSet object. See help("getEAWP") for a list of possible classes.

design

the design matrix of the microarray experiment, with rows corresponding to arrays and columns to coefficients to be estimated. Defaults to the unit vector meaning that the arrays are treated as replicates.

weights

numeric matrix containing prior weights for each expresson value.

var.design

design matrix for the variance model. Defaults to the sample-specific model whereby each sample has a distinct variance.

var.group

vector or factor indicating groups to have different array weights. This is another way to specify var.design for groupwise variance models.

prior.n

prior number of genes. Larger values squeeze the array weights more strongly towards equality.

method

character string specifying the estimating algorithm to be used. Choices are "genebygene", "reml" or "auto".

maxiter

maximum number of iterations allowed when method="reml".

tol

convergence tolerance when method="reml".

trace

logical. If TRUE then progress information is printed at each iteration of the "reml" algorithm or at every 1000th gene for the "genebygene" algorithm.

Details

The relative reliability of each array is estimated by measuring how well the expression values for that array follow the linear model. Arrays that tend to have larger residuals are assigned lower weights. The basic method is described by Ritchie et al (2006) and the extension to custom variance models by Liu et al (2015). A weighted linear model is fitted to the expression values for each gene. The variance model is fitted to the squared residuals from the linear model fit and is updated either by full REML scoring iterations (method="reml") or using an efficient gene-by-gene update algorithm (method="genebygene"). The final estimates of these array variances are converted to weights. The gene-by-gene algorithm is described by Ritchie et al (2006) while the REML algorithm is an adaption of the algorithm of Smyth (2002). For stability, the array weights are squeezed slightly towards equality. This is done by adding a prior likelihood corresponding to unit array weights equivalent to prior.n genes. The gene-by-gene algorithm is started from the prior genes while the REML algorithm adds the prior to the log-likelihood derivatives. By default, arrayWeights chooses between the REML and gene-by-gene algorithms automatically (method="auto"). REML is chosen if there are no prior weights or missing values and otherwise the gene-by-gene algorithm is used. The input object is interpreted as for lmFit and getEAWP. In particular, the arguments design and weights will be extracted from the data object if available and do not normally need to be set explicitly in the call; if any of these are set in the call then they will over-ride the slots or components in the data object.

Value

A numeric vector of array weights, which multiply to 1.

Examples

ngenes <- 1000 narrays <- 6 y <- matrix(rnorm(ngenes*narrays), ngenes, narrays) var.group <- c(1,1,1,2,2,2) y[,var.group==1] <- 2*y[,var.group==1] arrayWeights(y, var.group=var.group)

See also

arrayWeightsQuick, voomWithQualityWeights An overview of linear model functions in limma is given by 06.LinearModels.

Author

Matthew Ritchie and Gordon Smyth

References

Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S., Blewitt, M. E., Asselin-Labat, M.-L., Smyth, G. K., Ritchie, M. E. (2015). Why weight? Combining voom with estimates of sample quality improves power in RNA-seq analyses. Nucleic Acids Research 43, e97. https://doi.org/10.1093/nar/gkv412 Ritchie, M. E., Diyagama, D., Neilson, van Laar, R., J., Dobrovic, A., Holloway, A., and Smyth, G. K. (2006). Empirical array quality weights in the analysis of microarray data. BMC Bioinformatics 7, 261. 10.1186/1471-2105-7-261 Smyth, G. K. (2002). An efficient algorithm for REML in heteroscedastic regression. Journal of Computational and Graphical Statistics 11, 836-847. https://gksmyth.github.io/pubs/remlalgo.pdf

arrayWeightsQuick

Array Quality Weights

Bioconductor · 3.68.2 · limma/man/arrayWeightsQuick.Rd · 2026-05-07

Estimates relative quality weights for each array in a multi-array experiment with replication.

Aliases

arrayWeightsQuick

Usage

arrayWeightsQuick(y, fit)

Arguments

the data object used to estimate fit. Can be of any class which can be coerced to matrix, including matrix, MAList, marrayNorm or ExpressionSet.

fit

MArrayLM fitted model object

Details

Estimates the relative reliability of each array by measuring how well the expression values for that array follow the linear model. This is a quick and dirty version of arrayWeights.

Value

Numeric vector of weights of length ncol(fit).

Examples

fit <- lmFit(y, design) arrayWeightsQuick(y, fit)

See also

See arrayWeights. An overview of LIMMA functions for reading data is given in 03.ReadingData.

Author

Gordon Smyth

References

Ritchie, M. E., Diyagama, D., Neilson, van Laar, R., J., Dobrovic, A., Holloway, A., and Smyth, G. K. (2006). Empirical array quality weights in the analysis of microarray data. BMC Bioinformatics 7, 261. 10.1186/1471-2105-7-261

as.MAList

Convert marrayNorm Object to an MAList Object

Bioconductor · 3.68.2 · limma/man/asmalist.Rd · 2026-05-07

Convert marrayNorm Object to an MAList Object

Aliases

as.MAList

Keywords

classesdata

Usage

as.MAList(object)

Arguments

object

an marrayNorm object

Details

The marrayNorm class is defined in the marray package. This function converts a normalized two color microarray data object created by the marray package into the corresponding limma data object. Note that such conversion is not necessary to access the limma linear modelling functions, because lmFit will operate on a marrayNorm data object directly.

Value

Object of class [=MAList-class]MAList

See also

02.Classes gives an overview of all the classes defined by this package. The marrayNorm class is defined in the marray package.

Author

Gordon Smyth

as.data.frame

Turn a Microarray Linear Model Object into a Dataframe

Bioconductor · 3.68.2 · limma/man/asdataframe.Rd · 2026-05-07

Turn a MArrayLM object into a data.frame.

Aliases

as.data.frame.MAListas.data.frame.EListas.data.frame.EListRawas.data.frame.MArrayLM

Keywords

array

Usage

as.data.frameMArrayLM(x, row.names = NULL, optional = FALSE, )

Arguments

an object of class MArrayLM

row.names

NULL or a character vector giving the row names for the data frame. Missing values are not allowed.

optional

logical. If TRUE, setting row names and converting column names (to syntactic names) is optional.

additional arguments to be passed to or from methods.

Details

This method combines all the components of x which have a row for each probe on the array into a data.frame.

Value

A data.frame.

See also

[base]as.data.frame in the base package. 02.Classes gives an overview of data classes used in LIMMA. 06.LinearModels gives an overview of linear model functions in LIMMA.

Author

Gordon Smyth

as.matrix

Turn a Microarray Data Object into a Matrix

Bioconductor · 3.68.2 · limma/man/asmatrix.Rd · 2026-05-07

Turn a microarray data object into a numeric matrix by extracting the expression values.

Aliases

as.matrix.RGListas.matrix.MAListas.matrix.EListRawas.matrix.EListas.matrix.MArrayLMas.matrix.marrayNormas.matrix.PLMsetas.matrix.ExpressionSetas.matrix.LumiBatchas.matrix.vsn

Keywords

array

Usage

as.matrixMAList(x,)

Arguments

an object of class RGList, MAList, EList, MArrayLM, marrayNorm, PLMset, ExpressionSet, LumiBatch or vsn.

additional arguments, not used for these methods.

Details

These methods extract the matrix of log-ratios, for MAList or marrayNorm objects, or the matrix of expression values for other expression objects such as EList or ExressionSet. For MArrayLM objects, the matrix of fitted coefficients is extracted. These methods involve loss of information, so the original data object is not recoverable.

Value

A numeric matrix.

See also

as.matrix in the base package or [Biobase]exprs in the Biobase package. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

asMatrixWeights

Bioconductor · 3.68.2 · limma/man/asMatrixWeights.Rd · 2026-05-07

Convert probe-weights or array-weights to a matrix of weights.

Aliases

asMatrixWeights

Usage

asMatrixWeights(weights, dim)

Arguments

weights

numeric matrix of weights, rows corresponding to probes and columns to arrays. Or vector of probe weights. Or vector of array weights.

dim

numeric dimension vector of length 2, i.e., the number of probes and the number of arrays.

Details

This function converts a vector or probe-weights or a vector of array-weights to a matrix of the correct size. Probe-weights are repeated across rows while array-weights are repeated down the columns. If weights has length equal to the number of probes, it is assumed to contain probe-weights. If it has length equal to the number of arrays, it is assumed to contain array-weights. If the number of probes is equal to the number of arrays, then weights is assumed to contain array-weights if it is a row-vector of the correct size, i.e., if it is a matrix with one row. This function is used internally by the linear model fitting functions in limma.

Value

Numeric matrix of dimension dim.

Examples

asMatrixWeights(1:3,c(4,3)) asMatrixWeights(1:4,c(4,3))

See also

modifyWeights. An overview of functions in LIMMA used for fitting linear models is given in 06.LinearModels.

Author

Gordon Smyth

auROC

Area Under Receiver Operating Curve

Bioconductor · 3.68.2 · limma/man/auROC.Rd · 2026-05-07

Compute exact area under the ROC for empirical data.

Aliases

auROC

Keywords

htest

Usage

auROC(truth, stat=NULL)

Arguments

truth

logical vector, or numeric vector of 0s and 1s, indicating whether each case is a true positive.

stat

numeric vector containing test statistics used to rank cases, from largest to smallest. If NULL, then truth is assumed to be already sorted in decreasing test statistic order.

Details

A receiver operating curve (ROC) is a plot of sensitivity (true positive rate) versus 1-specificity (false positive rate) for a statistical test or binary classifier. The area under the ROC is a well accepted measure of test performance. It is equivalent to the probability that a randomly chosen pair of cases is corrected ranked. Here we consider a test statistic stat, with larger values being more significant, and a vector truth indicating whether the alternative hypothesis is in fact true. truth==TRUE or truth==1 indicates a true discovery and truth=FALSE or truth=0 indicates a false discovery. Correct ranking here means that truth[i] is greater than or equal to truth[j] when stat[i] is greater than stat[j]. The function computes the exact area under the empirical ROC curve defined by truth when ordered by stat. If stat contains ties, then auROC returns the average area under the ROC for all possible orderings of truth for tied stat values. The area under the curve is undefined if truth is all TRUE or all FALSE or if truth or stat contain missing values.

Value

Numeric value between 0 and 1 giving area under the curve, 1 being perfect and 0 being the minimum.

Examples

auROC(c(1,1,0,0,0)) truth <- rbinom(30,size=1,prob=0.2) stat <- rchisq(30,df=2) auROC(truth,stat)

Author

Gordon Smyth

avearrays

Average Over Replicate Arrays

Bioconductor · 3.68.2 · limma/man/avearrays.Rd · 2026-05-07

Condense a microarray data object so that technical replicate arrays are replaced with (weighted) averages.

Aliases

avearraysavearrays.defaultavearrays.MAListavearrays.EList

Usage

avearraysdefault(x, ID=colnames(x), weights=NULL) avearraysMAList(x, ID=colnames(x), weights=x$weights) avearraysEList(x, ID=colnames(x), weights=x$weights)

Arguments

a matrix-like object, usually a matrix, MAList or EList object.

sample identifier.

weights

numeric matrix of non-negative weights

Details

A new data object is computed in which technical replicate arrays are replaced by their (weighted) averages. For an MAList object, the components M and A are both averaged in this way, as are weights and any matrices found in object$other. EList objects are similar, except that the E component is averaged instead of M and A. If x is of mode "character", then the replicate values are assumed to be equal and the first is taken as the average.

Value

A data object of the same class as x with a column for each unique value of ID.

Examples

x <- matrix(rnorm(8*3),8,3) colnames(x) <- c("a","a","b") avearrays(x)

See also

avereps. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

avedups

Average Over Duplicate Spots

Bioconductor · 3.68.2 · limma/man/avedups.Rd · 2026-05-07

Condense a microarray data object so that values for within-array replicate spots are replaced with their average.

Aliases

avedupsavedups.defaultavedups.MAListavedups.EList

Keywords

array

Usage

avedupsdefault(x, ndups=2, spacing=1, weights=NULL) avedupsMAList(x, ndups=x$printer$ndups, spacing=x$printer$spacing, weights=x$weights) avedupsEList(x, ndups=x$printer$ndups, spacing=x$printer$spacing, weights=x$weights)

Arguments

a matrix-like object, usually a matrix, MAList or EList object.

ndups

number of within-array replicates for each probe.

spacing

number of spots to step from a probe to its duplicate.

weights

numeric matrix of spot weights.

Details

A new data object is computed in which each probe is represented by the (weighted) average of its duplicate spots. For an MAList object, the components M and A are both averaged in this way. For an EList object, the component E is averaged in this way. If x is of mode "character", then the duplicate values are assumed to be equal and the first is taken as the average.

Value

A data object of the same class as x with 1/ndups as many rows.

See also

avereps. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

avereps

Average Over Irregular Replicate Probes

Bioconductor · 3.68.2 · limma/man/avereps.Rd · 2026-05-07

Condense a microarray data object so that values for within-array replicate probes are replaced with their average.

Aliases

averepsavereps.defaultavereps.MAListavereps.EListavereps.EListRawavereps.RGList

Keywords

array

Usage

averepsdefault(x, ID=rownames(x), ) averepsMAList(x, ID=NULL, ) averepsEList(x, ID=NULL, )

Arguments

a matrix-like object, usually a matrix, MAList or EList object.

probe identifier.

other arguments are not currently used.

Details

A new data object is computed in which each probe ID is represented by the average of its replicate spots or features. For an MAList object, the components M and A are both averaged in this way, as are weights and any matrices found in object$other. For an MAList object, ID defaults to MA$genes$ID is that exists, otherwise to rownames(MA$M). EList objects are similar, except that the E component is averaged instead of M and A. If x is of mode "character", then the replicate values are assumed to be equal and the first is taken as the average.

Value

A data object of the same class as x with a row for each unique value of ID.

Examples

x <- matrix(rnorm(8*3),8,3) colnames(x) <- c("S1","S2","S3") rownames(x) <- c("b","a","a","c","c","b","b","b") avereps(x)

See also

avedups, avearrays. Also [base]rowsum in the base package. 02.Classes gives an overview of data classes used in LIMMA.

Note

This function should only be applied to normalized log-expression values, and not to raw unlogged expression values. It will generate an error message if applied to RGList or EListRaw objects.

Author

Gordon Smyth

backgroundCorrect

Correct Intensities for Background

Bioconductor · 3.68.2 · limma/man/backgroundcorrect.Rd · 2026-05-07

Background correct microarray expression intensities.

Aliases

backgroundCorrectbackgroundCorrect.matrix

Concepts

background correction

Usage

backgroundCorrect(RG, method="auto", offset=0, printer=RG$printer, normexp.method="saddle", verbose=TRUE) backgroundCorrect.matrix(E, Eb=NULL, method="auto", offset=0, printer=NULL, normexp.method="saddle", verbose=TRUE)

Arguments

a numeric matrix, [limma:EList]EListRaw or [limma:rglist]RGList object.

numeric matrix containing foreground intensities.

numeric matrix containing background intensities.

method

character string specifying correction method. Possible values are "auto", "none", "subtract", "half", "minimum", "movingmin", "edwards" or "normexp". If RG is a matrix, possible values are restricted to "none" or "normexp". The default "auto" is interpreted as "subtract" if background intensities are available or "normexp" if they are not.

offset

numeric value to add to intensities

printer

a list containing printer layout information, see PrintLayout-class. Ignored if RG is a matrix.

normexp.method

character string specifying parameter estimation strategy used by normexp, ignored for other methods. Possible values are "saddle", "mle", "rma" or "rma75".

verbose

logical. If TRUE, progress messages are sent to standard output

Details

This function implements the background correction methods reviewed or developed in Ritchie et al (2007) and Silver at al (2009). Ritchie et al (2007) recommend method="normexp" whenever RG contains local background estimates. Silver et al (2009) shows that either normexp.method="mle" or normexp.method="saddle" are excellent options for normexp. If RG contains morphological background estimates instead (available from SPOT or GenePix image analysis software), then method="subtract" performs well. If method="none" then no correction is done, i.e., the background intensities are treated as zero. If method="subtract" then the background intensities are subtracted from the foreground intensities. This is the traditional background correction method, but is not necessarily recommended. If method="movingmin" then the background estimates are replaced with the minimums of the backgrounds of the spot and its eight neighbors, i.e., the background is replaced by a moving minimum of 3x3 grids of spots. The remaining methods are all designed to produce positive corrected intensities. If method="half" then any intensity which is less than 0.5 after background subtraction is reset to be equal to 0.5. If method="minimum" then any intensity which is zero or negative after background subtraction is set equal to half the minimum of the positive corrected intensities for that array. If method="edwards" a log-linear interpolation method is used to adjust lower intensities as in Edwards (2003). If method="normexp" a convolution of normal and exponential distributions is fitted to the foreground intensities using the background intensities as a covariate, and the expected signal given the observed foreground becomes the corrected intensity. This results in a smooth monotonic transformation of the background subtracted intensities such that all the corrected intensities are positive. The normexp method is available in a number of variants depending on how the model parameters are estimated, and these are selected by normexp.method. Here "saddle" gives the saddle-point approximation to maximum likelihood from Ritchie et al (2007) and improved by Silver et al (2009), "mle" gives exact maximum likelihood from Silver at al (2009), "rma" gives the background correction algorithm from the RMA-algorithm for Affymetrix microarray data as implemented in the affy package, and "rma75" gives the RMA-75 method from McGee and Chen (2006). In practice "mle" performs well and is nearly as fast as "saddle", but "saddle" is the default for backward compatibility. See normexp.fit for more details. The offset can be used to add a constant to the intensities before log-transforming, so that the log-ratios are shrunk towards zero at the lower intensities. This may eliminate or reverse the usual 'fanning' of log-ratios at low intensities associated with local background subtraction. Background correction (background subtraction) is also performed by the normalizeWithinArrays method for RGList objects, so it is not necessary to call backgroundCorrect directly unless one wants to use a method other than simple subtraction. Calling backgroundCorrect before normalizeWithinArrays will over-ride the default background correction.

Value

A matrix, EListRaw or RGList object in which foreground intensities have been background corrected and any components containing background intensities have been removed.

Examples

RG <- new("RGList", list(R=c(1,2,3,4),G=c(1,2,3,4),Rb=c(2,2,2,2),Gb=c(2,2,2,2))) backgroundCorrect(RG) backgroundCorrect(RG, method="half") backgroundCorrect(RG, method="minimum") backgroundCorrect(RG, offset=5)

See also

kooperberg, neqc. An overview of background correction functions is given in 04.Background.

Author

Gordon Smyth

References

Edwards, D. E. (2003). Non-linear normalization and background correction in one-channel cDNA microarray studies Bioinformatics 19, 825-833. McGee, M., and Chen, Z. (2006). Parameter estimation for the exponential-normal convolution model for background correction of Affymetrix GeneChip data. Stat Appl Genet Mol Biol, Volume 5, Article 24. Ritchie, M. E., Silver, J., Oshlack, A., Silver, J., Holmes, M., Diyagama, D., Holloway, A., and Smyth, G. K. (2007). A comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700-2707. http://bioinformatics.oxfordjournals.org/content/23/20/2700 Silver, J., Ritchie, M. E., and Smyth, G. K. (2009). Microarray background correction: maximum likelihood estimation for the normal-exponential convolution model. Biostatistics 10, 352-363. http://biostatistics.oxfordjournals.org/content/10/2/352

barcodeplot

Barcode Enrichment Plot

Bioconductor · 3.68.2 · limma/man/barcodeplot.Rd · 2026-05-07

Display the enrichment of one or two gene sets in a ranked gene list.

Aliases

barcodeplot

Usage

barcodeplot(statistics, index = NULL, index2 = NULL, gene.weights = NULL, weights.label = "Weight", labels = c("Down","Up"), quantiles = c(-1,1)*sqrt(2), col.bars = NULL, alpha = 0.4, worm = TRUE, span.worm = 0.45, xlab = "Statistic", )

Arguments

statistics

numeric vector giving the values of statistics to rank genes by.

index

index vector for the gene set. This can be a vector of indices, or a logical vector of the same length as statistics or, in general, any vector such that statistic[index] gives a subset of the statistic values. Can be omitted if gene.weights has same length as statistics, in which case positive values of gene.weights indicate to members of the positive set and negative weights correspond to members of the negative set.

index2

optional index vector for a second (negative) gene set. If specified, then index and index2 specify positive and negative genes respectively. Usually used to distinguish down-regulated genes from up-regulated genes.

gene.weights

numeric vector giving directional weights for the genes in the (first) set. Positive and negative weights correspond to positive and negative genes. Ignored if index2 is non-null.

weights.label

label describing the entries in gene.weights.

labels

character vector of labels for low and high statistics. First label is associated with low statistics or negative statistics and is displayed at the left end of the plot. Second label is associated with high or positive statistics and is displayed at the right end of the plot.

quantiles

numeric vector of length 2, giving cutoff values for statistics considered small or large respectively. Used to color the rectangle of the barcodeplot.

col.bars

character vector of colors for the vertical bars of the barcodeplot showing the ranks of the gene set members. Defaults to "black" for one set or c("red","blue") for two sets.

alpha

transparency for vertical bars. When gene.weights are not NULL, values 0<alpha<1 give semitransparent colors for the vertical bars inside the rectangle. This helps distinguish position bars from the weighted bars and also helps to show the density of the bars when there are many bars. Ignored if gene.weights=NULL.

worm

logical, should enrichment worms be plotted?

span.worm

loess span for enrichment worms. Larger spans give smoother worms.

xlab

x-axis label for statistics.

other arguments are passed to plot.

Details

The function displays the enrichment of a specified gene set signature in a ranked list of genes. The vector statistics defines the ranking of the population of genes. This vector can represent any useful ranking but often it provides t-statistics or a log-fold-changes arising from a differential analysis. The gene set signature is defined either by index and index2 or by gene.weights. The signature can be either unidirectional or bidirectional. A unidirectional signature is a simple set of genes (defined by index), optionally accompanied by a set of positive magnitude scores (specified by gene.weights). Typically this is a set of genes representing a pathway or biological process that are expected to be co-regulated in the same direction. A bidirectional signature consists of a set of up-genes and a set of down-genes (specified by index and index2 respectively) or, more generally, a set of genes with accompanying magnitude scores that are either positive or negative (specified by gene.weights). Technically, this function plots the positions of one or two gene sets in a ranked list of statistics. If there are two sets, then one is considered to be the positive set and the other the down set. For example, the first set and second sets often correspond to genes that are expected to be up- or down-regulated respectively. The function can optionally display varying weights for different genes, for example log-fold-changes from a previous experiment. The statistics are ranked left to right from smallest to largest. The ranked statistics are represented by a shaded bar or bed, and the positions of the specified subsets are marked by vertical bars, forming a pattern like a barcode. An enrichment worm optionally shows the relative enrichment of the vertical bars in each part of the plot. The worm is computed by the tricubeMovingAverage function. Barcode plots are often used in conjunction with gene set tests, and show the enrichment of gene sets amongst high or low ranked genes. They were inspired by the set location plot of Subramanian et al (2005), with a number of enhancements, especially the ability to plot positive and negative sets simultaneously. Barcode plots first appeared in the literature in Lim et al (2009). More recent examples can be seen in Liu et al (2014), Sheikh et al (2015), Witkowski et al (2015) and Ng et al (2015). The function can be used with any of four different calling sequences: index is specified, but not index2 or gene.weights. Single direction plot. index and index2 are specified. Two directional plot. index and gene.weights are specified. gene.weights must have same length as statistics[index]. Plot will be two-directional if gene.weights contains positive and negative values. gene.weights is specified by not index or index2. gene.weights must have same length as statistics. Plot will be two-directional if gene.weights contains positive and negative values.

Value

No value is returned but a plot is produced as a side effect.

Examples

stat <- rnorm(100) sel <- 1:10 sel2 <- 11:20 stat[sel] <- stat[sel]+1 stat[sel2] <- stat[sel2]-1 # One directional barcodeplot(stat, index = sel) # Two directional barcodeplot(stat, index = sel, index2 = sel2) # Second set can be indicated by negative weights barcodeplot(stat, index = c(sel,sel2), gene.weights = c(rep(1,10), rep(-1,10))) # Two directional with unequal weights w <- rep(0,100) w[sel] <- runif(10) w[sel2] <- -runif(10) barcodeplot(stat, gene.weights = w, weights.label = "logFC") # One directional with unequal weights w <- rep(0,100) w[sel2] <- -runif(10) barcodeplot(stat, gene.weights = w, weights.label = "logFC", col.bars = "dodgerblue")

See also

tricubeMovingAverage, roast, camera, romer, geneSetTest There is a topic page on 10.GeneSetTests.

Author

Yifang Hu, Gordon Smyth and Di Wu

References

Ng, AP, Hu, Y, Metcalf, D, Hyland, CD, Ierino, H, Phipson, B, Wu, D, Baldwin, TM, Kauppi, M, Kiu, H, Di, Rago, L, Hilton, DJ, Smyth, GK, Alexander, WS (2015). Early lineage priming by trisomy of Erg leads to myeloproliferation in a down syndrome model. PLOS Genetics 11, e1005211. 10.1371/journal.pgen.1005211 Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-Labat ML, Gyorki DE, Ward T, Partanen A, Feleppa F, Huschtscha LI, Thorne HJ; kConFab; Fox SB, Yan M, French JD, Brown MA, Smyth GK, Visvader JE, and Lindeman GJ (2009). Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nature Medicine 15, 907-913. 10.1038/nm.2000 Liu, GJ, Cimmino, L, Jude, JG, Hu, Y, Witkowski, MT, McKenzie, MD, Kartal-Kaess, M, Best, SA, Tuohey, L, Liao, Y, Shi, W, Mullighan, CG, Farrar, MA, Nutt, SL, Smyth, GK, Zuber, J, and Dickins, RA (2014). Pax5 loss imposes a reversible differentiation block in B progenitor acute lymphoblastic leukemia. Genes & Development 28, 1337-1350. 10.1101/gad.240416.114 Sheikh, B, Lee, S, El-saafin, F, Vanyai, H, Hu, Y, Pang, SHM, Grabow, S, Strasser, A, Nutt, SL, Alexander, WS, Smyth, GK, Voss, AK, and Thomas, T (2015). MOZ regulates B cell progenitors in mice, consequently, Moz haploinsufficiency dramatically retards MYC-induced lymphoma development. Blood 125, 1910-1921. 10.1182/blood-2014-08-594655 Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, and Mesirov JP (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102, 15545-15550. Witkowski, MT, Cimmino, L, Hu, Y, Trimarchi, T, Tagoh, H, McKenzie, MD, Best, SA, Tuohey, L, Willson, TA, Nutt, SL, Meinrad Busslinger, M, Aifantis, I, Smyth, GK, and Dickins, RA (2015). Activated Notch counteracts Ikaros tumor suppression in mouse and human T cell acute lymphoblastic leukemia. Leukemia 29, 1301-1311. 10.1038/leu.2015.27

beadCountWeights

Bead Count Weights for Illumina BeadChips

Bioconductor · 3.68.2 · limma/man/beadCountWeights.Rd · 2026-05-07

Estimates weights which account for biological variation and technical variation resulting from varying bead numbers.

Aliases

beadCountWeights

Usage

beadCountWeights(y, x, design = NULL, bead.stdev = NULL, bead.stderr = NULL, nbeads = NULL, array.cv = TRUE, scale = FALSE)

Arguments

an "EList" object or a numeric matrix containing normalized log2-expression values.

an "EListRaw" object or a numeric matrix of raw expression values, with same dimensions as y.

design

the design matrix of the microarray experiment, with rows corresponding to arrays and columns to coefficients to be estimated. Defaults to y$design or, if that is NULL, then to a column of ones meaning that the arrays are treated as replicates.

bead.stdev

numeric matrix of bead-level standard deviations.

bead.stderr

numeric matrix of bead-level standard errors. Not required if bead.stdev is set.

nbeads

numeric matrix containing number of beads.

array.cv

logical, should technical variation for each observation be calculated from a constant or array-specific coefficient of variation? The default is to use array-specific coefficients of variation.

scale

logical, should weights be scaled so that the average weight size is the mean of the inverse technical variance along a probe? By default, weights are scaled so that the average weight size along a probe is 1.

Details

This function estimates optimum weights using the bead statistics for each probe for an Illumina expression BeadChip. It can be used with any Illumina expression BeadChip, but is most likely to be useful with HumanHT-12 BeadChips. Arguments x and y are both required. x contains the raw expression values and y contains the corresponding log2 values for the same probes and the same arrays after background correction and normalization. x and y be any type of object that can be coerced to a matrix, with rows corresponding to probes and columns to arrays. x and y must contain the same rows and columns in the same order. The reliability of the normalized expression value for each probe on each array is measured by estimating its technical and biological variability. The bead number weights are the inverse sum of the technical and biological variances. The technical variance for each probe on each array is inversely proportional to the number of beads and is estimated using array-specific bead-level coefficients of variation. Coefficients of variation are calculated using raw expression values. The biological variance for each probe across the arrays are estimated using a Newton iteration, with the assumption that the total residual deviance for each probe from lmFit is inversely proportional to the sum of the technical variance and biological variance. Only one of bead.stdev or bead.stderr needs to be set. If bead.stdev is not provided, then it will be computed as bead.stderr * sqrt(nbeads). If arguments bead.stdev and nbeads are not set explicitly in the call, then they will be extracted fromy$other$BEAD_STDEV and y$other$Avg_NBEADS. An EList object containing these components can be created by read.idat or read.ilmn, see the example code below.

Value

A list object with the following components: weightsnumeric matrix of bead number weights cv.constantnumeric value of constant bead-level coefficient of variation cv.arraynumeric vector of array-specific bead-level coefficient of variation var.technicalnumeric matrix of technical variances var.biologicalnumeric vector of biological variances

Examples

z <- read.ilmn(files="probesummaryprofile.txt", ctrfiles="controlprobesummary.txt", other.columns=c("BEAD_STDEV","Avg_NBEADS")) y <- neqc(z) x <- z[z$genes$Status=="regular",] bcw <- beadCountWeights(y,x,design) fit <- lmFit(y,design,weights=bcw$weights) fit <- eBayes(fit)

See also

read.ilmn, read.idat, neqc. An overview of linear model functions in limma is given by 06.LinearModels.

Author

Charity Law and Gordon Smyth

References

Law CW (2013). Precision weights for gene expression analysis. PhD Thesis. University of Melbourne, Australia. http://hdl.handle.net/11343/38150

blockDiag

Block Diagonal Matrix

Bioconductor · 3.68.2 · limma/man/blockDiag.Rd · 2026-05-07

Form a block diagonal matrix from the given blocks.

Aliases

blockDiag

Usage

blockDiag()

Arguments

numeric matrices

Details

This function is sometimes useful for constructing a design matrix for a disconnected two-color microarray experiment in conjunction with modelMatrix.

Value

A block diagonal matrix with dimensions equal to the sum of the input dimensions

Examples

a <- matrix(1,3,2) b <- matrix(2,2,2) blockDiag(a,b)

See also

modelMatrix

Author

Gordon Smyth

bwss

Between and within sums of squares

Bioconductor · 3.68.2 · limma/man/bwss.Rd · 2026-05-07

Sums of squares between and within groups. Allows for missing values.

Aliases

bwss

Keywords

models

Usage

bwss(x,group)

Arguments

a numeric vector giving the responses.

group

a vector or factor giving the grouping variable.

Details

This is equivalent to one-way analysis of variance.

Value

A list with components bsssums of squares between the group means. wsssums of squares within the groups. bdfdegrees of freedom corresponding to bss. wdfdegrees of freedom corresponding to wss.

See also

bwss.matrix

Author

Gordon Smyth

bwss.matrix

Between and within sums of squares for matrix

Bioconductor · 3.68.2 · limma/man/bwss.matrix.Rd · 2026-05-07

Sums of squares between and within the columns of a matrix. Allows for missing values. This function is called by the [limma:anova-method]anova method for MAList objects.

Aliases

bwss.matrix

Keywords

models

Usage

bwss.matrix(x)

Arguments

a numeric matrix.

Details

This is equivalent to a one-way analysis of variance where the columns of the matrix are the groups. If x is a matrix then bwss.matrix(x) is the same as bwss(x,col(x)) except for speed of execution.

Value

A list with components bsssums of squares between the column means. wsssums of squares within the column means. bdfdegrees of freedom corresponding to bss. wdfdegrees of freedom corresponding to wss.

See also

bwss, anova.MAList

Author

Gordon Smyth

camera

Competitive Gene Set Test Accounting for Inter-gene Correlation

Bioconductor · 3.68.2 · limma/man/camera.Rd · 2026-05-07

Test whether a set of genes is highly ranked relative to other genes in terms of differential expression, accounting for inter-gene correlation.

Aliases

cameracamera.defaultinterGeneCorrelationcameraPRcameraPR.default

Concepts

gene set tests

Usage

cameradefault(y, index, design, contrast = ncol(design), weights = NULL, use.ranks = FALSE, allow.neg.cor=FALSE, inter.gene.cor=0.01, trend.var = FALSE, sort = TRUE, directional = TRUE, ) cameraPRdefault(statistic, index, use.ranks = FALSE, inter.gene.cor=0.01, sort = TRUE, directional = TRUE, ) interGeneCorrelation(y, design)

Arguments

a numeric matrix of log-expression values or log-ratios of expression values, or any data object containing such a matrix. Rows correspond to probes and columns to samples. Any type of object that can be processed by getEAWP is acceptable. NA or infinite values are not allowed.

statistic

a numeric vector of genewise statistics. If index contains gene IDs, then statistic should be a named vector with the gene IDs as names.

index

an index vector or a list of index vectors. Can be any vector such that y[index,] of statistic[index] selects the rows corresponding to the test set. The list can be made using ids2indices.

design

design matrix.

contrast

contrast of the linear model coefficients for which the test is required. Can be an integer specifying a column of design, or else a numeric vector of same length as the number of columns of design.

weights

numeric matrix of precision weights. Can be a matrix of the same size as y, or a numeric vector of array weights with length equal to ncol(y), or a numeric vector of gene weights with length equal to nrow(y).

use.ranks

do a rank-based test (TRUE) or a parametric test (FALSE)?

allow.neg.cor

should reduced variance inflation factors be allowed for negative correlations?

inter.gene.cor

numeric, optional preset value for the inter-gene correlation within tested sets. If NA or NULL, then an inter-gene correlation will be estimated for each tested set.

trend.var

logical, should an empirical Bayes trend be estimated? See eBayes for details.

sort

logical, should the results be sorted by p-value?

directional

are the gene sets directional? If TRUE, will test for genes changing in the same direction within each set. If FALSE, will test for large effects without regard to direction in each set.

other arguments are not currently used

Details

camera and interGeneCorrelation implement methods proposed by Wu and Smyth (2012). camera performs a competitive test in the sense defined by Goeman and Buhlmann (2007). It tests whether the genes in the set are highly ranked in terms of differential expression relative to genes not in the set. It has similar aims to geneSetTest but accounts for inter-gene correlation. See roast for an analogous self-contained gene set test. The function can be used for any microarray experiment which can be represented by a linear model. The design matrix for the experiment is specified as for the lmFit function, and the contrast of interest is specified as for the contrasts.fit function. This allows users to focus on differential expression for any coefficient or contrast in a linear model by giving the vector of test statistic values. camera estimates p-values after adjusting the variance of test statistics by an estimated variance inflation factor. The inflation factor depends on estimated genewise correlation and the number of genes in the gene set. By default, camera uses interGeneCorrelation to estimate the mean pair-wise correlation within each set of genes. camera can alternatively be used with a preset correlation specified by inter.gene.cor that is shared by all sets. This usually works best with a small value, say inter.gene.cor=0.01. If inter.gene.cor=NA, then camera will estimate the inter-gene correlation for each set. In this mode, camera gives rigorous error rate control for all sample sizes and all gene sets. However, in this mode, highly co-regulated gene sets that are biological interpretable may not always be ranked at the top of the list. With the default value inter.gene.cor=0.01, camera will rank biologically interpretable sets more highly. This gives a useful compromise between strict error rate control and interpretable gene set rankings. cameraPR is a "pre-ranked" version of camera where the genes are pre-ranked according to a pre-computed statistic. If direction=TRUE, then the gene sets are assumed to contain genes changing in the same direction and the functions will do two-sided directional tests for each gene set. If direction=FALSE, then the gene sets are assumed to be non-directional, so that significant genes in each set could be changing in different directions. In this case, the functions will test for large changes in each set without regard to direction of change. If direction=FALSE, then use.ranks should be TRUE and statistic should be non-negative.

Value

camera and cameraPR return a data.frame with a row for each set and the following columns: NGenesnumber of genes in set. Correlationinter-gene correlation (only included if the inter.gene.cor was not preset). Directiondirection of change ("Up" or "Down"). PValuetwo-tailed p-value. FDRBenjamini and Hochberg FDR adjusted p-value. interGeneCorrelation returns a list with components: vifvariance inflation factor. correlationinter-gene correlation.

Examples

y <- matrix(rnorm(1000*6),1000,6) design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1)) # First set of 20 genes are genuinely differentially expressed index1 <- 1:20 y[index1,4:6] <- y[index1,4:6]+1 # Second set of 20 genes are not DE index2 <- 21:40 camera(y, index1, design) camera(y, index2, design) camera(y, list(set1=index1,set2=index2), design, inter.gene.cor=NA) camera(y, list(set1=index1,set2=index2), design, inter.gene.cor=0.01) # Pre-ranked version fit <- eBayes(lmFit(y, design)) cameraPR(fit$t[,2], list(set1=index1,set2=index2)) # Non-directional tests cameraPR(abs(fit$t[,2]), list(set1=index1,set2=index2), use.ranks=TRUE, directional=FALSE)

See also

getEAWP rankSumTestWithCorrelation, geneSetTest, roast, fry, romer, ids2indices. There is a topic page on 10.GeneSetTests.

Note

The default settings for inter.gene.cor and allow.neg.cor were changed to the current values in limma 3.29.6. Previously, the default was to estimate an inter-gene correlation for each set. To reproduce the earlier default, use allow.neg.cor=TRUE and inter.gene.cor=NA.

Author

Di Wu and Gordon Smyth

References

Wu D, Smyth GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40, e133. 10.1093/nar/gks461 Goeman JJ, Buhlmann P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987.

cbind

Combine RGList, MAList, EList or EListRaw Objects

Bioconductor · 3.68.2 · limma/man/cbind.Rd · 2026-05-07

Combine a set of RGList, MAList, EList or EListRaw objects.

Aliases

cbindcbind.RGListcbind.MAListcbind.EListcbind.EListRawrbind.RGListrbind.MAListrbind.EListrbind.EListRaw

Keywords

manip

Usage

cbindRGList(, deparse.level=1) rbindRGList(, deparse.level=1)

Arguments

RGList, MAList, EList or EListRaw objects.

deparse.level

not currently used, see [base]cbind in the base package

Details

cbind combines data objects assuming the same probes in the same order but different arrays. rbind combines data objects assuming equivalent arrays, i.e., the same RNA targets, but different probes. For cbind, the matrices of expression data from the individual objects are cbinded. The data.frames of target information, if they exist, are rbinded. The combined data object will preserve any additional components or attributes found in the first object to be combined. For rbind, the matrices of expression data are rbinded while the target information, in any, is unchanged.

Value

An [limma:rglist]RGList, [limma:malist]MAList, [limma:EList]EList or [limma:EList]EListRaw object holding data from all the arrays and all genes from the individual objects.

Examples

M <- A <- matrix(11:14,4,2) rownames(M) <- rownames(A) <- c("a","b","c","d") colnames(M) <- colnames(A) <- c("A1","A2") MA1 <- new("MAList",list(M=M,A=A)) M <- A <- matrix(21:24,4,2) rownames(M) <- rownames(A) <- c("a","b","c","d") colnames(M) <- colnames(A) <- c("B1","B2") MA2 <- new("MAList",list(M=M,A=A)) cbind(MA1,MA2)

See also

[base]cbind in the base package. 03.ReadingData gives an overview of data input and manipulation functions in LIMMA.

Author

Gordon Smyth

changeLog

Change Log

Bioconductor · 3.68.2 · limma/man/changelog.Rd · 2026-05-07

Show the most recent changes from a package change log or NEWS file.

Aliases

changeLog

Keywords

documentation

Usage

changeLog(n = 30, package = "limma")

Arguments

integer, number of lines to write of change log.

package

character string giving name of package.

Details

The function will look for a file changelog.txt or ChangeLog in the top-level or doc directories of the installed package. Failing that, it will look for NEWS or NEWS.md in the top-level directory. Note that changeLog does not write the content of NEWS.Rd, which is a structured file. Use news(package="limma") for that instead.

Value

No value is produced, but a number of lines of text are written to standard output.

Examples

changeLog() changeLog(package="statmod")

See also

01.Introduction, news.

Author

Gordon Smyth

chooseLowessSpan

Choose Span for Local-Weighted Regression Smoothing

Bioconductor · 3.68.2 · limma/man/chooseLowessSpan.Rd · 2026-05-07

Choose an optimal span, depending on the number of points, for lowess smoothing of variance trends.

Aliases

chooseLowessSpan

Concepts

locally-weighted regression smoothing

Usage

chooseLowessSpan(n=1000, small.n=50, min.span=0.3, power=1/3)

Arguments

the number of points the lowess curve will be applied to.

small.n

the span will be set to 1 for any n less than or equal to this value.

min.span

the minimum span for large n.

power

numeric power between 0 and 1 that determines how fast the chosen span decreases with n.

Details

The span is the proportion of points used for each of the local regressions. When there a few points, a large span should be used to ensure a smooth curve. When there are a large number of points, smaller spans can be used because each span window still contains good coverage. By default, the chosen span decreases as the cube-root of the number of points, a rule that is motivated by analogous rules to choose the number of bins for a histogram (Scott, 1979; Freedman & Diaconis, 1981; Hyndman, 1995). The span returned is min.span + (1-min.span) * (small.n/n)^power except that the span is set to 1 for any n less than small.n. Note that the fitted lowess curve will still estimate a trend (i.e., will not be constant) even if span=1. The function is tuned for smoothing of mean-variance trends, for which the trend is usually monotonic, so preference is given to moderately large spans. Even for the very large datasets, the span is always greater than min.span. This function is used to create adaptive spans for voom, vooma and voomaLmFit where n is the number of genes in the analysis.

Value

A numeric vector of length 1 containing the span value.

Examples

chooseLowessSpan(100) chooseLowessSpan(1e6) n <- 10:5000 span <- chooseLowessSpan(n) plot(n,span,type="l",log="x")

See also

loessFit, weightedLowess, lowess, loess. vooma, eBayes, squeezeVar, fitFDistRobustly.

Author

Gordon Smyth

References

Freedman, D. and Diaconis, P. (1981). On the histogram as a density estimator: L_2 theory. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 453-476. Hyndman, R. J. (1995). The problem with Sturges' rule for constructing histograms. https://robjhyndman.com/papers/sturges.pdf. Scott, D. W. (1979). On optimal and data-based histograms. Biometrika 66, 605-610.

classifyTestsF

Genewise Nested F-Tests

Bioconductor · 3.68.2 · limma/man/classifytestsF.Rd · 2026-05-07

For each gene, classify a series of related t-statistics as significantly up or down using nested F-tests.

Aliases

classifyTestsF

Keywords

htest

Usage

classifyTestsF(object, cor.matrix = NULL, df = Inf, p.value = 0.01, fstat.only = FALSE)

Arguments

object

numeric matrix of t-statistics or an MArrayLM object from which the t-statistics may be extracted.

cor.matrix

covariance matrix of each row of t-statistics. Will be extracted automatically from an MArrayLM object but otherwise defaults to the identity matrix.

numeric vector giving the degrees of freedom for the t-statistics. May have length 1 or length equal to the number of rows of tstat. Will be extracted automatically from an MArrayLM object but otherwise default to Inf.

p.value

numeric value between 0 and 1 giving the desired size of the test.

fstat.only

logical, if TRUE then return the overall F-statistic as for FStat instead of classifying the test results.

Details

classifyTestsF implements the "nestedF" multiple testing option offered by decideTests. Users should generally use decideTests rather than calling classifyTestsF directly because, by itself, classifyTestsF does not incorporate any multiple testing adjustment across genes. Instead it simply tests across contrasts for each gene individually. classifyTestsF uses a nested F-test approach giving particular attention to correctly classifying genes that have two or more significant t-statistics, i.e., which are differentially expressed in two or more conditions. For each row of tstat, the overall F-statistics is constructed from the t-statistics as for FStat. At least one constrast will be classified as significant if and only if the overall F-statistic is significant. If the overall F-statistic is significant, then the function makes a best choice as to which t-statistics contributed to this result. The methodology is based on the principle that any t-statistic should be called significant if the F-test is still significant for that row when all the larger t-statistics are set to the same absolute size as the t-statistic in question. Compared to conventional multiple testing methods, the nested F-test approach achieves better consistency between related contrasts. (For example, if B is judged to be different from C, then at least one of B or C should be different to A.) The approach was first used by Michaud et al (2008). The nested F-test approach provides weak control of the family-wise error rate, i.e., it correctly controls the type I error rate of calling any contrast as significant if all the null hypotheses are true. In other words, it provides error rate control at the overall F-test level but does not provide strict error rate control at the individual contrast level. Usually object is a limma linear model fitted object, from which a matrix of t-statistics can be extracted, but it can also be a numeric matrix of t-statistics. In either case, rows correspond to genes and columns to coefficients or contrasts. If object is a matrix, then it may be necessary to supply values for cor.matrix and df. The cor.matrix is the same as the correlation matrix of the coefficients from which the t-statistics were calculated and df is the degrees of freedom of the t-statistics. All statistics for the same gene must have the same degrees of freedom. If fstat.only=TRUE, the classifyTestsF just returns the vector of overall F-statistics for each gene.

Value

If fstat.only=FALSE, then an object of class [=TestResults-class]TestResults is returned. This is essentially a numeric matrix with elements -1, 0 or 1 depending on whether each t-statistic is classified as significantly negative, not significant or significantly positive respectively. If fstat.only=TRUE, then a numeric vector of F-statistics is returned with attributes df1 and df2 giving the corresponding degrees of freedom.

Examples

TStat <- matrix(c(0,10,0, 0,5,0, -4,-4,4, 2,2,2), 4, 3, byrow=TRUE) colnames(TStat) <- paste0("Contrast",1:3) rownames(TStat) <- paste0("Gene",1:4) classifyTestsF(TStat, df=20) FStat <- classifyTestsF(TStat, df=20, fstat.only=TRUE) P <- pf(FStat, df1=attr(FStat,"df1"), df2=attr(FStat,"df2"), lower.tail=FALSE) data.frame(F.Statistic=FStat,P.Value=P)

See also

An overview of multiple testing functions is given in 08.Tests.

Author

Gordon Smyth

References

Michaud, J, Simpson, KM, Escher, R, Buchet-Poyau, K, Beissbarth, T, Carmichael, C, Ritchie, ME, Schutz, F, Cannon, P, Liu, M, Shen, X, Ito, Y, Raskind, WH, Horwitz, MS, Osato, M, Turner, DR, Speed, TP, Kavallaris, M, Smyth, GK, and Scott, HS (2008). Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9, 363.

contrastAsCoef

Reform a Design Matrix to that Contrasts Become Coefficients

Bioconductor · 3.68.2 · limma/man/contrastAsCoef.Rd · 2026-05-07

Reform a design matrix so that one or more coefficients from the new matrix correspond to specified contrasts of coefficients from the old matrix.

Aliases

contrastAsCoef

Keywords

regression

Usage

contrastAsCoef(design, contrast=NULL, first=TRUE)

Arguments

design

numeric design matrix.

contrast

numeric matrix with rows corresponding to columns of the design matrix (coefficients) and columns containing contrasts. May be a vector if there is only one contrast.

first

logical, should coefficients corresponding to contrasts be the first columns (TRUE) or last columns (FALSE) of the output design matrix.

Details

If the contrasts contained in the columns of contrast are not linearly dependent, then superfluous columns are dropped until the remaining matrix has full column rank. The number of retained contrasts is stored in qr$rank and the retained columns are given by qr$pivot.

Value

A list with components designreformed design matrix coefcolumns of design matrix which hold the meaningful coefficients qrQR-decomposition of contrast matrix

Examples

design <- cbind(1,c(0,0,1,1,0,0),c(0,0,0,0,1,1)) cont <- c(0,-1,1) design2 <- contrastAsCoef(design, cont)$design # Original coef[3]-coef[2] becomes coef[1] y <- rnorm(6) fit1 <- lm(y~0+design) fit2 <- lm(y~0+design2) coef(fit1) coef(fit1) %*% cont coef(fit2)

See also

[stats]model.matrix in the stats package. An overview of linear model functions in limma is given by 06.LinearModels.

Author

Gordon Smyth

contrasts.fit

Compute Contrasts from Linear Model Fit

Bioconductor · 3.68.2 · limma/man/contrasts.fit.Rd · 2026-05-07

Given a linear model fit to microarray data, compute estimated coefficients and standard errors for a given set of contrasts.

Aliases

contrasts.fit

Keywords

htest

Usage

contrasts.fit(fit, contrasts=NULL, coefficients=NULL)

Arguments

fit

an [limma:marraylm]MArrayLM object or a list object produced by the function lm.series or equivalent. Must contain components coefficients and stdev.unscaled.

contrasts

numeric matrix with rows corresponding to coefficients in fit and columns containing contrasts. May be a vector if there is only one contrast. NAs are not allowed.

coefficients

vector indicating which coefficients are to be kept in the revised fit object. An alternative way to specify the contrasts.

Details

This function accepts input from any of the functions lmFit, lm.series, mrlm, gls.series or lmscFit. The function re-orientates the fitted model object from the coefficients of the original design matrix to any set of contrasts of the original coefficients. The coefficients, unscaled standard deviations and correlation matrix are re-calculated in terms of the contrasts. The idea of this function is to fit a full-rank model using lmFit or equivalent, then use contrasts.fit to obtain coefficients and standard errors for any number of contrasts of the coefficients of the original model. Unlike the design matrix input to lmFit, which normally has one column for each treatment in the experiment, the matrix contrasts may have any number of columns and these are not required to be linearly independent. Methods of assessing differential expression, such as eBayes or classifyTestsF, can then be applied to fitted model object. The coefficients argument provides a simpler way to specify the contrasts matrix when the desired contrasts are just a subset of the original coefficients.

Value

An list object of the same class as fit, usually [limma:marraylm]MArrayLM. This is a list with components coefficientsnumeric matrix containing the estimated coefficients for each contrast for each probe. stdev.unscalednumeric matrix conformal with coef containing the unscaled standard deviations for the coefficient estimators. cov.coefficientsnumeric matrix giving the unscaled covariance matrix of the estimable coefficients. Most other components found in fit are passed through unchanged, but t, p.value, lods, F and F.p.value will all be removed.

Examples

# Simulate gene expression data: 6 microarrays and 100 genes # with one gene differentially expressed in first 3 arrays M <- matrix(rnorm(100*6,sd=0.3),100,6) M[1,1:3] <- M[1,1:3] + 2 # Design matrix corresponds to oneway layout, columns are orthogonal design <- cbind(First3Arrays=c(1,1,1,0,0,0),Last3Arrays=c(0,0,0,1,1,1)) fit <- lmFit(M,design=design) # Would like to consider original two estimates plus difference between first 3 and last 3 arrays contrast.matrix <- cbind(First3=c(1,0),Last3=c(0,1),"Last3-First3"=c(-1,1)) fit2 <- contrasts.fit(fit,contrast.matrix) fit2 <- eBayes(fit2) # Large values of eb$t indicate differential expression results <- decideTests(fit2, method="nestedF") vennCounts(results)

See also

An overview of linear model functions in limma is given by 06.LinearModels.

Note

For efficiency reasons, this function does not re-factorize the design matrix for each probe. A consequence is that, if the design matrix is non-orthogonal and the original fit included precision weights or missing values, then the unscaled standard deviations produced by this function are approximate rather than exact. The approximation is usually acceptable. If not, then the issue can be avoided by redefining the design matrix to fit the contrasts directly. Even with precision weights or missing values, the results from contrasts.fit are always exact if the coefficients being compared are statistically independent. This will be true, for example, if the original fit was a oneway model without blocking and the group-means (no-intercept) parametrization was used for the design matrix.

Author

Gordon Smyth

controlStatus

Set Status of each Spot from List of Spot Types

Bioconductor · 3.68.2 · limma/man/controlStatus.Rd · 2026-05-07

Determine the type (or status) of each spot in the gene list.

Aliases

controlStatus

Keywords

Usage

controlStatus(types, genes, spottypecol="SpotType", regexpcol, verbose=TRUE)

Arguments

types

dataframe containing spot type specifiers, usually input using readSpotTypes.

genes

dataframe containing gene annotation, or an object of class RGList, MAList, EListRaw, EList or MArrayLM from which the gene annotation can be extracted.

spottypecol

integer or name specifying column of types containing spot type names.

regexpcol

vector of integers or column names specifying columns of types containing regular expressions. Defaults to any column names in common between types and genes.

verbose

logical, if TRUE then progess on pattern matching is reported to the standard output channel.

Details

This function constructs a vector of status codes by searching for patterns in the gene list. The data frame genes contains gene IDs and should have as many rows as there are spots on the microarrays. Such a data frame is often read using readGAL. The data frame types has as many rows as you want to distinguish types of spots in the gene list. This data frame should contain a column or columns, the regexpcol columns, which have the same names as columns in genes and which contain patterns to match in the gene list. Another column, the spottypecol, contains the names of the spot types. Any other columns are assumed to contain plotting parameters, such as colors or symbols, to be associated with the spot types. The patterns in the regexpcol columns are simplified regular expressions. For example, AA* means any string starting with AA, *AA means any code ending with AA, AA means exactly these two letters, *AA* means any string containing AA, AA. means AA followed by exactly one other character and AA\. means exactly AA followed by a period and no other characters. Any other regular expressions are allowed but the codes ^ for beginning of string and $ for end of string should not be included. Note that the patterns are matched sequentially from first to last, so more general patterns should be included first. For example, it is often a good idea to include a default spot-type as the first line in types with pattern * for all regexpcol columns and default plotting parameters.

Value

Character vector specifying the type (or status) of each spot on the array. Attributes contain plotting parameters associated with each spot type.

Examples

genes <- data.frame( ID=c("Control","Control","Control","Control","AA1","AA2","AA3","AA4"), Name=c("Ratio 1","Ratio 2","House keeping 1","House keeping 2", "Gene 1","Gene 2","Gene 3","Gene 4")) types <- data.frame( SpotType=c("Gene","Ratio","Housekeeping"), ID=c("*","Control","Control"), Name=c("*","Ratio*","House keeping*"), col=c("black","red","blue")) status <- controlStatus(types,genes)

See also

An overview of LIMMA functions for reading data is given in 03.ReadingData.

Author

Gordon Smyth

coolmap

Heatmap of gene expression values

Bioconductor · 3.68.2 · limma/man/coolmap.Rd · 2026-05-07

Create a heatmap of a matrix of log-expression values.

Aliases

coolmap

Usage

coolmap(x, cluster.by="de pattern", col=NULL, linkage.row="complete", linkage.col="complete", show.dendrogram="both", ...)

Arguments

any data object that can be coerced to a matrix of log-expression values, for example an ExpressionSet or EList. Rows represent genes and columns represent RNA samples.

cluster.by

choices are "de pattern" or "expression level". In the former case, the intention is to cluster by relative changes in expression, so genes are clustered by Pearson correlation and log-expression values are mean-corrected by rows for the plot. In the latter case, the intention is to cluster by absolute expression, so genes are clustered by Euclidean and log-expression values are not mean-corrected.

col

character vector specifying the color panel. Can be either the name of the panel or a vector of R colors that can be passed directly to the heatmap.2 function. Possible panel names are "redblue", "redgreen", "yellowblue" or "whitered". Defaults to "redblue" if cluster.by="de pattern" or "yellowblue" if cluster.by="expression level".

linkage.row

linkage criterion used to cluster the rows. Choices are "none", "ward", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid", with "ward" treated as "ward.D2".

linkage.col

linkage criterion used to cluster the columns. Choices are the same as for linkage.row.

show.dendrogram

choices are "row", "column", "both" or "none".

any other arguments are passed to heatmap.2. See details for which arguments are reserved.

Details

This function calls the heatmap.2 function in the gplots package with sensible argument settings for genomic log-expression data. The default settings for heatmap.2 are often not ideal for expression data, and overriding the defaults requires explicit calls to hclust and as.dendrogram as well as prior standardization of the data values. The coolmap function implements our preferred defaults for the two most common types of heatmaps. When clustering by relative expression (cluster.by="de pattern"), it implements a row standardization that takes account of NA values and standard deviations that might be zero. coolmap sets the following heatmap.2 arguments internally: Rowv, Colv, scale, density.info, trace, col, symbreaks, symkey, dendrogram, key.title and key.xlab. These arguments are therefore reserved and cannot be varied. Other than these reserved arguments, any other heatmap.2 argument can be included in the coolmap call, thereby giving full access to heatmap.2 functionality.

Value

A plot is created on the current graphics device. A list is also invisibly returned, see [gplots]heatmap.2 for details.

Examples

# Simulate gene expression data for 50 genes and 6 microarrays. # Samples are in two groups # First 50 probes are differentially expressed in second group ngenes <- 50 sd <- 0.3*sqrt(4/rchisq(ngenes,df=4)) x <- matrix(rnorm(ngenes*6,sd=sd),ngenes,6) rownames(x) <- paste("Gene",1:ngenes) x <- x + seq(from=0, to=16, length=ngenes) x[,4:6] <- x[,4:6] + 2 coolmap(x)

See also

[gplots]heatmap.2, hclust, dist. An overview of diagnostic functions available in LIMMA is given in 09.Diagnostics.

Author

Gordon Smyth

cumOverlap

Cumulative Overlap Analysis of Ordered Lists

Bioconductor · 3.68.2 · limma/man/cumOverlap.Rd · 2026-05-07

Test whether the leading members of ordered lists significantly overlap.

Aliases

cumOverlap

Concepts

gene set enrichment analysis

Usage

cumOverlap(ol1, ol2)

Arguments

ol1

vector containing first ordered list. Duplicate values not allowed.

ol2

vector containing second ordered list. Should contain the same values as found in ol1 but in a possibly different order. Duplicate values not allowed.

Details

The function compares the top n members of each list, for every possible n, and conducts an hypergeometric test for overlap. The function returns the value of n giving the smallest p-value. The p-values are adjusted for multiple testing in a similar way to Bonferroni's method, but starting from the top of th e ranked list instead of from the smallest p-values. This approach is designed to be sensitive to contexts where the number of Ids involved in the significant overlap are a small proportion of the total. The vectors ol1 and ol2 do not need to be of the same length, but only values in common between the two vectors will be used in the calculation. This method was described in Chapter 4 of Wu (2011).

Value

List containing the following components: n.totalinteger, total number of values in common between ol1 and ol2. n.mininteger, top table length leading to smallest adjusted p-value. p.minsmallest adjusted p-value. n.overlapinteger, number of overlapping IDs in first n.min. id.overlapvector giving the overlapping IDs in first n.min. p.valuenumeric, vector of p-values for each possible top table length. adj.p.valuenumeric, vector of Bonferroni adjusted p-values for each possible top table length.

Examples

GeneIds <- paste0("Gene",1:50) ol1 <- GeneIds ol2 <- c(sample(GeneIds[1:5]), sample(GeneIds[6:50])) coa <- cumOverlap(ol1, ol2) coa$p.min coa$id.overlap

Author

Gordon Smyth and Di Wu

References

Wu, D (2011). Finding hidden relationships between gene expression profiles with application to breast cancer biology. PhD thesis, University of Melbourne. http://hdl.handle.net/11343/36278

decideTests

Multiple Testing Across Genes and Contrasts

Bioconductor · 3.68.2 · limma/man/decideTests.Rd · 2026-05-07

Identify which genes are significantly differentially expressed for each contrast from a fit object containing p-values and test statistics. A number of different multiple testing strategies are offered that adjust for multiple testing down the genes as well as across contrasts for each gene.

Aliases

decideTestsdecideTests.defaultdecideTests.MArrayLM

Keywords

htest

Usage

decideTestsMArrayLM(object, method = "separate", adjust.method = "BH", p.value = 0.05, lfc = 0, ) decideTestsdefault(object, method = "separate", adjust.method = "BH", p.value = 0.05, lfc = 0, coefficients = NULL, cor.matrix = NULL, tstat = NULL, df = Inf, genewise.p.value = NULL, )

Arguments

object

a numeric matrix of p-values or an MArrayLM object from which p-values and t-statistics can be extracted.

method

character string specifying how genes and contrasts are to be combined in the multiple testing scheme. Choices are "separate", "global", "hierarchical" or "nestedF".

adjust.method

character string specifying p-value adjustment method. Possible values are "none", "BH", "fdr" (equivalent to "BH"), "BY" and "holm". See [stats]p.adjust for details.

p.value

numeric value between 0 and 1 giving the required family-wise error rate or false discovery rate.

lfc

numeric, minimum absolute log2-fold-change required.

coefficients

numeric matrix of coefficients or log2-fold-changes. Of same dimensions as object.

cor.matrix

correlation matrix of coefficients. Square matrix of dimension ncol(object).

tstat

numeric matrix of t-statistics. Of same dimensions as object.

numeric vector of length nrow(object) giving degrees of freedom for the t-statistics.

genewise.p.value

numeric vector of length nrow(object) containing summary gene-level p-values for use with method="hierarchical".

other arguments are not used.

Details

This function can be applied to a matrix of p-values but is more often applied to an MArrayLM fit object produced by eBayes or treat. In either case, rows of object correspond to genes and columns to coefficients or contrasts. This function applies a multiple testing procedure and a significance level cutoff to the statistics contained in object. It implements a number of multiple testing procedures for determining whether each statistic should be considered significantly different from zero. method="separate" will apply multiple testing adjustments to each column of p-values separately. Setting method="separate" is equivalent to using topTable separately for each coefficient in the linear model fit and will identify the same probes as significantly differentially expressed if adjust.method is the same. method="global" will treat the entire matrix of t-statistics as a single vector of unrelated tests. method="hierarchical" adjusts down genes and then across contrasts. method="nestedF" adjusts down genes according to overall F-tests and then uses classifyTestsF to classify contrasts as significant or not for the selected genes. The default method="separate" and adjust.method="BH" settings are appropriate for most analyses. method="global" is useful when it is important that the same t-statistic cutoff should correspond to statistical significance for all the contrasts. The "nestedF" method was proposed by Michaud et al (2008) and achieves better consistency between contrasts than the other methods. It provides formal error rate control at the gene level but not for individual contrasts. See the classifyTestsF help page for more detail about the "nestedF" method. If object is a MArrayLM linear model fit, then the "hierarchical" method conducts row-wise F-tests and then proceeds to t-tests for those rows with significant F-tests. The multiple testing adjustment is applied initially to the F-tests and then, with an adjusted level, to the t-tests for each significant row. Also see the limma User's Guide for a discussion of the statistical properties of the various adjustment methods.

Value

An object of class [=TestResults-class]TestResults. This is essentially a numeric matrix with elements -1, 0 or 1 depending on whether each t-statistic is classified as significantly negative, not significant or significantly positive. If lfc>0 then contrasts are judged significant only when the log2-fold change is at least this large in absolute value. For example, one might choose lfc=log2(1.5) to restrict to 50% changes or lfc=1 for 2-fold changes. In this case, contrasts must satisfy both the p-value and the fold-change cutoff to be judged significant.

See also

An overview of multiple testing functions is given in 08.Tests.

Note

Although this function enables users to set p-value and lfc cutoffs simultaneously, this combination criterion is not recommended. logFC cutoffs tend to favor low expressed genes and thereby reduce rather than increase biological significance. Unless the fold changes and p-values are very highly correlated, the addition of a fold change cutoff can increase the family-wise error rate or false discovery rate above the nominal level. Users wanting to use fold change thresholding are recommended to use treat instead of eBayes and to leave lfc at the default value when using decideTests.

Author

Gordon Smyth

References

Michaud J, Simpson KM, Escher R, Buchet-Poyau K, Beissbarth T, Carmichael C, Ritchie ME, Schutz F, Cannon P, Liu M, Shen X, Ito Y, Raskind WH, Horwitz MS, Osato M, Turner DR, Speed TP, Kavallaris M, Smyth GK, Scott HS (2008). Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9, 363. 10.1186/1471-2164-9-363

designI2M

Convert Individual Channel Design Matrix to M-A Format

Bioconductor · 3.68.2 · limma/man/channel2M.Rd · 2026-05-07

Convert a design matrix in terms of individual channels to ones in terms of M-values or A-values for two-color microarray data.

Aliases

designI2MdesignI2A

Concepts

separate channel analysis

Usage

designI2M(design) designI2A(design)

Arguments

design

numeric model matrix with one row for each channel observation, i.e., twice as many rows as arrays

Details

If design is a model matrix suitable for modelling individual log-intensities for two color microarray data, then designI2M computes the corresponding model matrix for modelling M-values (log-ratios) and designI2A computes the model matrix for modelling A-values (average log-intensities). Note that the matrices designI2M(design) or designI2A(design) may be singular if not all of the coefficients are estimable from the M or A-values. In that case there will be columns containing entirely zeros.

Value

numeric model matrix with half as many rows as design

Examples

X <- cbind(1,c(1,1,1,1,0,0,0,0),c(0,0,0,0,1,1,1,1)) designI2M(X) designI2A(X)

See also

[stats]model.matrix in the stats package. An overview of individual channel linear model functions in limma is given by 07.SingleChannel.

Author

Gordon Smyth

detectionPValues

Detection P-Values from Negative Controls

Bioconductor · 3.68.2 · limma/man/detectionPValue.Rd · 2026-05-07

Compute the proportion of negative controls greater than each observed expression value. Particularly useful for Illumina BeadChips.

Aliases

detectionPValues.defaultdetectionPValues.EListRawdetectionPValues

Keywords

background correctionillumina beadchips

Usage

detectionPValuesEListRaw(x, status = NULL, ) detectionPValuesdefault(x, status, negctrl = "negative", )

Arguments

object of class EListRaw or a numeric matrix containing raw intensities for regular and control probes from a series of microarrays.

status

character vector giving probe types. Defaults to x$genes$Status if x is an EListRaw object.

negctrl

character string identifier for negative control probes.

other arguments are not currently used.

Details

The rows of x for which status == negctrl are assumed to correspond to negative control probes. For each column of x, the detection p-values are defined as (N.eq/2 + N.gt) / N.neg, where N.gt is the number of negative controls with expression greater than the observed value, N.eq is the number of negative controls with expression equal to the observed value, and N.neg is the total number of negative controls. When used on Illumina BeadChip data, this function produces essentially the same detection p-values as returned by Illumina's GenomeStudio software.

Value

numeric matrix of same dimensions as x containing detection p-values.

Examples

# Read Illumina binary IDAT files x <- read.idat(idat, bgx) x$other$Detection <- detectionPValues(x) y <- neqc(x)

See also

An overview of LIMMA functions to read expression data is given in 03.ReadingData. read.idat reads Illumina BeadChip expression data from binary IDAT files. neqc performs normexp background correction and quantile normalization aided by control probes.

Author

Gordon Smyth

References

Shi W, de Graaf C, Kinkel S, Achtman A, Baldwin T, Schofield L, Scott H, Hilton D, Smyth GK (2010). Estimating the proportion of microarray probes expressed in an RNA sample. Nucleic Acids Research 38(7), 2168-2176. 10.1093/nar/gkp1204

diffSplice

Differential Transcript or Exon Usage

Bioconductor · 3.68.2 · limma/man/diffSplice.Rd · 2026-05-07

Aliases

diffSplicediffSplice.MArrayLM

Concepts

RNA-seqDifferential transcript usageDifferential usage

Usage

diffSpliceMArrayLM(fit, geneid, exonid = NULL, robust = FALSE, legacy = FALSE, verbose = TRUE, )

Arguments

fit

an MArrayLM fitted model object produced by voomLmFit, lmFit or contrasts.fit. Rows should correspond to transcripts, for a DTU analysis, or to exons and exon-exon junctions for a DEU analysis.

geneid

gene identifiers. Either a vector of length nrow(fit) or the name of the column of fit$genes containing the gene identifiers. Rows with the same ID are assumed to belong to the same gene.

exonid

feature or exon identifiers. Either a vector of length nrow(fit) or the name of the column of fit$genes containing the feature identifiers.

robust

logical, should the estimation of the empirical Bayes prior parameters be robustified against outlier sample variances?

legacy

logical. If FALSE then the new empirical Bayes hyperparameter estimation (introduced in limma 3.61.8) will be used, if TRUE the earlier hyperparameter estimation will be used. The new method is particularly appropriate when the residual degrees of freedom are not all equal, which is likely to be the case for diffSplice.

verbose

logical, if TRUE some diagnostic information about the number of genes and exons is output.

other arguments are not currently used.

Details

Given a limma fit object with multiple features per gene, test for differential usage of those features within genes between experimental conditions. The features may be any set of isoform-identifying features such as transcripts, exons or exon-junctions. The features can be transcripts for a differential transcript usage (DTU) analysis (Baldoni et al 2025), or can be a combination of exons and exon-exon junctions for a differential exon usage (DEU) analysis. Another possibility is for the features to be transcript-overlap equivalence classes (Cmero et al 2019). Testing for differential usage is equivalent to testing whether the log-fold-changes in the fit differ between features for the same gene. Two different tests are provided. The first is a F-test for differences between the log-fold-changes for each gene. This is equivalent to testing for interaction between the features for that gene and the coefficient of the linear model. The other is a series of t-tests in which each transcript is compared to the weighted average of all other features for the same gene. The feature-level t-tests are converted into genewise tests by adjusting the p-values for the same gene by Simes method. As an alternative, the feature-level t-tests are also converted into genewise tests by adjusting the smallest p-value for each gene by Bonferroni's method. This function can be used on transcript level RNA-seq counts from Salmon or kallisto, after pre-processing by the edgeR functions catchSalmon() (or catchKallisto or catchRSEM) and voomLmFit(), as described by Baldoni et al (2025). It can also be used on equivalence-class counts from Salmon or kallisto, after pre-processing by voomLmFit(), as described by Cmero et al (2019). It can also be used on exon-level read counts or on data from an exon microarray.

Value

An object of class MArrayLM containing both exon level and gene level tests. Results are sorted by geneid and by exonid within gene. coefficientsnumeric matrix of coefficients of same dimensions as fit. Each coefficient is the difference between the log-fold-change for that exon versus the average log-fold-change for all other exons for the same gene. tnumeric matrix of moderated t-statistics, of same dimensions as fit. p.valuenumeric vector of p-values corresponding to the t-statistics genesdata.frame of exon annotation genecolnamecharacter string giving the name of the column of genes containing gene IDs gene.Fnumeric matrix of moderated F-statistics, one row for each gene. gene.F.p.valuenumeric matrix of p-values corresponding to gene.F gene.simes.p.valuenumeric matrix of Simes adjusted p-values, one row for each gene. gene.bonferroni.p.valuenumeric matrix of Bonferroni adjusted p-values, one row for each gene. gene.genesdata.frame of gene annotation.

Examples

fit <- voomLmFit(dge, design) ex <- diffSplice(fit, geneid="GeneID") topSplice(ex) plotSplice(ex, xlab="Transcript")

See also

topSplice and plotSplice are downstream functions that operate on the output from diffSplice. Also see diffSplice.DGEGLM in the edgeR package, which has comparable functionality but for edgeR fit objects. A summary of functions available in LIMMA for RNA-seq analysis is given in 11.RNAseq.

Note

This function is not designed for situations with a very high level of multi-counting of RNA-seq reads that overlap two or more exons. In particular, it is not designed for use with "chopped" exons, where overlapping exons belonging to different transcripts of the same gene are chopped up into unique sub-exons, because artificial exons of this sort lead to high levels of multi-counting.

Author

Gordon Smyth and Charity Law

References

Baldoni PL, Chen L, Li M, Chen Y, Smyth GK (2025). Dividing out quantification uncertainty enables assessment of differential transcript usage with limma and edgeR. Nucleic Acids Research. bioRxiv 10.1101/2025.04.07.647659. Cmero M, Davidson NM, Oshlack A (2019). Using equivalence class counts for fast and accurate testing of differential transcript usage. F1000Research 8, 265. 10.12688/f1000research.18276.2.

dim

Retrieve the Dimensions of an RGList, MAList or MArrayLM Object

Bioconductor · 3.68.2 · limma/man/dim.Rd · 2026-05-07

Retrieve the number of rows (genes) and columns (arrays) for an RGList, MAList or MArrayLM object.

Aliases

dim.RGListdim.MAListdim.EListRawdim.EListdim.MArrayLM

Keywords

array

Usage

dimRGList(x)

Arguments

an object of class RGList, MAList or MArrayLM

Details

Microarray data objects share many analogies with ordinary matrices in which the rows correspond to spots or genes and the columns to arrays. These methods allow one to extract the size of microarray data objects in the same way that one would do for ordinary matrices. A consequence is that row and column commands nrow(x), ncol(x) and so on also work.

Value

Numeric vector of length 2. The first element is the number of rows (genes) and the second is the number of columns (arrays).

Examples

M <- A <- matrix(11:14,4,2) rownames(M) <- rownames(A) <- c("a","b","c","d") colnames(M) <- colnames(A) <- c("A1","A2") MA <- new("MAList",list(M=M,A=A)) dim(M) ncol(M) nrow(M)

See also

[base]dim in the base package. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

dimnames

Retrieve the Dimension Names of an RGList, MAList, EList, EListRaw or MArrayLM Object

Bioconductor · 3.68.2 · limma/man/dimnames.Rd · 2026-05-07

Retrieve the dimension names of a microarray data object.

Aliases

dimnames.RGListdimnames.MAListdimnames.EListRawdimnames.EListdimnames.MArrayLMdimnames<-.RGListdimnames<-.MAListdimnames<-.EListdimnames<-.EListRaw

Keywords

array

Usage

dimnamesRGList(x) dimnamesRGList(x) <- value

Arguments

an object of class RGList, MAList, EList, EListRaw or (not for assignment) MArrayLM

value

a possible value for dimnames(x): see dimnames

Details

The dimension names of a microarray object are the same as those of the most important matrix component of that object. A consequence is that rownames and colnames will work as expected.

Value

Either NULL or a list of length 2. If a list, its components are either NULL or a character vector the length of the appropriate dimension of x.

See also

[base]dimnames in the base package. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

dupcor

Correlation Between Duplicates or Within Blocks

Bioconductor · 3.68.2 · limma/man/dupcor.Rd · 2026-05-07

Estimate the intra-block correlation given a block structure for the arrays or samples.

Aliases

duplicateCorrelation

Keywords

modelsregression

Usage

duplicateCorrelation(object, design=NULL, ndups=2, spacing=1, block=NULL, trim=0.15, weights=NULL)

Arguments

object

A matrix-like data object containing log-ratios or log-expression values for a series of samples, with rows corresponding to genes and columns to samples. Any type of data object that can be processed by getEAWP is acceptable.

design

the design matrix of the microarray experiment, with rows corresponding to arrays and columns to comparisons to be estimated. The number of rows must match the number of columns of object. Defaults to the unit vector meaning that the arrays are treated as replicates.

ndups

a positive integer giving the number of times each gene is printed on an array. nrow(object) must be divisible by ndups. Ignored if block is specified.

spacing

the spacing between the rows of object corresponding to duplicate spots, spacing=1 for consecutive spots

block

vector or factor specifying a blocking variable

trim

the fraction of observations to be trimmed from each end of tanh(all.correlations) when computing the trimmed mean.

weights

an optional numeric matrix of the same dimension as object containing weights for each spot. If smaller than object then it will be filled out to the same size.

Details

When block=NULL, this function estimates the correlation between duplicate spots (regularly spaced within-array replicate spots). If block is not null, this function estimates the correlation between repeated observations on the blocking variable. Typically the blocks are biological replicates and repeated observations on the same block may be correlated. In either case, the correlation is estimated by fitting a mixed linear model by REML individually for each gene. The function also returns a consensus correlation, which is a robust average of the individual correlations, intended for input to functions such as lmFit, gls.series or voom. It is not possible to estimate correlations between duplicate spots and with sample blocks simultaneously. If block is not null, then the function will set ndups=1, which is equivalent to ignoring duplicate spots. For this function to return statistically useful results, there must be at least two more arrays than the number of coefficients to be estimated, i.e., two more than the column rank of design. The function may take long time to execute as it fits a mixed linear model for each gene using an iterative algorithm. If present, ndups and spacing will be extracted from object$printer$ndups and object$printer$spacing.

Value

A list with components consensus.correlationthe average estimated inter-duplicate correlation. The average is the trimmed mean of the individual correlations on the atanh-transformed scale. corsame as consensus.correlation, for compatibility with earlier versions of the software atanh.correlationsnumeric vector of length nrow(object)/ndups giving the individual genewise atanh-transformed correlations.

Examples

# Simulate a paired experiment with incomplete blocks Block <- c(1,1,2,2,3,3,4,4,5,6,7,8) Treat <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2)) design <- model.matrix(~Treat) ngenes <- 50 nsamples <- 12 y <- matrix(rnorm(ngenes*nsamples),ngenes,nsamples) rownames(y) <- paste0("Gene",1:ngenes) # Estimate the within-block correlation dupcor <- duplicateCorrelation(y,design,block=Block) dupcor$consensus.correlation # Estimate the treatment effect using both complete and incomplete blocks fit <- lmFit(y,design,block=Block,correlation=dupcor$consensus) fit <- eBayes(fit) topTable(fit,coef=2)

See also

These functions use [statmod:mixedmodel]mixedModel2Fit from the statmod package. An overview of linear model functions in limma is given by 06.LinearModels.

Author

Gordon Smyth

References

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075. [http://bioinformatics.oxfordjournals.org/content/21/9/2067] [Preprint with corrections: https://gksmyth.github.io/pubs/dupcor.pdf]

eBayes

Empirical Bayes Statistics for Differential Expression

Bioconductor · 3.68.2 · limma/man/ebayes.Rd · 2026-05-07

Given a linear model fit from lmFit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a global value.

Aliases

eBayestreat

Keywords

empirical Bayes

Usage

eBayes(fit, proportion = 0.01, stdev.coef.lim = c(0.1,4), trend = FALSE, span = NULL, robust = FALSE, winsor.tail.p = c(0.05,0.1), legacy = NULL) treat(fit, fc = 1.2, lfc = NULL, trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05,0.1), legacy = NULL, upshot = FALSE)

Arguments

fit

an MArrayLM fitted model object produced by lmFit or contrasts.fit. For ebayes only, fit can alternatively be an unclassed list produced by lm.series, gls.series or mrlm containing components coefficients, stdev.unscaled, sigma and df.residual.

proportion

numeric value between 0 and 1, assumed proportion of genes which are differentially expressed

stdev.coef.lim

numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes

trend

logical, should an intensity-dependent trend be allowed for the prior variance? If FALSE then the prior variance is constant. Alternatively, trend can be a row-wise numeric vector, which will be used as the covariate for the prior variance.

span

lowess span used for prior variance trend.

robust

logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

winsor.tail.p

numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize (if robust=TRUE and legacy=TRUE).

legacy

logical. If FALSE then the new hyperparameter estimation (introduced in limma 3.61.8) will be used, if TRUE the earlier hyperparameter estimation will be used. If NULL, then the new method will be used when the residual degrees of freedom are not all equal and the old method will be used otherwise. legacy=FALSE will also be assumed if span is not NULL.

minimum fold-change below which changes are not considered scientifically meaningful.

lfc

minimum log2-fold-change below which changes not considered scientifically meaningful. Defaults to log2(fc). If specified then takes precedence over fc.

upshot

logical. If TRUE, then a more powerful test is conducted assuming that the true logFC are uniformly distributed over the null interval from -lfc to lfc. If FALSE, then the standard TREAT p-values are computed assuming the worst-case scenario of true logFC equal to lfc.

Details

These functions are used to rank genes in order of evidence for differential expression. They use an empirical Bayes method to squeeze the genewise-wise residual variances towards a common value (or towards a global trend) (Smyth, 2004; Phipson et al, 2016). The degrees of freedom for the individual variances are increased to reflect the extra information gained from the empirical Bayes moderation, resulting in increased statistical power to detect differential expression. Theese functions accept as input an MArrayLM fitted model object fit produced by lmFit. The columns of fit define a set of contrasts which are to be tested equal to zero. The fitted model object may have been processed by contrasts.fit before being passed to eBayes to convert the coefficients of the original design matrix into an arbitrary number of contrasts. The empirical Bayes moderated t-statistics test each individual contrast equal to zero. For each gene (row), the moderated F-statistic tests whether all the contrasts are zero. The F-statistic is an overall test computed from the set of t-statistics for that probe. This is exactly analogous the relationship between t-tests and F-statistics in conventional anova, except that the residual mean squares have been moderated between genes. The estimates s2.prior and df.prior are computed by one of fitFDist, fitFDistRobustly or fitFDistUnequalDF1 (depending on settings for robust and legacy). s2.post is the weighted average of s2.prior and sigma^2 with weights proportional to df.prior and df.residual respectively. The log-odds of differential expression lods was called the B-statistic by Loennstedt and Speed (2002). The F-statistics F are computed by classifyTestsF with fstat.only=TRUE. eBayes does not compute ordinary t-statistics because they always have worse performance than the moderated versions. The ordinary (unmoderated) t-statistics can, however, can be easily extracted from the linear model output for comparison purposes---see the example code below. treat computes empirical Bayes moderated-t p-values relative to a minimum fold-change threshold. Instead of testing for genes that have true log-fold-changes different from zero, it tests whether the true log2-fold-change is greater than lfc in absolute value (McCarthy and Smyth, 2009). In other words, it uses an interval null hypothesis, where the interval is [-lfc,lfc]. When the number of DE genes is large, treat is often useful for giving preference to larger fold-changes and for prioritizing genes that are biologically important. treat is concerned with p-values rather than posterior odds, so it does not compute the B-statistic lods. The idea of thresholding doesn't apply to F-statistics in a straightforward way, so moderated F-statistics are also not computed. When fc=1 and lfc=0, treat is identical to eBayes, except that F-statistics and B-statistics are not computed. The fc threshold is usually chosen relatively small, because genes need to have fold changes substantially greater than the testing threshold in order to be considered statistically significant. Typical values for fc are 1.1, 1.2 or 1.5. The top genes chosen by treat can be examined using topTreat. The treat threshold can be specified either as a fold-change via fc or as a log2-fold-change via lfc, with lfc = log2(fc). Note that the treat testing procedure is considerably more rigorous and conservative than simply applying same fc values as a fold-change cutoff to the list of differentially expressed genes. Indeed, the observed log2-fold-change needs to substantially larger than lfc for a gene to be called as statistically significant by treat. The threshold should be chosen as a small value below which results should be ignored rather than as a target fold-change. In practice, modest values for fc such as 1.1, 1.2 or 1.5 are usually the most useful. Setting fc=1.2 or fc=1.5 will usually cause most differentially expressed genes to have estimated fold-changes of 2-fold or greater, depending on the sample size and precision of the experiment. Larger thresholds are usually overly conservative and counter productive. In general, the fc threshold should be chosen sufficiently small so that a worthwhile number of DE genes remain, otherwise the purpose of prioritizing genes with larger fold-changes will be defeated. The use of eBayes or treat with trend=TRUE is known as the limma-trend method (Law et al, 2014; Phipson et al, 2016). With this option, an intensity-dependent trend is fitted to the prior variances s2.prior. Specifically, squeezeVar is called with the covariate equal to Amean, the average log2-intensity for each gene. The trend that is fitted can be examined by plotSA. limma-trend is useful for processing expression values that show a mean-variance relationship. This is often useful for microarray data, and it can also be applied to RNA-seq counts that have been converted to log2-counts per million (logCPM) values (Law et al, 2014). When applied to RNA-seq logCPM values, limma-trend give similar results to the voom method. The voom method incorporates the mean-variance trend into the precision weights, whereas limma-trend incorporates the trend into the empirical Bayes moderation. limma-trend is somewhat simpler than voom because it assumes that the sequencing depths (library sizes) are not wildly different between the samples and it applies the mean-variance trend on a genewise basis instead to individual observations. limma-trend is recommended for RNA-seq analysis when the library sizes are reasonably consistent (less than 3-fold difference from smallest to largest) because of its simplicity and speed. If robust=TRUE then the robust empirical Bayes procedure of Phipson et al (2016) is used. This is frequently useful to protect the empirical Bayes procedure against hyper-variable or hypo-variable genes, especially when analysing RNA-seq data. See squeezeVar for more details. In limma 3.61.8 (August 2024), the new function fitFDistUnequalDF1 was introduced to improve estimation of the hyperparameters s2.prior and df.prior, especially when not all genes have the same residual degrees of freedom. fitFDistUnequalDF1 is a potential replacement for the original functions fitFDist and fitFDistRobustly and the argument legacy is provided to control backward compatibility. The new hyperparameter estimation will be used if legacy=FALSE and the original methods will be used if legacy=TRUE. If legacy=NULL, then the new method will be used if the residual degrees of freedom are unequal and the original methods otherwise. Unequal residual degrees of freedom arise in limma pipelines when the expression matrix includes missing values or from the quasi-likelihood pipeline in edgeR v4. If upshot=FALSE, then the original TREAT method is used, which computes p-values relative to the "worst case" scenario in which the true logFCs are on the boundary of the null hypothesis interval closest to the observed logFC. If upshot=TRUE, then a new more powerful UPSHOT approach suggested by Nikos Ignatiadis (University of Chicago) is used. The UPSHOT test assumes that the true logFC are uniformly distributed over the null interval from -lfc to lfc. The UPSHOT test gives correct error rate control over any unimodal distribution of true logFCs over the null interval.

Value

eBayes produces an object of class MArrayLM (see MArrayLM-class) containing everything found in fit plus the following added components: tnumeric matrix of moderated t-statistics. p.valuenumeric matrix of two-sided p-values corresponding to the t-statistics. lodsnumeric matrix giving the log-odds of differential expression (on the natural log scale). s2.priorestimated prior value for sigma^2. A row-wise vector if covariate is non-NULL, otherwise a single value. df.priordegrees of freedom associated with s2.prior. A row-wise vector if robust=TRUE, otherwise a single value. df.totalrow-wise numeric vector giving the total degrees of freedom associated with the t-statistics for each gene. Equal to df.prior+df.residual or sum(df.residual), whichever is smaller. s2.postrow-wise numeric vector giving the posterior values for sigma^2. var.priorcolumn-wise numeric vector giving estimated prior values for the variance of the log2-fold-changes for differentially expressed gene for each constrast. Used for evaluating lods. Frow-wise numeric vector of moderated F-statistics for testing all contrasts defined by the columns of fit simultaneously equal to zero. F.p.valuerow-wise numeric vector giving p-values corresponding to F. The matrices t, p.value and lods have the same dimensions as the input object fit, with rows corresponding to genes and columns to coefficients or contrasts. The vectors s2.prior, df.prior, df.total, F and F.p.value correspond to rows, with length equal to the number of genes. The vector var.prior corresponds to columns, with length equal to the number of contrasts. If s2.prior or df.prior have length 1, then the same value applies to all genes. s2.prior, df.prior and var.prior contain empirical Bayes hyperparameters used to obtain df.total, s2.post and lods. treat a produces an MArrayLM object similar to that from eBayes but without lods, var.prior, F or F.p.value.

Examples

# See also lmFit examples # Simulate gene expression data, # 6 microarrays and 100 genes with one gene differentially expressed set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- matrix(rnorm(100*6,sd=sqrt(sigma2)),100,6) design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1)) y[1,4:6] <- y[1,4:6] + 1 fit <- lmFit(y,design) # Moderated t-statistic fit <- eBayes(fit) topTable(fit,coef=2) # Ordinary t-statistic ordinary.t <- fit$coef[,2] / fit$stdev.unscaled[,2] / fit$sigma # Treat relative to a 10% fold-change tfit <- treat(fit, fc=1.1) topTreat(tfit,coef=2)

See also

squeezeVar, fitFDist, tmixture.matrix, plotSA. An overview of linear model functions in limma is given by 06.LinearModels.

Note

The algorithm used by eBayes and treat with robust=TRUE was revised slightly in limma 3.27.6. The minimum df.prior returned may be slightly smaller than previously.

Author

Gordon Smyth and Davis McCarthy

References

Law CW, Chen Y, Shi W, Smyth GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. 10.1186/gb-2014-15-2-r29. See also the Preprint Version at https://gksmyth.github.io/pubs/VoomPreprint.pdf incorporating some notational corrections. Loennstedt I, Speed TP (2002). Replicated microarray data. Statistica Sinica 12, 31-46. McCarthy DJ, Smyth GK (2009). Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics 25, 765-771. 10.1093/bioinformatics/btp053 Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. 10.1214/16-AOAS920 Smyth GK (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3. 10.2202/1544-6115.1027. See also the Preprint Version https://gksmyth.github.io/pubs/ebayes.pdf incorporating corrections to 30 June 2009.

exprs.MA

Extract Log-Expression Matrix from MAList

Bioconductor · 3.68.2 · limma/man/exprsMA.Rd · 2026-05-07

Extract the matrix of log-expression values from an MAList object.

Aliases

exprs.MA

Keywords

array

Usage

exprs.MA(MA)

Arguments

an MAList object.

Details

Converts M and A-values to log-expression values. The output matrix will have two columns for each array, in the order green, red for each array. This contrasts with as.matrix.MAList which extracts the M-values only, or RG.MA which converts to expression values in RGList form.

Value

A numeric matrix with twice the columns of the input.

See also

02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

fitFDist

Moment Estimation of Scaled F-Distribution

Bioconductor · 3.68.2 · limma/man/fitfdist.Rd · 2026-05-07

Moment estimation of the parameters of a scaled F-distribution given one of the degrees of freedom. These functions are called internally by eBayes and squeezeVar and is not usually called directly by a user.

Aliases

fitFDistfitFDistRobustlyfitFDistUnequalDF1

Concepts

empirical Bayes

Usage

fitFDist(x, df1, covariate = NULL) fitFDistRobustly(x, df1, covariate = NULL, winsor.tail.p = c(0.05,0.1), trace = FALSE) fitFDistUnequalDF1(x, df1, covariate = NULL, span = NULL, robust = FALSE, prior.weights = NULL)

Arguments

numeric vector or array of positive values representing a sample from a scaled F-distribution.

df1

the first degrees of freedom of the F-distribution. Can be a single value, or else a vector of the same length as x.

covariate

if non-NULL, the estimated scale value will depend on this numeric covariate.

winsor.tail.p

numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize.

trace

logical value indicating whether a trace of the iteration progress should be printed.

span

lowess span parameter. Defaults to chooseLowessSpan(length(x),small.n=500).

robust

logical. Should outlier values of x be down-weighted with results similar to fitFDistRobustly?

prior.weights

numeric vector of (non-negative) prior weights.

Details

fitFDist() implements an algorithm proposed by Smyth (2004) and Phipson et al (2016). It estimates scale and df2 under the assumption that x is distributed as scale times an F-distributed random variable on df1 and df2 degrees of freedom. The parameters are estimated using the method of moments, specifically from the mean and variance of the x values on the log-scale. When covariate is supplied, a spline curve trend will be estimated for the x values and the estimation will be adjusted for this trend (Phipson et al, 2016). fitFDistRobustly is similar to fitFDist except that it computes the moments of the Winsorized values of x, making it robust against left and right outliers. Larger values for winsor.tail.p produce more robustness but less efficiency. When covariate is supplied, a loess trend is estimated for the x values. The robust method is described by Phipson et al (2016). As well as estimating the F-distribution for the bulk of the cases, i.e., with outliers discounted, fitFDistRobustly also returns an estimated F-distribution with reduced df2 that might be appropriate for each outlier case. fitFDistUnequalDF1 was introduced in limma 3.61.8 and gives special attention to the possibility that the degrees of freedom df1 might be unequal and might include non-integer values. The most important innovation of fitFDistUnequalDF1 is downweighting of observations with lower degrees of freedom, to give more precise estimation overall. It also allows the possibility of prior weights, which can be used to downweight unreliable x values for reasons other than small df1. fitFDistUnequalDF1 implements a different robust estimation strategy to fitFDistRobustly. Instead of Winsorizing the x values, potential outliers are instead downweighted using the prior weights. Whereas fitFDist and fitFDistRobustly use unweighted moment estimation for both scale and df2, fitFDistUnequalDF1 uses weighted moment estimation for scale and profile maximum likelihood for df2. fitFDistUnequalDF1 gives improved performance over fitFDist and fitFDistRobustly, especially when the degrees of freedom are unequal but also to a lesser extent when the degrees of freedom are equal. Unequal residual degrees of freedom arise in limma pipelines when the expression matrix includes missing values, or from edgeR::voomLmFit or from the quasi-likelihood pipeline in edgeR v4 (Chen et al 2024). The edgeR v4 pipeline produces fractional degrees of freedom including, potentially, degrees of freedom less than 1.

Value

fitFDist or fitFDistUnequalDF1 with robust=FALSE produces a list with the following components: scalescale factor for F-distribution. A vector if covariate is non-NULL, otherwise a scalar. df2the second degrees of freedom of the fitted F-distribution. fitFDistRobustly returns the following components as well: tail.p.valueright tail probability of the scaled F-distribution for each x value. prob.outlierposterior probability that each case is an outlier relative to the scaled F-distribution with degrees of freedom df1 and df2. df2.outlierthe second degrees of freedom associated with extreme outlier cases. df2.shrunknumeric vector of values for the second degrees of freedom, with shrunk values for outliers. Most values are equal to df2, but outliers have reduced values depending on how extreme each case is. All values lie between df2.outlier and df2.

Examples

x <- rf(100,df1=8,df2=16) fitFDist(x,df1=8)

See also

This function is called by squeezeVar, which in turn is called by eBayes and treat. This function calls trigammaInverse.

Note

The algorithm used by fitFDistRobustly was revised slightly in limma 3.27.6. The prob.outlier value, which is the lower bound for df2.shrunk, may be slightly smaller than previously.

Author

Gordon Smyth, Belinda Phipson (fitFDistRobustly) and Lizhong Chen (fitFDistUnequalDF1).

References

Smyth GK (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1, Article 3. 10.2202/1544-6115.1027 https://gksmyth.github.io/pubs/ebayes.pdf Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. 10.1214/16-AOAS920 Chen Y, Chen L, Lun ATL, Baldoni PL, Smyth GK (2024). edgeR 4.0: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. bioRxiv 2024.01.21.576131. 10.1101/2024.01.21.576131

fitGammaIntercept

Fit Intercept to Vector of Gamma Distributed Variates

Bioconductor · 3.68.2 · limma/man/fitGammaIntercept.Rd · 2026-05-07

Fit Intercept to Vector of Gamma Distributed Variates

Aliases

fitGammaIntercept

Keywords

distribution

Usage

fitGammaIntercept(y,offset=0,maxit=1000)

Arguments

numeric vector of positive response values.

offset

numeric vector giving known part of the expected value of y. Can be a single value, or else a vector of the same length as y.

maxit

maximum number of Newton iterations to be done.

Details

The values y are assumed to follow a gamma distribution with common shape parameter and with expected values given by x+offset. The function implements a globally convergent Newton iteration to estimate x.

Value

Numeric value giving intercept.

Examples

offset <- runif(10) x <- 9 mu <- x+offset y <- rgamma(10,shape=20,scale=mu/20) fitGammaIntercept(y,offset=offset)

See also

This function is called by genas.

Author

Gordon Smyth and Belinda Phipson

References

Phipson, B. (2013). Empirical Bayes modelling of expression profiles and their associations. PhD Thesis. University of Melbourne, Australia.

fitmixture

Fit Mixture Model by Non-Linear Least Squares

Bioconductor · 3.68.2 · limma/man/fitmixture.Rd · 2026-05-07

Fit Mixture Model by Non-Linear Least Squares

Aliases

fitmixture

Usage

fitmixture(log2e, mixprop, niter = 4, trace = FALSE)

Arguments

log2e

a numeric matrix containing log2 expression values. Rows correspond to probes for genes and columns to RNA samples.

mixprop

a vector of length ncol(log2e) giving the mixing proportion (between 0 and 1) for each sample.

niter

integer number of iterations.

trace

logical. If TRUE, summary working estimates are output from each iteration.

Details

A mixture experiment is one in which two reference RNA sources are mixed in different proportions to create experimental samples. Mixture experiments have been used to evaluate genomic technologies and analysis methods (Holloway et al, 2006). This function uses all the data for each gene to estimate the expression level of the gene in each of two pure samples. The function fits a nonlinear mixture model to the log2 expression values for each gene. The expected values of log2e for each gene are assumed to be of the form log2( mixprop*Y1 + (1-mixprop)*Y2 ) where Y1 and Y2 are the expression levels of the gene in the two reference samples being mixed. The mixprop values are the same for each gene but Y1 and Y2 are specific to the gene. The function returns the estimated values A=0.5*log2(Y1*Y2) and M=log2(Y2/Y1) for each gene. The nonlinear estimation algorithm implemented in fitmixture uses a nested Gauss-Newton iteration (Smyth, 1996). It is fully vectorized so that the estimation is done for all genes simultaneously.

Value

List with three components: Anumeric vector giving the estimated average log2 expression of the two reference samples for each gene Mnumeric vector giving estimated log-ratio of expression between the two reference samples for each gene stdevstandard deviation of the residual term in the mixture model for each gene

Examples

ngenes <- 100 TrueY1 <- rexp(ngenes) TrueY2 <- rexp(ngenes) mixprop <- matrix(c(0,0.25,0.75,1),1,4) TrueExpr <- TrueY1 %*% mixprop + TrueY2 %*% (1-mixprop) log2e <- log2(TrueExpr) + matrix(rnorm(ngenes*4),ngenes,4)*0.1 out <- fitmixture(log2e,mixprop) # Plot true vs estimated log-ratios plot(log2(TrueY1/TrueY2), out$M)

Author

Gordon K Smyth

References

Holloway, A. J., Oshlack, A., Diyagama, D. S., Bowtell, D. D. L., and Smyth, G. K. (2006). Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis. BMC Bioinformatics 7, Article 511. 10.1186/1471-2105-7-511 Smyth, G. K. (1996). Partitioned algorithms for maximum likelihood and other nonlinear estimation. Statistics and Computing, 6, 201-216. https://gksmyth.github.io/pubs/partitio.pdf

fitted.MArrayLM

Fitted Values Method for MArrayLM Fits

Bioconductor · 3.68.2 · limma/man/fitted.MArrayLM.Rd · 2026-05-07

Obtain fitted values from a fitted linear model object.

Aliases

fitted.MArrayLM

Keywords

modelsregression

Concepts

regression

Usage

3methodfittedMArrayLM(object, )

Arguments

object

a fitted object of class inheriting from "MArrayLM".

other arguments are not currently used.

Value

A numeric matrix of fitted values.

See also

residuals.MArrayLM. See also fitted in the stats package.

Author

Gordon Smyth

genas

Genuine Association of Gene Expression Profiles

Bioconductor · 3.68.2 · limma/man/genas.Rd · 2026-05-07

Calculates biological correlation between two gene expression profiles.

Aliases

genas

Usage

genas(fit, coef=c(1,2), subset="all", plot=FALSE, alpha=0.4)

Arguments

fit

an MArrayLM fitted model object produced by lmFit or contrasts.fit and followed by eBayes.

coef

numeric vector of length 2 indicating which columns in the fit object are to be correlated.

subset

character string indicating which subset of genes to include in the correlation analysis. Choices are "all", "Fpval", "p.union", "p.int", "logFC" or "predFC".

plot

logical, should a scatterplot be produced summarizing the correlation analysis?

alpha

numeric value between 0 and 1 determining the transparency of the technical and biological ellipses if a plot is produced. alpha=0 indicates fully transparent and alpha=1 indicates fully opague.

Details

The function estimates the biological correlation between two different contrasts in a linear model. By biological correlation, we mean the correlation that would exist between the log2-fold changes (logFC) for the two contrasts, if measurement error could be eliminated and the true log-fold-changes were known. This function is motivated by the fact that different contrasts for a linear model are often strongly correlated in a technical sense. For example, the estimated logFC for multiple treatment conditions compared back to the same control group will be positively correlated even in the absence of any biological effect. This function aims to separate the biological from the technical components of the correlation. The method is explained briefly in Majewski et al (2010) and in full detail in Phipson (2013). The subset argument specifies whether and how the fit object should be subsetted. Ideally, only genes that are truly differentially expressed for one or both of the contrasts should be used estimate the biological correlation. The default is "all", which uses all genes in the fit object to estimate the biological correlation. The option "Fpval" chooses genes based on how many F-test p-values are estimated to be truly significant using the function propTrueNull. This should capture genes that display any evidence of differential expression in either of the two contrasts. The options "p.union" and "p.int" are based on the moderated t p-values from both contrasts. From the propTrueNull function an estimate of the number of p-values truly significant in either of the two contrasts can be obtained. "p.union" takes the union of these genes and "p.int" takes the intersection of these genes. The other options, "logFC" and "predFC" subsets on genes that attain a logFC or predFC at least as large as the 90th percentile of the log fold changes or predictive log fold changes on the absolute scale. The plot option is a logical argument that specifies whether or not to plot a scatter plot of log-fold-changes for the two contrasts. The biological and technical correlations are overlaid on the scatterplot using semi-transparent ellipses. library(ellipse) is required to enable the plotting of ellipses.

Value

genas produces a list with the following components: technical.correlationestimate of the technical correlation biological.correlationestimate of the biological correlation covariance.matrixestimate of the covariance matrix from which the biological correlation is obtained deviancethe likelihood ratio test statistic used to test whether the biological correlation is equal to 0 p.valuethe p.value associated with deviance nthe number of genes used to estimate the biological correlation

Examples

# Simulate gene expression data # Three conditions (Control, A and B) and 1000 genes ngene <- 1000 mu.A <- mu.B <- mu.ctrl <- rep(5,ngene) # 200 genes are differentially expressed. # All are up in condition A and down in B # so the biological correlation is negative. mu.A[1:200] <- mu.ctrl[1:200]+2 mu.B[1:200] <- mu.ctrl[1:200]-2 # Two microarrays for each condition mu <- cbind(mu.ctrl,mu.ctrl,mu.A,mu.A,mu.B,mu.B) y <- matrix(rnorm(6000,mean=mu,sd=1),ngene,6) # two experimental groups and one control group with two replicates each group <- factor(c("Ctrl","Ctrl","A","A","B","B"), levels=c("Ctrl","A","B")) design <- model.matrix(~group) # fit a linear model fit <- lmFit(y,design) fit <- eBayes(fit) # Estimate biological correlation between the logFC profiles # for A-vs-Ctrl and B-vs-Ctrl genas(fit, coef=c(2,3), plot=TRUE, subset="F")

See also

lmFit, eBayes, contrasts.fit

Note

As present, genas assumes that technical correlations between coefficients are the same for all genes, and hence it only works with fit objects that were created without observation weights or missing values. It does not work with voom pipelines, because these involve observation weights.

Author

Belinda Phipson and Gordon Smyth

References

Majewski, IJ, Ritchie, ME, Phipson, B, Corbin, J, Pakusch, M, Ebert, A, Busslinger, M, Koseki, H, Hu, Y, Smyth, GK, Alexander, WS, Hilton, DJ, and Blewitt, ME (2010). Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells. Blood 116, 731-739. http://www.bloodjournal.org/content/116/5/731 Phipson, B. (2013). Empirical Bayes modelling of expression profiles and their associations. PhD Thesis. University of Melbourne, Australia. http://hdl.handle.net/11343/38162 Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43, e47. http://nar.oxfordjournals.org/content/43/7/e47

geneSetTest

Mean-rank Gene Set Test

Bioconductor · 3.68.2 · limma/man/geneSetTest.Rd · 2026-05-07

Test whether a set of genes is highly ranked relative to other genes in terms of a given statistic. Genes are assumed to be independent.

Aliases

geneSetTestwilcoxGST

Concepts

gene set tests

Usage

geneSetTest(index, statistics, alternative = "mixed", type= "auto", ranks.only = TRUE, nsim=9999) wilcoxGST(index, statistics, )

Arguments

index

index vector for the gene set. This can be a vector of indices, or a logical vector of the same length as statistics or, in general, any vector such that statistic[index] gives the statistic values for the gene set to be tested.

statistics

vector, any genewise statistic by which genes can be ranked.

alternative

character string specifying the alternative hypothesis, must be one of "mixed", "either", "up" or "down". "two.sided", "greater" and "less" are also permitted as synonyms for "either", "up" and "down" respectively.

type

character string specifying whether the statistics are signed (t-like, "t") or unsigned (F-like, "f") or whether the function should make an educated guess ("auto"). If the statistic is unsigned, then it assume that larger statistics are more significant.

ranks.only

logical, if TRUE only the ranks of the statistics are used.

nsim

number of random samples to take in computing the p-value. Not used if ranks.only=TRUE.

other arguments are passed to geneSetTest.

Details

These functions compute a p-value to test the hypothesis that the indexed test set of genes tends to be more highly ranked in terms of some test statistic compared to randomly chosen genes. The statistic might be any statistic of interest, for example a t-statistic or F-statistic for differential expression. Like all gene set tests, these functions can be used to detect differential expression for a group of genes, even when the effects are too small or there is too little data to detect the genes individually. wilcoxGST is a synonym for geneSetTest with ranks.only=TRUE. This version of the test procedure was developed by Michaud et al (2008), who called it mean-rank gene-set enrichment. geneSetTest performs a competitive test in the sense that genes in the test set are compared to other genes (Goeman and Buhlmann, 2007). If the statistic is a genewise test statistic for differential expression, then geneSetTest tests whether genes in the set are more differentially expressed than genes not in the set. By contrast, a self-contained gene set test such as roast tests whether genes in the test set are differentially expressed, in an absolute sense, without regard to any other genes on the array. Because it is based on permuting genes, geneSetTest assumes that the different genes (or probes) are statistically independent. (Strictly speaking, it assumes that the genes in the set are no more correlated on average than randomly chosen genes.) If inter-gene correlations are present, then a statistically significant result from geneSetTest indicates either that the set is highly ranked or that the genes in the set are positively correlated on average (Wu and Smyth, 2012). Unless gene sets with positive correlations are particularly of interest, it may be advisable to use camera or cameraPR instead to adjust the test for inter-gene correlations. Inter-gene correlations are likely to be present in differential expression experiments with biologically heterogeneous experimental units. On the other hand, the assumption of independence between genes should hold when the replicates are purely technical, i.e., when there is no biological variability between the replicate arrays in each experimental condition. The statistics are usually a set of probe-wise statistics arising for some comparison from a microarray experiment. They may be t-statistics, meaning that the genewise null hypotheses would be rejected for large positive or negative values, or they may be F-statistics, meaning that only large values are significant. Any set of signed statistics, such as log-ratios, M-values or moderated t-statistics, are treated as t-like. Any set of unsigned statistics, such as F-statistics, posterior probabilities or chi-square tests are treated as F-like. If type="auto" then the statistics will be taken to be t-like if they take both positive and negative values and will be taken to be F-like if they are all of the same sign. There are four possible alternatives to test for. alternative=="up" means the genes in the set tend to be up-regulated, with positive t-statistics. alternative=="down" means the genes in the set tend to be down-regulated, with negative t-statistics. alternative=="either" means the set is either up or down-regulated as a whole. alternative=="mixed" test whether the genes in the set tend to be differentially expressed, without regard for direction. In this case, the test will be significant if the set contains mostly large test statistics, even if some are positive and some are negative. The latter three alternatives are appropriate when there is a prior expection that all the genes in the set will react in the same direction. The "mixed" alternative is appropriate if you know only that the genes are involved in the relevant pathways, possibly in different directions. The "mixed" is the only meaningful alternative with F-like statistics. The test statistic used for the gene-set-test is the mean of the statistics in the set. If ranks.only is TRUE the only the ranks of the statistics are used. In this case the p-value is obtained from a Wilcoxon test. If ranks.only is FALSE, then the p-value is obtained by simulation using nsim random sets of genes.

Value

numeric value giving the estimated p-value.

Examples

stat <- rnorm(100) sel <- 1:10; stat[sel] <- stat[sel]+1 wilcoxGST(sel,stat)

See also

cameraPR, camera, roast, barcodeplot, wilcox.test. There is a topic page on 10.GeneSetTests.

Note

Wu and Smyth (2012) show that geneSetTest does not does correct for inter-gene correlations and is more likely to assign small p-values to sets containing positive correlated genes. The function cameraPR is recommended as a alternative.

Author

Gordon Smyth and Di Wu

References

Wu, D, and Smyth, GK (2012). Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40(17), e133. 10.1093/nar/gks461 Goeman, JJ, and Buhlmann P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987. Michaud, J, Simpson, KM, Escher, R, Buchet-Poyau, K, Beissbarth, T, Carmichael, C, Ritchie, ME, Schutz, F, Cannon, P, Liu, M, Shen, X, Ito, Y, Raskind, WH, Horwitz, MS, Osato, M, Turner, DR, Speed, TP, Kavallaris, M, Smyth, GK, and Scott, HS (2008). Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9, 363. 10.1186/1471-2164-9-363

getEAWP

Extract Basic Data from Expression Data Objects

Bioconductor · 3.68.2 · limma/man/getEAWP.Rd · 2026-05-07

Given an expression data object of any known class, get the expression values, weights, probe annotation and A-values that are needed for linear modelling. This function is called by the linear modelling functions in LIMMA.

Aliases

getEAWP

Usage

getEAWP(object)

Arguments

object

any matrix-like object containing log-expression values. Can be an object of class MAList, EList, marrayNorm, PLMset, vsn, or any class inheriting from ExpressionSet, or any object that can be coerced to a numeric matrix.

Details

Rows correspond to probes and columns to RNA samples. In the case of two-color microarray data objects (MAList or marrayNorm), Amean is the vector of row means of the matrix of A-values. For other data objects, Amean is the vector of row means of the matrix of expression values. From April 2013, the rownames of the output exprs matrix are required to be unique. If object has no row names, then the output rownames of exprs are 1:nrow(object). If object has row names but with duplicated names, then the rownames of exprs are set to 1:nrow(object) and the original row names are preserved in the ID column of probes. object should be a normalized data object. getEAWP will return an error if object is a non-normalized data object such as RGList or EListRaw, because these do not contain log-expression values.

Value

A list with components exprsnumeric matrix of log-ratios, log-intensities or log-expression values weightsnumeric matrix of weights probesdata.frame of probe-annotation Ameannumeric vector of average log-expression for each probe exprs is the only required component. The other components will be NULL if not found in the input object.

See also

02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

getLayout

Extract the Print Layout of an Array from the GAL File

Bioconductor · 3.68.2 · limma/man/getlayout.Rd · 2026-05-07

From the Block, Row and Column information in a genelist, determine the number of grid rows and columns on the array and the number of spot rows and columns within each grid.

Aliases

getLayoutgetLayout2getDupSpacing

Keywords

Usage

getLayout(gal, guessdups=FALSE) getLayout2(galfile) getDupSpacing(ID)

Arguments

gal

data.frame containing the GAL, i.e., giving the position and gene identifier of each spot

galfile

name or path of GAL file

guessdups

logical, if TRUE then try to determine number and spacing of duplicate spots, i.e., within-array replicates

vector or factor of gene IDs

Details

A GenePix Array List (GAL) file is a list of genes and associated information produced by an Axon microarray scanner. The function getLayout determines the print layout from a data frame created from a GAL file or gene list. The data.frame must contain columns Block, Column and Row. (The number of tip columns is assumed to be either one or four.) On some arrays, each probe may be duplicated a number of times (ndups) at regular intervals (spacing) in the GAL file. getDupSpacing determines valid values for ndups and spacing from a vector of IDs. If guessdups=TRUE, then getLayout calls getDupSpacing. The function getLayout2 attempts to determine the print layout from the header information of an actual GAL file.

Value

A printlayout object, which is a list with the following components. The last two components are present only if guessdups=TRUE. ngrid.rinteger, number of grid rows on the arrays ngrid.cinteger, number of grid columns on the arrays nspot.rinteger, number of rows of spots in each grid nspot.cinteger, number of columns of spots in each grid ndupsinteger, number of times each probe is printed on the array spacinginteger, spacing between multiple printings of each probe

Examples

# gal <- readGAL() # layout <- getLayout(gal)

See also

An overview of LIMMA functions for reading data is given in 03.ReadingData.

Author

Gordon Smyth and James Wettenhall

getSpacing

Get Numerical Spacing

Bioconductor · 3.68.2 · limma/man/getSpacing.Rd · 2026-05-07

Convert character to numerical spacing measure for within-array replicate spots.

Aliases

getSpacing

Keywords

Usage

getSpacing(spacing, layout)

Arguments

spacing

character string or integer. Acceptable character strings are "columns", "rows", "subarrays" or "topbottom". Integer values are simply passed through.

layout

list containing printer layout information

Details

"rows" means that duplicate spots are printed side-by-side by rows. These will be recorded in consecutive rows in the data object. "columns" means that duplicate spots are printed side-by-sidy by columns. These will be separated in the data object by layout$nspot.r rows. "subarrays" means that a number of sub-arrays, with identical probes in the same arrangement, are printed on each array. The spacing therefore will be the size of a sub-array. "topbottom" is the same as "subarrays" when there are two sub-arrays.

Value

Integer giving spacing between replicate spots in the gene list.

Examples

getSpacing("columns",list(ngrid.r=2,ngrid.c=2,nspot.r=20,nspot.c=19)) getSpacing("rows",list(ngrid.r=2,ngrid.c=2,nspot.r=20,nspot.c=19)) getSpacing("topbottom",list(ngrid.r=2,ngrid.c=2,nspot.r=20,nspot.c=19))

See also

An overview of LIMMA functions for reading data is given in 03.ReadingData.

Author

Gordon Smyth

gls.series

Fit Linear Model to Microarray Data by Generalized Least Squares

Bioconductor · 3.68.2 · limma/man/gls.series.Rd · 2026-05-07

Fit a linear model genewise to expression data from a series of microarrays. The fit is by generalized least squares allowing for correlation between duplicate spots or related arrays. This is a utility function for lmFit.

Aliases

gls.series

Keywords

modelsregression

Usage

gls.series(M,design=NULL,ndups=2,spacing=1,block=NULL,correlation=NULL,weights=NULL,)

Arguments

numeric matrix containing log-ratio or log-expression values for a series of microarrays, rows correspond to genes and columns to arrays.

design

numeric design matrix defining the linear model, with rows corresponding to arrays and columns to comparisons to be estimated. The number of rows must match the number of columns of M. Defaults to the unit vector meaning that the arrays are treated as replicates.

ndups

positive integer giving the number of times each gene is printed on an array. nrow(M) must be divisible by ndups. Ignored if block is not NULL.

spacing

the spacing between the rows of M corresponding to duplicate spots, spacing=1 for consecutive spots. Ignored if block is not NULL.

block

vector or factor specifying a blocking variable on the arrays. Same length as ncol(M).

correlation

numeric value specifying the inter-duplicate or inter-block correlation.

weights

an optional numeric matrix of the same dimension as M containing weights for each spot. If it is of different dimension to M, it will be filled out to the same size.

other optional arguments to be passed to dupcor.series.

Details

This is a utility function used by the higher level function lmFit. Most users should not use this function directly but should use lmFit instead. This function is for fitting gene-wise linear models when some of the expression values are correlated. The correlated groups may arise from replicate spots on the same array (duplicate spots) or from a biological or technical replicate grouping of the arrays. This function is normally called by lmFit and is not normally called directly by users. Note that the correlation is assumed to be constant across genes. If correlation=NULL then a call is made to duplicateCorrelation to estimated the correlation.

Value

A list with components coefficientsnumeric matrix containing the estimated coefficients for each linear model. Same number of rows as M, same number of columns as design. stdev.unscalednumeric matrix conformal with coef containing the unscaled standard deviations for the coefficient estimators. The standard errors are given by stdev.unscaled * sigma. sigmanumeric vector containing the residual standard deviation for each gene. df.residualnumeric vector giving the degrees of freedom corresponding to sigma correlationinter-duplicate or inter-block correlation qrQR decomposition of the generalized linear squares problem, i.e., the decomposition of design standardized by the Choleski-root of the correlation matrix defined by correlation

See also

duplicateCorrelation. An overview of linear model functions in limma is given by 06.LinearModels.

Author

Gordon Smyth

goana

Gene Ontology or KEGG Pathway Analysis

Bioconductor · 3.68.2 · limma/man/goana.Rd · 2026-05-07

Test for over-representation of gene ontology (GO) terms or KEGG pathways in one or more sets of genes, optionally adjusting for abundance or gene length bias.

Aliases

goanagoana.defaultgoana.MArrayLMkeggakegga.defaultkegga.MArrayLMgetGeneKEGGLinksgetKEGGPathwayNames

Concepts

gene set tests

Usage

goanaMArrayLM(de, coef = ncol(de), geneid = rownames(de), FDR = 0.05, trend = FALSE, ) keggaMArrayLM(de, coef = ncol(de), geneid = rownames(de), FDR = 0.05, trend = FALSE, ) goanadefault(de, universe = NULL, species = "Hs", null.prob = NULL, covariate=NULL, plot=FALSE, ) keggadefault(de, universe = NULL, restrict.universe = FALSE, species = "Hs", species.KEGG = NULL, convert = FALSE, gene.pathway = NULL, pathway.names = NULL, null.prob = NULL, covariate=NULL, plot=FALSE, ) getGeneKEGGLinks(species.KEGG = "hsa", convert = FALSE) getKEGGPathwayNames(species.KEGG = NULL, remove.qualifier = FALSE)

Arguments

a character vector of Entrez Gene IDs, or a list of such vectors, or an MArrayLM fit object.

coef

column number or column name specifying for which coefficient or contrast differential expression should be assessed.

geneid

Entrez Gene identifiers. Either a vector of length nrow(de) or the name of the column of de$genes containing the Entrez Gene IDs.

FDR

false discovery rate cutoff for differentially expressed genes. Numeric value between 0 and 1.

species

character string specifying the species. Possible values include "Hs" (human), "Mm" (mouse), "Rn" (rat), "Dm" (fly) or "Pt" (chimpanzee), but other values are possible if the corresponding organism package is available. See alias2Symbol for other possible values. Ignored if species.KEGG or is not NULL or if gene.pathway and pathway.names are not NULL.

species.KEGG

three-letter KEGG species identifier. See https://www.kegg.jp/kegg/catalog/org_list.html or https://rest.kegg.jp/list/organism for possible values. Alternatively, if de contains KEGG ortholog Ids ("k00001" etc) instead of gene Ids, then set species.KEGG="ko". This argument is ignored if gene.pathway and pathway.names are both not NULL.

convert

if TRUE then KEGG gene identifiers will be converted to NCBI Entrez Gene identifiers. Note that KEGG IDs are the same as Entrez Gene IDs for most species anyway.

gene.pathway

data.frame linking genes to pathways. First column gives gene IDs, second column gives pathway IDs. By default this is obtained automatically by getGeneKEGGLinks(species.KEGG).

remove.qualifier

if TRUE, the species qualifier will be removed from the pathway names.

pathway.names

data.frame giving full names of pathways. First column gives pathway IDs, second column gives pathway names. By default this is obtained automatically using getKEGGPathwayNames(species.KEGG, remove=TRUE).

trend

adjust analysis for gene length or abundance? Can be logical, or a numeric vector of covariate values, or the name of the column of de$genes containing the covariate values. If TRUE, then de$Amean is used as the covariate.

universe

vector specifying the set of Entrez Gene identifiers to be the background universe. If NULL then all Entrez Gene IDs associated with any gene ontology term will be used as the universe.

restrict.universe

logical, should the universe be restricted to gene identifiers found in at least one pathway in gene.pathway?

null.prob

optional numeric vector of the same length as universe giving the null probability that each gene in the universe will appear in a gene set without enrichment. Will be computed from covariate if the latter is provided. Ignored if universe is NULL.

covariate

optional numeric vector of the same length as universe giving a covariate against which null.prob should be computed. Ignored if universe is NULL.

plot

logical, should the null.prob vs covariate trend be plotted?

any other arguments in a call to the MArrayLM methods are passed to the corresponding default method.

Details

These functions perform over-representation analyses for Gene Ontology terms or KEGG pathways. The default methods accept a gene set as a vector of Entrez Gene IDs or multiple gene sets as a list of such vectors. An over-represention analysis is then done for each set. The MArrayLM method extracts the gene sets automatically from a linear model fit object. The p-values returned by goana and kegga are unadjusted for multiple testing. The authors have chosen not to correct automatically for multiple testing because GO terms and KEGG pathways are often overlapping, so standard methods of p-value adjustment may be very conservative. Users should be aware though that p-values are unadjusted, meaning that only very small p-values should be used for published results. goana uses annotation from the appropriate Bioconductor organism package. The species can be any character string XX for which an organism package org.XX.eg.db is installed. Examples are "Hs" for human or "Mm" for mouse. See alias2Symbol for other possible values for species. kegga reads KEGG pathway annotation from the KEGG website. For kegga, the species name can be provided in either Bioconductor or KEGG format. Examples of KEGG format are "hsa" for human, "mmu" for mouse of "dme" for fly. kegga can be used for any species supported by KEGG, of which there are more than 14,000 possibilities. By default, kegga obtains the KEGG annotation for the specified species from the https://rest.kegg.jp website using getGeneKEGGLinks and getKEGGPathwayNames. Alternatively one can supply the required pathway annotation to kegga in the form of two data.frames. If this is done, then an internet connection is not required. The gene ID system used by kegga for each species is determined by KEGG. For human and mouse, the default (and only choice) is Entrez Gene ID. For Drosophila, the default is FlyBase CG annotation symbol. The format of the IDs can be seen by typing head(getGeneKEGGLinks(species)), for examplehead(getGeneKEGGLinks("hsa")) or head(getGeneKEGGLinks("dme")). Entrez Gene IDs can always be used. If Entrez Gene IDs are not the default, then conversion can be done by specifying "convert=TRUE". Another possibility is to use KEGG orthology IDs as the gene IDs, and these can be used for any species. In that case, set species.KEGG="ko". The ability to supply data.frame annotation to kegga means that kegga can in principle be used in conjunction with any user-supplied set of annotation terms. The default goana and kegga methods accept a vector null.prob giving the prior probability that each gene in the universe appears in a gene set. This vector can be used to correct for unwanted trends in the differential expression analysis associated with gene length, gene abundance or any other covariate (Young et al, 2010). The MArrayLM object computes the null.prob vector automatically when trend is non-NULL. If null.prob=NULL, the function computes one-sided hypergeometric tests equivalent to Fisher's exact test. If prior probabilities are specified, then a test based on the Wallenius' noncentral hypergeometric distribution is used to adjust for the relative probability that each gene will appear in a gene set, following the approach of Young et al (2010). The MArrayLM methods performs over-representation analyses for the up and down differentially expressed genes from a linear model analysis. In this case, the universe is all the genes found in the fit object. trend=FALSE is equivalent to null.prob=NULL. If trend=TRUE or a covariate is supplied, then a trend is fitted to the differential expression results and this is used to set null.prob. The statistical approach provided here is the same as that provided by the goseq package, with one methodological difference and a few restrictions. Unlike the goseq package, the gene identifiers here must be Entrez Gene IDs and the user is assumed to be able to supply gene lengths if necessary. The goseq package has additional functionality to convert gene identifiers and to provide gene lengths. The only methodological difference is that goana and kegga computes gene length or abundance bias using tricubeMovingAverage instead of monotonic regression. While tricubeMovingAverage does not enforce monotonicity, it has the advantage of numerical stability when de contains only a small number of genes. The trend is estimated by the goanaTrend function.

Value

The goana default method produces a data frame with a row for each GO term and the following columns: TermGO term. Ontontology that the GO term belongs to. Possible values are "BP", "CC" and "MF". Nnumber of genes in the GO term. DEnumber of genes in the DE set. P.DEp-value for over-representation of the GO term in the set. The last two column names above assume one gene set with the name DE. In general, there will be a pair of such columns for each gene set and the name of the set will appear in place of "DE". The goana method for MArrayLM objects produces a data frame with a row for each GO term and the following columns: TermGO term. Ontontology that the GO term belongs to. Possible values are "BP", "CC" and "MF". Nnumber of genes in the GO term. Upnumber of up-regulated differentially expressed genes. Downnumber of down-regulated differentially expressed genes. P.Upp-value for over-representation of GO term in up-regulated genes. P.Downp-value for over-representation of GO term in down-regulated genes. The row names of the data frame give the GO term IDs. The output from kegga is the same except that row names become KEGG pathway IDs, Term becomes Pathway and there is no Ont column.

Examples

## Linear model usage: fit <- lmFit(y, design) fit <- eBayes(fit) # Standard GO analysis go.fisher <- goana(fit, species="Hs") topGO(go.fisher, sort = "up") topGO(go.fisher, sort = "down") # GO analysis adjusting for gene abundance go.abund <- goana(fit, geneid = "GeneID", trend = TRUE) topGO(go.abund, sort = "up") topGO(go.abund, sort = "down") # GO analysis adjusting for gene length bias # (assuming that y$genes$Length contains gene lengths) go.len <- goana(fit, geneid = "GeneID", trend = "Length") topGO(go.len, sort = "up") topGO(go.len, sort = "down") ## Default usage with a list of gene sets: go.de <- goana(list(DE1 = EG.DE1, DE2 = EG.DE2, DE3 = EG.DE3)) topGO(go.de, sort = "DE1") topGO(go.de, sort = "DE2") topGO(go.de, ontology = "BP", sort = "DE3") topGO(go.de, ontology = "CC", sort = "DE3") topGO(go.de, ontology = "MF", sort = "DE3") ## Standard KEGG analysis k <- kegga(fit, species="Hs") k <- kegga(fit, species.KEGG="hsa") # equivalent to previous topKEGG(k, sort = "up") topKEGG(k, sort = "down")

See also

topGO, topKEGG, goana The goseq package provides an alternative implementation of methods from Young et al (2010). Unlike the limma functions documented here, goseq will work with a variety of gene identifiers and includes a database of gene length information for various species. The gostats package also does GO analyses without adjustment for bias but with some other options. See 10.GeneSetTests for a description of other functions used for gene set testing.

Note

kegga requires an internet connection unless gene.pathway and pathway.names are both supplied. The default for kegga with species="Dm" changed from convert=TRUE to convert=FALSE in limma 3.27.8. Users wanting to use Entrez Gene IDs for Drosophila should set convert=TRUE, otherwise fly-base CG annotation symbol IDs are assumed (for example "Dme1_CG4637"). The default for restrict.universe=TRUE in kegga changed from TRUE to FALSE in limma 3.33.4. Bug fix: results from kegga with trend=TRUE or with non-NULL covariate were incorrect prior to limma 3.32.3. The results were biased towards significant Down p-values and against significant Up p-values.

Author

Gordon Smyth and Yifang Hu

References

Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14. 10.1186/gb-2010-11-2-r14

goanaTrend

Estimate DE Trend for Gene Ontology or KEGG Pathway Analysis

Bioconductor · 3.68.2 · limma/man/goanaTrend.Rd · 2026-05-07

Given a list of differentially expressed (DE) genes and a covariate, estimate the probability of a gene being called significant as a function of the covariate. This function is typically used to estimate the gene length or gene abundance bias for a pathway analysis.

Aliases

goanaTrend

Concepts

gene set tests

Usage

goanaTrend(index.de, covariate, n.prior = 10, plot = FALSE, xlab = "Covariate Rank", ylab = "Probability gene is DE", main="DE status vs covariate")

Arguments

index.de

an index vector specifying which genes are significantly DE. Can be a vector of integer indices, or a logical vector of length nrow(covariate), or any vector such as covariate[index] selects the DE genes.

covariate

numeric vector, length equal to the number of genes in the analysis. Usually equal to gene length or average log-expression but can be any meaningful genewise covariate.

n.prior

prior number of genes using for moderating the trend towards constancy, for stability when the number of DE genes is small.

plot

if TRUE, plot the estimated tend.

xlab

label for x-axis of plot.

ylab

label for y-axis of plot.

main

main title for the plot.

Details

goanaTrend is called by goana and kegga when the trend argument is used to correct for unwanted trends in the differential expression analysis associated with gene length, gene abundance or any other covariate (Young et al, 2010). This function is analogous to the nullp function of the goseq package but the trend is estimated using tricubeMovingAverage instead of by monotonic regression. While tricubeMovingAverage does not enforce strict monotonicity, it has the advantage of numerical stability and statistical robustness when there are only a small number of DE genes. This function also moderates the estimated trend slightly towards constancy to provide more stability. The degree of moderation is determined by the n.prior argument relative to the number of DE genes.

Value

Numeric vector of same length as covariate giving estimated probabilities.

Examples

x <- runif(100) i <- 1:10 goanaTrend(i, x, plot=TRUE)

See also

goana, kegga See 10.GeneSetTests for a description of other functions used for gene set testing.

Author

Gordon Smyth and Yifang Hu

References

Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010). Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11, R14. 10.1186/gb-2010-11-2-r14

gridr

Row and Column Positions on Microarray

Bioconductor · 3.68.2 · limma/man/gridspotrc.Rd · 2026-05-07

Grid and spot row and column positions.

Aliases

gridrgridcspotrspotc

Keywords

Usage

gridr(layout) gridc(layout) spotr(layout) spotc(layout)

Arguments

layout

list with the components ngrid.r, ngrid.c, nspot.r and nspot.c

Value

Vector of length prod(unlist(layout)) giving the grid rows (gridr), grid columns (gridc), spot rows (spotr) or spot columns (spotc).

Author

Gordon Smyth

head

Return the First to Last Part of a Data Object

Bioconductor · 3.68.2 · limma/man/head.Rd · 2026-05-07

Retrieve the first or last parts of an RGList, MAList, EListRaw, EList, MArrayLM or TestResults object.

Aliases

head.RGListhead.MAListhead.EListRawhead.EListhead.MArrayLMhead.TestResultstail.RGListtail.MAListtail.EListRawtail.EListtail.MArrayLMtail.TestResults

Keywords

array

Usage

headEList(x, n = 6L, ) tailEList(x, n = 6L, )

Arguments

an object of class RGList, MAList, EListRaw, EList, MArrayLM or TestResults.

a single integer. If positive or zero, number rows of resulting object. If negative, all but the n last/first rows of x.

other arguments are not currently used.

Details

head (tail) returns the first (last) n rows when n >= 0 or all but the last (first) n rows when n < 0.

Value

An object like x but generally with fewer rows.

Examples

E <- matrix(rnorm(40),20,2) rownames(E) <- paste0("Gene",1:20) colnames(E) <- c("A","B") y <- new("EList",list(E=E)) head(y) tail(y)

See also

head in the utils package. 02.Classes gives an overview of data classes used in LIMMA.

Author

Gordon Smyth

heatdiagram

Stemmed Heat Diagram

Bioconductor · 3.68.2 · limma/man/heatdiagram.Rd · 2026-05-07

Creates a heat diagram showing the co-regulation of genes under one condition with a range of other conditions.

Aliases

heatdiagramheatDiagram

Concepts

plots

Usage

heatDiagram(results, coef, primary=1, names=NULL, treatments=colnames(coef), limit=NULL, orientation="landscape", low="green", high="red", cex=1, mar=NULL, ncolors=123, ) heatdiagram(stat, coef, primary=1, names=NULL, treatments=colnames(stat), critical.primary=4, critical.other=3, limit=NULL, orientation="landscape", low="green", high="red", cex=1, mar=NULL, ncolors=123, )

Arguments

results

TestResults matrix, containing elements -1, 0 or 1, from decideTests

stat

numeric matrix of test statistics. Rows correspond to genes and columns to treatments or contrasts between treatments.

coef

numeric matrix of the same size as stat. Holds the coefficients to be displayed in the plot.

primary

number or name of the column to be compared to the others. Genes are included in the diagram according to this column of stat and are sorted according to this column of coef. If primary is a name, then stat and coef must have the same column names.

names

optional character vector of gene names

treatments

optional character vector of treatment names

critical.primary

critical value above which the test statistics for the primary column are considered significant and included in the plot

critical.other

critical value above which the other test statistics are considered significant. Should usually be no larger than critical.primary although larger values are permitted.

limit

optional value for coef above which values will be plotted in extreme color. Defaults to max(abs(coef)).

orientation

"portrait" for upright plot or "landscape" for plot orientated to be wider than high. "portrait" is likely to be appropriate for inclusion in printed document while "landscape" may be appropriate for a presentation on a computer screen.

low

color associated with repressed gene regulation

high

color associated with induced gene regulation

ncolors

number of distinct colors used for each of up and down regulation

cex

factor to increase or decrease size of column and row text

mar

numeric vector of length four giving the size of the margin widths. Default is cex*c(5,6,1,1) for landscape and cex*c(1,1,4,3) for portrait.

any other arguments will be passed to the image function

Details

Users are encouraged to use heatDiagram rather than heatdiagram as the later function may be removed in future versions of limma. This function plots an image of gene expression profiles in which rows (or columns for portrait orientation) correspond to treatment conditions and columns (or rows) correspond to genes. Only genes which are significantly differentially expressed in the primary condition are included. Genes are sorted by differential expression under the primary condition. Note: the plot produced by this function is unique to the limma package. It should not be confused with "heatmaps" often used to display results from cluster analyses.

Value

An image is created on the current graphics device. A matrix with named rows containing the coefficients used in the plot is also invisibly returned.

Examples

MA <- normalizeWithinArrays(RG) design <- cbind(c(1,1,1,0,0,0),c(0,0,0,1,1,1)) fit <- lmFit(MA,design=design) contrasts.mouse <- cbind(Control=c(1,0),Mutant=c(0,1),Difference=c(-1,1)) fit <- eBayes(contrasts.fit(fit,contrasts=contrasts.mouse)) results <- decideTests(fit,method="global",p=0.1) heatDiagram(results,fit$coef,primary="Difference")

See also

image.

Author

Gordon Smyth

helpMethods

Prompt for Method Help Topics

Bioconductor · 3.68.2 · limma/man/helpMethods.Rd · 2026-05-07

For any S4 generic function, find all methods defined in currently loaded packages. Prompt the user to choose one of these to display the help document.

Aliases

helpMethods

Keywords

methods

Usage

helpMethods(genericFunction)

Arguments

genericFunction

a generic function or a character string giving the name of a generic function

Examples

helpMethods(show)

See also

[methods]showMethods

Author

Gordon Smyth

ids2indices

Convert Gene Identifiers to Indices for Gene Sets

Bioconductor · 3.68.2 · limma/man/ids2indices.Rd · 2026-05-07

Make a list of gene identifiers into a list of indices for gene sets.

Aliases

ids2indices

Concepts

gene set tests

Usage

ids2indices(gene.sets, identifiers, remove.empty=TRUE)

Arguments

gene.sets

list of character vectors, each vector containing the gene identifiers for a set of genes.

identifiers

character vector of gene identifiers.

remove.empty

logical, should sets of size zero be removed from the output?

Details

This function used to create input for romer, mroast and camera function. Typically, identifiers is the vector of Entrez Gene IDs, and gene.sets is obtained constructed from a database of gene sets, for example a representation of the Molecular Signatures Database (MSigDB) downloaded from https://bioinf.wehi.edu.au/software/MSigDB/.

Value

list of integer vectors, each vector containing the indices of a gene set in the vector identifiers.

Examples

download.file("https://bioinf.wehi.edu.au/software/MSigDB/human_c2_v5p2.rdata", "human_c2_v5p2.rdata", mode = "wb") load("human_c2_v5p2.rdata") c2.indices <- ids2indices(Hs.c2, y$genes$GeneID) camera(y, c2.indices, design)

See also

romer, mroast, camera There is a topic page on 10.GeneSetTests.

Author

Gordon Smyth and Yifang Hu

imageplot

Image Plot of Microarray Statistics

Bioconductor · 3.68.2 · limma/man/imageplot.Rd · 2026-05-07

Creates an image of colors or shades of gray that represent the values of a statistic for each spot on a spotted microarray. This function can be used to explore any spatial effects across the microarray.

Aliases

imageplot

Keywords

hplot

Usage

imageplot(z, layout, low = NULL, high = NULL, ncolors = 123, zerocenter = NULL, zlim = NULL, mar=c(2,1,1,1), legend=TRUE, )

Arguments

numeric vector or array. This vector can contain any spot statistics, such as log intensity ratios, spot sizes or shapes, or t-statistics. Missing values are allowed and will result in blank spots on the image. Infinite values are not allowed.

layout

a list specifying the dimensions of the spot matrix and the grid matrix.

low

color associated with low values of z. May be specified as a character string such as "green", "white" etc, or as a rgb vector in which c(1,0,0) is red, c(0,1,0) is green and c(0,0,1) is blue. The default value is "green" if zerocenter=T or "white" if zerocenter=F.

high

color associated with high values of z. The default value is "red" if zerocenter=T or "blue" if zerocenter=F.

ncolors

number of color shades used in the image including low and high.

zerocenter

should zero values of z correspond to a shade exactly halfway between the colors low and high? The default is TRUE if z takes positive and negative values, otherwise FALSE.

zlim

numerical vector of length 2 giving the extreme values of z to associate with colors low and high. By default zlim is the range of z. Any values of z outside the interval zlim will be truncated to the relevant limit.

mar

numeric vector of length 4 specifying the width of the margin around the plot. This argument is passed to [graphics]par.

legend

logical, if TRUE the range of z and zlim is shown in the bottom margin

any other arguments will be passed to the function image

Details

This function may be used to plot the values of any spot-specific statistic, such as the log intensity ratio, background intensity or a quality measure such as spot size or shape. The image follows the layout of an actual microarray slide with the bottom left corner representing the spot (1,1,1,1). The color range is used to represent the range of values for the statistic. When this function is used to plot the red/green log-ratios, it is intended to be an in silico version of the classic false-colored red-yellow-green image of a scanned two-color microarray. This function is related to the earlier plot.spatial function in the sma package and to the later maImage function in the marray package. It differs from plot.spatial most noticeably in that all the spots are plotted and the image is plotted from bottom left rather than from top left. It is intended to display spatial patterns and artefacts rather than to highlight only the extreme values as does plot.spatial. It differs from maImage in that any statistic may be plotted and in its use of a red-yellow-green color scheme for log-ratios, similar to the classic false-colored jpeg image, rather than the red-black-green color scheme associated with heat maps.

Value

An plot is created on the current graphics device.

Examples

M <- rnorm(8*4*16*16) imageplot(M,layout=list(ngrid.r=8,ngrid.c=4,nspot.r=16,nspot.c=16))

See also

maImage in the marray package, image in the graphics package. An overview of diagnostic functions available in LIMMA is given in 09.Diagnostics.

Author

Gordon Smyth

imageplot3by2

Write Imageplots to Files

Bioconductor · 3.68.2 · limma/man/imageplot3by2.Rd · 2026-05-07

Write imageplots to files in PNG format, six plots to a file in a 3 by 2 grid arrangement.

Aliases

imageplot3by2

Keywords

hplot

Usage

imageplot3by2(RG, z="Gb", prefix=paste("image",z,sep="-"), path=NULL, zlim=NULL, common.lim=TRUE, )

Arguments

an RGList or MAList object, or any list with component named by z

character string giving name of component of RG to plot

prefix

character string giving prefix to attach to file names

path

character string specifying directory for output files

zlim

numeric vector of length 2, giving limits of response vector to be associated with saturated colors

common.lim

logical, should all plots on a page use the same axis limits

any other arguments are passed to imageplot

Details

At the time of writing, this function writes plots in PNG format in an arrangement optimized for A4-sized paper.

Value

No value is returned, but one or more files are written to the working directory. The number of files is determined by the number of columns of RG.

See also

An overview of diagnostic functions available in LIMMA is given in 09.Diagnostics.

Author

Gordon Smyth

intraspotCorrelation

Intra-Spot Correlation for Two Color Data

Bioconductor · 3.68.2 · limma/man/intraspotCorrelation.Rd · 2026-05-07

Estimate the within-block correlation associated with spots for spotted two color microarray data.

Aliases

intraspotCorrelation

Keywords

multivariate

Usage

intraspotCorrelation(object, design, trim=0.15)

Arguments

object

an [limma:malist]MAList object or a list from which M and A values may be extracted

design

a numeric matrix containing the design matrix for linear model in terms of the individual channels. The number of rows should be twice the number of arrays. The number of columns will determine the number of coefficients estimated for each gene.

trim

the fraction of observations to be trimmed from each end of the atanh-correlations when computing the consensus correlation. See mean.

Details

This function estimates the correlation between two channels observed on each spot. The correlation is estimated by fitting a heteroscedastic regression model to the M and A-values of each gene. The function also returns a consensus correlation, which is a robust average of the individual correlations, which can be used as input for functions lmscFit. The function may take long time to execute.

Value

A list with components consensus.correlationrobust average of the estimated inter-duplicate correlations. The average is the trimmed mean of the correlations for individual genes on the atanh-transformed scale. atanh.correlationsa numeric vector giving the individual genewise correlations on the atanh scale dfnumeric matrix of degrees of freedom associated with the correlations. The first column gives the degrees of freedom for estimating the within-spot or M-value mean square while the second gives the degrees of freedom for estimating the between spot or A-value mean square.

Examples

# See lmscFit corfit <- intraspotCorrelation(MA, design) all.correlations <- tanh(corfit$atanh.correlations) boxplot(all.correlations)

See also

This function uses [statmod:remlscor]remlscore from the statmod package. An overview of methods for single channel analysis in limma is given by 07.SingleChannel.

Author

Gordon Smyth

References

Smyth, G. K. (2005). Individual channel analysis of two-colour microarray data. Proceedings of the 55th Session of the International Statistics Institute, 5-12 April 2005, Sydney, Australia, Paper 116. https://gksmyth.github.io/pubs/ISI2005-116.pdf

is.fullrank

Check for Full Column Rank

Bioconductor · 3.68.2 · limma/man/isfullrank.Rd · 2026-05-07

Test whether a numeric matrix has full column rank.

Aliases

is.fullranknonEstimable

Keywords

algebra

Usage

is.fullrank(x) nonEstimable(x)

Arguments

a numeric matrix or vector

Details

is.fullrank is used to check the integrity of design matrices in limma, for example after [limma:subsetting]subsetting operations. nonEstimable is used by lmFit to report which coefficients in a linear model cannot be estimated.

Value

is.fullrank returns TRUE or FALSE. nonEstimable returns a character vector of names for the columns of x which are linearly dependent on previous columns. If x has full column rank, then the value is NULL.

Examples

# TRUE is.fullrank(1) is.fullrank(cbind(1,0:1)) # FALSE is.fullrank(0) is.fullrank(matrix(1,2,2)) nonEstimable(matrix(1,2,2))

Author

Gordon Smyth

isNumeric

Test for Numeric Argument

Bioconductor · 3.68.2 · limma/man/isnumeric.Rd · 2026-05-07

Test whether argument is numeric or a data.frame with numeric columns.

Aliases

isNumeric

Keywords

programming

Usage

isNumeric(x)

Arguments

any object

Details

This function is used to check the validity of arguments for numeric functions. It is an attempt to emulate the behavior of internal generic math functions. isNumeric differs from is.numeric in that data.frames with all columns numeric are accepted as numeric.

Value

TRUE or FALSE

Examples

isNumeric(3) isNumeric("a") x <- data.frame(a=c(1,1),b=c(0,1)) isNumeric(x) # TRUE is.numeric(x) # FALSE

See also

[base:numeric]is.numeric, [base:groupGeneric]Math

Author

Gordon Smyth

kooperberg

Kooperberg Model-Based Background Correction for GenePix data

Bioconductor · 3.68.2 · limma/man/kooperberg.Rd · 2026-05-07

This function uses a Bayesian model to background correct GenePix microarray data.

Aliases

kooperberg

Keywords

background correction

Usage

kooperberg(RG, a = TRUE, layout = RG$printer, verbose = TRUE)

Arguments

an RGList of GenePix data, read in using read.maimages, with other.columns=c("F635 SD","B635 SD","F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels").

logical. If TRUE, the 'a' parameters in the model (equation 3 and 4) are estimated for each slide. If FALSE the 'a' parameters are set to unity.

layout

list containing print layout with components ngrid.r, ngrid.c, nspot.r and nspot.c. Defaults to RG$printer.

verbose

logical. If TRUE, progress is reported to standard output.

Details

This function is for use with GenePix data and is designed to cope with the problem of large numbers of negative intensities and hence missing values on the log-intensity scale. It avoids missing values in most cases and at the same time dampens down the variability of log-ratios for low intensity spots. See Kooperberg et al (2002) for more details. kooperberg uses the foreground and background intensities, standard deviations and number of pixels to compute empirical estimates of the model parameters as described in equation 2 of Kooperberg et al (2002).

Value

An RGList containing the components Rmatrix containing the background adjusted intensities for the red channel for each spot for each array Gmatrix containing the background adjusted intensities for the green channel for each spot for each array printerlist containing print layout

Examples

# This is example code for reading and background correcting GenePix data # given GenePix Results (gpr) files in the working directory (data not # provided). # get the names of the GenePix image analysis output files in the current directory genepixFiles <- dir(pattern="*\\\\.gpr$") RG <- read.maimages(genepixFiles, source="genepix", other.columns=c("F635 SD","B635 SD", "F532 SD","B532 SD","B532 Mean","B635 Mean","F Pixels","B Pixels")) RGmodel <- kooperberg(RG) MA <- normalizeWithinArrays(RGmodel)

See also

04.Background gives an overview of background correction functions defined in the LIMMA package.

Author

Matthew Ritchie

References

Kooperberg, C., Fazzio, T. G., Delrow, J. J., and Tsukiyama, T. (2002) Improved background correction for spotted DNA microarrays. Journal of Computational Biology 9, 55-66. Ritchie, M. E., Silver, J., Oshlack, A., Silver, J., Holmes, M., Diyagama, D., Holloway, A., and Smyth, G. K. (2007). A comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700-2707. 10.1093/bioinformatics/btm412

limmaUsersGuide

View Limma User's Guide

Bioconductor · 3.68.2 · limma/man/limmaUsersGuide.Rd · 2026-05-07

Finds the location of the Limma User's Guide and optionally opens it.

Aliases

limmaUsersGuide

Keywords

documentation

Usage

limmaUsersGuide(view=TRUE)

Arguments

view

logical, should the document be opened using the default PDF document reader?

Details

The function vignette("limma") will find the short limma Vignette which describes how to obtain the Limma User's Guide. The User's Guide is not itself a true vignette because it is not automatically generated using Sweave during the package build process. This means that it cannot be found using vignette, hence the need for this special function. If the operating system is other than Windows, then the PDF viewer used is that given by Sys.getenv("R_PDFVIEWER"). The PDF viewer can be changed using Sys.putenv(R_PDFVIEWER=). This function is used by drop-down Vignettes menu when the Rgui interface for Windows is used.

Value

Character string giving the file location.

Examples

limmaUsersGuide(view=FALSE)

See also

vignette, [Biobase]openPDF, [Biobase]openVignette, Sys.getenv, Sys.putenv

Author

Gordon Smyth

버전 이력

Repository	Version	First seen	Last seen
Bioconductor	3.69.2	2026-06-29	2026-07-24
Bioconductor	3.69.1	2026-05-31	2026-06-03
Bioconductor	3.69.0	2026-05-20	2026-05-22
Bioconductor	3.68.4	2026-07-01	2026-07-24
Bioconductor	3.68.3	2026-05-31	2026-06-02
Bioconductor	3.68.2	2026-05-21	2026-05-22
Bioconductor	3.68.1	2026-05-07	2026-05-07

보안

표시할 OSV 데이터가 없습니다.

문헌 신호

표시할 OpenAlex 데이터가 없습니다.