bioregion

R 패키지 메타데이터와 수집 신호를 모아 봅니다.

Packages / CRAN / bioregion

bioregion

v1.4.0

bioregion

Repository CRANLicense GPL-3Lifecycle activeNeeds compilation yes

Package page Reference Source DOI

DOI

10.32614/CRAN.package.bioregion

Core Signals

첫 화면에서 판단해야 할 수집 신호를 먼저 배치합니다.

표시할 핵심 신호가 없습니다.

Supported Backends

DESCRIPTION에서 감지한 backend 관련 package입니다.

data.table

Imports

Quick Facts

기본 메타데이터를 작은 카드와 토큰으로 압축합니다.

profile

Repository

CRAN

Version

1.4.0

License

GPL-3

Lifecycle

active

Needs compilation

yes

Last observed

2026-07-10

CRAN: cran.r-project.org/package=bioregion

Build fields

LinkingTo

Rcpp

수집 소스별 패키지 정보

1개 소스

CRAN

1.4.0

2026-07-10

License

GPL-3

Depends

R (>= 4.0.0)

Imports

ape, apcluster, bipartite, cluster, data.table, dbscan, dynamicTreeCut, fastcluster, fastkmedoids, ggplot2, grDevices, httr, igraph, mathjaxr, Matrix, phangorn, rcartocolor, Rdpack, rlang, rmarkdown, segmented, sf, stats, tidyr, utils

Suggests

ade4, adespatial, betapart, dplyr, ecodist, knitr, microbenchmark, rnaturalearth, rnaturalearthdata, terra, testthat (>= 3.0.0), vegan

LinkingTo

Rcpp

Needs compilation

yes

Lifecycle

active

Last observed

2026-07-10 02:43:42

이 패키지가 의존하는 패키지

5개 표시전체 38개

Package	Type	Spec
apcluster CRAN · 1.4.0 · 2026-07-24	Imports	apcluster
ape CRAN · 1.4.0 · 2026-07-24	Imports	ape
bipartite CRAN · 1.4.0 · 2026-07-24	Imports	bipartite
cluster CRAN · 1.4.0 · 2026-07-24	Imports	cluster
data.table CRAN · 1.4.0 · 2026-07-24	Imports	data.table
dbscan CRAN · 1.4.0 · 2026-07-24	Imports	dbscan
dynamicTreeCut CRAN · 1.4.0 · 2026-07-24	Imports	dynamicTreeCut
fastcluster CRAN · 1.4.0 · 2026-07-24	Imports	fastcluster
fastkmedoids CRAN · 1.4.0 · 2026-07-24	Imports	fastkmedoids
ggplot2 CRAN · 1.4.0 · 2026-07-24	Imports	ggplot2
grDevices CRAN · 1.4.0 · 2026-07-24	Imports	grDevices
httr CRAN · 1.4.0 · 2026-07-24	Imports	httr
igraph CRAN · 1.4.0 · 2026-07-24	Imports	igraph
mathjaxr CRAN · 1.4.0 · 2026-07-24	Imports	mathjaxr
Matrix CRAN · 1.4.0 · 2026-07-24	Imports	Matrix
phangorn CRAN · 1.4.0 · 2026-07-24	Imports	phangorn
rcartocolor CRAN · 1.4.0 · 2026-07-24	Imports	rcartocolor
Rdpack CRAN · 1.4.0 · 2026-07-24	Imports	Rdpack
rlang CRAN · 1.4.0 · 2026-07-24	Imports	rlang
rmarkdown CRAN · 1.4.0 · 2026-07-24	Imports	rmarkdown
segmented CRAN · 1.4.0 · 2026-07-24	Imports	segmented
sf CRAN · 1.4.0 · 2026-07-24	Imports	sf
stats CRAN · 1.4.0 · 2026-07-24	Imports	stats
tidyr CRAN · 1.4.0 · 2026-07-24	Imports	tidyr
utils CRAN · 1.4.0 · 2026-07-24	Imports	utils
Rcpp CRAN · 1.4.0 · 2026-07-24	LinkingTo	Rcpp
ade4 CRAN · 1.4.0 · 2026-07-24	Suggests	ade4
adespatial CRAN · 1.4.0 · 2026-07-24	Suggests	adespatial
betapart CRAN · 1.4.0 · 2026-07-24	Suggests	betapart
dplyr CRAN · 1.4.0 · 2026-07-24	Suggests	dplyr
ecodist CRAN · 1.4.0 · 2026-07-24	Suggests	ecodist
knitr CRAN · 1.4.0 · 2026-07-24	Suggests	knitr
microbenchmark CRAN · 1.4.0 · 2026-07-24	Suggests	microbenchmark
rnaturalearth CRAN · 1.4.0 · 2026-07-24	Suggests	rnaturalearth
rnaturalearthdata CRAN · 1.4.0 · 2026-07-24	Suggests	rnaturalearthdata
terra CRAN · 1.4.0 · 2026-07-24	Suggests	terra
testthat CRAN · 1.4.0 · 2026-07-24	Suggests	testthat (>= 3.0.0)
vegan CRAN · 1.4.0 · 2026-07-24	Suggests	vegan
검색 결과가 없습니다.

1 / 8

이 패키지를 쓰는 패키지

0개 표시전체 0개

Package	Type	Spec
표시할 dependency edge가 없습니다.
검색 결과가 없습니다.

1 / 1

패키지 페이지

All links

Repository: CRAN
Version: 1.4.0
Collected: 2026-07-23 02:06:32
Package page: https://cran.r-project.org/web/packages/bioregion/index.html
DOI: 10.32614/CRAN.package.bioregion
Citation: https://cran.r-project.org/web/packages/bioregion/citation.html
CRAN checks: https://cran.r-project.org/web/checks/check_results_bioregion.html
Reference HTML: https://cran.r-project.org/web/packages/bioregion/refman/bioregion.html
Reference PDF: https://cran.r-project.org/web/packages/bioregion/bioregion.pdf
Source package: https://cran.r-project.org/src/contrib/bioregion_1.4.0.tar.gz
Archive: https://CRAN.R-project.org/src/contrib/Archive/bioregion

Page fields

Author: Maxime Lenormand [aut, cre], Boris Leroy [aut], Pierre Denelle [aut]
BugReports: https://github.com/bioRgeo/bioregion/issues
CRAN Checks: bioregion results
Citation: bioregion citation info
DOI: 10.32614/CRAN.package.bioregion
License: GPL-3
LinkingTo: Rcpp
Maintainer: Maxime Lenormand <maxime.lenormand at inrae.fr>
NeedsCompilation: yes
Old Sources: bioregion archive
Package Source: bioregion_1.4.0.tar.gz
Published: 2026-03-29
Reference Manual: bioregion.html , bioregion.pdf
URL: https://github.com/bioRgeo/bioregion , https://bioRgeo.github.io/bioregion/
Version: 1.4.0
Vignettes: Tutorial for bioregion ( source , R code )
Windows Binaries: r-devel: bioregion_1.4.0.zip , r-release: bioregion_1.4.0.zip , r-oldrel: bioregion_1.4.0.zip
MacOS Binaries: r-release (arm64): bioregion_1.4.0.tgz , r-oldrel (arm64): bioregion_1.4.0.tgz , r-release (x86_64): bioregion_1.4.0.tgz , r-oldrel (x86_64): bioregion_1.4.0.tgz

Version

1.4.0

LinkingTo

Rcpp

Published

2026-03-29

DOI

10.32614/CRAN.package.bioregion

Author

Maxime Lenormand [aut, cre], Boris Leroy [aut], Pierre Denelle [aut]

https://orcid.org/0000-0001-6362-3473 https://orcid.org/0000-0002-7686-4302 https://orcid.org/0000-0001-5037-2281

Maintainer

Maxime Lenormand <maxime.lenormand at inrae.fr>

BugReports

https://github.com/bioRgeo/bioregion/issues

License

GPL-3

URL

https://github.com/bioRgeo/bioregion , https://bioRgeo.github.io/bioregion/

https://github.com/bioRgeo/bioregion https://bioRgeo.github.io/bioregion/

NeedsCompilation

yes

Citation

bioregion citation info

CRAN Checks

bioregion results

Reference Manual

bioregion.html , bioregion.pdf

bioregion.html bioregion.pdf

Vignettes

Tutorial for bioregion ( source , R code )

Tutorial for bioregion source R code

Package Source

bioregion_1.4.0.tar.gz

Windows Binaries

r-devel: bioregion_1.4.0.zip , r-release: bioregion_1.4.0.zip , r-oldrel: bioregion_1.4.0.zip

bioregion_1.4.0.zip bioregion_1.4.0.zip bioregion_1.4.0.zip

MacOS Binaries

r-release (arm64): bioregion_1.4.0.tgz , r-oldrel (arm64): bioregion_1.4.0.tgz , r-release (x86_64): bioregion_1.4.0.tgz , r-oldrel (x86_64): bioregion_1.4.0.tgz

bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz

Old Sources

bioregion archive

Page sections 3

Documentation

Heading

Documentation

Links

[{"label":"bioregion.html","section":"","type":"","url":"https://cran.r-project.org/web/packages/bioregion/refman/bioregion.html"},{"label":"bioregion.pdf","section":"","type":"","url":"https://cran.r-project.org/web/packages/bioregion/bioregion.pdf"},{"label":"Tutorial for bioregion","section":"","type":"","url":"https://cran.r-project.org/web/packages/bioregion/vignettes/bioregion.html"},{"label":"source","section":"","type":"","url":"https://cran.r-project.org/web/packages/bioregion/vignettes/bioregion.Rmd"},{"label":"R code","section":"","type":"","url":"https://cran.r-project.org/web/packages/bioregion/vignettes/bioregion.R"}]

Text

Reference manual: bioregion.html , bioregion.pdf Vignettes: Tutorial for bioregion ( source , R code )

Downloads

Heading

Downloads

Links

[{"label":"bioregion_1.4.0.tar.gz","section":"","type":"","url":"https://cran.r-project.org/src/contrib/bioregion_1.4.0.tar.gz"},{"label":"bioregion_1.4.0.zip","section":"","type":"","url":"https://cran.r-project.org/bin/windows/contrib/4.7/bioregion_1.4.0.zip"},{"label":"bioregion_1.4.0.zip","section":"","type":"","url":"https://cran.r-project.org/bin/windows/contrib/4.6/bioregion_1.4.0.zip"},{"label":"bioregion_1.4.0.zip","section":"","type":"","url":"https://cran.r-project.org/bin/windows/contrib/4.5/bioregion_1.4.0.zip"},{"label":"bioregion_1.4.0.tgz","section":"","type":"","url":"https://cran.r-project.org/bin/macosx/sonoma-arm64/contrib/4.6/bioregion_1.4.0.tgz"},{"label":"bioregion_1.4.0.tgz","section":"","type":"","url":"https://cran.r-project.org/bin/macosx/big-sur-arm64/contrib/4.5/bioregion_1.4.0.tgz"},{"label":"bioregion_1.4.0.tgz","section":"","type":"","url":"https://cran.r-project.org/bin/macosx/big-sur-x86_64/contrib/4.6/bioregion_1.4.0.tgz"},{"label":"bioregion_1.4.0.tgz","section":"","type":"","url":"https://cran.r-project.org/bin/macosx/big-sur-x86_64/contrib/4.5/bioregion_1.4.0.tgz"},{"label":"bioregion archive","section":"","type":"","url":"https://CRAN.R-project.org/src/contrib/Archive/bioregion"}]

Text

Package source: bioregion_1.4.0.tar.gz Windows binaries: r-devel: bioregion_1.4.0.zip , r-release: bioregion_1.4.0.zip , r-oldrel: bioregion_1.4.0.zip macOS binaries: r-release (arm64): bioregion_1.4.0.tgz , r-oldrel (arm64): bioregion_1.4.0.tgz , r-release (x86_64): bioregion_1.4.0.tgz , r-oldrel (x86_64): bioregion_1.4.0.tgz Old sources: bioregion archive

Linking

Heading

Linking

Links

[{"label":"https://CRAN.R-project.org/package=bioregion","section":"","type":"","url":"https://CRAN.R-project.org/package=bioregion"}]

Text

Please use the canonical form https://CRAN.R-project.org/package=bioregion to link to this page.

Documentation 5

bioregion.html bioregion.pdf Tutorial for bioregion source R code

Vignettes 3

Tutorial for bioregion source R code

Downloads 9

bioregion_1.4.0.tar.gz bioregion_1.4.0.zip bioregion_1.4.0.zip bioregion_1.4.0.zip bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion archive

All page links 58

ape apcluster bipartite cluster data.table dbscan dynamicTreeCut fastcluster fastkmedoids ggplot2 grDevices httr igraph mathjaxr Matrix phangorn rcartocolor Rdpack rlang rmarkdown Rcpp ade4 adespatial betapart dplyr ecodist knitr microbenchmark rnaturalearth rnaturalearthdata terra testthat vegan 10.32614/CRAN.package.bioregion https://orcid.org/0000-0001-6362-3473 https://orcid.org/0000-0002-7686-4302 https://orcid.org/0000-0001-5037-2281 https://github.com/bioRgeo/bioregion/issues GPL-3 https://github.com/bioRgeo/bioregion https://bioRgeo.github.io/bioregion/bioregion citation info bioregion results bioregion.html bioregion.pdf Tutorial for bioregion source R code bioregion_1.4.0.tar.gz bioregion_1.4.0.zip bioregion_1.4.0.zip bioregion_1.4.0.zip bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion_1.4.0.tgz bioregion archive https://CRAN.R-project.org/package=bioregion

패키지 문서 원문

3 artifacts

citation

Citation

CRAN · 1.4.0 · Citation · text/html · 1,040 · 2026-05-07

Title

CRAN: bioregion citation info

Label

Citation

Text content

CRAN: bioregion citation info Denelle, P., Leroy, B., Lenormand, M. (2025). “Bioregionalization analyses with the bioregion R package.” Methods in Ecology and Evolution , 16 , 496-506. doi:10.1111/2041-210X.14496 . Corresponding BibTeX entry: @Article{, author = {{Denelle} and {P.} and {Leroy} and {B.} and {Lenormand} and {M.}}, title = {Bioregionalization analyses with the bioregion R package}, journal = {Methods in Ecology and Evolution}, year = {2025}, pages = {496-506}, volume = {16}, doi = {10.1111/2041-210X.14496}, }

reference_manual_html

Reference manual HTML

CRAN · 1.4.0 · Documentation · text/html · 259,621 · 2026-05-07

Title

Help for package bioregion

Label

Reference manual HTML

Text content

Help for package bioregion const macros = { "\\R": "\\textsf{R}", "\\mbox": "\\text", "\\code": "\\texttt"}; function processMathHTML() { var l = document.getElementsByClassName('reqn'); for (let e of l) { katex.render(e.textContent, e, { throwOnError: false, macros }); } return; } Package {bioregion} Contents as_bioregion_pairwise betapart_to_bioregion bind_pairwise bioregion_colors bioregion_metrics bioregionalization_metrics compare_bioregionalizations cut_tree dissimilarity dissimilarity_to_similarity exportGDF find_optimal_n fishdf fishmat fishsf hclu_diana hclu_hierarclust hclu_optics install_binaries map_bioregions mat_to_net net_to_mat netclu_beckett netclu_greedy netclu_infomap netclu_labelprop netclu_leadingeigen netclu_leiden netclu_louvain netclu_oslom netclu_walktrap nhclu_affprop nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_pam similarity similarity_to_dissimilarity site_species_metrics site_species_subset vegedf vegemat vegesf Type: Package Title: Comparison of Bioregionalization Methods Version: 1.4.0 Description: The main purpose of this package is to propose a transparent methodological framework to compare bioregionalization methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) < doi:10.1111/j.1365-2699.2010.02375.x >) and network algorithms (Lenormand et al. (2019) < doi:10.1002/ece3.4718 > and Leroy et al. (2019) < doi:10.1111/jbi.13674 >). Depends: R (≥ 4.0.0) License: GPL-3 Encoding: UTF-8 LazyData: true Imports: ape, apcluster, bipartite, cluster, data.table, dbscan, dynamicTreeCut, fastcluster, fastkmedoids, ggplot2, grDevices, httr, igraph, mathjaxr, Matrix, phangorn, rcartocolor, Rdpack, rlang, rmarkdown, segmented, sf, stats, tidyr, utils RdMacros: mathjaxr, Rdpack LinkingTo: Rcpp Suggests: ade4, adespatial, betapart, dplyr, ecodist, knitr, microbenchmark, rnaturalearth, rnaturalearthdata, terra, testthat (≥ 3.0.0), vegan VignetteBuilder: knitr RoxygenNote: 7.3.3 URL: https://github.com/bioRgeo/bioregion , https://bioRgeo.github.io/bioregion/ BugReports: https://github.com/bioRgeo/bioregion/issues Config/testthat/edition: 3 NeedsCompilation: yes Packaged: 2026-03-29 09:45:16 UTC; maxime Author: Maxime Lenormand [aut, cre], Boris Leroy [aut], Pierre Denelle [aut] Maintainer: Maxime Lenormand <maxime.lenormand@inrae.fr> Repository: CRAN Date/Publication: 2026-03-29 15:50:22 UTC Convert a matrix or list of matrices to a bioregion (dis)similarity object Description Converts a (dis)similarity matrix or a list of such matrices into a bioregion.pairwise object compatible with the bioregion package. The input can come from base R, dist objects, or outputs from other packages. Usage as_bioregion_pairwise( mat, metric_name = NULL, pkg = NULL, is_similarity = FALSE ) Arguments mat A matrix , a dist object, or a list of these representing pairwise similarity or dissimilarity values to convert into a bioregion.pairwise object. This function can also directly handle outputs from other R packages (see the pkg argument). metric_name Optional character vector or single character string specifying the name of the (dis)similarity metric(s), which will appear as column names in the output (see Note). pkg An optional character string indicating the name of the package from which mat was generated ( NULL by default, see Details). Available options are "adespatial" , "betapart" , "ecodist" , or "vegan" . is_similarity A logical value indicating whether the input data represents similarity ( TRUE ) or dissimilarity ( FALSE ). Details This function can directly handle outputs from ten functions across four packages: adespatial : beta.div , beta.div.comp betapart : beta.pair , beta.pair.abund , betapart.core , betapart.core.abund ecodist : distance , bcdist vegan : vegdist , designdist See the documentation of these packages for more information: https://cran.r-project.org/package=adespatial https://cran.r-project.org/package=betapart https://cran.r-project.org/package=ecodist https://cran.r-project.org/package=vegan Value A dissimilarity or similarity object of class bioregion.pairwise , compatible with the bioregion package. Note If no specific package is specified (i.e., pkg = NULL ), site names will be based on the row names of the first matrix. If row names are NULL , they will be generated automatically. If mat is a named list, those names will be used as column names only if metric_name = NULL . Author(s) Maxime Lenormand ( maxime.lenormand@inrae.fr ) Boris Leroy ( leroy.boris@gmail.com ) Pierre Denelle ( pierre.denelle@gmail.com ) See Also For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html . Associated functions: dissimilarity similarity bind_pairwise Examples mat <- matrix(runif(100), 10, 10) rownames(mat) <- paste0("s",1:10) pair <- as_bioregion_pairwise(list(mat,mat,mat), metric_name = NULL, pkg = NULL, is_similarity = FALSE) Convert betapart dissimilarity to bioregion dissimilarity (DEPRECATED) Description This function converts dissimilarity results produced by the betapart package (and packages using betapart, such as phyloregion) into a dissimilarity object compatible with the bioregion package. This function only converts object types to make them compatible with bioregion; it does not modify the beta-diversity values. This function allows the inclusion of phylogenetic beta diversity to compute bioregions with bioregion. Usage betapart_to_bioregion(betapart_result) Arguments betapart_result An object produced by the betapart package (e.g., using the beta.pair function). Value A dissimilarity object of class bioregion.pairwise , compatible with the bioregion package. Author(s) Boris Leroy ( leroy.boris@gmail.com ) Maxime Lenormand ( maxime.lenormand@inrae.fr ) Pierre Denelle ( pierre.denelle@gmail.com ) See Also This function is deprecated, use as_bioregion_pairwise instead. Examples comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) ## Not run: beta_div <- betapart::beta.pair.abund(comat) betapart_to_bioregion(beta_div) ## End(Not run) Combine and enrich bioregion (dis)similarity object(s) Description Combine two bioregion.pairwise objects and/or compute new pairwise metrics based on the columns of the object(s). Usage bind_pairwise(primary_metrics, secondary_metrics, new_metrics = NULL) Arguments primary_metrics A bioregion.pairwise object. This is the main set of pairwise metrics that will be used as a base for the combination. secondary_metrics A second bioregion.pairwise object to be combined with primary_metrics . It must have the same sites identifiers and pairwise structure. Can be set to NULL if new_metrics is specified. new_metrics A character vector or a single character string specifying custom formula(s) based on the column names of primary_metrics and secondary_metrics (see Details). Details When both primary_metrics and secondary_metrics are provided and if the pairwise structure is identical the function combine the two objects. If new_metrics is provided, each formula is evaluated based on the column names of primary_metrics (and secondary_metrics if provided). Value A new bioregion.pairwise object containing the combined and/or enriched data. It includes all original metrics from the inputs, as well as any newly computed metrics. Author(s) Maxime Lenormand ( maxime.lenormand@inrae.fr ) Boris Leroy ( leroy.boris@gmail.com ) Pierre Denelle ( pierre.denelle@gmail.com ) See Also For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html . Associated functions: dissimilarity similarity as_bioregion_pairwise Examples comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) sim

section

bioregion.pdf

CRAN · 1.4.0 · Documentation · application/pdf · 323,861 · 2026-05-07

Title

bioregion.pdf

Label

bioregion.pdf

Reference for bioregion (1.4.0)

44개 topic

as_bioregion_pairwise

Convert a matrix or list of matrices to a bioregion (dis)similarity object

CRAN · 1.4.0 · bioregion/man/as_bioregion_pairwise.Rd · 2026-05-07

Converts a (dis)similarity matrix or a list of such matrices into a bioregion.pairwise object compatible with the bioregion package. The input can come from base R, dist objects, or outputs from other packages.

Aliases

as_bioregion_pairwise

Usage

as_bioregion_pairwise( mat, metric_name = NULL, pkg = NULL, is_similarity = FALSE )

Arguments

mat

A matrix, a dist object, or a list of these representing pairwise similarity or dissimilarity values to convert into a bioregion.pairwise object. This function can also directly handle outputs from other R packages (see the pkg argument).

metric_name

Optional character vector or single character string specifying the name of the (dis)similarity metric(s), which will appear as column names in the output (see Note).

pkg

An optional character string indicating the name of the package from which mat was generated (NULL by default, see Details). Available options are "adespatial", "betapart", "ecodist", or "vegan".

is_similarity

A logical value indicating whether the input data represents similarity (TRUE) or dissimilarity (FALSE).

Details

This function can directly handle outputs from ten functions across four packages: adespatial: [adespatial:beta.div]beta.div, [adespatial:beta.div.comp]beta.div.comp betapart: [betapart:beta.pair]beta.pair, [betapart:beta.pair.abund]beta.pair.abund, [betapart:betapart.core]betapart.core, [betapart:betapart.core.abund]betapart.core.abund ecodist: [ecodist:distance]distance, [ecodist:bcdist]bcdist vegan: [vegan:vegdist]vegdist, [vegan:designdist]designdist See the documentation of these packages for more information: https://cran.r-project.org/package=adespatial https://cran.r-project.org/package=betapart https://cran.r-project.org/package=ecodist https://cran.r-project.org/package=vegan

Value

A dissimilarity or similarity object of class bioregion.pairwise, compatible with the bioregion package.

Examples

mat <- matrix(runif(100), 10, 10) rownames(mat) <- paste0("s",1:10) pair <- as_bioregion_pairwise(list(mat,mat,mat), metric_name = NULL, pkg = NULL, is_similarity = FALSE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html. Associated functions: dissimilarity similarity bind_pairwise

Note

If no specific package is specified (i.e., pkg = NULL), site names will be based on the row names of the first matrix. If row names are NULL, they will be generated automatically. If mat is a named list, those names will be used as column names only if metric_name = NULL.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

betapart_to_bioregion

Convert betapart dissimilarity to bioregion dissimilarity (DEPRECATED)

CRAN · 1.4.0 · bioregion/man/betapart_to_bioregion.Rd · 2026-05-07

This function converts dissimilarity results produced by the betapart package (and packages using betapart, such as phyloregion) into a dissimilarity object compatible with the bioregion package. This function only converts object types to make them compatible with bioregion; it does not modify the beta-diversity values. This function allows the inclusion of phylogenetic beta diversity to compute bioregions with bioregion.

Aliases

betapart_to_bioregion

Usage

betapart_to_bioregion(betapart_result)

Arguments

betapart_result

An object produced by the betapart package (e.g., using the beta.pair function).

Value

A dissimilarity object of class bioregion.pairwise, compatible with the bioregion package.

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) beta_div <- betapart::beta.pair.abund(comat) betapart_to_bioregion(beta_div)

See also

This function is deprecated, use as_bioregion_pairwise instead.

Author

Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com)

bind_pairwise

Combine and enrich bioregion (dis)similarity object(s)

CRAN · 1.4.0 · bioregion/man/bind_pairwise.Rd · 2026-05-07

Combine two bioregion.pairwise objects and/or compute new pairwise metrics based on the columns of the object(s).

Aliases

bind_pairwise

Usage

bind_pairwise(primary_metrics, secondary_metrics, new_metrics = NULL)

Arguments

primary_metrics

A bioregion.pairwise object. This is the main set of pairwise metrics that will be used as a base for the combination.

secondary_metrics

A second bioregion.pairwise object to be combined with primary_metrics. It must have the same sites identifiers and pairwise structure. Can be set to NULL if new_metrics is specified.

new_metrics

A character vector or a single character string specifying custom formula(s) based on the column names of primary_metrics and secondary_metrics (see Details).

Details

When both primary_metrics and secondary_metrics are provided and if the pairwise structure is identical the function combine the two objects. If new_metrics is provided, each formula is evaluated based on the column names of primary_metrics (and secondary_metrics if provided).

Value

A new bioregion.pairwise object containing the combined and/or enriched data. It includes all original metrics from the inputs, as well as any newly computed metrics.

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) sim <- bind_pairwise(primary_metrics = similarity(comat, metric = "abc"), secondary_metrics = similarity(comat, metric = "Simpson"), new_metrics = "1 - (b + c) / (a + b + c)")

See also

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

bioregion_colors

Add color palettes to bioregion cluster objects

CRAN · 1.4.0 · bioregion/man/bioregion_colors.Rd · 2026-05-07

This function assigns colors to clusters in a bioregion.clusters object using color palettes from the rcartocolor package. It handles large numbers of clusters by assigning vivid colors to the most important clusters (based on size), grey shades to less important clusters, and optionally black to insignificant clusters.

Aliases

bioregion_colors

Usage

bioregion_colors( clusters, palette = "Vivid", cluster_ordering = "n_sites", cutoff_insignificant = NULL )

Arguments

clusters

An object of class bioregion.clusters, typically output from clustering functions such as [=netclu_infomap]netclu_infomap(), [=hclu_hierarclust]hclu_hierarclust(), or [=nhclu_pam]nhclu_pam().

palette

A character string indicating which color palette from rcartocolor to use. Default is "Vivid". Other qualitative palettes include "Bold", "Prism", "Safe", "Antique", and "Pastel".

cluster_ordering

A character string indicating the criterion for ranking clusters to determine color assignment priority. Options are: "n_sites" (default): Rank by number of sites in each cluster "n_species": Rank by number of species (bipartite networks only) "n_both": Rank by combined sites + species (bipartite networks only) Larger clusters (by the chosen criterion) receive vivid colors first.

cutoff_insignificant

A numeric value or NULL (default). When specified, clusters with values at or below this threshold (based on the cluster_ordering criterion) are considered insignificant and colored black, reducing visual clutter on maps. If NULL, all clusters receive distinct colors.

Details

The function uses a two-step algorithm to assign colors: Step 1: Identify insignificant clusters (if cutoff_insignificant is specified) Insignificant clusters are those with a marginal size compared to others. This is a subjective threshold set by the user. All such clusters are assigned the color black (#000000) to minimize their visual impact. Clusters with values at or below the threshold are assigned black (#000000). Step 2: Assign colors to significant clusters Remaining clusters are ranked by the cluster_ordering criterion: Top clusters (up to 12): Receive distinct colors from the chosen palette. This limit is because above 12 the human eye struggles to distinguish between colors. Remaining clusters (beyond top 12): Receive shades of grey from light (#CCCCCC) to dark (#404040), maintaining visual distinction but with less prominence. Multiple partitions: If the cluster object contains multiple partitions (e.g., from hierarchical clustering with different k values), colors are assigned independently for each partition. Each partition gets its own color scale optimized for the number of clusters in that partition.

Value

A modified bioregion.clusters object with two additional elements: colors: A list where each element corresponds to a partition (bioregionalization). Each list element is a data.frame with two columns: cluster (character): Cluster identifier for that partition color (character): Hex color code (e.g., "#FF5733") clusters_colors: A data.frame with the same structure as the clusters element, but with cluster IDs replaced by their corresponding hex color codes for direct use in plotting functions.

Examples

data(fishmat) data(fishsf) # Basic example with few clusters sim <- similarity(fishmat, metric = "Simpson") clust <- netclu_greedy(sim) clust_colored <- bioregion_colors(clust) print(clust_colored) # Map with automatic colors map_bioregions(clust_colored, fishsf) # Example with many clusters and cutoff dissim <- similarity_to_dissimilarity(sim) clust <- hclu_hierarclust(dissim, optimal_tree_method = "best", n_clust = 15) clust_colored2 <- bioregion_colors(clust, cluster_ordering = "n_sites", cutoff_insignificant = 1) map_bioregions(clust_colored2, fishsf) # Example with different palette clust_colored3 <- bioregion_colors(clust, palette = "Bold") map_bioregions(clust_colored3, fishsf) # Example with bipartite network clust_bip <- netclu_greedy(fishdf, bipartite = TRUE) clust_bip_colored <- bioregion_colors(clust_bip, cluster_ordering = "n_both") map_bioregions(clust_bip_colored, fishsf)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_1_visualization.html. Associated functions: map_bioregions

Note

The colored cluster object can be directly used with [=map_bioregions]map_bioregions(), which will automatically detect and apply the color scheme when present.

Author

Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com)

References

Color palettes from the rcartocolor package: Nowosad J (2018). "CARTOColors: color palettes inspired by CARTO." https://github.com/Nowosad/rcartocolor

bioregion_metrics

Calculate metrics for bioregions

CRAN · 1.4.0 · bioregion/man/bioregion_metrics.Rd · 2026-05-07

This function calculates the number of sites, the number of species, the number of endemic species and the proportion of endemism per bioregion. The spatial coherence can be optionally computed if a spatial object is provided.

Aliases

bioregion_metrics

Usage

bioregion_metrics(bioregionalization, comat, map = NULL, col_bioregion = NULL)

Arguments

bioregionalization

A bioregion.clusters object.

comat

A site-species matrix with sites as rows and species as columns.

map

A spatial object that can be handled by sf or terra. The first attribute or layer should correspond to the sites' ID (see Details). Needed only for the spatial coherence (NULL by default).

col_bioregion

Deprecated.

Details

map should be the output of map_bioregions(bioregionalization, geometry, write_clusters = TRUE)

Value

A data.frame with 5 columns (Bioregion ID and metrics described below) or 7 if spatial coherence is computed. NbSites: Number of sites per bioregion Richness: Number of distinct species per bioregion. Rich_Endemics: Number of species found only in the bioregion. Prop_Endemics: Fraction of endemics species. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#bioregion-metrics-spatial-coherenceSC_size: Spatial coherence based on size, fraction of the number of site contained in the bioregion's largest contiguous patch. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#bioregion-metrics-spatial-coherenceSC_area: Spatial coherence based on area, fraction of the bioregion area contained in its largest contiguous patch. Note that if bioregionalization contains multiple partitions (i.e., if dim(bioregionalization$clusters) > 2), a list will be returned.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") clust <- netclu_louvain(net) bioregion_metrics(bioregionalization = clust, comat = comat)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html. Associated functions: site_species_metrics bioregionalization_metrics

Author

Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

bioregionalization_metrics

Calculate metrics for one or several bioregionalizations

CRAN · 1.4.0 · bioregion/man/bioregionalization_metrics.Rd · 2026-05-07

This function calculates metrics for one or several bioregionalizations, typically based on outputs from netclu_, hclu_, or nhclu_ functions. Some metrics may require users to provide either a similarity or dissimilarity matrix, or the initial species-site table.

Aliases

bioregionalization_metrics

Usage

bioregionalization_metrics( bioregionalization, dissimilarity = NULL, dissimilarity_index = NULL, net = NULL, site_col = 1, species_col = 2, eval_metric = "all" )

Arguments

bioregionalization

A bioregion.clusters object.

dissimilarity

A dist object or a bioregion.pairwise object (output from [=similarity_to_dissimilarity]similarity_to_dissimilarity()). Required if eval_metric includes "pc_distance" and tree is not a bioregion.hierar.tree object.

dissimilarity_index

A character string indicating the dissimilarity (beta-diversity) index to use if dissimilarity is a data.frame with multiple dissimilarity indices.

net

The site-species network (i.e., bipartite network). Should be provided as a data.frame if eval_metric includes "avg_endemism" or "tot_endemism".

site_col

The name or index of the column representing site nodes (i.e., primary nodes). Should be provided if eval_metric includes "avg_endemism" or "tot_endemism".

species_col

The name or index of the column representing species nodes (i.e., feature nodes). Should be provided if eval_metric includes "avg_endemism" or "tot_endemism".

eval_metric

A character vector or a single character string indicating the metric(s) to be calculated to assess the effect of different numbers of clusters. Available options are "pc_distance", "anosim", "avg_endemism", or "tot_endemism". If "all" is specified, all metrics will be calculated.

Details

Evaluation metrics: pc_distance: This metric, as used by Holt et al. (2013), is the ratio of the between-cluster sum of dissimilarities (beta-diversity) to the total sum of dissimilarities for the full dissimilarity matrix. It is calculated in two steps: Compute the total sum of dissimilarities by summing all elements of the dissimilarity matrix. Compute the between-cluster sum of dissimilarities by setting within-cluster dissimilarities to zero and summing the matrix. The pc_distance ratio is obtained by dividing the between-cluster sum of dissimilarities by the total sum of dissimilarities. anosim: This metric is the statistic used in the Analysis of Similarities, as described in Castro-Insua et al. (2018). It compares between-cluster and within-cluster dissimilarities. The statistic is computed as: R = (r_B - r_W) / (N (N-1) / 4), where r_B and r_W are the average ranks of between-cluster and within-cluster dissimilarities, respectively, and N is the total number of sites. Note: This function does not estimate significance; for significance testing, use [vegan:anosim]vegan::anosim(). avg_endemism: This metric is the average percentage of endemism in clusters, as recommended by Kreft & Jetz (2010). It is calculated as: End_mean = sum_i (E_i / S_i) / K, where E_i is the number of endemic species in cluster i, S_i is the number of species in cluster i, and K is the total number of clusters. tot_endemism: This metric is the total endemism across all clusters, as recommended by Kreft & Jetz (2010). It is calculated as: End_tot = E / C, where E is the total number of endemic species (i.e., species found in only one cluster) and C is the number of non-endemic species.

Value

A list of class bioregion.bioregionalization.metrics with two to three elements: args: Input arguments. evaluation_df: A data.frame containing the eval_metric values for all explored numbers of clusters. endemism_results: If endemism calculations are requested, a list with the endemism results for each bioregionalization.

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) comnet <- mat_to_net(comat) dissim <- dissimilarity(comat, metric = "all") # User-defined number of clusters tree1 <- hclu_hierarclust(dissim, n_clust = 10:15, index = "Simpson") tree1 a <- bioregionalization_metrics(tree1, dissimilarity = dissim, net = comnet, site_col = "Node1", species_col = "Node2", eval_metric = c("tot_endemism", "avg_endemism", "pc_distance", "anosim"))

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html#optimaln. Associated functions: compare_bioregionalizations find_optimal_n

Author

Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com)

References

Castro-Insua A, Gómez-Rodríguez C & Baselga A (2018) Dissimilarity measures affected by richness differences yield biased delimitations of biogeographic realms. Nature Communications 9, 9-11. Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science 339, 74-78. Kreft H & Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography 37, 2029-2053.

compare_bioregionalizations

Compare cluster memberships among multiple bioregionalizations

CRAN · 1.4.0 · bioregion/man/compare_bioregionalizations.Rd · 2026-05-07

This function computes pairwise comparisons for several bioregionalizations, usually outputs from netclu_, hclu_, or nhclu_ functions. It also provides the confusion matrix from pairwise comparisons, enabling the user to compute additional comparison metrics.

Aliases

compare_bioregionalizations

Usage

compare_bioregionalizations( bioregionalizations, indices = c("rand", "jaccard"), cor_frequency = FALSE, store_pairwise_membership = TRUE, store_confusion_matrix = TRUE, verbose = TRUE )

Arguments

bioregionalizations

A data.frame object where each row corresponds to a site, and each column to a bioregionalization.

indices

NULL or character. Indices to compute for the pairwise comparison of bioregionalizations. Currently available metrics are "rand" and "jaccard".

cor_frequency

A boolean. If TRUE, computes the correlation between each bioregionalization and the total frequency of co-membership of items across all bioregionalizations. This is useful for identifying which bioregionalization(s) is(are) most representative of all computed bioregionalizations.

store_pairwise_membership

A boolean. If TRUE, stores the pairwise membership of items in the output object.

store_confusion_matrix

A boolean. If TRUE, stores the confusion matrices of pairwise bioregionalization comparisons in the output object.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

This function operates in two main steps: Within each bioregionalization, the function compares all pairs of items and documents whether they are clustered together (TRUE) or separately (FALSE). For example, if site 1 and site 2 are clustered in the same cluster in bioregionalization 1, their pairwise membership site1_site2 will be TRUE. This output is stored in the pairwise_membership slot if store_pairwise_membership = TRUE. Across all bioregionalizations, the function compares their pairwise memberships to determine similarity. For each pair of bioregionalizations, it computes a confusion matrix with the following elements: a: Number of item pairs grouped in both bioregionalizations. b: Number of item pairs grouped in the first but not in the second bioregionalization. c: Number of item pairs grouped in the second but not in the first bioregionalization. d: Number of item pairs not grouped in either bioregionalization. The confusion matrix is stored in confusion_matrix if store_confusion_matrix = TRUE. Based on these confusion matrices, various indices can be computed to measure agreement among bioregionalizations. The currently implemented indices are: Rand index: (a + d) / (a + b + c + d) Measures agreement by considering both grouped and ungrouped item pairs. Jaccard index: a / (a + b + c) Measures agreement based only on grouped item pairs. These indices are complementary: the Jaccard index evaluates clustering similarity, while the Rand index considers both clustering and separation. For example, if two bioregionalizations never group the same pairs, their Jaccard index will be 0, but their Rand index may be > 0 due to ungrouped pairs. Users can compute additional indices manually using the list of confusion matrices. To identify which bioregionalization is most representative of the others, the function can compute the correlation between the pairwise membership of each bioregionalization and the total frequency of pairwise membership across all bioregionalizations. This is enabled by setting cor_frequency = TRUE.

Value

A list containing 4 to 7 elements: args: A list of user-provided arguments. inputs: A list containing information on the input bioregionalizations, such as the number of items clustered. pairwise_membership (optional): If store_pairwise_membership = TRUE, a boolean matrix where TRUE indicates two items are in the same cluster, and FALSE indicates they are not. freq_item_pw_membership: A numeric vector containing the number of times each item pair is clustered together, corresponding to the sum of rows in pairwise_membership. bioregionalization_freq_cor (optional): If cor_frequency = TRUE, a numeric vector of correlations between individual bioregionalizations and the total frequency of pairwise membership. confusion_matrix (optional): If store_confusion_matrix = TRUE, a list of confusion matrices for each pair of bioregionalizations. bioregionalization_comparison: A data.frame containing comparison results, where the first column indicates the bioregionalizations compared, and the remaining columns contain the requested indices.

Examples

# We here compare three different bioregionalizations comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "Simpson") bioregion1 <- nhclu_kmeans(dissim, n_clust = 3, index = "Simpson") net <- similarity(comat, metric = "Simpson") bioregion2 <- netclu_greedy(net) bioregion3 <- netclu_walktrap(net) # Make one single data.frame with the bioregionalizations to compare compare_df <- merge(bioregion1$clusters, bioregion2$clusters, by = "ID") compare_df <- merge(compare_df, bioregion3$clusters, by = "ID") colnames(compare_df) <- c("Site", "Hclu", "Greedy", "Walktrap") rownames(compare_df) <- compare_df$Site compare_df <- compare_df[, c("Hclu", "Greedy", "Walktrap")] # Running the function compare_bioregionalizations(compare_df) # Find out which bioregionalizations are most representative compare_bioregionalizations(compare_df, cor_frequency = TRUE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_3_compare_bioregionalizations.html. Associated functions: bioregionalization_metrics

Author

Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com)

cut_tree

Cut a hierarchical tree

CRAN · 1.4.0 · bioregion/man/cut_tree.Rd · 2026-05-07

This function is designed to work on a hierarchical tree and cut it at user-selected heights. It works with outputs from either hclu_hierarclust or hclust objects. The function allows for cutting the tree based on the chosen number(s) of clusters or specified height(s). Additionally, it includes a procedure to automatically determine the cutting height for the requested number(s) of clusters.

Aliases

cut_tree

Usage

cut_tree( tree, n_clust = NULL, cut_height = NULL, find_h = TRUE, h_max = 1, h_min = 0, dynamic_tree_cut = FALSE, dynamic_method = "tree", dynamic_minClusterSize = 5, dissimilarity = NULL, show_hierarchy = FALSE, verbose = TRUE, ... )

Arguments

tree

A bioregion.hierar.tree or an hclust object.

n_clust

An integer vector or a single integer indicating the number of clusters to be obtained from the hierarchical tree, or the output from [=bioregionalization_metrics]bioregionalization_metrics(). This should not be used concurrently with cut_height.

cut_height

A numeric vector specifying the height(s) at which the tree should be cut. This should not be used concurrently with n_clust or optim_method.

find_h

A boolean indicating whether the cutting height should be determined for the requested n_clust.

h_max

A numeric value indicating the maximum possible tree height for determining the cutting height when find_h = TRUE.

h_min

A numeric value specifying the minimum possible height in the tree for determining the cutting height when find_h = TRUE.

dynamic_tree_cut

A boolean indicating whether the dynamic tree cut method should be used. If TRUE, n_clust and cut_height are ignored.

dynamic_method

A character string specifying the method to be used for dynamically cutting the tree: either "tree" (clusters searched only within the tree) or "hybrid" (clusters searched in both the tree and the dissimilarity matrix).

dynamic_minClusterSize

An integer indicating the minimum cluster size for the dynamic tree cut method (see [dynamicTreeCut:cutreeDynamic]dynamicTreeCut::cutreeDynamic()).

dissimilarity

Relevant only if dynamic_method = "hybrid". Provide the dissimilarity data.frame used to build the tree.

show_hierarchy

A boolean specifying if the hierarchy of clusters should be identifiable in the outputs (FALSE by default).

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

...

Additional arguments passed to [dynamicTreeCut:cutreeDynamic]dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.

Details

The function supports two main methods for cutting the tree. First, the tree can be cut at a uniform height (specified by cut_height or determined automatically for the requested n_clust). Second, the dynamic tree cut method (Langfelder et al., 2008) can be applied, which adapts to the shape of branches in the tree, cutting at varying heights based on cluster positions. The dynamic tree cut method has two variants: The tree-based variant (dynamic_method = "tree") uses a top-down approach, relying solely on the tree and the order of clustered objects. The hybrid variant (dynamic_method = "hybrid") employs a bottom-up approach, leveraging both the tree and the dissimilarity matrix to identify clusters based on dissimilarity among sites. This approach is useful for detecting outliers within clusters.

Value

If tree is an output from [=hclu_hierarclust]hclu_hierarclust(), the same object is returned with updated content (i.e., args and clusters). If tree is an hclust object, a data.frame containing the clusters is returned.

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site", 1:20) colnames(comat) <- paste0("Species", 1:25) simil <- similarity(comat, metric = "all") dissimilarity <- similarity_to_dissimilarity(simil) # User-defined number of clusters tree1 <- hclu_hierarclust(dissimilarity, n_clust = 5) tree2 <- cut_tree(tree1, cut_height = .05) tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10)) tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25)) tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE) hclust_tree <- tree2$algorithm$final.tree clusters_2 <- cut_tree(hclust_tree, n_clust = 10) cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE, dissimilarity = dissimilarity)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html. Associated functions: hclu_hierarclust

Note

The find_h argument is ignored if dynamic_tree_cut = TRUE, as cutting heights cannot be determined in this case.

Author

Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com)

References

Langfelder P, Zhang B & Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. BIOINFORMATICS 24, 719-720.

dissimilarity

Compute dissimilarity metrics (beta-diversity) between sites based on species composition

CRAN · 1.4.0 · bioregion/man/dissimilarity.Rd · 2026-05-07

This function generates a data.frame where each row provides one or several dissimilarity metrics between pairs of sites, based on a co-occurrence matrix with sites as rows and species as columns.

Aliases

dissimilarity

Usage

dissimilarity(comat, metric = "Simpson", formula = NULL, method = "prodmat")

Arguments

comat

A co-occurrence matrix with sites as rows and species as columns.

metric

A character vector or a single character string specifying the metrics to compute (see Details). Available options are "abc", "ABC", "Jaccard", "Jaccardturn", "Sorensen", "Simpson", "Bray", "Brayturn", and "Euclidean". If "all" is specified, all metrics will be calculated. Can be set to NULL if formula is used.

formula

A character vector or a single character string specifying custom formula(s) based on the a, b, c, A, B, and C quantities (see Details). The default is NULL.

method

A character string specifying the method to compute abc (see Details). The default is "prodmat", which is more efficient but memory-intensive. Alternatively, "loops" is less memory-intensive but slower.

Details

With a the number of species shared by a pair of sites, b species only present in the first site and c species only present in the second site. Jaccard = (b + c) / (a + b + c) Jaccardturn = 2min(b, c) / (a + 2min(b, c)) (Baselga, 2012) Sorensen = (b + c) / (2a + b + c) Simpson = min(b, c) / (a + min(b, c)) If abundances data are available, Bray-Curtis and its turnover component can also be computed with the following equation: Bray = (B + C) / (2A + B + C) Brayturn = min(B, C)/(A + min(B, C)) (Baselga, 2013) with A the sum of the lesser values for common species shared by a pair of sites. B and C are the total number of specimens counted at both sites minus A. formula can be used to compute customized metrics with the terms a, b, c, A, B, and C. For example formula = c("pmin(b,c) / (a + pmin(b,c))", "(B + C) / (2*A + B + C)") will compute the Simpson and Bray-Curtis dissimilarity metrics, respectively. Note that pmin is used in the Simpson formula because a, b, c, A, B and C are numeric vectors. Euclidean computes the Euclidean distance between each pair of sites.

Value

A data.frame with the additional class bioregion.pairwise, containing one or several dissimilarity metrics between pairs of sites. The first two columns represent the pairs of sites. There is one column per similarity metric provided in metric and formula, except for the abc and ABC metrics, which are stored in three separate columns (one for each letter).

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) dissim <- dissimilarity(comat, metric = c("abc", "ABC", "Simpson", "Brayturn")) dissim <- dissimilarity(comat, metric = "all", formula = "1 - (b + c) / (a + b + c)")

See also

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Baselga, A. (2012) The Relationship between Species Replacement, Dissimilarity Derived from Nestedness, and Nestedness. Global Ecology and Biogeography, 21(12), 1223--1232. Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4(6), 552--557.

dissimilarity_to_similarity

Convert dissimilarity metrics to similarity metrics

CRAN · 1.4.0 · bioregion/man/dissimilarity_to_similarity.Rd · 2026-05-07

This function converts a data.frame of dissimilarity metrics (beta diversity) between sites into similarity metrics.

Aliases

dissimilarity_to_similarity

Usage

dissimilarity_to_similarity(dissimilarity, include_formula = TRUE)

Arguments

dissimilarity

the output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity().

include_formula

a boolean indicating whether metrics based on custom formula(s) should also be converted (see Details). The default is TRUE.

Value

A data.frame with the additional class bioregion.pairwise, providing similarity metrics for each pair of sites based on a dissimilarity object.

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) dissimil <- dissimilarity(comat, metric = "all") dissimil similarity <- dissimilarity_to_similarity(dissimil) similarity

See also

Note

The behavior of this function changes depending on column names. Columns Site1 and Site2 are copied identically. If there are columns called a, b, c, A, B, C they will also be copied identically. If there are columns based on your own formula (argument formula in [=dissimilarity]dissimilarity()) or not in the original list of dissimilarity metrics (argument metrics in [=dissimilarity]dissimilarity()) and if the argument include_formula is set to FALSE, they will also be copied identically. Otherwise there are going to be converted like they other columns (default behavior). If a column is called Euclidean, the similarity will be calculated based on the following formula: Euclidean similarity = 1 / (1 - Euclidean distance) Otherwise, all other columns will be transformed into dissimilarity with the following formula: similarity = 1 - dissimilarity

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

exportGDF

Export a network to GDF format for Gephi visualization

CRAN · 1.4.0 · bioregion/man/exportGDF.Rd · 2026-05-07

This function exports a network (unipartite or bipartite) from a data.frame to the GDF (Graph Data Format) file format, which can be directly imported into Gephi visualization software. The function handles edge data, node attributes, and color specifications.

Aliases

exportGDF

Usage

exportGDF( df, col1 = "Node1", col2 = "Node2", weight = NULL, bioregions = NULL, bioregionalization = NULL, color_column = NULL, file = "output.gdf" )

Arguments

A two- or three-column data.frame where each row represents an edge (interaction) between two nodes. The first two columns contain the node identifiers, and an optional third column can contain edge weights.

col1

A character string specifying the name of the first column in df containing node identifiers. Defaults to "Node1".

col2

A character string specifying the name of the second column in df containing node identifiers. Defaults to "Node2".

weight

A character string specifying the name of the column in df containing edge weights. If NULL (default), edges are unweighted.

bioregions

An optional bioregion.clusters object (typically from clustering functions like [=netclu_greedy]netclu_greedy()) or a data.frame containing bioregionalization results. When a bioregion.clusters object with colors (from [=bioregion_colors]bioregion_colors()) is provided, colors and bioregion assignments are automatically extracted and used for visualization. Alternatively, a data.frame with bioregionalization data can be provided, where each row represents a node with one column containing node identifiers that match those in df.

bioregionalization

A character string or a positive integer with two different uses depending on the type of bioregions: When bioregions is a bioregion.clusters object with multiple partitions: specifies which partition to use. Can be either a character string with the partition name (e.g., "K_3", "K_5") or a positive integer indicating the partition index (e.g., 1 for first partition, 2 for second). If NULL (default), the first partition is used. When bioregions is a data.frame: specifies the name of the column containing node identifiers that match those in df. Must be a character string. Defaults to the first column name if not specified.

color_column

A character string specifying the name of a column in bioregions containing color information in hexadecimal format (e.g., "#FF5733"). If specified, colors will be converted to RGB format for Gephi. If NULL (default), colors are automatically extracted when bioregions is a bioregion.clusters object with colors. When bioregions is a plain data.frame, this parameter must be specified to include colors.

file

A character string specifying the output file path. Defaults to "output.gdf".

Details

The GDF format is a simple text-based format used by Gephi to define graph structure. This function creates a GDF file with two main sections: nodedef: Defines nodes and their attributes (name, label, and any additional bioregionalization information from bioregions) edgedef: Defines edges between nodes, optionally with weights If color_column is specified, hexadecimal color codes are automatically converted to RGB format (e.g., "#FF5733" becomes "255,87,51") as required by Gephi's color specification. Attributes are automatically typed as VARCHAR (text), DOUBLE (numeric), or color (for color attributes). Important note on zero-weight edges: Gephi does not handle edges with weight = 0 properly. If a weight column is specified and edges with weight = 0 are detected, they will be automatically removed from the exported network, and a warning will be issued.

Value

The function writes a GDF file to the specified path and returns nothing (NULL invisibly). The file can be directly opened in Gephi for network visualization and analysis.

Examples

# Create a simple network net <- data.frame( Node1 = c("A", "A", "B", "C"), Node2 = c("B", "C", "C", "D"), Weight = c(1.5, 2.0, 1.0, 3.5) ) # Export network with weights exportGDF(net, weight = "Weight", file = "my_network.gdf") # Create bioregionalization data with colors (as data.frame) bioregion_data <- data.frame( node_id = c("A", "B", "C", "D"), cluster = c("1", "2", "3", "4"), node_color = c("#FF5733", "#33FF57", "#3357FF", "#FF33F5") ) # Export network with bioregionalization and colors exportGDF(net, weight = "Weight", bioregions = bioregion_data, bioregionalization = "node_id", color_column = "node_color", file = "my_network_with_bioregions.gdf") # Using bioregion.clusters object with colors (recommended) data(fishmat) net <- similarity(fishmat, metric = "Simpson") clust <- netclu_greedy(net) clust_colored <- bioregion_colors(clust) # Convert to network format net_df <- mat_to_net(fishmat, weight = TRUE) # Export with automatic colors from clustering - very simple! exportGDF(net_df, weight = "weight", bioregions = clust_colored, file = "my_network_colored.gdf") # With multiple partitions, specify which one to use dissim <- similarity_to_dissimilarity(similarity(fishmat, metric = "Simpson")) clust_hier <- hclu_hierarclust(dissim, n_clust = c(3, 5, 8)) clust_hier_colored <- bioregion_colors(clust_hier) # Using partition name exportGDF(net_df, weight = "weight", bioregions = clust_hier_colored, bioregionalization = "K_5", file = "my_network_K5.gdf") # Or using partition index (2 = second partition) exportGDF(net_df, weight = "weight", bioregions = clust_hier_colored, bioregionalization = 2, file = "my_network_partition2.gdf")

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

find_optimal_n

Search for an optimal number of clusters in a list of bioregionalizations

CRAN · 1.4.0 · bioregion/man/find_optimal_n.Rd · 2026-05-07

This function aims to optimize one or several criteria on a set of ordered bioregionalizations. It is typically used to find one or more optimal cluster counts on hierarchical trees to cut or ranges of bioregionalizations from k-means or PAM. Users should exercise caution in other cases (e.g., unordered bioregionalizations or unrelated bioregionalizations).

Aliases

find_optimal_n

Usage

find_optimal_n( bioregionalizations, metrics_to_use = "all", criterion = "elbow", step_quantile = 0.99, step_levels = NULL, step_round_above = TRUE, metric_cutoffs = c(0.5, 0.75, 0.9, 0.95, 0.99, 0.999), n_breakpoints = 1, plot = TRUE, verbose = TRUE )

Arguments

bioregionalizations

A bioregion.bioregionalization.metrics object (output from [=bioregionalization_metrics]bioregionalization_metrics()) or a data.frame with the first two columns named K (bioregionalization name) and n_clusters (number of clusters), followed by columns with numeric evaluation metrics.

metrics_to_use

A character vector or single string specifying metrics in bioregionalizations for calculating optimal clusters. Defaults to "all" (uses all metrics).

criterion

A character string specifying the criterion to identify optimal clusters. Options include "elbow", "increasing_step", "decreasing_step", "cutoff", "breakpoints", "min", or "max". Defaults to "elbow". See Details.

step_quantile

For "increasing_step" or "decreasing_step", specifies the quantile of differences between consecutive bioregionalizations as the cutoff to identify significant steps in eval_metric.

step_levels

For "increasing_step" or "decreasing_step", specifies the number of largest steps to retain as cutoffs.

step_round_above

A boolean indicating whether the optimal clusters are above (TRUE) or below (FALSE) identified steps. Defaults to TRUE.

metric_cutoffs

For criterion = "cutoff", specifies the cutoffs of eval_metric to extract cluster counts.

n_breakpoints

Specifies the number of breakpoints to find in the curve. Defaults to 1.

plot

A boolean indicating if a plot of the first eval_metric with identified optimal clusters should be drawn.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

This function explores evaluation metric ~ cluster relationships, applying criteria to find optimal cluster counts. Note on criteria: Several criteria can return multiple optimal cluster counts, emphasizing hierarchical or nested bioregionalizations. This approach aligns with modern recommendations for biological datasets, as seen in Ficetola et al. (2017)'s reanalysis of Holt et al. (2013). Criteria for optimal clusters: elbow: Identifies the "elbow" point in the evaluation metric curve, where incremental improvements diminish. Based on a method to find the maximum distance from a straight line linking curve endpoints. increasing_step or decreasing_step: Highlights significant increases or decreases in metrics by analyzing pairwise differences between bioregionalizations. Users specify step_quantile or step_levels. cutoffs: Derives clusters from specified metric cutoffs, e.g., as in Holt et al. (2013). Adjust cutoffs based on spatial scale. breakpoints: Uses segmented regression to find breakpoints. Requires specifying n_breakpoints. min & max: Selects clusters at minimum or maximum metric values.

Value

A list of class bioregion.optimal.n with these elements: args: Input arguments. evaluation_df: The input evaluation data.frame, appended with boolean columns for optimal cluster counts. optimal_nb_clusters: A list with optimal cluster counts for each metric in "metrics_to_use", based on the chosen criterion. plot: The plot (if requested).

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "all") # User-defined number of clusters tree <- hclu_hierarclust(dissim, optimal_tree_method = "best", n_clust = 5:10) tree a <- bioregionalization_metrics(tree, dissimilarity = dissim, species_col = "Node2", site_col = "Node1", eval_metric = "anosim") find_optimal_n(a, criterion = 'increasing_step', plot = FALSE)

See also

Note

Please note that finding the optimal number of clusters is a procedure which normally requires decisions from the users, and as such can hardly be fully automatized. Users are strongly advised to read the references indicated below to look for guidance on how to choose their optimal number(s) of clusters. Consider the "optimal" numbers of clusters returned by this function as first approximation of the best numbers for your bioregionalization.

Author

Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com)

References

Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science 339, 74-78. Ficetola GF, Mazel F & Thuiller W (2017) Global determinants of zoogeographical boundaries. Nature Ecology & Evolution 1, 0089.

fishdf

Spatial distribution of fish in Europe (data.frame)

CRAN · 1.4.0 · data · bioregion/man/fishdf.Rd · 2026-05-07

A dataset containing the abundance of 195 species in 338 sites.

Aliases

fishdf

Keywords

datasets

Usage

fishdf

Format

A data.frame with 2,703 rows and 3 columns: SiteUnique site identifier (corresponding to the field ID of fishsf) SpeciesUnique species identifier AbundanceSpecies abundance

fishmat

Spatial distribution of fish in Europe (co-occurrence matrix)

CRAN · 1.4.0 · data · bioregion/man/fishmat.Rd · 2026-05-07

A dataset containing the abundance of each of the 195 species in each of the 338 sites.

Aliases

fishmat

Keywords

datasets

Usage

fishmat

Format

A co-occurrence matrix with sites as rows and species as columns. Each element of the matrix represents the abundance of the species in the site.

fishsf

Spatial distribution of fish in Europe

CRAN · 1.4.0 · data · bioregion/man/fishsf.Rd · 2026-05-07

A dataset containing the geometry of the 338 sites.

Aliases

fishsf

Keywords

datasets

Usage

fishsf

Format

A sf data.frame with 338 rows and 2 columns: SiteUnique site identifier geometryGeometry of the site

hclu_diana

Divisive hierarchical clustering based on dissimilarity or beta-diversity

CRAN · 1.4.0 · bioregion/man/hclu_diana.Rd · 2026-05-07

This function computes a divisive hierarchical clustering from a dissimilarity (beta-diversity) data.frame, calculates the cophenetic correlation coefficient, and can generate clusters from the tree if requested by the user. The function implements randomization of the dissimilarity matrix to generate the tree, with a selection method based on the optimal cophenetic correlation coefficient. Typically, the dissimilarity data.frame is a bioregion.pairwise object obtained by running similarity or similarity followed by similarity_to_dissimilarity.

Aliases

hclu_diana

Usage

hclu_diana( dissimilarity, index = names(dissimilarity)[3], n_clust = NULL, cut_height = NULL, find_h = TRUE, h_max = 1, h_min = 0, verbose = TRUE )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the remaining column(s) contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

n_clust

An integer vector or a single integer indicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics. Should not be used concurrently with cut_height.

cut_height

A numeric vector indicating the height(s) at which the tree should be cut. Should not be used concurrently with n_clust.

find_h

A boolean indicating whether the cutting height should be determined for the requested n_clust.

h_max

A numeric value indicating the maximum possible tree height for the chosen index.

h_min

A numeric value indicating the minimum possible height in the tree for the chosen index.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

The function is based on [cluster:diana]diana. Chapter 6 of Kaufman & Rousseeuw (1990) fully details the functioning of the diana algorithm. To find an optimal number of clusters, see [=bioregionalization_metrics]bioregionalization_metrics()

Value

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "all") data("fishmat") fishdissim <- dissimilarity(fishmat) fish_diana <- hclu_diana(fishdissim, index = "Simpson")

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html. Associated functions: cut_tree

Author

Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.

hclu_hierarclust

Hierarchical clustering based on dissimilarity or beta-diversity

CRAN · 1.4.0 · bioregion/man/hclu_hierarclust.Rd · 2026-05-07

This function generates a hierarchical tree from a dissimilarity (beta-diversity) data.frame, calculates the cophenetic correlation coefficient, and optionally retrieves clusters from the tree upon user request. The function includes a randomization process for the dissimilarity matrix to generate the tree, with two methods available for constructing the final tree. Typically, the dissimilarity data.frame is a bioregion.pairwise object obtained by running similarity, or by running similarity followed by similarity_to_dissimilarity.

Aliases

hclu_hierarclust

Usage

hclu_hierarclust( dissimilarity, index = names(dissimilarity)[3], method = "average", randomize = TRUE, seed = NULL, n_runs = 100, keep_trials = "no", optimal_tree_method = "iterative_consensus_tree", n_clust = NULL, cut_height = NULL, find_h = TRUE, h_max = 1, h_min = 0, consensus_p = 0.5, show_hierarchy = FALSE, verbose = TRUE )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the subsequent column(s) contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

method

The name of the hierarchical classification method, as in [fastcluster:hclust]hclust. Should be one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC), or "centroid" (= UPGMC).

randomize

A boolean indicating whether the dissimilarity matrix should be randomized to account for the order of sites in the dissimilarity matrix.

seed

A value for the random number generator (NULL for random by default).

n_runs

The number of trials for randomizing the dissimilarity matrix.

keep_trials

A character string indicating whether random trial results (including the randomized matrix, the associated tree and metrics for that tree) should be stored in the output object. Possible values are "no" (default), "all" or "metrics". Note that this parameter is automatically set to "no" if optimal_tree_method = "iterative_consensus_tree".

optimal_tree_method

A character string indicating how the final tree should be obtained from all trials. Possible values are "iterative_consensus_tree" (default), "best" or "consensus". We recommend "iterative_consensus_tree". See Details.

n_clust

An integer vector or a single integer indicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics. This parameter should not be used simultaneously with cut_height.

cut_height

A numeric vector indicating the height(s) at which the tree should be cut. This parameter should not be used simultaneously with n_clust.

find_h

A boolean indicating whether the height of the cut should be found for the requested n_clust.

h_max

A numeric value indicating the maximum possible tree height for the chosen index.

h_min

A numeric value indicating the minimum possible height in the tree for the chosen index.

consensus_p

A numeric value (applicable only if optimal_tree_method = "consensus") indicating the threshold proportion of trees that must support a region/cluster for it to be included in the final consensus tree.

show_hierarchy

A boolean specifying if the hierarchy of clusters should be identifiable in the outputs (FALSE by default). This argument is only used if the tree is cut (i.e., n_clust or cut_height is provided).

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

The function is based on [fastcluster:hclust]hclust. The default method for the hierarchical tree is average, i.e. UPGMA as it has been recommended as the best method to generate a tree from beta diversity dissimilarity (Kreft & Jetz, 2010). Clusters can be obtained by two methods: Specifying a desired number of clusters in n_clust Specifying one or several heights of cut in cut_height To find an optimal number of clusters, see [=bioregionalization_metrics]bioregionalization_metrics() It is important to pay attention to the fact that the order of rows in the input distance matrix influences the tree topology as explained in Dapporto (2013). To address this, the function generates multiple trees by randomizing the distance matrix. Two methods are available to obtain the final tree: optimal_tree_method = "iterative_consensus_tree": The Iterative Hierarchical Consensus Tree (IHCT) method reconstructs a consensus tree by iteratively splitting the dataset into two subclusters based on the pairwise dissimilarity of sites across n_runs trees based on n_runs randomizations of the distance matrix. At each iteration, it identifies the majority membership of sites into two stable groups across all trees, calculates the height based on the selected linkage method (method), and enforces monotonic constraints on node heights to produce a coherent tree structure. This approach provides a robust, hierarchical representation of site relationships, balancing cluster stability and hierarchical constraints. optimal_tree_method = "best": This method selects one tree among with the highest cophenetic correlation coefficient, representing the best fit between the hierarchical structure and the original distance matrix. optimal_tree_method = "consensus": This method constructs a consensus tree using phylogenetic methods with the function [ape:consensus]consensus. When using this option, you must set the consensus_p parameter, which indicates the proportion of trees that must contain a region/cluster for it to be included in the final consensus tree. Consensus trees lack an inherent height because they represent a majority structure rather than an actual hierarchical clustering. To assign heights, we use a non-negative least squares method ([phangorn:designTree]nnls.tree) based on the initial distance matrix, ensuring that the consensus tree preserves approximate distances among clusters. We recommend using the "iterative_consensus_tree" as all the branches of this tree will always reflect the majority decision among many randomized versions of the distance matrix. This method is inspired by Dapporto et al. (2015), which also used the majority decision among many randomized versions of the distance matrix, but it expands it to reconstruct the entire topology of the tree iteratively. We do not recommend using the basic consensus method because in many contexts it provides inconsistent results, with a meaningless tree topology and a very low cophenetic correlation coefficient. For a fast exploration of the tree, we recommend using the best method which will only select the tree with the highest cophenetic correlation coefficient among all randomized versions of the distance matrix.

Value

A list of class bioregion.clusters with five slots: name: A character string containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list describing the characteristics of the clustering process. algorithm: A list containing all objects associated with the clustering procedure, such as the original cluster objects. clusters: A data.frame containing the clustering results. In the algorithm slot, users can find the following elements: trials: A list containing all randomization trials. Each trial includes the dissimilarity matrix with randomized site order, the associated tree, and the cophenetic correlation coefficient for that tree. final.tree: An hclust object representing the final hierarchical tree to be used. final.tree.coph.cor: The cophenetic correlation coefficient between the initial dissimilarity matrix and the final.tree.

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "Simpson") # User-defined number of clusters tree1 <- hclu_hierarclust(dissim, n_clust = 5) tree1 plot(tree1) str(tree1) tree1$clusters # User-defined height cut # Only one height tree2 <- hclu_hierarclust(dissim, cut_height = .05) tree2 tree2$clusters # Multiple heights tree3 <- hclu_hierarclust(dissim, cut_height = c(.05, .15, .25)) tree3$clusters # Mind the order of height cuts: from deep to shallow cuts # Info on each partition can be found in table cluster_info tree3$cluster_info plot(tree3)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html. Associated functions: cut_tree

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Kreft H & Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography 37, 2029-2053. Dapporto L, Ramazzotti M, Fattorini S, Talavera G, Vila R & Dennis, RLH (2013) Recluster: an unbiased clustering procedure for beta-diversity turnover. Ecography 36, 1070--1075. Dapporto L, Ciolli G, Dennis RLH, Fox R & Shreeve TG (2015) A new procedure for extrapolating turnover regionalization at mid-small spatial scales, tested on British butterflies. Methods in Ecology and Evolution 6 , 1287--1297.

hclu_optics

OPTICS hierarchical clustering algorithm

CRAN · 1.4.0 · bioregion/man/hclu_optics.Rd · 2026-05-07

This function performs semi-hierarchical clustering based on dissimilarity using the OPTICS algorithm (Ordering Points To Identify the Clustering Structure).

Aliases

hclu_optics

Usage

hclu_optics( dissimilarity, index = names(dissimilarity)[3], minPts = NULL, eps = NULL, xi = 0.05, minimum = FALSE, show_hierarchy = FALSE, algorithm_in_output = TRUE, ... )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the subsequent column(s) contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

minPts

A numeric value specifying the minPts argument of [dbscan:dbscan]dbscan. minPts is the minimum number of points required to form a dense region. By default, it is set to the natural logarithm of the number of sites in dissimilarity.

eps

A numeric value specifying the eps argument of [dbscan:optics]optics. It defines the upper limit of the size of the epsilon neighborhood. Limiting the neighborhood size improves performance and has no or very little impact on the ordering as long as it is not set too low. If not specified (default behavior), the largest minPts-distance in the dataset is used, which gives the same result as infinity.

A numeric value specifying the steepness threshold to identify clusters hierarchically using the Xi method (see [dbscan:optics]optics).

minimum

A boolean specifying whether the hierarchy should be pruned from the output to only retain clusters at the "minimal" level, i.e., only leaf / non-overlapping clusters. If TRUE, then the argument show_hierarchy should be set to FALSE.

show_hierarchy

A boolean specifying whether the hierarchy of clusters should be included in the output. By default, the hierarchy is not visible in the clusters obtained from OPTICS; it can only be visualized by plotting the OPTICS object. If show_hierarchy = TRUE, the output cluster data.frame will contain additional columns showing the hierarchy of clusters.

algorithm_in_output

A boolean indicating whether the original output of [dbscan:dbscan]dbscan should be returned in the output (TRUE by default, see Value).

...

Additional arguments to be passed to optics() (see [dbscan:optics]optics).

Details

The OPTICS (Ordering points to identify the clustering structure) is a semi-hierarchical clustering algorithm which orders the points in the dataset such that points which are closest become neighbors, and calculates a reachability distance for each point. Then, clusters can be extracted in a hierarchical manner from this reachability distance, by identifying clusters depending on changes in the relative cluster density. The reachability plot should be explored to understand the clusters and their hierarchical nature, by running plot on the output of the function if algorithm_in_output = TRUE: plot(object$algorithm). We recommend reading (Hahsler et al., 2019) to grasp the algorithm, how it works, and what the clusters mean. To extract the clusters, we use the [dbscan:optics]extractXi function which is based on the steepness of the reachability plot (see [dbscan:optics]optics)

Value

Examples

dissim <- dissimilarity(fishmat, metric = "all") clust1 <- hclu_optics(dissim, index = "Simpson") clust1 # Visualize the optics plot (the hierarchy of clusters is illustrated at the # bottom) plot(clust1$algorithm) # Extract the hierarchy of clusters clust1 <- hclu_optics(dissim, index = "Simpson", show_hierarchy = TRUE) clust1

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html. Associated functions: nhclu_dbscan

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software 91, 1--30.

install_binaries

Download, unzip, check permissions, and test the bioregion's binary files

CRAN · 1.4.0 · bioregion/man/install_binaries.Rd · 2026-05-07

This function downloads and unzips the 'bin' folder required to run certain functions of the bioregion package. It also verifies if the files have the necessary permissions to be executed as programs. Finally, it tests whether the binary files are running correctly.

Aliases

install_binaries

Usage

install_binaries( binpath = "tempdir", download_only = FALSE, infomap_version = c("2.1.0", "2.6.0", "2.7.1", "2.8.0"), verbose = TRUE )

Arguments

binpath

A character string specifying the path to the folder that will host the bin folder containing the binary files (see Details).

download_only

A logical value indicating whether the function should only download the bin.zip file or perform the entire process (see Details).

infomap_version

A character vector or a single character string specifying the Infomap version(s) to install.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

By default, the binary files are installed in R's temporary directory (binpath = "tempdir"). In this case, the bin folder will be automatically removed at the end of the R session. Alternatively, the binary files can be installed in the bioregion package folder (binpath = "pkgfolder"). A custom folder path can also be specified. In this case, and only in this case, download_only can be set to TRUE, but you must ensure that the files have the required permissions to be executed as programs. In all cases, PLEASE MAKE SURE to update the binpath and check_install parameters accordingly in netclu_infomap, netclu_louvain, and netclu_oslom.

Value

No return value.

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a1_install_binary_files.html.

Note

Currently, only Infomap versions 2.1.0, 2.6.0, 2.7.1, and 2.8.0 are available.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

map_bioregions

Create a map of bioregions

CRAN · 1.4.0 · bioregion/man/map_bioregions.Rd · 2026-05-07

This plot function can be used to visualize bioregions based on a bioregion.clusters object combined with a spatial object (sf or terra).

Aliases

map_bioregions

Usage

map_bioregions( bioregionalization, map, partition_index = NULL, map_as_output = FALSE, plot = TRUE, clusters = NULL, geometry = NULL, write_clusters = NULL, ... )

Arguments

bioregionalization

A bioregion.clusters object.

map

A spatial object that can be handled by sf or terra. The first attribute or layer should correspond to the sites' ID (see Details).

partition_index

An integer, character, or NULL specifying which bioregionalization's partition(s) to plot. By default (NULL), all partitions are plotted. If an integer or vector of integers is provided, partition(s) are selected by column number(s) in the bioregionalization data.frame (starting from 1 after the ID column). If a character or vector of characters, partition(s) are selected by name(s) matching column names in bioregionalization.

map_as_output

A boolean indicating if the sf data.frame object used for the plot should be returned.

plot

A boolean indicating if the plot should be drawn.

clusters

Deprecated. Use bioregionalization instead. The former bioregionalization has been replaced by partition_index.

geometry

Deprecated. Use map instead.

write_clusters

Deprecated. Use map_as_output instead.

...

Further arguments to be passed to sf::plot().

Details

The site IDs in bioregionalization and map should correspond. They must have the same type (i.e., character if bioregionalization is a bioregion.clusters object), and the sites in bioregionalization should be included among the sites in map. If map is an sf or a SpatVector (terra) object, it should contain an attribute table with the IDs in the first column. If map is a SpatRaster (terra) object, it should contain the IDs in the first layer. If the bioregionalization object contains both types of nodes (sites and species), only site will be mapped. The function automatically filters to site nodes using the node_type attribute. Colors: If the bioregionalization object contains colors (added via bioregion_colors()), these colors will be automatically used for plotting. Otherwise, the default sf color scheme will be applied.

Value

One or several maps of bioregions if plot = TRUE and the sf data.frame object used for the plot if map_as_output = TRUE.

Examples

data(fishmat) data(fishsf) net <- similarity(fishmat, metric = "Simpson") clu <- netclu_greedy(net) mapclu <- map_bioregions(clu, map = fishsf, map_as_output = TRUE, plot = FALSE) # With colors clu_colored <- bioregion_colors(clu) mapclu <- map_bioregions(clu_colored, map = fishsf, map_as_output = TRUE, plot = FALSE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_1_visualization.html. Associated functions: bioregion_colors

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

mat_to_net

Create a data.frame from a contingency table

CRAN · 1.4.0 · bioregion/man/mat_to_net.Rd · 2026-05-07

This function generates a two- or three-column data.frame, where each row represents the interaction between two nodes (e.g., site and species) and an optional third column indicates the weight of the interaction (if weight = TRUE). The input is a contingency table, with rows representing one set of entities (e.g., site) and columns representing another set (e.g., species).

Aliases

mat_to_net

Usage

mat_to_net( mat, weight = FALSE, remove_zeroes = TRUE, include_diag = TRUE, include_lower = TRUE )

Arguments

mat

A contingency table (i.e., a matrix).

weight

A logical value indicating whether the values in the matrix should be interpreted as interaction weights.

remove_zeroes

A logical value determining whether interactions with a weight equal to 0 should be excluded from the output.

include_diag

A logical value indicating whether the diagonal (self-interactions) should be included in the output. This applies only to square matrices.

include_lower

A logical value indicating whether the lower triangular part of the matrix should be included in the output. This applies only to square matrices.

Value

A data.frame where each row represents the interaction between two nodes. If weight = TRUE, the data.frame includes a third column representing the weight of each interaction.

Examples

mat <- matrix(sample(1000, 50), 5, 10) rownames(mat) <- paste0("Site", 1:5) colnames(mat) <- paste0("Species", 1:10) net <- mat_to_net(mat, weight = TRUE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html. Associated functions: net_to_mat

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

net_to_mat

Create a contingency table from a data.frame

CRAN · 1.4.0 · bioregion/man/net_to_mat.Rd · 2026-05-07

This function generates a contingency table from a two- or three-column data.frame, where each row represents the interaction between two nodes (e.g., site and species) and an optional third column indicates the weight of the interaction (if weight = TRUE).

Aliases

net_to_mat

Usage

net_to_mat( net, weight = FALSE, squared = FALSE, symmetrical = FALSE, missing_value = 0 )

Arguments

net

A two- or three-column data.frame where each row represents the interaction between two nodes (e.g., site and species), with an optional third column indicating the weight of the interaction.

weight

A logical value indicating whether the weight column should be considered.

squared

A logical value indicating whether the output matrix should be square (i.e., containing the same nodes in rows and columns).

symmetrical

A logical value indicating whether the resulting matrix should be symmetrical. This applies only if squared = TRUE. Note that different weights associated with opposite pairs already present in net will be preserved.

missing_value

The value to assign to pairs of nodes not present in net. Defaults to 0.

Value

A matrix with the first nodes (from the first column of net) as rows and the second nodes (from the second column of net) as columns. If squared = TRUE, the rows and columns will have the same number of elements, corresponding to the unique union of objects in the first and second columns of net. If squared = TRUE and symmetrical = TRUE, the matrix will be forced to be symmetrical based on the upper triangular part of the matrix.

Examples

net <- data.frame( Site = c(rep("A", 2), rep("B", 3), rep("C", 2)), Species = c("a", "b", "a", "c", "d", "b", "d"), Weight = c(10, 100, 1, 20, 50, 10, 20) ) mat <- net_to_mat(net, weight = TRUE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html. Associated functions: mat_to_net

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

netclu_beckett

Community structure detection in weighted bipartite networks via modularity optimization

CRAN · 1.4.0 · bioregion/man/netclu_beckett.Rd · 2026-05-07

This function takes a bipartite weighted graph and computes modules by applying Newman’s modularity measure in a bipartite weighted version.

Aliases

netclu_beckett

Usage

netclu_beckett( net, weight = TRUE, cut_weight = 0, index = names(net)[3], seed = NULL, forceLPA = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

A data.frame representing a bipartite network with the first two columns representing undirected links between pairs of nodes, and the next column(s) representing the weights of the links.

weight

A boolean indicating whether weights should be considered if there are more than two columns (see Note).

cut_weight

A minimal weight value. If weight is TRUE, links with weights strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

seed

The seed for the random number generator (NULL for random by default).

forceLPA

A boolean indicating whether the even faster pure LPA-algorithm of Beckett should be used. DIRT-LPA (the default) is less likely to get trapped in a local minimum but is slightly slower. Defaults to FALSE.

site_col

The name or number of the column for site nodes (i.e., primary nodes).

species_col

The name or number of the column for species nodes (i.e., feature nodes).

return_node_type

A character indicating which types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

algorithm_in_output

A boolean indicating whether the original output of [bipartite:computeModules]computeModules should be returned in the output (TRUE by default, see Value).

Details

This function is based on the modularity optimization algorithm provided by Stephen Beckett (Beckett, 2016) as implemented in the https://cran.r-project.org/package=bipartitebipartite package ([bipartite:computeModules]computeModules).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. If algorithm_in_output = TRUE, users can find the output of [bipartite:computeModules]computeModules in the algorithm slot.

Examples

net <- data.frame( Site = c(rep("A", 2), rep("B", 3), rep("C", 2)), Species = c("a", "b", "a", "c", "d", "b", "d"), Weight = c(10, 100, 1, 20, 50, 10, 20)) com <- netclu_beckett(net)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html. Associated functions: netclu_infomap netclu_louvain netclu_oslom

Note

Beckett's algorithm is designed to handle weighted bipartite networks. If weight = FALSE, a weight of 1 will be assigned to each pair of nodes. Ensure that the site_col and species_col arguments correctly identify the respective columns for site nodes (primary nodes) and species nodes (feature nodes). The type of nodes returned in the output can be selected using the return_node_type argument: "both" to include both node types, "site" to return only site nodes, or "species" to return only species nodes.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Beckett SJ (2016) Improved community detection in weighted bipartite networks. Royal Society Open Science 3, 140536.

netclu_greedy

Community structure detection via greedy optimization of modularity

CRAN · 1.4.0 · bioregion/man/netclu_greedy.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network via greedy optimization of modularity.

Aliases

netclu_greedy

Usage

netclu_greedy( net, weight = TRUE, cut_weight = 0, index = names(net)[3], bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

The output object from [=similarity]similarity() or [=dissimilarity_to_similarity]dissimilarity_to_similarity(). If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the similarity indices.

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

The name or number for the column of site nodes (i.e. primary nodes).

species_col

The name or number for the column of species nodes (i.e. feature nodes).

return_node_type

A character indicating what types of nodes (site, species or both) should be returned in the output (return_node_type = "both" by default).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_fast_greedy]cluster_fast_greedy should be returned in the output (TRUE by default, see Value).

Details

This function is based on the fast greedy modularity optimization algorithm (Clauset et al., 2004) as implemented in the https://cran.r-project.org/package=igraphigraph package ([igraph:cluster_fast_greedy]cluster_fast_greedy).

Value

A list of class bioregion.clusters with five slots: name: character containing the name of the algorithm args: list of input arguments as provided by the user inputs: list of characteristics of the clustering process algorithm: list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE) clusters: data.frame containing the clustering results In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of [igraph:cluster_fast_greedy]cluster_fast_greedy.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_greedy(net) net_bip <- mat_to_net(comat, weight = TRUE) clust2 <- netclu_greedy(net_bip, bipartite = TRUE)

See also

Note

Although this algorithm was not primarily designed to deal with bipartite network, it is possible to consider the bipartite network as unipartite network (bipartite = TRUE). Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e. feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to both to keep both types of nodes, sites to preserve only the sites nodes and species to preserve only the species nodes.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Clauset A, Newman MEJ & Moore C (2004) Finding community structure in very large networks. Phys. Rev. E 70, 066111.

netclu_infomap

Infomap community finding

CRAN · 1.4.0 · bioregion/man/netclu_infomap.Rd · 2026-05-07

This function finds communities in a (un)weighted (un)directed network based on the Infomap algorithm (https://github.com/mapequation/infomap).

Aliases

netclu_infomap

Usage

netclu_infomap( net, weight = TRUE, cut_weight = 0, index = names(net)[3], seed = NULL, nbmod = 0, markovtime = 1, numtrials = 1, twolevel = FALSE, show_hierarchy = FALSE, directed = FALSE, bipartite_version = FALSE, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", version = "2.8.0", binpath = "tempdir", check_install = TRUE, path_temp = "infomap_temp", delete_temp = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

seed

The seed for the random number generator (NULL for random by default).

nbmod

Penalize solutions the more they differ from this number (0 by default for no preferred number of modules).

markovtime

Scales link flow to change the cost of moving between modules, higher values result in fewer modules (1 by default).

numtrials

For the number of trials before picking up the best solution.

twolevel

A boolean indicating if the algorithm should optimize a two-level partition of the network (FALSE by default for multi-level).

show_hierarchy

A boolean specifying if the hierarchy of community should be identifiable in the outputs (FALSE by default).

directed

A boolean indicating if the network is directed (from column 1 to column 2).

bipartite_version

A boolean indicating if the bipartite version of Infomap should be used (see Note).

bipartite

A boolean indicating if the network is bipartite (see Note).

site_col

The name or number for the column of site nodes (i.e. primary nodes).

species_col

The name or number for the column of species nodes (i.e. feature nodes).

return_node_type

A character indicating what types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

version

A character indicating the Infomap version to use.

binpath

A character indicating the path to the bin folder (see install_binaries and Details).

check_install

A boolean indicating if the function should check that the Infomap has been properly installed (see install_binaries and Details).

path_temp

A character indicating the path to the temporary folder (see Details).

delete_temp

A boolean indicating if the temporary folder should be removed (see Details).

Details

Infomap is a network clustering algorithm based on the Map equation proposed in Rosvall & Bergstrom (2008) that finds communities in (un)weighted and (un)directed networks. This function is based on the C++ version of Infomap (https://github.com/mapequation/infomap/releases). This function needs binary files to run. They can be installed with install_binaries. If you changed the default path to the bin folder while running install_binaries PLEASE MAKE SURE to set binpath accordingly. If you did not use install_binaries to change the permissions and test the binary files PLEASE MAKE SURE to set check_install accordingly. The C++ version of Infomap generates temporary folders and/or files that are stored in the path_temp folder ("infomap_temp" with a unique timestamp located in the bin folder in binpath by default). This temporary folder is removed by default (delete_temp = TRUE). Several versions of Infomap are available in the package. See install_binaries for more details.

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects. clusters: A data.frame containing the clustering results. In the algorithm slot, users can find the following elements: cmd: The command line used to run Infomap. version: The Infomap version. web: Infomap's GitHub repository.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_infomap(net)

See also

Note

Infomap has been designed to deal with bipartite networks. To use this functionality, set the bipartite_version argument to TRUE in order to approximate a two-step random walker (see https://www.mapequation.org/infomap/ for more information). Note that a bipartite network can also be considered as a unipartite network (bipartite = TRUE). In both cases, do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e. feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to "both" to keep both types of nodes, "site" to preserve only the site nodes, and "species" to preserve only the species nodes.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Rosvall M & Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 1118-1123.

netclu_labelprop

Finding communities based on propagating labels

CRAN · 1.4.0 · bioregion/man/netclu_labelprop.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network based on propagating labels.

Aliases

netclu_labelprop

Usage

netclu_labelprop( net, weight = TRUE, cut_weight = 0, index = names(net)[3], seed = NULL, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

seed

The seed for the random number generator (NULL for random by default).

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

The name or number for the column of site nodes (i.e. primary nodes).

species_col

The name or number for the column of species nodes (i.e. feature nodes).

return_node_type

A character indicating what types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_label_prop]cluster_label_prop should be returned in the output (TRUE by default, see Value).

Details

This function is based on propagating labels (Raghavan et al., 2007) as implemented in the https://cran.r-project.org/package=igraphigraph package ([igraph:cluster_label_prop]cluster_label_prop).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, if algorithm_in_output = TRUE, users can find a "communities" object, output of [igraph:cluster_label_prop]cluster_label_prop.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_labelprop(net) net_bip <- mat_to_net(comat, weight = TRUE) clust2 <- netclu_labelprop(net_bip, bipartite = TRUE)

See also

Note

Although this algorithm was not primarily designed to deal with bipartite networks, it is possible to consider the bipartite network as a unipartite network (bipartite = TRUE). Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e. feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to "both" to keep both types of nodes, "site" to preserve only the site nodes, and "species" to preserve only the species nodes.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Raghavan UN, Albert R & Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 036106.

netclu_leadingeigen

Finding communities based on the leading eigenvector of the community matrix

CRAN · 1.4.0 · bioregion/man/netclu_leadingeigen.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network based on the leading eigenvector of the community matrix.

Aliases

netclu_leadingeigen

Usage

netclu_leadingeigen( net, weight = TRUE, cut_weight = 0, index = names(net)[3], bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

The name or number for the column of site nodes (i.e., primary nodes).

species_col

The name or number for the column of species nodes (i.e., feature nodes).

return_node_type

A character indicating what types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_leading_eigen]cluster_leading_eigen should be returned in the output (TRUE by default, see Value).

Details

This function is based on the leading eigenvector of the community matrix (Newman, 2006) as implemented in the https://cran.r-project.org/package=igraphigraph package ([igraph:cluster_leading_eigen]cluster_leading_eigen).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of [igraph:cluster_leading_eigen]cluster_leading_eigen.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_leadingeigen(net) net_bip <- mat_to_net(comat, weight = TRUE) clust2 <- netclu_leadingeigen(net_bip, bipartite = TRUE)

See also

Note

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104.

netclu_leiden

Finding communities using the Leiden algorithm

CRAN · 1.4.0 · bioregion/man/netclu_leiden.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network based on the Leiden algorithm of Traag, van Eck & Waltman.

Aliases

netclu_leiden

Usage

netclu_leiden( net, weight = TRUE, cut_weight = 0, index = names(net)[3], seed = NULL, objective_function = "CPM", resolution_parameter = 1, beta = 0.01, n_iterations = 2, vertex_weights = NULL, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

seed

The random number generator seed (NULL for random by default).

objective_function

A string indicating the objective function to use, either the Constant Potts Model ("CPM") or "modularity" ("CPM" by default).

resolution_parameter

The resolution parameter to use. Higher resolutions lead to smaller communities, while lower resolutions lead to larger communities.

beta

A parameter affecting the randomness in the Leiden algorithm. This affects only the refinement step of the algorithm.

n_iterations

The number of iterations for the Leiden algorithm. Each iteration may further improve the partition.

vertex_weights

The vertex weights used in the Leiden algorithm. If not provided, they will be automatically determined based on the objective_function. Please see the details of this function to understand how to interpret the vertex weights.

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

The name or number for the column of site nodes (i.e., primary nodes).

species_col

The name or number for the column of species nodes (i.e., feature nodes).

return_node_type

A character indicating what types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_leiden]cluster_leiden should be returned in the output (TRUE by default, see Value).

Details

This function is based on the Leiden algorithm (Traag et al., 2019) as implemented in the https://cran.r-project.org/package=igraphigraph package ([igraph:cluster_leiden]cluster_leiden).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of [igraph:cluster_leiden]cluster_leiden.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_leiden(net) net_bip <- mat_to_net(comat, weight = TRUE) clust2 <- netclu_leiden(net_bip, bipartite = TRUE)

See also

Note

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Traag VA, Waltman L & Van Eck NJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports 9, 5233.

netclu_louvain

Louvain community finding

CRAN · 1.4.0 · bioregion/man/netclu_louvain.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network based on the Louvain algorithm.

Aliases

netclu_louvain

Usage

netclu_louvain( net, weight = TRUE, cut_weight = 0, index = names(net)[3], lang = "igraph", resolution = 1, seed = NULL, q = 0, c = 0.5, k = 1, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", binpath = "tempdir", check_install = TRUE, path_temp = "louvain_temp", delete_temp = TRUE, algorithm_in_output = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

The name or number of the column to use as weight. By default, the third column name of net is used.

lang

A string indicating which version of Louvain should be used ("igraph" or "cpp", see Details).

resolution

A resolution parameter to adjust the modularity (1 is chosen by default, see Details).

seed

The random number generator seed (only when lang = "igraph", NULL for random by default).

The quality function used to compute the partition of the graph (modularity is chosen by default, see Details).

The parameter for the Owsinski-Zadrozny quality function (between 0 and 1, 0.5 is chosen by default).

The kappa_min value for the Shi-Malik quality function (it must be > 0, 1 is chosen by default).

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

The name or number for the column of site nodes (i.e., primary nodes).

species_col

The name or number for the column of species nodes (i.e., feature nodes).

return_node_type

A character indicating what types of nodes ("site", "species", or "both") should be returned in the output ("both" by default).

binpath

A character indicating the path to the bin folder (see install_binaries and Details).

check_install

A boolean indicating if the function should check that Louvain has been properly installed (see install_binaries and Details).

path_temp

A character indicating the path to the temporary folder (see Details).

delete_temp

A boolean indicating if the temporary folder should be removed (see Details).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_louvain]cluster_louvain should be returned in the output (TRUE by default, see Value).

Details

Louvain is a network community detection algorithm proposed in (Blondel et al., 2008). This function offers two implementations of the Louvain algorithm (controlled by the lang parameter): the https://cran.r-project.org/package=igraphigraph implementation ([igraph:cluster_louvain]cluster_louvain) and the C++ implementation (https://sourceforge.net/projects/louvain/, version 0.3). The https://cran.r-project.org/package=igraphigraph implementation allows adjustment of the resolution parameter of the modularity function (resolution argument) used internally by the algorithm. Lower values typically yield fewer, larger clusters. The original definition of modularity is recovered when the resolution parameter is set to 1 (by default). The C++ implementation provides several quality functions: q = 0 for the classical Newman-Girvan criterion (Modularity), q = 1 for the Zahn-Condorcet criterion, q = 2 for the Owsinski-Zadrozny criterion (parameterized by c), q = 3 for the Goldberg Density criterion, q = 4 for the A-weighted Condorcet criterion, q = 5 for the Deviation to Indetermination criterion, q = 6 for the Deviation to Uniformity criterion, q = 7 for the Profile Difference criterion, q = 8 for the Shi-Malik criterion (parameterized by k), and q = 9 for the Balanced Modularity criterion. The C++ version is based on version 0.3 (https://sourceforge.net/projects/louvain/). Binary files are required to run it, and can be installed with install_binaries. If you changed the default path to the bin folder while running install_binaries, PLEASE MAKE SURE to set binpath accordingly. If you did not use install_binaries to change the permissions or test the binary files, PLEASE MAKE SURE to set check_install accordingly. The C++ version generates temporary folders and/or files in the path_temp folder ("louvain_temp" with a unique timestamp located in the bin folder in binpath by default). This temporary folder is removed by default (delete_temp = TRUE).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of [igraph:cluster_louvain]cluster_louvain if lang = "igraph" and the following element if lang = "cpp": cmd: The command line used to run Louvain. version: The Louvain version. web: The Louvain's website.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_louvain(net, lang = "igraph")

See also

Note

Although this algorithm was not primarily designed to deal with bipartite networks, it is possible to consider the bipartite network as a unipartite network (bipartite = TRUE). Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e., feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to "both" to keep both types of nodes, "site" to preserve only the site nodes, and "species" to preserve only the species nodes.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Blondel VD, Guillaume JL, Lambiotte R & Mech ELJS (2008) Fast unfolding of communities in large networks. J. Stat. Mech. 10, P10008.

netclu_oslom

OSLOM community finding

CRAN · 1.4.0 · bioregion/man/netclu_oslom.Rd · 2026-05-07

This function finds communities in a (un)weighted (un)directed network based on the OSLOM algorithm (http://oslom.org/, version 2.4).

Aliases

netclu_oslom

Usage

netclu_oslom( net, weight = TRUE, cut_weight = 0, index = names(net)[3], seed = NULL, reassign = "no", r = 10, hr = 50, t = 0.1, cp = 0.5, directed = FALSE, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", binpath = "tempdir", check_install = TRUE, path_temp = "oslom_temp", delete_temp = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

Name or number of the column to use as weight. By default, the third column name of net is used.

seed

For the random number generator (NULL for random by default).

reassign

A character indicating if the nodes belonging to several community should be reassigned and what method should be used (see Note).

The number of runs for the first hierarchical level (10 by default).

The number of runs for the higher hierarchical level (50 by default, 0 if you are not interested in hierarchies).

The p-value, the default value is 0.10. Increase this value if you want more modules.

Kind of resolution parameter used to decide between taking some modules or their union (default value is 0.5; a bigger value leads to bigger clusters).

directed

A boolean indicating if the network is directed (from column 1 to column 2).

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

Name or number for the column of site nodes (i.e. primary nodes).

species_col

Name or number for the column of species nodes (i.e. feature nodes).

return_node_type

A character indicating what types of nodes (site, species, or both) should be returned in the output (return_node_type = "both" by default).

binpath

A character indicating the path to the bin folder (see install_binaries and Details).

check_install

A boolean indicating if the function should check that the OSLOM has been properly installed (see install_binaries and Details).

path_temp

A character indicating the path to the temporary folder (see Details).

delete_temp

A boolean indicating if the temporary folder should be removed (see Details).

Details

OSLOM is a network community detection algorithm proposed in Lancichinetti et al. (2011) that finds statistically significant (overlapping) communities in (un)weighted and (un)directed networks. This function is based on the 2.4 C++ version of OSLOM (http://www.oslom.org/software.htm). This function needs files to run. They can be installed with install_binaries. If you changed the default path to the bin folder while running install_binaries, PLEASE MAKE SURE to set binpath accordingly. If you did not use install_binaries to change the permissions and test the binary files, PLEASE MAKE SURE to set check_install accordingly. The C++ version of OSLOM generates temporary folders and/or files that are stored in the path_temp folder (folder "oslom_temp" with a unique timestamp located in the bin folder in binpath by default). This temporary folder is removed by default (delete_temp = TRUE).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, users can find the following elements: cmd: The command line used to run OSLOM. version: The OSLOM version. web: The OSLOM's web site.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_oslom(net)

See also

Note

Although this algorithm was not primarily designed to deal with bipartite networks, it is possible to consider the bipartite network as unipartite network (bipartite = TRUE). Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e. feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to both to keep both types of nodes, sites to preserve only the sites nodes, and species to preserve only the species nodes. Since OSLOM potentially returns overlapping communities, we propose two methods to reassign the 'overlapping' nodes: randomly (reassign = "random") or based on the closest candidate community (reassign = "simil") (only for weighted networks, in this case the closest candidate community is determined with the average similarity). By default, reassign = "no" and all the information will be provided. The number of partitions will depend on the number of overlapping modules (up to three). The suffix _semel, _bis, and _ter are added to the column names. The first partition (_semel) assigns a module to each node. A value of NA in the second (_bis) and third (_ter) columns indicates that no overlapping module was found for this node (i.e. non-overlapping nodes).

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Lancichinetti A, Radicchi F, Ramasco JJ & Fortunato S (2011) Finding statistically significant communities in networks. PLOS ONE 6, e18961.

netclu_walktrap

Community structure detection via short random walks

CRAN · 1.4.0 · bioregion/man/netclu_walktrap.Rd · 2026-05-07

This function finds communities in a (un)weighted undirected network via short random walks.

Aliases

netclu_walktrap

Usage

netclu_walktrap( net, weight = TRUE, cut_weight = 0, index = names(net)[3], steps = 4, bipartite = FALSE, site_col = 1, species_col = 2, return_node_type = "both", algorithm_in_output = TRUE )

Arguments

net

weight

A boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

A minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (0 by default).

index

Name or number of the column to use as weight. By default, the third column name of net is used.

steps

The length of the random walks to perform.

bipartite

A boolean indicating if the network is bipartite (see Details).

site_col

Name or number for the column of site nodes (i.e. primary nodes).

species_col

Name or number for the column of species nodes (i.e. feature nodes).

return_node_type

A character indicating what types of nodes (site, species, or both) should be returned in the output (return_node_type = "both" by default).

algorithm_in_output

A boolean indicating if the original output of [igraph:cluster_walktrap]cluster_walktrap should be returned in the output (TRUE by default, see Value).

Details

This function is based on random walks (Pons & Latapy, 2005) as implemented in the https://cran.r-project.org/package=igraphigraph package ([igraph:cluster_walktrap]cluster_walktrap).

Value

A list of class bioregion.clusters with five slots: name: A character containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of [igraph:cluster_walktrap]cluster_walktrap.

Examples

comat <- matrix(sample(1000, 50), 5, 10) rownames(comat) <- paste0("Site", 1:5) colnames(comat) <- paste0("Species", 1:10) net <- similarity(comat, metric = "Simpson") com <- netclu_walktrap(net) net_bip <- mat_to_net(comat, weight = TRUE) clust2 <- netclu_walktrap(net_bip, bipartite = TRUE)

See also

Note

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Pons P & Latapy M (2005) Computing Communities in Large Networks Using Random Walks. In Yolum I, Güngör T, Gürgen F, Özturan C (eds.), Computer and Information Sciences - ISCIS 2005, Lecture Notes in Computer Science, 284-293.

nhclu_affprop

Non-hierarchical clustering: Affinity Propagation

CRAN · 1.4.0 · bioregion/man/nhclu_affprop.Rd · 2026-05-07

This function performs non-hierarchical clustering using the Affinity Propagation algorithm.

Aliases

nhclu_affprop

Usage

nhclu_affprop( similarity, index = names(similarity)[3], seed = NULL, p = NA, q = NA, maxits = 1000, convits = 100, lam = 0.9, details = FALSE, nonoise = FALSE, K = NULL, prc = NULL, bimaxit = NULL, exact = NULL, algorithm_in_output = TRUE, verbose = TRUE )

Arguments

similarity

The output object from [=similarity]similarity() or [=dissimilarity_to_similarity]dissimilarity_to_similarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the similarity indices.

index

The name or number of the similarity column to use. By default, the third column name of similarity is used.

seed

The seed for the random number generator used when nonoise = FALSE.

Input preference, which can be a vector specifying individual preferences for each data point. If scalar, the same value is used for all data points. If NA, exemplar preferences are initialized based on the distribution of non-Inf values in the similarity matrix, controlled by q.

If p = NA, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. By default, the median is used. A value between 0 and 1 specifies the sample quantile, where q = 0.5 results in the median.

maxits

The maximum number of iterations to execute.

convits

The algorithm terminates if the exemplars do not change for convits iterations.

lam

The damping factor, a value in the range [0.5, 1). Higher values correspond to heavier damping, which may help prevent oscillations.

details

If TRUE, detailed information about the algorithm's progress is stored in the output object.

nonoise

If TRUE, disables the addition of a small amount of noise to the similarity object, which prevents degenerate cases.

The desired number of clusters. If not NULL, the function [apcluster:apclusterK-methods]apclusterK is called.

prc

A parameter needed when K is not NULL. The algorithm stops if the number of clusters deviates by less than prc percent from the desired value K. Set to 0 to enforce exactly K clusters.

bimaxit

A parameter needed when K is not NULL. Specifies the maximum number of bisection steps to perform. No warning is issued if the number of clusters remains outside the desired range.

exact

A flag indicating whether to compute the initial preference range exactly.

algorithm_in_output

A boolean indicating whether to include the original output of [apcluster:apcluster-methods]apcluster in the result. Defaults to TRUE.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

This function is based on the https://cran.r-project.org/package=apclusterapcluster package ([apcluster:apcluster-methods]apcluster).

Value

Examples

comat_1 <- matrix(sample(0:1000, size = 10*12, replace = TRUE, prob = 1/1:1001), 10, 12) rownames(comat_1) <- paste0("Site", 1:10) colnames(comat_1) <- paste0("Species", 1:12) comat_1 <- cbind(comat_1, matrix(0, 10, 8, dimnames = list(paste0("Site", 1:10), paste0("Species", 13:20)))) comat_2 <- matrix(sample(0:1000, size = 10*12, replace = TRUE, prob = 1/1:1001), 10, 12) rownames(comat_2) <- paste0("Site", 11:20) colnames(comat_2) <- paste0("Species", 9:20) comat_2 <- cbind(matrix(0, 10, 8, dimnames = list(paste0("Site", 11:20), paste0("Species", 1:8))), comat_2) comat <- rbind(comat_1, comat_2) dissim <- dissimilarity(comat, metric = "Simpson") sim <- dissimilarity_to_similarity(dissim) clust1 <- nhclu_affprop(sim) clust2 <- nhclu_affprop(sim, q = 1) # Fixed number of clusters clust3 <- nhclu_affprop(sim, K = 2, prc = 10, bimaxit = 20, exact = FALSE)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html. Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop

Author

Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Frey B & Dueck D (2007) Clustering by Passing Messages Between Data Points. Science 315, 972-976.

nhclu_clara

Non-hierarchical clustering: CLARA

CRAN · 1.4.0 · bioregion/man/nhclu_clara.Rd · 2026-05-07

This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids, implemented via the Clustering Large Applications (CLARA) algorithm.

Aliases

nhclu_clara

Usage

nhclu_clara( dissimilarity, index = names(dissimilarity)[3], seed = NULL, n_clust = c(1, 2, 3), maxiter = 0, initializer = "LAB", fasttol = 1, numsamples = 5, sampling = 0.25, independent = FALSE, algorithm_in_output = TRUE )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

seed

A value for the random number generator (set to NULL for random initialization by default).

n_clust

An integer vector or a single integer specifying the desired number(s) of clusters.

maxiter

An integer defining the maximum number of iterations.

initializer

A character string, either "BUILD" (used in the classic PAM algorithm) or "LAB" (Linear Approximate BUILD).

fasttol

A positive numeric value defining the tolerance for fast swapping behavior. Defaults to 1.

numsamples

A positive integer specifying the number of samples to draw.

sampling

A positive numeric value defining the sampling rate.

independent

A boolean indicating whether the previous medoids are excluded in the next sample. Defaults to FALSE.

algorithm_in_output

A boolean indicating whether the original output of [fastkmedoids:fastclara]fastclara should be included in the output. Defaults to TRUE (see Value).

Details

Based on https://cran.r-project.org/package=fastkmedoidsfastkmedoids package ([fastkmedoids:fastclara]fastclara).

Value

A list of class bioregion.clusters with five components: name: A character string containing the name of the algorithm. args: A list of input arguments as provided by the user. inputs: A list of characteristics of the clustering process. algorithm: A list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE). clusters: A data.frame containing the clustering results. If algorithm_in_output = TRUE, the algorithm slot includes the output of [fastkmedoids:fastclara]fastclara.

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "all") #clust <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)

See also

For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html. Associated functions: nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_pam nhclu_affprop

Author

Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Schubert E & Rousseeuw PJ (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. Similarity Search and Applications 11807, 171-187.

nhclu_clarans

Non-hierarchical clustering: CLARANS

CRAN · 1.4.0 · bioregion/man/nhclu_clarans.Rd · 2026-05-07

This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids, implemented via the Clustering Large Applications based on RANdomized Search (CLARANS) algorithm.

Aliases

nhclu_clarans

Usage

nhclu_clarans( dissimilarity, index = names(dissimilarity)[3], seed = NULL, n_clust = c(1, 2, 3), numlocal = 2, maxneighbor = 0.025, algorithm_in_output = TRUE )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

seed

A value for the random number generator (NULL for random initialization by default).

n_clust

An integer vector or a single integer specifying the desired number(s) of clusters.

numlocal

An integer defining the number of local searches to perform.

maxneighbor

A positive numeric value defining the maximum number of neighbors to consider for each local search.

algorithm_in_output

A boolean indicating whether the original output of [fastkmedoids:fastclarans]fastclarans should be included in the output. Defaults to TRUE (see Value).

Details

Based on https://cran.r-project.org/package=fastkmedoidsfastkmedoids package ([fastkmedoids:fastclarans]fastclarans).

Value

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "all") #clust <- nhclu_clarans(dissim, index = "Simpson", n_clust = 5)

See also

Author

Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Schubert E & Rousseeuw PJ (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. Similarity Search and Applications 11807, 171-187.

nhclu_dbscan

Non-hierarchical clustering: DBSCAN

CRAN · 1.4.0 · bioregion/man/nhclu_dbscan.Rd · 2026-05-07

This function performs non-hierarchical clustering based on dissimilarity using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm.

Aliases

nhclu_dbscan

Usage

nhclu_dbscan( dissimilarity, index = names(dissimilarity)[3], minPts = NULL, eps = NULL, plot = TRUE, algorithm_in_output = TRUE, verbose = TRUE, ... )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

minPts

A numeric vector or a single numeric value specifying the minPts argument of [dbscan:dbscan]dbscan::dbscan(). minPts is the minimum number of points to form a dense region. By default, it is set to the natural logarithm of the number of sites in dissimilarity. See Details for guidance on choosing this parameter.

eps

A numeric vector or a single numeric value specifying the eps argument of [dbscan:dbscan]dbscan::dbscan(). eps specifies how similar points should be to each other to be considered part of a cluster. See Details for guidance on choosing this parameter.

plot

A boolean indicating whether the k-nearest neighbor distance plot should be displayed.

algorithm_in_output

A boolean indicating whether the original output of [dbscan:dbscan]dbscan::dbscan should be included in the output. Defaults to TRUE (see Value).

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

...

Additional arguments to be passed to dbscan() (see [dbscan:dbscan]dbscan::dbscan).

Details

The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm clusters points based on the density of neighbors around each data point. It requires two main arguments: minPts, the minimum number of points to identify a core, and eps, the radius used to find neighbors. Choosing minPts: This determines how many points are necessary to form a cluster. For example, what is the minimum number of sites expected in a bioregion? Choose a value sufficiently large for your dataset and expectations. Choosing eps: This determines how similar sites should be to form a cluster. If eps is too small, most points will be considered too distinct and marked as noise. If eps is too large, clusters may merge. The value of eps depends on minPts. It is recommended to choose eps by identifying a knee in the k-nearest neighbor distance plot. By default, the function attempts to find a knee in this curve automatically, but the result is uncertain. Users should inspect the graph and modify eps accordingly. To explore eps values, run the function initially without defining eps, review the recommendations, and adjust as needed based on clustering results.

Value

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) dissim <- dissimilarity(comat, metric = "all") clust1 <- nhclu_dbscan(dissim, index = "Simpson") clust2 <- nhclu_dbscan(dissim, index = "Simpson", eps = 0.2) clust3 <- nhclu_dbscan(dissim, index = "Simpson", minPts = c(5, 10, 15, 20), eps = c(.1, .15, .2, .25, .3))

See also

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1--30.

nhclu_kmeans

Non-hierarchical clustering: K-means analysis

CRAN · 1.4.0 · bioregion/man/nhclu_kmeans.Rd · 2026-05-07

This function performs non-hierarchical clustering based on dissimilarity using a k-means analysis.

Aliases

nhclu_kmeans

Usage

nhclu_kmeans( dissimilarity, index = names(dissimilarity)[3], seed = NULL, n_clust = c(1, 2, 3), iter_max = 10, nstart = 10, algorithm = "Hartigan-Wong", algorithm_in_output = TRUE )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

seed

A value for the random number generator (NULL for random by default).

n_clust

An integer vector or a single integer value specifying the requested number(s) of clusters.

iter_max

An integer specifying the maximum number of iterations for the k-means method (see [stats:kmeans]kmeans).

nstart

An integer specifying how many random sets of n_clust should be selected as starting points for the k-means analysis (see [stats:kmeans]kmeans).

algorithm

A character specifying the algorithm to use for k-means (see [stats:kmeans]kmeans). Available options are Hartigan-Wong, Lloyd, Forgy, and MacQueen.

algorithm_in_output

A boolean indicating whether the original output of [stats:kmeans]kmeans should be included in the output. Defaults to TRUE (see Value).

Details

This method partitions data into k groups such that the sum of squares of Euclidean distances from points to the assigned cluster centers is minimized. K-means cannot be applied directly to dissimilarity or beta-diversity metrics because these distances are not Euclidean. Therefore, it first requires transforming the dissimilarity matrix using Principal Coordinate Analysis (PCoA) with [ape:pcoa]pcoa, and then applying k-means to the coordinates of points in the PCoA. Because this additional transformation alters the initial dissimilarity matrix, the partitioning around medoids method (nhclu_pam) is preferred.

Value

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) comnet <- mat_to_net(comat) dissim <- dissimilarity(comat, metric = "all") clust <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")

See also

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

nhclu_pam

Non-hierarchical clustering: Partitioning Around Medoids

CRAN · 1.4.0 · bioregion/man/nhclu_pam.Rd · 2026-05-07

This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids (PAM).

Aliases

nhclu_pam

Usage

nhclu_pam( dissimilarity, index = names(dissimilarity)[3], seed = NULL, n_clust = c(1, 2, 3), variant = "faster", nstart = 1, cluster_only = FALSE, algorithm_in_output = TRUE, ... )

Arguments

dissimilarity

The output object from [=dissimilarity]dissimilarity() or [=similarity_to_dissimilarity]similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.

index

The name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

seed

A value for the random number generator (NULL for random by default).

n_clust

An integer vector or a single integer value specifying the requested number(s) of clusters.

variant

A character string specifying the PAM variant to use. Defaults to faster. Available options are original, o_1, o_2, f_3, f_4, f_5, or faster. See [cluster:pam]pam for more details.

nstart

An integer specifying the number of random starts for the PAM algorithm. Defaults to 1 (for the faster variant).

cluster_only

A boolean specifying whether only the clustering results should be returned from the [cluster:pam]pam function. Setting this to TRUE makes the function more efficient.

algorithm_in_output

A boolean indicating whether the original output of [cluster:pam]pam should be included in the result. Defaults to TRUE (see Value).

...

Additional arguments to pass to pam() (see [cluster:pam]pam).

Details

This method partitions the data into the chosen number of clusters based on the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarities between cluster centers (medoids) and points assigned to the cluster. In contrast, k-means minimizes the sum of squared Euclidean distances, which makes it unsuitable for dissimilarity matrices that are not based on Euclidean distances.

Value

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001), 20, 25) rownames(comat) <- paste0("Site",1:20) colnames(comat) <- paste0("Species",1:25) comnet <- mat_to_net(comat) dissim <- dissimilarity(comat, metric = "all") clust <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")

See also

Author

Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com) Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.

similarity

Compute similarity metrics between sites based on species composition

CRAN · 1.4.0 · bioregion/man/similarity.Rd · 2026-05-07

This function generates a data.frame where each row provides one or several similarity metrics between pairs of sites, based on a co-occurrence matrix with sites as rows and species as columns.

Aliases

similarity

Usage

similarity(comat, metric = "Simpson", formula = NULL, method = "prodmat")

Arguments

comat

A co-occurrence matrix with sites as rows and species as columns.

metric

formula

A character vector or a single character string specifying custom formula(s) based on the a, b, c, A, B, and C quantities (see Details). The default is NULL.

method

Details

With a the number of species shared by a pair of sites, b species only present in the first site and c species only present in the second site. Jaccard = 1 - (b + c) / (a + b + c) Jaccardturn = 1 - 2min(b, c) / (a + 2min(b, c)) (Baselga, 2012) Sorensen = 1 - (b + c) / (2a + b + c) Simpson = 1 - min(b, c) / (a + min(b, c)) If abundances data are available, Bray-Curtis and its turnover component can also be computed with the following equation: Bray = 1 - (B + C) / (2A + B + C) Brayturn = 1 - min(B, C) / (A + min(B, C)) (Baselga, 2013) with A the sum of the lesser values for common species shared by a pair of sites. B and C are the total number of specimens counted at both sites minus A. formula can be used to compute customized metrics with the terms a, b, c, A, B, and C. For example formula = c("1 - pmin(b,c) / (a + pmin(b,c))", "1 - (B + C) / (2*A + B + C)") will compute the Simpson and Bray-Curtis similarity metrics, respectively. Note that pmin is used in the Simpson formula because a, b, c, A, B and C are numeric vectors. Euclidean computes the Euclidean similarity between each pair of sites following this equation: Euclidean = 1 / (1 + d_ij) Where d_ij is the Euclidean distance between site i and site j in terms of species composition.

Value

A data.frame with the additional class bioregion.pairwise, containing one or several similarity metrics between pairs of sites. The first two columns represent the pairs of sites. There is one column per similarity metric provided in metric and formula, except for the abc and ABC metrics, which are stored in three separate columns (one for each letter).

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) sim <- similarity(comat, metric = c("abc", "ABC", "Simpson", "Brayturn")) sim <- similarity(comat, metric = "all", formula = "1 - (b + c) / (a + b + c)")

See also

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

References

Baselga A (2012) The Relationship between Species Replacement, Dissimilarity Derived from Nestedness, and Nestedness. Global Ecology and Biogeography 21, 1223--1232. Baselga A (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution 4, 552--557.

similarity_to_dissimilarity

Convert similarity metrics to dissimilarity metrics

CRAN · 1.4.0 · bioregion/man/similarity_to_dissimilarity.Rd · 2026-05-07

This function converts a data.frame of similarity metrics between sites into dissimilarity metrics (beta diversity).

Aliases

similarity_to_dissimilarity

Usage

similarity_to_dissimilarity(similarity, include_formula = TRUE)

Arguments

similarity

The output object from [=similarity]similarity() or [=dissimilarity_to_similarity]dissimilarity_to_similarity().

include_formula

A boolean indicating whether metrics based on custom formula(s) should also be converted (see Details). The default is TRUE.

Value

A data.frame with additional class bioregion.pairwise, providing dissimilarity metric(s) between each pair of sites based on a similarity object.

Examples

comat <- matrix(sample(0:1000, size = 50, replace = TRUE, prob = 1 / 1:1001), 5, 10) rownames(comat) <- paste0("s", 1:5) colnames(comat) <- paste0("sp", 1:10) simil <- similarity(comat, metric = "all") simil dissimilarity <- similarity_to_dissimilarity(simil) dissimilarity

See also

Note

The behavior of this function changes depending on column names. Columns Site1 and Site2 are copied identically. If there are columns called a, b, c, A, B, C they will also be copied identically. If there are columns based on your own formula (argument formula in [=similarity]similarity()) or not in the original list of similarity metrics (argument metrics in [=similarity]similarity()) and if the argument include_formula is set to FALSE, they will also be copied identically. Otherwise there are going to be converted like they other columns (default behavior). If a column is called Euclidean, its distance will be calculated based on the following formula: Euclidean distance = (1 - Euclidean similarity) / Euclidean similarity Otherwise, all other columns will be transformed into dissimilarity with the following formula: dissimilarity = 1 - similarity

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

site_species_metrics

Calculate metrics for sites and species relative to bioregions and chorotypes

CRAN · 1.4.0 · bioregion/man/site_species_metrics.Rd · 2026-05-07

This function computes metrics that quantify how species and sites relate to clusters (bioregions or chorotypes). Depending on the type of clustering, metrics can measure how species are distributed across bioregions (site clusters), how sites relate to chorotypes (species clusters), or both.

Aliases

site_species_metrics

Usage

site_species_metrics( bioregionalization, bioregion_metrics = c("Specificity", "NSpecificity", "Fidelity", "IndVal", "NIndVal", "Rho"), bioregionalization_metrics = "P", data_type = "auto", cluster_on = "site", comat, similarity = NULL, include_cluster = FALSE, index = names(similarity)[3], verbose = TRUE )

Arguments

bioregionalization

A bioregion.clusters object.

bioregion_metrics

A character vector or a single character string specifying the metrics to compute for each cluster. Available metrics depend on the type of clustering (see arg cluster_on): When sites are clustered into bioregions (default case): species-level metrics include "Specificity", "NSpecificity", "Fidelity", "IndVal", "NIndVal", "Rho", and "CoreTerms". Site-level metrics include "Richness", "Rich_Endemics", "Prop_Endemics", "MeanSim", and "SdSim". When species are clustered into chorotypes (e.g., bipartite network clustering): site-level metrics include "Specificity", "NSpecificity", "Fidelity", "IndVal", "NIndVal", "Rho", and "CoreTerms". Use "all" to compute all available metrics. See Details for metric descriptions.

bioregionalization_metrics

A character vector or a single character string specifying summary metrics computed across all clusters. These metrics assess how an entity (species or site) is distributed across the entire bioregionalization, rather than relative to each individual cluster: "P": Participation coefficient measuring how evenly a species or site is distributed across clusters (0 = restricted to one cluster, 1 = evenly spread). "Silhouette": How well a site fits its assigned bioregion compared to the nearest alternative bioregion (requires similarity data). Use "all" to compute all available metrics.

data_type

A character string specifying whether metrics should be computed based on presence/absence ("occurrence") or abundance values ("abundance"). This affects how Specificity, Fidelity, IndVal, Rho and CoreTerms are calculated: "auto" (default): Automatically detected from input data (bioregionalization and/or comat). "occurrence": Metrics based on presence/absence only. "abundance": Metrics weighted by abundance values. "both": Compute both versions of the metrics.

cluster_on

A character string specifying what was clustered in the bioregionalization, which determines what types of metrics can be computed: "site" (default): Sites were clustered into bioregions. Metrics describe how each species is distributed across bioregions. "species": Species were clustered into chorotypes. Metrics describe how each site relates to chorotypes. Only available when species have been assigned to clusters (e.g., bipartite network clustering). "both": Compute metrics for both perspectives. Only available when both sites and species have cluster assignments.

comat

A site-species matrix with sites as rows and species as columns. Values can be occurrence (1/0) or abundance. Required for most metrics.

similarity

A site-by-site similarity object from [=similarity]similarity() or [=dissimilarity_to_similarity]dissimilarity_to_similarity(). Required only for similarity-based metrics ("MeanSim", "SdSim", "Silhouette").

include_cluster

A boolean indicating whether to add an Assigned column in the output, marking TRUE for rows where the site belongs to the bioregion being evaluated. Useful for quickly identifying a site's own bioregion. Default is FALSE.

index

The name or number of the column to use as similarity. By default, the third column name of similarity is used.

verbose

A boolean indicating whether to display progress messages. Set to FALSE to suppress these messages.

Details

This function computes metrics that characterize the relationship between species, sites, and clusters. The available metrics depend on whether you clustered sites (into bioregions) or species (into chorotypes). --- 1. Understanding the two perspectives --- Bioregions are clusters of sites with similar species composition. Chorotypes are clusters of species with similar distributions. In general, the package is designed to cluster sites into bioregions. However, it is possible to group species into clusters. We call these species clusters 'chorotypes', following conceptual definitions in the biogeographical literature, to avoid any confusion in the calculation of metrics. In some cases, such as bipartite network clustering, both species and sites receive the same clusters. We maintain the name distinction in the calculation of metrics - but remember that in this case BIOREGION IDs = CHOROTYPE IDs. The cluster_on argument determines which perspective to use. --- 2. Metrics when sites are clustered (cluster_on = "site" or cluster_on = "both") --- Species-per-bioregion metrics quantify how each species is distributed across bioregions. These metrics are derived from three core terms (https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#metric-componentssee the online vignette for a visual diagram): n_sb: Number of sites in bioregion b where species s is present n_s: Total number of sites in which species s is present. n_b: Total number of sites in bioregion b. Abundance version of these core terms can also be calculated when data_type = "abundance" (or data_type = "auto" and bioregionalization was based on abundance): w_sb: Sum of abundances of species s in sites of bioregion b. w_s: Total abundance of species s. w_b: Total abundance of all species present in sites of bioregion b. The species-per-bioregion metrics are (click on metric names to access formulas): https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#specificity-occurrenceSpecificity: Fraction of a species' occurrences found in a given bioregion (De Cáceres & Legendre 2009). A value of 1 means the species occurs only in that bioregion. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#nspecificity-occurrenceNSpecificity: Normalized specificity that accounts for differences in bioregion size (De Cáceres & Legendre 2009). https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#fidelity-occurrenceFidelity: Fraction of sites in a bioregion where the species occurs (De Cáceres & Legendre 2009). A value of 1 means the species is present in all sites of that bioregion. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#indval-occurrenceIndVal: Indicator Value = Specificity × Fidelity (De Cáceres & Legendre 2009). High values identify species that are both restricted to and frequent within a bioregion. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#nindval-occurrenceNIndVal: Normalized IndVal accounting for bioregion size (De Cáceres & Legendre 2009). https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#rho-occurrenceRho: Standardized contribution index comparing observed vs. expected co-occurrence under random association (Lenormand 2019). CoreTerms: Raw counts (n, n_b, n_s, n_sb) for custom calculations. These metrics can be found in the output slot species_bioregions. Site-per-bioregion metrics characterize sites relative to bioregions: https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#diversity-endemicity-site-metricsRichness: Number of species in the site. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#diversity-endemicity-site-metricsRich_Endemics: Number of species in the site that are endemic to one bioregion. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#diversity-endemicity-site-metricsProp_Endemics: Proportion of endemic species in the site. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#meansimMeanSim: Mean similarity of a site to all sites in each bioregion. https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#sdsimSdSim: Standard deviation of similarity values. These metrics can be found in the output slot site_bioregions. Summary metrics across the whole bioregionalization: These metrics summarize how an entity (species or site) is distributed across all clusters, rather than in relation to each individual cluster. Species-level summary metric: https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#p-occurrence-1P (Participation): Evenness of species distribution across bioregions (Denelle et al. 2020). Found in output slot species_bioregionalization. Site-level summary metric: https://biorgeo.github.io/bioregion/articles/a5_2_summary_metrics.html#silhouetteSilhouette: How well a site fits its assigned bioregion vs. the nearest alternative (Rousseeuw 1987). Found in output slot site_bioregionalization. --- 3. Metrics when species are clustered (cluster_on = "species" or cluster_on = "both") --- Site-per-chorotype metrics quantify how each site relates to species clusters (chorotypes). The same metrics as above (Specificity, Fidelity, IndVal, etc.) can be computed, but their interpretation is inverted. These metrics are based on the following core terms: n_gc: Number of species belonging to chorotype c that are present in site g. n_g: Total number of species present in site g. n_c: Total number of species belonging to chorotype c. Abundance version of these core terms can also be calculated when data_type = "abundance" (or data_type = "auto" and bioregionalization was based on abundance). Their interpretation changes, for example: Specificity: Fraction of a site's species belonging to a chorotype. Fidelity: Fraction of a chorotype's species present in the site. IndVal: Indicator value for site-chorotype associations. P: Evenness of sites across chorotypes

Value

A list containing one or more data.frame elements, depending on the selected metrics and clustering type: When sites are clustered (cluster_on = "site"): species_bioregions: Metrics for each species x bioregion combination (e.g., Specificity, IndVal). One row per species x bioregion pair. species_bioregionalization: Summary metrics for each species across all bioregions (e.g., Participation coefficient). One row per species. site_bioregions: Metrics for each site x bioregion combination (e.g., MeanSim, Richness). One row per site x bioregion pair. site_bioregionalization: Summary metrics for each site (e.g., Silhouette). One row per site. When species are clustered (cluster_on = "species"): site_chorotypes: Metrics for each site x chorotype combination (e.g., Specificity, IndVal). One row per site x chorotype pair. site_chorological: Summary metrics for each site across all chorotypes (e.g., Participation coefficient). One row per site. Note that if bioregionalization contains multiple partitions (i.e., if dim(bioregionalization$clusters) > 2), a nested list will be returned, with one sublist per partition.

Examples

data(fishmat) fishsim <- similarity(fishmat, metric = "Jaccard") bioregionalization <- hclu_hierarclust(similarity_to_dissimilarity(fishsim), index = "Jaccard", method = "average", randomize = TRUE, optimal_tree_method = "best", n_clust = c(1,2,3), verbose = FALSE) ind <- site_species_metrics(bioregionalization = bioregionalization, bioregion_metrics = "all", bioregionalization_metrics = "all", data_type = "auto", cluster_on = "site", comat = fishmat, similarity = fishsim, include_cluster = TRUE, index = 3, verbose = TRUE)

See also

Note

If data_type = "auto", the choice between occurrence- or abundance- based metrics will be determined automatically from the input data, and a message will explain the choice made. Strict matching between entity IDs (site and species IDs) in bioregionalization and in comat / similarity is required.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Boris Leroy (leroy.boris@gmail.com) Pierre Denelle (pierre.denelle@gmail.com)

References

De Cáceres M & Legendre P (2009) Associations between species and groups of sites: indices and statistical inference. Ecology 90, 3566--3574. Denelle P, Violle C & Munoz F (2020) Generalist plants are more competitive and more functionally similar to each other than specialist plants: insights from network analyses. Journal of Biogeography 47, 1922–-1933. Lenormand M, Papuga G, Argagnon O, Soubeyrand M, Alleaume S & Luque S (2019) Biogeographical network analysis of plant species distribution in the Mediterranean region. Ecology and Evolution 9, 237--250. Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53--65.

site_species_subset

Extract a subset of sites or species from a bioregion.clusters object

CRAN · 1.4.0 · bioregion/man/site_species_subset.Rd · 2026-05-07

This function extracts a subset of nodes based on their type ("site" or "species") from a bioregion.clusters object, which contains both types of nodes (sites and species).

Aliases

site_species_subset

Usage

site_species_subset(bioregionalization, node_type = "site", clusters = NULL)

Arguments

bioregionalization

An object of class bioregion.clusters.

node_type

A character string indicating the type of nodes to extract. Possible values are "site" or "species". The default is "site".

clusters

Deprecated. Use bioregionalization instead.

Value

An object of class bioregion.clusters containing only the specified node type (sites or species).

Examples

net <- data.frame( Site = c(rep("A", 2), rep("B", 3), rep("C", 2)), Species = c("a", "b", "a", "c", "d", "b", "d"), Weight = c(10, 100, 1, 20, 50, 10, 20) ) clu <- netclu_louvain(net, lang = "igraph", bipartite = TRUE) clu_sites <- site_species_subset(clu, node_type = "site")

Note

Some bioregion.clusters objects may contain both types of nodes (sites and species). This information is available in the $inputs$node_type slot. This function allows you to extract a specific type of node (either sites or species) from any bioregion.clusters object that includes both.

Author

Maxime Lenormand (maxime.lenormand@inrae.fr) Pierre Denelle (pierre.denelle@gmail.com) Boris Leroy (leroy.boris@gmail.com)

vegedf

Spatial distribution of Mediterranean vegetation (data.frame)

CRAN · 1.4.0 · data · bioregion/man/vegedf.Rd · 2026-05-07

A dataset containing the abundance of 3,697 species in 715 sites.

Aliases

vegedf

Keywords

datasets

Usage

vegedf

Format

A data.frame with 460,878 rows and 3 columns: SiteUnique site identifier (corresponding to the field ID of vegesp) SpeciesUnique species identifier AbundanceSpecies abundance

Source

10.1002/ece3.4718

vegemat

Spatial distribution of Mediterranean vegetation (co-occurrence matrix)

CRAN · 1.4.0 · data · bioregion/man/vegemat.Rd · 2026-05-07

A dataset containing the abundance of each of the 3,697 species in each of the 715 sites.

Aliases

vegemat

Keywords

datasets

Usage

vegemat

Format

A co-occurrence matrix with sites as rows and species as columns. Each element of the matrix represents the abundance of the species in the site.

Source

10.1002/ece3.4718

vegesf

Spatial distribution of Mediterranean vegetation (sf data.frame)

CRAN · 1.4.0 · data · bioregion/man/vegesf.Rd · 2026-05-07

A dataset containing the geometry of the 715 sites.

Aliases

vegesf

Keywords

datasets

Usage

vegesf

Format

A sf data.frame with 715 rows and 2 columns: SiteUnique site identifier geometryGeometry of the site

Source

10.1002/ece3.4718

버전 이력

Repository	Version	Published	First seen	Last seen
CRAN	1.3.0	2026-01-23	2026-05-31	2026-05-31
CRAN	1.2.0	2025-01-31	2026-05-31	2026-05-31
CRAN	1.1.1-1	2024-11-11	2026-05-31	2026-05-31
CRAN	1.1.1	2024-04-19	2026-05-31	2026-05-31
CRAN	1.1.0	2024-03-19	2026-05-31	2026-05-31
CRAN	1.0.0	2023-04-14	2026-05-31	2026-05-31
CRAN	1.4.0		2026-05-31	2026-07-10

보안

표시할 OSV 데이터가 없습니다.

문헌 신호

표시할 OpenAlex 데이터가 없습니다.