Information Limits for Community Detection in Hypergraph with Label Information

In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides w...

Description complète

Enregistré dans:
Détails bibliographiques
Auteurs principaux: Xiaofeng Zhao, Wei Zhao, Mingao Yuan
Format: article
Langue:EN
Publié: MDPI AG 2021
Sujets:
Accès en ligne:https://doaj.org/article/5784e698c94c4263b71c7c06a7aa7e56
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Description
Résumé:In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides with the true labels with a probability convergent to one. In this work, we consider the effect of label information on the exact recovery of communities in an <i>m</i>-uniform Hypergraph Stochastic Block Model (HSBM). We investigate two scenarios of label information: <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></semantics></math></inline-formula> a noisy label for each node is observed independently, with <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>−</mo><msub><mi>α</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> as the probability that the noisy label will match the true label; <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></semantics></math></inline-formula> the true label of each node is observed independently, with the probability of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>−</mo><msub><mi>α</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula>. We derive sharp boundaries for exact recovery under both scenarios from an information-theoretical point of view. The label information improves the sharp detection boundary if and only if <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>α</mi><mi>n</mi></msub><mo>=</mo><msup><mi>n</mi><mrow><mo>−</mo><mi>β</mi><mo>+</mo><mi>o</mi><mo>(</mo><mn>1</mn><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula> for a constant <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>β</mi><mo>></mo><mn>0</mn></mrow></semantics></math></inline-formula>.