Information Limits for Community Detection in Hypergraph with Label Information
In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides w...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/5784e698c94c4263b71c7c06a7aa7e56 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Sumario: | In network data mining, community detection refers to the problem of partitioning the nodes of a network into clusters (communities). This is equivalent to identifying the cluster label of each node. A label estimator is said to be an exact recovery of the true labels (communities) if it coincides with the true labels with a probability convergent to one. In this work, we consider the effect of label information on the exact recovery of communities in an <i>m</i>-uniform Hypergraph Stochastic Block Model (HSBM). We investigate two scenarios of label information: <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></semantics></math></inline-formula> a noisy label for each node is observed independently, with <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>−</mo><msub><mi>α</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula> as the probability that the noisy label will match the true label; <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></semantics></math></inline-formula> the true label of each node is observed independently, with the probability of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>1</mn><mo>−</mo><msub><mi>α</mi><mi>n</mi></msub></mrow></semantics></math></inline-formula>. We derive sharp boundaries for exact recovery under both scenarios from an information-theoretical point of view. The label information improves the sharp detection boundary if and only if <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>α</mi><mi>n</mi></msub><mo>=</mo><msup><mi>n</mi><mrow><mo>−</mo><mi>β</mi><mo>+</mo><mi>o</mi><mo>(</mo><mn>1</mn><mo>)</mo></mrow></msup></mrow></semantics></math></inline-formula> for a constant <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>β</mi><mo>></mo><mn>0</mn></mrow></semantics></math></inline-formula>. |
---|