Artículo Científico presentado por Diego Huerta Ocaña [diego.huertaoa@udlap.mx]

Miembro del Programa de Honores. Licenciatura en Ciencia de Datos. Departamento de Actuaría, Física y Matemáticas. Escuela de Ciencias, Universidad de las Américas Puebla.

Jurado Calificador

Director: Dr. Gerardo Arizmendi Echegaray
Presidente: Dr. Hugo Villanueva Méndez
Secretario: Dr. Freddy Palma Mancilla

Cholula, Puebla, México a 19 de noviembre de 2024.

Abstract

In this work, we present a probabilistic model for directed graphs where nodes have attributes and labels. This model serves as a generative classifier capable of predicting the labels of unseen nodes using either maximum likelihood or maximum a posteriori estimations. The predictions made by this model are highly interpretable, contrasting with some common methods for node classification, such as graph neural networks. We applied the model to two datasets, demonstrating predictive performance that is competitive with, and even superior to, state-of-the-art methods. One of the datasets considered is adapted from the Math Genealogy Project, which has not previously been utilized for this purpose. Consequently, we evaluated several classification algorithms on this dataset to compare the performance of our model and provide benchmarks for this new resource.

Keywords: Probability, Machine Learnings, Graphs, Networks, Node Classification.

Table of content

Chapter 1. Introduction

Chapter 2. Preliminaries

  • 2.1 Probability theory
  • 2.2 Machine Learning Models for Classification

Chapter 3. Model

Chapter 4. Parameter estimation

Chapter 5. Node Classification

  • 5.1 Prediction over a single node
  • 5.2 Prediction over several nodes

Chapter 6. Math Genealogy Project

  • 6.1 Learning task
  • 6.2 Application of the model
  • 6.3 Baselines for subject classification
  • 6.4 Results on the Math Genealogy Project Dataset
  • 6.5 Prediction example

Chapter 7. Ogbn-arxiv dataset

  • 7.1 Learning task
  • 7.2 Application of the model
  • 7.3 Results on the ogbn-arxiv dataset

Chapter 8. Conclusion

References

Huerta Ocaña, D. 2024. A Probabilistic Model for Node Classification in Directed Graphs. Artículo Científico Licenciatura. Ciencia de Datos. Departamento de Actuaría, Física y Matemáticas, Escuela de Ciencias, Universidad de las Américas Puebla. Noviembre. Derechos Reservados © 2024.