Inheriting hSBM from https://github.com/martingerlach/hSBM_Topicmodel extends it to tripartite networks (aka supervised topic models)
The idea is to run SBM-based topic modeling on networks given keywords on documents
python3 -m pip install . -vv
conda install -c conda-forge nsbm
from nsbm import nsbm
import pandas as pd
import numpy as np
df = pd.DataFrame(
index = ["w{}".format(w) for w in range(1000)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 100, 250000).reshape((1000, 250)))
df_key_list = []
## keywords
df_key_list.append(
pd.DataFrame(
index = ["keyword{}".format(w) for w in range(100)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 10, (100, 250)))
)
## authors
df_key_list.append(
pd.DataFrame(
index = ["author{}".format(w) for w in range(10)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 5, (10, 250)))
)
## other features
df_key_list.append(
pd.DataFrame(
index = ["feature{}".format(w) for w in range(25)],
columns = ["doc{}".format(d) for d in range(250)],
data = np.random.randint(1, 5, (25, 250)))
)
model = nsbm()
model.make_graph_multiple_df(df, df_key_list)
model.fit(n_init=1, B_min=50, verbose=False)
model.save_data()
docker run -it -u jovyan -v $PWD:/home/jovyan/work -p 8899:8888 docker.pkg.github.com/fvalle1/trisbm/trisbm:latest
If a graph.xml.gz file is found in the current dir the analysis will be performed on it.
python3 tests/run_tests.py
Please check this stuff in your data:
- there should be no zero-degree nodes (all nodes should have at least one link)
- there shouldn't be any duplicate node
- The
make_form_BoW_df
function discretises the data
See LICENSE.
This work is in part based on sbmtm
This package depends on graph-tool