Generation of database
- class pyAgrum.BNDatabaseGenerator(bn)
BNDatabaseGenerator is used to easily generate databases from a pyAgrum.BayesNet.
- Parameters
bn (pyAgrum.BayesNet) – the Bayesian network used to generate data.
- bn()
Get the Bayesian network used to generate the samples
- Returns
The Bayesian network
- Return type
- drawSamples(*args)
Generate and stock a database generated by sampling the Bayesian network.
If evs is specified, the samples are stored only if there are compatible with these observations.
Returns the log2likelihood of this database.
- Parameters
nbSamples (int) – the number of samples that will be generated
evs ("pyAgrum.Instantiation" or Dict[intstr,intstr]) – (optional) The evidence that will be observed by the resulting samples.
- Return type
float
Warning
nbSamples is not the size of the database but the number of generated samples. It may happen that the evidence is very rare (or even impossible). In that cas the generated database may have only a few samples (even it may be empty).
Examples
>>> import pyAgrum as gum >>> bn=gum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F<-B') >>> g=gum.BNDatabaseGenerator(bn) >>> g.setRandomVarOrder() >>> g.drawSamples(100,{'B':'yes','E':'1'}) -233.16554130404904 >>> g.to_pandas() D E C B F A 0 1 1 0 yes 1 1 1 1 1 0 yes 1 0 2 1 1 1 yes 0 1 3 1 1 0 yes 0 0 4 1 1 0 yes 0 1 5 1 1 0 yes 1 0 6 1 1 0 yes 0 0 7 0 1 1 yes 1 1 8 1 1 0 yes 0 1 9 0 1 0 yes 1 1 10 1 1 0 yes 1 1
- log2likelihood()
Get the log2likelihood of the generated database
- Raises
pyAgrum.OperationNotAllowed – if nothing has been sampled yet (using gum.BNDatabaseGenerator.drawSamples() for instance)
- Returns
the log2likelihood
- Return type
float
- samplesAt(row, col)
Get the value of the database in (row,col)
- Parameters
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns
the index of the modality of the variable in this position
- Return type
int
- samplesLabelAt(row, col)
Get the label of the database in (row,col)
- Parameters
row (int) – the row
col (int) – the column (using the ordered list of variables)
- Returns
the label of the modality of the variable in this position
- Return type
str
- samplesNbCols()
return the number of columns in the samples
- Return type
int
- samplesNbRows()
return the number of rows in the samples
- Return type
int
- setAntiTopologicalVarOrder()
Select an anti-topological order for the variables in the database.
- Return type
None
- setRandomVarOrder()
Select an random order for the variables in the database.
- Return type
None
- setTopologicalVarOrder()
Select a topological order for the variables in the database.
- Return type
None
- setVarOrder(*args)
Set a specific order with a list of names
- Parameters
vars (List[str]) – order specified by the list of variable names.
- Return type
None
- setVarOrderFromCSV(*args)
Set the same order than in a csv file
- Parameters
filename (str) – the name of the CSV file
- Return type
None
- toCSV(*args)
generates csv representing the generated database.
- Parameters
csvFilename (str) – the name of the csv file
useLabels (bool) – whether label or id in the csv file (default true)
append (bool) – append in the file or rewrite the file (default false)
csvSeparator (str) – separator in the csv file (default ‘,’)
- Return type
None
- to_pandas(with_labels=True)
export the samples as a pandas.DataFrame.
- Parameters
with_labels (bool) – is the DataFrame full of labels of variables or full of index of labels of variables
- varOrder()
The actual order for the variable (as a tuple of NodeId)
- Returns
the tuple of NodeId
- Return type
Tuple[int]
- varOrderNames()
The actual order for the variable (as a tuple of NodeId)
- Returns
the tuple of names
- Return type
Tuple[str]