Generation of database

class pyAgrum.BNDatabaseGenerator(bn)

BNDatabaseGenerator is used to easily generate databases from a pyAgrum.BayesNet.

Parameters: bn (pyAgrum.BayesNet) – the Bayesian network used to generate data.

bn()

Get the Bayesian network used to generate the samples

Returns: The Bayesian network
Return type: pyAgrum.BayesNet

drawSamples(*args)

Generate and stock a database generated by sampling the Bayesian network.

If evs is specified, the samples are stored only if there are compatible with these observations.

Returns the log2likelihood of this database.

Parameters

nbSamples (int) – the number of samples that will be generated
evs ("pyAgrum.Instantiation" or Dict[intstr,intstr]) – (optional) The evidence that will be observed by the resulting samples.

Return type

float

Warning

nbSamples is not the size of the database but the number of generated samples. It may happen that the evidence is very rare (or even impossible). In that cas the generated database may have only a few samples (even it may be empty).

Examples

>>> import pyAgrum as gum
>>> bn=gum.fastBN('A->B{yes|maybe|no}<-C->D->E<-F<-B')
>>> g=gum.BNDatabaseGenerator(bn)
>>> g.setRandomVarOrder()
>>> g.drawSamples(100,{'B':'yes','E':'1'})
-233.16554130404904
>>> g.to_pandas()
    D  E  C    B  F  A
0   1  1  0  yes  1  1
1   1  1  0  yes  1  0
2   1  1  1  yes  0  1
3   1  1  0  yes  0  0
4   1  1  0  yes  0  1
5   1  1  0  yes  1  0
6   1  1  0  yes  0  0
7   0  1  1  yes  1  1
8   1  1  0  yes  0  1
9   0  1  0  yes  1  1
10  1  1  0  yes  1  1

log2likelihood()

Get the log2likelihood of the generated database

Raises: pyAgrum.OperationNotAllowed – if nothing has been sampled yet (using gum.BNDatabaseGenerator.drawSamples() for instance)
Returns: the log2likelihood
Return type: float

samplesAt(row, col)

Get the value of the database in (row,col)

Parameters

row (int) – the row
col (int) – the column (using the ordered list of variables)

Returns

the index of the modality of the variable in this position

Return type

int

samplesLabelAt(row, col)

Get the label of the database in (row,col)

Parameters

row (int) – the row
col (int) – the column (using the ordered list of variables)

Returns

the label of the modality of the variable in this position

Return type

str

samplesNbCols()

return the number of columns in the samples

Return type: int

samplesNbRows()

return the number of rows in the samples

Return type: int

setAntiTopologicalVarOrder()

Select an anti-topological order for the variables in the database.

Return type: None

setRandomVarOrder()

Select an random order for the variables in the database.

Return type: None

setTopologicalVarOrder()

Select a topological order for the variables in the database.

Return type: None

setVarOrder(*args)

Set a specific order with a list of names

Parameters: vars (List[str]) – order specified by the list of variable names.
Return type: None

setVarOrderFromCSV(*args)

Set the same order than in a csv file

Parameters: filename (str) – the name of the CSV file
Return type: None

toCSV(*args)

generates csv representing the generated database.

Parameters

csvFilename (str) – the name of the csv file
useLabels (bool) – whether label or id in the csv file (default true)
append (bool) – append in the file or rewrite the file (default false)
csvSeparator (str) – separator in the csv file (default ‘,’)

Return type

None

to_pandas(with_labels=True)

export the samples as a pandas.DataFrame.

Parameters: with_labels (bool) – is the DataFrame full of labels of variables or full of index of labels of variables

varOrder()

The actual order for the variable (as a tuple of NodeId)

Returns: the tuple of NodeId
Return type: Tuple[int]

varOrderNames()

The actual order for the variable (as a tuple of NodeId)

Returns: the tuple of names
Return type: Tuple[str]