Complete Enumeration of BN-doped Polycyclic Aromatic Hydrocarbon Libraries in the MolDis repository
The MolDis repository under development at TIFR Centre for Interdisciplinary Sciences aims to provide an analytics platform for Big Data of computed molecular properties. Presently massively large datasets are being generated for a multitude of domains of application. The level-1 phase of data generation involves combinatorial enumeration of all possible molecules satisfying a few design rules. Starting with all possible plane-filling polycyclic aromatic hydrocarbons (PAHs), we have enumerated all possible doped analogues by substituting pairs of C atoms with B and N atom pairs. I will discuss the mathematics of this approach based on Polya enumeration theorem and show how a single PAH with six benzene rings can be doped into 241,813,226,150 different molecules providing a continuous spectrum of band-gap and other properties.