MatGraphDB is a Python package designed to simplify graph-based data management and analysis in materials and molecular science. It enables researchers to efficiently transform complex theoretical data into structured graph representations, leveraging:
- High-performance storage: Utilizes Apache Parquet for scalable and rapid data access.
- Automated workflows: Converts theoretical and computational data into graph structures.
- Robust data operations: Offers comprehensive CRUD functionality and custom generators to maintain consistent relationships between entities.
Check out the docs
pip install matgraphdb
git clone
cd MatGraphDB
pip install -e .
from matgraphdb import MatGraphDB
# Initialize MatGraphDB
mgdb = MatGraphDB(storage_path="MatGraphDB")
You can add any material to the database by either providing a structure
or coords
, species
, and lattice
, then using the create_material
or create_materials
Any material add to the database gets indexed. This is stored in the id
from pymatgen.core import Structure
# Add material to the database
material_data_1 = {
"structure": structure,
"properties": {
"material_id": "mp-1",
"source": "example",
"thermal_conductivity": {"value": 2.5, "unit": "W/mK"},
# or by coords, species, lattice
material_data_2 = {
"coords": [[0, 0, 0], [0.5, 0.5, 0.5]],
"species": ["Mg", "O"],
"lattice": [[0, 2.13, 2.13], [2.13, 0, 2.13], [2.13, 2.13, 0]],
"properties": {
"material_id": "mp-2",
"source": "example_manual",
"band_gap": {"value": 1.2, "unit": "eV"},
result = mgdb.create_material(
# Add material by structure
# Add multiple materials
To read materials from the database, you can use the read_materials
function. This function takes in a columns
parameter, which specifies the columns to read from the database. The filters
parameter specifies the filters to apply to the database. This will only read the matched materials to memory.
materials = mgdb.read_materials(
columns=["material_id", "elements", "band_gap.value"],
filters=[pc.field("band_gap.value") == 1.2]
To update materials in the database, you can use the update_materials
function. For updates you must provide the id
of the material you want to update. You can also provide the update_keys
parameter to specify the columns to update on as well, this is usefull if you import an existing dataset from another database.
update_data = [
"id": 0,
"band_gap": {"value": 3.6, "unit": "eV"},
materials = mgdb.update_materials(update_data)
update_data = [
"material_id": "mp-1",
"band_gap": {"value": 3.6, "unit": "eV"},
materials = mgdb.update_materials(update_data, update_keys=["material_id"])
To delete materials from the database, you can use the delete_materials
function. You can provide a list of ids
to delete.
materials = mgdb.delete_materials(ids=[0])
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
This project is licensed under the MIT License. See the LICENSE file for details.
Logan Lang, Aldo Romero, Eduardo Hernandez,
Note: This project is in its initial stages. Features and APIs are subject to change. Please refer to the latest documentation and release notes for updates.