Skip to content

[FEATURE] Integrate with geopandas for Native Geospatial Data Handling #530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
10 tasks
amrutha97 opened this issue Mar 24, 2025 · 1 comment · May be fixed by #733
Open
10 tasks

[FEATURE] Integrate with geopandas for Native Geospatial Data Handling #530

amrutha97 opened this issue Mar 24, 2025 · 1 comment · May be fixed by #733
Labels
enhancement New feature or request

Comments

@amrutha97
Copy link
Member

Goal

Add built-in support for reading, transforming, and displaying geospatial data via the geopandas library, making it easier to build map-based dashboards and spatial analytics apps with Preswald.


📌 Motivation

Preswald users are increasingly working with geospatial datasets—such as shapefiles, GeoJSON, and spatial CSVs—but currently must manually transform and flatten geometry fields before displaying them.

By natively integrating with geopandas, Preswald can:

  • Seamlessly support geospatial file formats
  • Simplify loading and preprocessing of spatial data
  • Prepare data for use with upcoming geo() component or Plotly maps
  • Unlock use cases in real estate, environment, logistics, and more

✅ Acceptance Criteria

  • Add geopandas as a supported backend dependency (pip install geopandas)
  • Automatically use geopandas.read_file() when:
    • type = "geojson" or type = "shapefile" in preswald.toml
  • Handle loading of:
    • .geojson, .shp, .gpkg
  • Convert GeoDataFrame to a regular DataFrame with flattened geometry (WKT or GeoJSON format)
  • Add flatten_geometry = true|false toggle in data config
  • Ensure compatibility with get_df() and downstream components (table(), plotly(), etc.)
  • Raise informative errors if geopandas is missing or file path is invalid

🛠 Implementation Plan

1. Update Data Loader in data.py

import geopandas as gpd

def load_geospatial_source(config):
    df = gpd.read_file(config["path"])
    
    if config.get("flatten_geometry", True):
        df["geometry"] = df["geometry"].apply(lambda g: g.__geo_interface__)
    
    return df

Detect .geojson, .shp, .gpkg, or type = "geojson" in preswald.toml.

2. Example preswald.toml

[data.city_boundaries]
type = "geojson"
path = "data/cities.geojson"
flatten_geometry = true

🧪 Testing Plan

  • Load sample .geojson and .shp files
  • Confirm connect() and get_df() return valid DataFrame
  • Use table(df) and plotly() to inspect spatial columns
  • Test behavior with and without flatten_geometry

📚 Docs To Update

  • docs/configuration.mdx → Add type = "geojson"/shapefile + flatten_geometry
  • docs/sdk/geo.mdx (future) → Add examples using geometry column
  • Add note about installing geopandas via extras:
    pip install preswald[geo]

🧩 Related Files

  • preswald/engine/managers/data.py
  • preswald.toml
  • Optional sample: examples/earthquakes.geojson

🔮 Future Enhancements

  • Detect and reproject coordinates (.to_crs())
  • Add spatial filter DSL (where geometry intersects...)
  • Integrate with geo() map-rendering component
  • Support live streaming geospatial data
@amrutha97 amrutha97 added the enhancement New feature or request label Mar 24, 2025
@sanjai-11
Copy link

I'd love to work on this issue. Assign it to me if it's still available!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants