Modeling Disruptions in the MTA Subway - Open Source Software News

Create a free graph database instance in Neo4j AuraDB

Roughly 3.6 million Americans ride the New York City subway each day. That’s 1.3 million more people than who take flights every day in the United States. The New York City Metropolitan Transportation Authority (MTA) subway system is unique in the number of riders, stations, and the level of service it provides.

Using Aura Graph Analytics, we can easily model what would happen if a station was fully closed for repairs. Insights like this can apply not just to transport systems but to supply chains, manufacturing processes, and much more.

For supply chains, imagine if a particular vendor was hit with tariffs or, even worse, went out of business. Which alternate supplier path gets your product to market fastest with the least disruption? Using the same shortest-path techniques demonstrated in this blog, you can quickly evaluate the next best route and keep operations running smoothly.

Aura Graph Analytics works with any enterprise data. In this example, we’re going to load data from Snowflake into dataframes, create a graph projection, and run algorithms – all without having to move our data into AuraDB!

Whether it’s passengers or products, understanding and adapting the paths through your network is key to resilience.

Getting Started

We’re going to work in a Google Colab notebook, but you can run this in any Python environment. First, we need to install and load the necessary packages:

!pip install graphdatascience

from graphdatascience.session import GdsSessions, AuraAPICredentials, DbmsConnectionInfo, AlgorithmCategory
from datetime import timedelta
import pandas as pd
import os
from google.colab import userdata

Create a Session

Next, we set up a session, first by loading in our secrets:

CLIENT_ID = userdata.get("CLIENT_ID")
CLIENT_SECRET = userdata.get("CLIENT_SECRET")
TENANT_ID = userdata.get("TENANT_ID")

Then by establishing a session:

from graphdatascience.session import GdsSessions, AuraAPICredentials, AlgorithmCategory, CloudLocation
from datetime import timedelta

sessions = GdsSessions(api_credentials=AuraAPICredentials(CLIENT_ID, CLIENT_SECRET, TENANT_ID))

name = "my-new-session-subway"
memory = sessions.estimate(
    node_count=20,
    relationship_count=50,
    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)
cloud_location = CloudLocation(provider="gcp", region="europe-west1")

gds = sessions.get_or_create(
    session_name=name,
    memory=memory,
    ttl=timedelta(hours=5),
    cloud_location=cloud_location,
)

Load Data From Snowflake

Load this data into Snowflake or directly into your Python environment.

A key advantage of Aura Graph Analytics is that you don’t need to store your data in AuraDB to use it. In our case, we’ll load data from Snowflake into Python dataframes. Let’s start by downloading the snowflake-connector-python package:

!pip install snowflake-connector-python

Then we create a connection to Snowflake:

import pandas as pd
import snowflake.connector

SNOWFLAKE_USER = userdata.get("snowflake_user")
SNOWFLAKE_PASSWORD = userdata.get("snowflake_password")
SNOWFLAKE_ACCOUNT = userdata.get("snowflake_account")


# Replace with your credentials
conn = snowflake.connector.connect(
    user= SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,  
    warehouse='GDSONSNOWFLAKE',
    database='MTA',
    schema='PUBLIC',
)

And return our two tables as Python dataframes:

cur = conn.cursor()
cur.execute("SELECT * FROM LINES")
lines = cur.fetch_pandas_all()
cur.close()
lines

STARTING_STATION	NEXT_STATION	RELATIONSHIPTYPE
0	1	GOES_TO
1	2	GOES_TO
2	3	GOES_TO
3	4	GOES_TO

cur = conn.cursor()
cur.execute("SELECT * FROM stations")
stations = cur.fetch_pandas_all()
cur.close()
stations

STATION_NAME	ID
Van Cortlandt Park-242 – Bx	0
238 St – Bx	1
231 St – Bx	2
Marble Hill-225 St – M	3
215 St – M	4

Creating a Projection

We need to do some mild cleanup to make sure everything has the right names.

For the dataframe representing nodes:

The first column should be called nodeId.
There can be no characters, so we have to drop the station names.

stations = stations.rename(columns={'id': 'nodeId'})
nodes = stations[['nodeId']]
nodes

For the dataframe representing relationships, we need to have columns called sourceNodeId and targetNodeId:

lines2 = lines.rename(
    columns={
        'STARTING_STATION' : 'targetNodeId',
        'NEXT_STATION' : 'sourceNodeId'
    }
)

lines = lines[['targetNodeId', 'sourceNodeId']]
lines

Graph Construct

Using graph.construct, we can easily create a projection:

graph_name = "subways"

if gds.graph.exists(graph_name)["exists"]:
    # Drop the graph if it exists
    gds.graph.drop(graph_name)
    print(f"Graph '{graph_name}' dropped.")
    
G = gds.graph.construct("subways", nodes, lines)

We’ll use Dijkstra shortest path to see how we can move through the system efficiently. We can create a simple wrapper function below, so that we can use the names of stations rather than their nodeIds:

station_crosswalk = dict(zip(stations['STATION_NAME'], stations['nodeId']))

# Function to get the node IDs from station names and run Dijkstra
def get_shortest_path(source_station, target_station, G):
    # Map the station names to node IDs
    source_node_id = station_crosswalk.get(source_station)
    target_node_id = station_crosswalk.get(target_station)

    result = gds.shortestPath.dijkstra.stream(
          G,
          sourceNode=source_node_id,
          targetNode=target_node_id
      )
    node_ids = result['nodeIds'][0]
    id_to_station = {v: k for k, v in station_crosswalk.items()}
    ordered_subset = {id_to_station[i]: i for i in node_ids if i in id_to_station}
    return ordered_subset

Let’s see how to get from Grand Army Plaza in Brooklyn to Times Square:

# Example usage
# Assuming 'G' is your graph
source_station = "Grand Army Plaza - Bk"
target_station = "Times Sq-42 St - M"

# Call the function
path = get_shortest_path(source_station, target_station, G)
path

This returns:

{'Grand Army Plaza - Bk': 69,
 'Bergen St - Bk': 68,
 'Atlantic Av-Barclays Ctr - Bk': 67,
 'Canal St - M': 32,
 '14 St-Union Sq - M': 104,
 '34 St-Herald Sq - M': 230,
 'Times Sq-42 St - M': 24}

Modeling Disruptions

But what if one of those stations closed? What would be the quickest path there? Let’s see what would happen if Herald Square was closed:

def exclude_node(nodes_df, lines_df, node_to_exclude):
    closed = nodes_df[nodes_df['nodeId'] != node_to_exclude]
    closed_lines = lines_df[
        (lines_df['sourceNodeId'] != node_to_exclude) &
        (lines_df['targetNodeId'] != node_to_exclude)
    ]
    return closed, closed_lines

closed_nodes, closed_lines = exclude_node(nodes, lines, 230)

We then need to create a new projection without Herald Square:

graph_name = "exclude"

if gds.graph.exists(graph_name)["exists"]:
    # Drop the graph if it exists
    gds.graph.drop(graph_name)
    print(f"Graph '{graph_name}' dropped.")

G = gds.graph.construct(graph_name, closed_nodes, closed_lines)

Then we rerun our algorithm:

# Example usage
# Assuming 'G' is your graph
source_station = "Grand Army Plaza - Bk"
target_station = "Times Sq-42 St - M"

# Call the function
path = get_shortest_path(source_station, target_station, G)
path

Which returns:

{'Grand Army Plaza - Bk': 69,
 'Bergen St - Bk': 68,
 'Atlantic Av-Barclays Ctr - Bk': 67,
 'Canal St - M': 32,
 'Chambers St - M': 34,
 '14 St - M': 29,
 '34 St-Penn Station - M': 25,
 'Times Sq-42 St - M': 24}

We can see that this is a slightly longer path than before!

Finally, we end our session:

sessions.delete(session_name="my-new-session-subway")

And with that, you can see how to run graph algorithms against any enterprise data, and how to model disruptions!

Summary and Next Steps

You’ve seen how to run graph algorithms against any enterprise data and how to model disruptions.

So now that you’ve got a solid grasp on modeling disruptions (whether that be on the subway or otherwise), head over to our GitHub repo for step-by-step instructions on how to do it yourself with Neo4j Aura Graph Analytics. You’ll find a Colab notebook, the full dataset, and everything you need to get started.

Prefer working in Snowflake? You can run the same example there using Neo4j Graph Analytics for Snowflake.

Resources

This article first appeared on Read More