VR score

This computes the VR score from pluripotency and velocity metrics

Data

Load Cell x Metadata dataset, where the columns must contain a two-dimensional embedding, sample/condition and cluster metadata, as well as cellular pluripotency and trancriptomic rate of change metrics.

For the later two, we suggest using CCAT and RNA velocity vector lengths respectively.

Additional metadata layers, such as cell-cycle labels or key gene expression values, can also be added to the dataset and used for annotating the resulting landscapes in later steps, but they will not be used for the computation of VR scores.

Example Data: CRC organoids

Data from Qin & Cardoso Rodriguez et al. 2023.

base_dir = os.path.dirname(sys.path[0])
input_dir = f"{base_dir}/data/input"
output_dir = f"{base_dir}/data/output"

Relevant information and computation parameters are stored as a dictionary.

param_dict = {
    "input_csv":"md_QinCardoso23.csv",
    "dim_names":["phate1","phate2"],
    "potency":"CCAT",
    "dynamic":"velocity_length",
    "cluster":"curatedCLUST",
    "cluster_colours":{
        "TA 1":"#FF99CC",
        "Goblet / DCS":"#CC9900",
        "Late Enterocyte":"#006600",
        "proCSC":"#000099",
        "Early Enterocyte":"#00FF99",
        "revCSC":"#CC0000",
        "ER Stress":"#CC00FF",
        "CSC":"#6666FF",
        "TA 2":"#FF6666"
        },
    "condition":"condition"
}
cell_data = pd.read_csv(f"{input_dir}/{param_dict['input_csv']}")
cell_data
phate1 phate2 CCAT velocity_length A K P WENR mac fib curatedCLUST OWNsig_stemO_UCell OWNsig_stemS_UCell condition
0 -0.003110 -0.003080 0.274730 186.242097 0 0 0 0 0 0 TA 1 0.000000 0.221648 WT
1 0.024983 0.029213 0.233824 152.058837 0 0 0 0 0 0 Goblet / DCS 0.000000 0.122389 WT
2 0.037142 0.007396 0.195300 164.789267 0 0 0 0 0 0 Late Enterocyte 0.000000 0.155778 WT
3 -0.021990 0.000884 0.359556 221.681973 0 0 0 0 0 0 proCSC 0.013370 0.085037 WT
4 0.041909 0.035762 0.230588 208.885837 0 0 0 0 0 0 Late Enterocyte 0.000000 0.058852 WT
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5896 -0.003300 -0.006133 0.361533 197.806256 0 0 0 1 0 0 TA 1 0.179889 0.337778 WENR
5897 -0.028562 0.010894 0.443155 145.233683 0 0 0 1 0 0 proCSC 0.663185 0.256611 WENR
5898 -0.002911 -0.009425 0.392001 188.500721 0 0 0 1 0 0 CSC 0.136074 0.283296 WENR
5899 -0.024535 0.002836 0.441813 114.664223 0 0 0 1 0 0 proCSC 0.298593 0.203278 WENR
5900 -0.013252 0.004670 0.434362 191.966684 0 0 0 1 0 0 revCSC 0.512222 0.388148 WENR

5901 rows × 14 columns

fig, axs = plt.subplots(2,2)
fig.suptitle("Pluripotency and transcriptomic change on 2D space")
axs[0,0].scatter(
    cell_data[param_dict["dim_names"][0]],
    cell_data[param_dict["dim_names"][1]],
    c=cell_data[param_dict["cluster"]].map(param_dict["cluster_colours"]))
axs[0,1].scatter(
    cell_data[param_dict["dim_names"][0]],
    cell_data[param_dict["dim_names"][1]])
axs[1,0].scatter(
    cell_data[param_dict["dim_names"][0]],
    cell_data[param_dict["dim_names"][1]],
    c=cell_data[param_dict["potency"]], cmap="inferno")
axs[1,1].scatter(
    cell_data[param_dict["dim_names"][0]],
    cell_data[param_dict["dim_names"][1]],
    c=cell_data[param_dict["dynamic"]], cmap="inferno")
<matplotlib.collections.PathCollection>

Data pre-processing


source

check_inversevvl

 check_inversevvl (dataframe, param_dict)

Computes inverse of RNA velocity vector lengths if those are being used as the dynamic metric (i.e. param_dict[“dynamic]==“velocity_length”).

Type Details
dataframe Input Cell x Meta pandas dataframe
param_dict Parameter dictionary with “dynamic” key set.
Returns dict Updated parameter dictionary with “inverse_vvl” set as the default dynamic metric.
cell_data, param_dict = check_inversevvl(cell_data, param_dict)
Computing the inverse of the RNA velocity values

source

scale_metrics

 scale_metrics (dataframe, metrics=['CCAT', 'inverse_vvl'])

Scales key metrics from the dataframe to a (0,1) range.

Type Default Details
dataframe Input Cell x Meta pandas dataframe
metrics list [‘CCAT’, ‘inverse_vvl’] Metrics to scale. Should be found within the dataframe’s columns.
Returns pandas.DataFrame Updated Cell X Metadata dataframe with scaled metrics (in-place)
cell_data = scale_metrics(cell_data, [param_dict["potency"], 
    param_dict["dynamic"]])
cell_data
phate1 phate2 CCAT velocity_length A K P WENR mac fib curatedCLUST OWNsig_stemO_UCell OWNsig_stemS_UCell condition inverse_vvl
0 -0.003110 -0.003080 0.339365 186.242097 0 0 0 0 0 0 TA 1 0.000000 0.221648 WT 0.147660
1 0.024983 0.029213 0.217284 152.058837 0 0 0 0 0 0 Goblet / DCS 0.000000 0.122389 WT 0.231605
2 0.037142 0.007396 0.102314 164.789267 0 0 0 0 0 0 Late Enterocyte 0.000000 0.155778 WT 0.196273
3 -0.021990 0.000884 0.592517 221.681973 0 0 0 0 0 0 proCSC 0.013370 0.085037 WT 0.087964
4 0.041909 0.035762 0.207627 208.885837 0 0 0 0 0 0 Late Enterocyte 0.000000 0.058852 WT 0.107182
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5896 -0.003300 -0.006133 0.598417 197.806256 0 0 0 1 0 0 TA 1 0.179889 0.337778 WENR 0.125830
5897 -0.028562 0.010894 0.842009 145.233683 0 0 0 1 0 0 proCSC 0.663185 0.256611 WENR 0.253098
5898 -0.002911 -0.009425 0.689346 188.500721 0 0 0 1 0 0 CSC 0.136074 0.283296 WENR 0.143186
5899 -0.024535 0.002836 0.838003 114.664223 0 0 0 1 0 0 proCSC 0.298593 0.203278 WENR 0.380759
5900 -0.013252 0.004670 0.815766 191.966684 0 0 0 1 0 0 revCSC 0.512222 0.388148 WENR 0.136525

5901 rows × 15 columns

Compute VR scores

Here we split by necessary variables and compute VR score


source

compute_distdeg

 compute_distdeg (input_df, dim_name=['phate1', 'phate2'],
                  split_by='curatedCLUST', knn=30,
                  distance_mode='centroid', distance_metric='manhattan',
                  scale=True, plot=False, rmv_outliers=False)

Computes distances and degrees of cells within a cluster. Results will get scaled within the cluster level so it handles them as independent entities.

Type Default Details
input_df
dim_name list [‘phate1’, ‘phate2’]
split_by str curatedCLUST
knn int 30
distance_mode str centroid
distance_metric str manhattan
scale bool True
plot bool False
rmv_outliers bool False
Returns pandas.DataFrame Dataframe

source

bycluster_median

 bycluster_median (dataframe, param_dict)

Compute median of potency and dynamic metrics on a per-cluster basis.

Type Details
dataframe Input Cell x Meta pandas dataframe
param_dict Parameter dictionary with “potency”, “dynamic”, and “cluster” keys set.
Returns pandas.DataFrame Updated Cell X Metadata dataframe with median metrics.

source

compute_vr_scores

 compute_vr_scores (dataframe, param_dict, global_dist=False)

Compute the Valley-Ridge scores.

Type Default Details
dataframe Input Cell x Meta pandas dataframe
param_dict Parameter dictionary with “potency” and “dynamic” keys set.
global_dist bool False
Returns pandas.DataFrame Dataframe
score_data = compute_vr_scores(cell_data, param_dict, global_dist=True)
fig = make_subplots(
    rows=2, cols=2,
    specs=[
        [{"type": "scatter3d"}, {"type": "scatter3d"}],
        [{"type": "scatter3d"}, {"type": "scatter3d"}]],
)

fig.add_trace(
    go.Scatter3d(
        x=score_data[score_data["condition"]=="WT"][param_dict["dim_names"][0]], 
        y=score_data[score_data["condition"]=="WT"][param_dict["dim_names"][1]], 
        z=score_data[score_data["condition"]=="WT"]["VR"],
        mode="markers", marker=dict(
            color=score_data[score_data["condition"]=="WT"][param_dict["cluster"]].map(param_dict["cluster_colours"]))
    ),
    row=1, col=1)

fig.add_trace(
    go.Scatter3d(
        x=score_data[score_data["condition"]=="WTfib"][param_dict["dim_names"][0]], 
        y=score_data[score_data["condition"]=="WTfib"][param_dict["dim_names"][1]], 
        z=score_data[score_data["condition"]=="WTfib"]["VR"],
        mode="markers", marker=dict(
            color=score_data[score_data["condition"]=="WTfib"][param_dict["cluster"]].map(param_dict["cluster_colours"]))
    ),
    row=1, col=2)

fig.add_trace(
    go.Scatter3d(
        x=score_data[score_data["condition"]=="CRC"][param_dict["dim_names"][0]], 
        y=score_data[score_data["condition"]=="CRC"][param_dict["dim_names"][1]], 
        z=score_data[score_data["condition"]=="CRC"]["VR"],
        mode="markers", marker=dict(
            color=score_data[score_data["condition"]=="CRC"][param_dict["cluster"]].map(param_dict["cluster_colours"]))
    ),
    row=2, col=1)

fig.add_trace(
    go.Scatter3d(
        x=score_data[score_data["condition"]=="WENR"][param_dict["dim_names"][0]], 
        y=score_data[score_data["condition"]=="WENR"][param_dict["dim_names"][1]], 
        z=score_data[score_data["condition"]=="WENR"]["VR"],
        mode="markers", marker=dict(
            color=score_data[score_data["condition"]=="WENR"][param_dict["cluster"]].map(param_dict["cluster_colours"]))
    ),
    row=2, col=2)

fig.update_traces(marker_size = 2)
fig.update_layout(
    autosize=True,
    scene_aspectmode="cube",
    width=1000, height=1000,
    margin=dict(l=50, r=50, b=50, t=90))

fig.show()

The resulting VR score dataset can be saved to a file:

And so can the dictionary with parameters, to be used in subsequent steps:

param_dict["scores_csv"] = "vr_QinCardoso23.csv"
with open(f"{output_dir}/param_dict.json", "w") as param_json:
    json.dump(param_dict, param_json)
score_data.to_csv(f"{output_dir}/{param_dict['scores_csv']}", index=False)

Unit tests will be implemented in the future once the alphashape-based distance computation is finished.

# test_eq()