plugy
is a Python module for the analysis of plug microfluidics spectrometric reads. Plug-based microfluidics is a three phase microfluidics system with fluorinated oil as continuous phase carrying large (~500 nl volume) aqueous droplets (plugs), separated by similar size mineral oil plugs. At the spectrometric data acquisition the plugs pass through a transparent tube targeted by multiple laser beams, while photomultiplers (PMTs) detect the fluorescence. plugy
is able to read and process the recordings of the PMT data. It detects the plugs and quantifies their fluorescence across multiple channels. If the plug sequence comes from a drug combinational screening experiment, where blue plugs (so called barcode) separate each of the samples (drug combinations) from each other, plugy
is able to identify these samples, organize the data by sample and perform various statiscics and visualizations.
plugy
is available in the git repositories at https://git.embl.de/grp-merten/plugy and https://github.com/saezlab/plugy. Once you have Python 3 on your computer, install it by pip
:
pip3 install git+https://git.embl.de/grp-merten/plugy.git@dev
If you analyse a drug combination screen you need three input files:
For processing any other kind of PMT recording, to identify plugs and samples, the first one is enough, no layout and sequence needed.
Normally we need to import the these two submodules:
from plugy import exp
from plugy.data import config
Then we create a configuration object:
plugy_config = config.PlugyConfig(
pmt_file = 'Exp53_1.txt',
seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
)
Finally, using the configuration, we create an experiment object. By default the experiment object runs all the analysis and visualization upon creation. You will see later how to change this behaviour.
plug_exp = exp.PlugExperiment(plugy_config)
The experiment object contains data frames (pandas.DataFrame
objects, read more about pandas
here: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) with the raw PTM recordings, the plugs and the samples. The raw data is a very long data frame as it contains 300 rows per second if you used a 300 Hz acquisition rate:
plug_exp.pmt_data.data
Let's see the plugs. In this experiment we detected 2,134 plugs. For each plug plugy
calculated its median fluorescence for each channel. It also identified whether the plug is part of the barcoding, and assigned a sample number and a cycle number. Each plug has a start and end time, the difference of these gives the length of the plugs which, assuming a steady flow rate, is proportional to the volume. Typically the length of a plug is around 0.5-2.5 seconds (approx. 100-600 nl at a 800 ul/h flow rate).
plug_exp.plug_data.plug_df
Finally, another data frame is available with the samples. This we create from a copy of the previous data frame, but here we don't have the barcode plugs and or plugs discarded for any reason. Instead, if sequence and layout data are available we have the sample name, the name of the compounds, and z-scores calculated for each plug.
plug_exp.plug_data.sample_df
With these data frames you can do whatever is possible with pandas
, for example you can easily export the data to CSV files:
plug_exp.plug_data.sample_df.to_csv('exp53_samples.csv')
Often you don't want to run everything upon the creation of the experiment object. Either your data doesn't contain samples or drug combinations and you just want to detect plugs. Or you are getting an error and you want to inspect the object so you need access to the instance. In all cases when you don't want to run everything, you need to pass the run = False
parameter to the config. Then the object will be created without running anything.
plugy_config = config.PlugyConfig(
pmt_file = 'Exp53_1.txt',
seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False, # <---- see here
)
plug_exp_norun = exp.PlugExperiment(plugy_config)
After creating such an object, you can manually call the methods step by step. For example, to read the PMT data and detect the plugs you can do:
plug_exp_norun.setup()
plug_exp_norun.load()
plug_exp_norun.detect_plugs()
Then we have the data frame with the plugs:
plug_exp_norun.plug_data.plug_df
If you pass the run = False
and init = True
parameters the experiment object will be created with the PMT data read but without detecting plugs.
plugy_config = config.PlugyConfig(
pmt_file = 'Exp53_1.txt',
seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
init = True, # <---- see here
)
plug_exp_raw = exp.PlugExperiment(plugy_config)
For this pass the run = False
and plugs = True
parameters.
plugy_config = config.PlugyConfig(
pmt_file = 'Exp53_1.txt',
seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
plugs = True, # <---- see here
)
plug_exp_plugs = exp.PlugExperiment(plugy_config)
Sometimes you just want the data frame with the samples without generating all the figures. For this pass the run = False
and samples = True
parameters.
plugy_config = config.PlugyConfig(
pmt_file = 'Exp53_1.txt',
seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
samples = True, # <---- see here
)
plug_exp_norun = exp.PlugExperiment(plugy_config)
Due to various technical issues of the experiments it might happen that in some of the cycles the number of identified samples doesn't match the expected, hence the identity of the samples can't be determined. In most of the cases you can fix this error by increasing the maximum allowed plug length or the ratio of the barcoding channel.
You should do this for example if you have fused plugs. Use the peak_max_width
parameter.
plugy_config = config.PlugyConfig(
pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
plugs = True,
)
plug_exp = exp.PlugExperiment(plugy_config)
As you see above: "Found 1 cycles, 0 with the expected number of samples". Notice that we passed the parameters run = False
and plugs = True
. If we go further than plug detection we get an error. Let's try with a peak_max_width
increased to 7 seconds:
plugy_config = config.PlugyConfig(
pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 7,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
plugs = True,
)
plug_exp = exp.PlugExperiment(plugy_config)
This way we could detect the long, fused plugs, so the number of samples match.
Sometimes it's not always true that the barcoding blue dye is the highest in the barcode plugs, or it is highest also in some sample plugs. You can adjust this ratio manually, or you can scan a range to find the ratio which results the highest number of detected cycles. For this set the barcoding_method
to "adaptive" and use the barcoding_param
option. Here we scan a range from 0.3 to 1.6:
plugy_config = config.PlugyConfig(
pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 7,
barcoding_method = 'adaptive',
barcoding_param = {'ratio': (0.3, 1.6)},
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
run = False,
plugs = True,
)
plug_exp = exp.PlugExperiment(plugy_config)
In this example the ratio doesn't really matter, so finally the value 0.3 was selected.
If the two methods above doesn't work, first inspect the plot of the raw data (pmt_overview.png) and try to find out what is the issue. Sometimes you see between certain samples there are only one barcode plug, or more than 12, or between the cycles there are less than 12 barcode plugs. Or in certain samples there are less than 3 plugs. To address these issues you can adjust the min_between_samples_barcodes
, the min_end_cycle_barcodes
or the min_plugs_in_sample
parameters. There are many further parameters to fine tune the sample detection but these are out of the scope of this tutorial.
Sometimes for technical reasons you can record the data in multiple sessions, but for the analysis and visualization you want to concatenate these sequences. For this, create a config and experiment object for all recordings except the first one, using the run = False
and the samples = True
parameters:
plugy_config_ii = config.PlugyConfig(
pmt_file = 'cycle_II_wo adaptor/Exp54_cycle_II_1.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
# we don't want to run the entire thing
run = False,
# but want to run until sample detection
samples = True,
)
plugy_config_iii = config.PlugyConfig(
pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 7,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
# we don't want to run the entire thing
run = False,
# but want to run until sample detection
samples = True,
)
plug_exp_ii = exp.PlugExperiment(plugy_config_ii)
plug_exp_iii = exp.PlugExperiment(plugy_config_iii)
Finally, create a config and an experiment for the first recording, and provide all the subsequent experiment objects in the append
parameter.
plugy_config = config.PlugyConfig(
pmt_file = 'cycle I wo adaptor/exp54_cycle_I.txt',
seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
auto_detect_cycles = True,
peak_max_width = 2.5,
ignore_qc_result = True,
normalize_using_control = True,
readout_analysis_column = 'readout_per_control_z_score',
figure_export_file_type = 'png',
# we want to merge these 2,
# appending them to the end of the first one:
append = (plug_exp_ii, plug_exp_iii),
)
plug_exp = exp.PlugExperiment(plugy_config)
As you see, we ended up with 3 cycles from the three acquisition sessions. All the data has been combined together in the data frames of the last object:
plug_exp.plug_data.sample_df
By default plugy
saves the figures into the results
directory. You can provide a different directory by the results_base_dir
option. At each subsequent run in the same working directory, the figures will be overwritten. If you want to keep the old figures and create new directories, you can use the option result_subdirs = True
. If you want to add a timestamp to these directory names, add the option timestamp_result_subdirs = True
. Before version 0.5.0 this was the default behaviour.