Plugy guide

plugy is a Python module for the analysis of plug microfluidics spectrometric reads. Plug-based microfluidics is a three phase microfluidics system with fluorinated oil as continuous phase carrying large (~500 nl volume) aqueous droplets (plugs), separated by similar size mineral oil plugs. At the spectrometric data acquisition the plugs pass through a transparent tube targeted by multiple laser beams, while photomultiplers (PMTs) detect the fluorescence. plugy is able to read and process the recordings of the PMT data. It detects the plugs and quantifies their fluorescence across multiple channels. If the plug sequence comes from a drug combinational screening experiment, where blue plugs (so called barcode) separate each of the samples (drug combinations) from each other, plugy is able to identify these samples, organize the data by sample and perform various statiscics and visualizations.

1. Installation

plugy is available in the git repositories at https://git.embl.de/grp-merten/plugy and https://github.com/saezlab/plugy. Once you have Python 3 on your computer, install it by pip:

pip3 install git+https://git.embl.de/grp-merten/plugy.git@dev

2. Basic usage

If you analyse a drug combination screen you need three input files:

  • The PMT recordings
  • The valve (chip inlet) layout: which drug was connected to which inlet
  • The sample sequence: a list of the samples generated during the experiment

For processing any other kind of PMT recording, to identify plugs and samples, the first one is enough, no layout and sequence needed.

Normally we need to import the these two submodules:

In [2]:
from plugy import exp
from plugy.data import config

Then we create a configuration object:

In [10]:
plugy_config = config.PlugyConfig(
    pmt_file = 'Exp53_1.txt',
    seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
)

Finally, using the configuration, we create an experiment object. By default the experiment object runs all the analysis and visualization upon creation. You will see later how to change this behaviour.

In [11]:
plug_exp = exp.PlugExperiment(plugy_config)
16.11.20 15:21:49 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 15:21:49 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:21:49 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:21:49 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp53/sequence_9_drugs_and_one neg_Ctr.csv
16.11.20 15:21:49 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:21:49 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:21:49 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 15:21:51 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 15:21:52 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 15:21:52 - plugy.data.plug - INFO - Finding plugs
16.11.20 15:21:52 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00,  7.54it/s]
16.11.20 15:21:55 - plugy.data.plug - INFO - Found 2 cycles, 2 with the expected number of samples. Sample count deviations: 0=0, 1=0. Best barcode detection parameters: times=1.
16.11.20 15:21:55 - plugy.data.plug - INFO - Labelling samples with compound names
16.11.20 15:21:55 - plugy.data.exp - INFO - Calculating statistics
16.11.20 15:22:12 - plugy.data.exp - INFO - Plotted PMT data to /home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/results/qc/pmt_overview.png
16.11.20 15:22:39 - plugy.data.exp - WARNING - Contamination above threshold (0.1398159072488064 > 0.03)
16.11.20 15:24:14 - plugy.data.exp - CRITICAL - Quality control failed due to the following reasons: Contamination above threshold (0.1398159072488064 > 0.03). See also the QC plots for more information. In case you still want to continue, you can set the `ignore_qc_result` config parameter to True.
16.11.20 15:24:14 - plugy.data.exp - INFO - Running drug combination analysis
16.11.20 15:24:17 - plugy.data.exp - INFO - Saving violin plots to /home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/results/drug_comb_z_violins.png
16.11.20 15:24:27 - plugy.data.exp - INFO - Saving violin plots to /home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/results/drug_comb_z_violins_by-cycle.png
16.11.20 15:24:33 - plugy.data.exp - INFO - Saving heatmap(s) to /home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/results/drug_comb_z_heatmap.png
16.11.20 15:24:35 - plugy.data.exp - INFO - Saving heatmap(s) to /home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/results/drug_comb_z_heatmap_by-cycle.png

3. How to access the data inside the experiment object

The experiment object contains data frames (pandas.DataFrame objects, read more about pandas here: https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html) with the raw PTM recordings, the plugs and the samples. The raw data is a very long data frame as it contains 300 rows per second if you used a 300 Hz acquisition rate:

In [12]:
plug_exp.pmt_data.data
Out[12]:
time green orange uv
0 2054.813000 0.015259 0.012818 0.201117
1 2054.816333 0.016785 0.013733 0.191961
2 2054.819667 0.017090 0.011292 0.189825
3 2054.823000 0.017396 0.011597 0.196539
4 2054.826333 0.017090 0.012818 0.201422
... ... ... ... ...
1943095 8531.796333 0.000610 0.007019 0.005188
1943096 8531.799667 0.000305 0.011292 0.006714
1943097 8531.803000 0.000305 0.010681 0.006714
1943098 8531.806333 0.000305 0.007324 0.006104
1943099 8531.809667 0.000610 0.013123 0.007630

1943100 rows × 4 columns

Let's see the plugs. In this experiment we detected 2,134 plugs. For each plug plugy calculated its median fluorescence for each channel. It also identified whether the plug is part of the barcoding, and assigned a sample number and a cycle number. Each plug has a start and end time, the difference of these gives the length of the plugs which, assuming a steady flow rate, is proportional to the volume. Typically the length of a plug is around 0.5-2.5 seconds (approx. 100-600 nl at a 800 ul/h flow rate).

In [13]:
plug_exp.plug_data.plug_df
Out[13]:
start_time end_time barcode_peak_median control_peak_median readout_peak_median readout_per_control barcode cycle_nr sample_nr discard name compound_a compound_b
0 2056.223000 2056.986333 0.123905 0.012818 0.011902 0.928538 True 0 0 True NaN NaN NaN
1 2058.256333 2059.329667 0.178685 0.013123 0.015259 1.162768 True 0 0 True NaN NaN NaN
2 2060.149667 2061.353000 0.249336 0.013123 0.020142 1.534862 True 0 0 True NaN NaN NaN
3 2062.153000 2063.156333 0.205390 0.013123 0.017090 1.302294 True 0 0 True NaN NaN NaN
4 2065.563000 2066.316333 0.082705 0.012818 0.008850 0.690435 True 0 0 True NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2129 8353.953000 8355.193000 0.670644 0.013581 0.029603 2.179817 True 1 73 True NaN NaN NaN
2130 8356.086333 8357.049667 0.636311 0.014039 0.028993 2.065176 True 1 73 True NaN NaN NaN
2131 8357.953000 8358.909667 0.633869 0.013733 0.028993 2.111192 True 1 73 True NaN NaN NaN
2132 8359.236333 8359.969667 0.568712 0.013123 0.026246 2.000000 True 1 73 True NaN NaN NaN
2133 8361.406333 8362.469667 0.668661 0.013733 0.030213 2.200029 True 1 73 True NaN NaN NaN

2134 rows × 13 columns

Finally, another data frame is available with the samples. This we create from a copy of the previous data frame, but here we don't have the barcode plugs and or plugs discarded for any reason. Instead, if sequence and layout data are available we have the sample name, the name of the compounds, and z-scores calculated for each plug.

In [14]:
plug_exp.plug_data.sample_df
Out[14]:
start_time end_time barcode_peak_median control_peak_median readout_peak_median readout_peak_z_score readout_per_control_z_score readout_per_control cycle_nr sample_nr name compound_a compound_b readout_media_norm readout_media_norm_z_score length
21 2103.499667 2105.433000 0.036012 0.291910 0.427564 -1.145513 -0.608381 1.464714 0 0 Cell Control FS FS 0.793676 -0.932854 1.933333
22 2106.199667 2107.429667 0.035096 0.307016 0.520035 -0.847285 -0.211035 1.693837 0 0 Cell Control FS FS 0.917933 -0.581419 1.230000
23 2108.039667 2109.103000 0.035401 0.275277 0.381176 -1.295118 -0.747143 1.384700 0 0 Cell Control FS FS 0.750462 -1.055076 1.063333
24 2109.996333 2111.099667 0.035096 0.281991 0.405896 -1.215394 -0.652293 1.439393 0 0 Cell Control FS FS 0.780168 -0.971059 1.103333
25 2111.836333 2112.813000 0.036012 0.268868 0.361644 -1.358111 -0.815884 1.345062 0 0 Cell Control FS FS 0.729095 -1.115507 0.976667
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2109 8296.136333 8297.583000 0.036927 0.805383 1.370433 1.895332 -0.197586 1.701592 1 73 Cell Control FS FS 1.245276 0.344398 1.446667
2110 8298.359667 8299.023000 0.038148 0.818812 1.083407 0.969644 -0.853891 1.323145 1 73 Cell Control FS FS 0.968439 -0.438576 0.663333
2111 8299.643000 8301.349667 0.038759 0.621052 1.322062 1.739329 0.543187 2.128746 1 73 Cell Control FS FS 1.558189 1.229406 1.706667
2112 8302.333000 8304.159667 0.037538 0.885647 1.698660 2.953893 0.177689 1.917988 1 73 Cell Control FS FS 1.404133 0.793692 1.826667
2113 8304.506333 8305.083000 0.037538 0.955535 2.389294 5.181254 1.187847 2.500478 1 73 Cell Control FS FS 1.830791 2.000404 0.576667

1144 rows × 16 columns

With these data frames you can do whatever is possible with pandas, for example you can easily export the data to CSV files:

In [43]:
plug_exp.plug_data.sample_df.to_csv('exp53_samples.csv')

4. How to run only certain parts of the workflow

Often you don't want to run everything upon the creation of the experiment object. Either your data doesn't contain samples or drug combinations and you just want to detect plugs. Or you are getting an error and you want to inspect the object so you need access to the instance. In all cases when you don't want to run everything, you need to pass the run = False parameter to the config. Then the object will be created without running anything.

In [18]:
plugy_config = config.PlugyConfig(
    pmt_file = 'Exp53_1.txt',
    seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False, # <---- see here
)

plug_exp_norun = exp.PlugExperiment(plugy_config)
16.11.20 15:48:03 - plugy.data.exp - INFO - Initializing PlugExperiment

After creating such an object, you can manually call the methods step by step. For example, to read the PMT data and detect the plugs you can do:

In [19]:
plug_exp_norun.setup()
plug_exp_norun.load()
plug_exp_norun.detect_plugs()
16.11.20 15:51:18 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:51:18 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:51:18 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp53/sequence_9_drugs_and_one neg_Ctr.csv
16.11.20 15:51:18 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:51:18 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:51:18 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 15:51:19 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 15:51:19 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 15:51:19 - plugy.data.plug - INFO - Finding plugs
16.11.20 15:51:20 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00,  9.73it/s]
16.11.20 15:51:22 - plugy.data.plug - INFO - Found 2 cycles, 2 with the expected number of samples. Sample count deviations: 0=0, 1=0. Best barcode detection parameters: times=1.

Then we have the data frame with the plugs:

In [20]:
plug_exp_norun.plug_data.plug_df
Out[20]:
start_time end_time barcode_peak_median control_peak_median readout_peak_median readout_per_control barcode cycle_nr sample_nr discard
0 2056.223000 2056.986333 0.123905 0.012818 0.011902 0.928538 True 0 0 True
1 2058.256333 2059.329667 0.178685 0.013123 0.015259 1.162768 True 0 0 True
2 2060.149667 2061.353000 0.249336 0.013123 0.020142 1.534862 True 0 0 True
3 2062.153000 2063.156333 0.205390 0.013123 0.017090 1.302294 True 0 0 True
4 2065.563000 2066.316333 0.082705 0.012818 0.008850 0.690435 True 0 0 True
... ... ... ... ... ... ... ... ... ... ...
2129 8353.953000 8355.193000 0.670644 0.013581 0.029603 2.179817 True 1 73 True
2130 8356.086333 8357.049667 0.636311 0.014039 0.028993 2.065176 True 1 73 True
2131 8357.953000 8358.909667 0.633869 0.013733 0.028993 2.111192 True 1 73 True
2132 8359.236333 8359.969667 0.568712 0.013123 0.026246 2.000000 True 1 73 True
2133 8361.406333 8362.469667 0.668661 0.013733 0.030213 2.200029 True 1 73 True

2134 rows × 10 columns

4.1. Only read the PMT recordings

If you pass the run = False and init = True parameters the experiment object will be created with the PMT data read but without detecting plugs.

In [21]:
plugy_config = config.PlugyConfig(
    pmt_file = 'Exp53_1.txt',
    seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    init = True, # <---- see here
)

plug_exp_raw = exp.PlugExperiment(plugy_config)
16.11.20 15:53:32 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 15:53:32 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:53:32 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:53:32 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp53/sequence_9_drugs_and_one neg_Ctr.csv
16.11.20 15:53:32 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:53:32 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:53:32 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 15:53:33 - plugy.data.pmt - INFO - Correcting acquisition time

4.2. Only detect the plugs

For this pass the run = False and plugs = True parameters.

In [22]:
plugy_config = config.PlugyConfig(
    pmt_file = 'Exp53_1.txt',
    seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    plugs = True, # <---- see here
)

plug_exp_plugs = exp.PlugExperiment(plugy_config)
16.11.20 15:55:20 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 15:55:20 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:55:20 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:55:20 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp53/sequence_9_drugs_and_one neg_Ctr.csv
16.11.20 15:55:20 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:55:20 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:55:20 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 15:55:21 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 15:55:21 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 15:55:21 - plugy.data.plug - INFO - Finding plugs
16.11.20 15:55:22 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00,  9.66it/s]
16.11.20 15:55:24 - plugy.data.plug - INFO - Found 2 cycles, 2 with the expected number of samples. Sample count deviations: 0=0, 1=0. Best barcode detection parameters: times=1.

4.3. Identify the samples without creating the figures

Sometimes you just want the data frame with the samples without generating all the figures. For this pass the run = False and samples = True parameters.

In [29]:
plugy_config = config.PlugyConfig(
    pmt_file = 'Exp53_1.txt',
    seq_file = 'sequence_9_drugs_and_one neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_one_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    samples = True, # <---- see here
)

plug_exp_norun = exp.PlugExperiment(plugy_config)
16.11.20 15:59:49 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 15:59:49 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:59:49 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Channel_9_drugs_and_one_neg_Ctr.csv
16.11.20 15:59:49 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp53/sequence_9_drugs_and_one neg_Ctr.csv
16.11.20 15:59:49 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:59:49 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp53/Exp53_1.txt
16.11.20 15:59:49 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 15:59:50 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 15:59:50 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 15:59:50 - plugy.data.plug - INFO - Finding plugs
16.11.20 15:59:51 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00,  8.51it/s]
16.11.20 15:59:53 - plugy.data.plug - INFO - Found 2 cycles, 2 with the expected number of samples. Sample count deviations: 0=0, 1=0. Best barcode detection parameters: times=1.

16.11.20 15:59:54 - plugy.data.plug - INFO - Labelling samples with compound names
16.11.20 15:59:54 - plugy.data.exp - INFO - Calculating statistics

5. Adjusting sample and cycle detection

Due to various technical issues of the experiments it might happen that in some of the cycles the number of identified samples doesn't match the expected, hence the identity of the samples can't be determined. In most of the cases you can fix this error by increasing the maximum allowed plug length or the ratio of the barcoding channel.

5.1. Adjusting the maximum plug length

You should do this for example if you have fused plugs. Use the peak_max_width parameter.

In [36]:
plugy_config = config.PlugyConfig(
    pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    plugs = True,
)

plug_exp = exp.PlugExperiment(plugy_config)
16.11.20 16:30:40 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 16:30:40 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:30:40 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:30:40 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:30:40 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:30:40 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:30:40 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 16:30:40 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 16:30:40 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 16:30:40 - plugy.data.plug - INFO - Finding plugs
16.11.20 16:30:41 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00, 20.70it/s]
16.11.20 16:30:41 - plugy.data.plug - INFO - Found 1 cycles, 0 with the expected number of samples. Sample count deviations: 0=-62. Best barcode detection parameters: times=1.

As you see above: "Found 1 cycles, 0 with the expected number of samples". Notice that we passed the parameters run = False and plugs = True. If we go further than plug detection we get an error. Let's try with a peak_max_width increased to 7 seconds:

In [37]:
plugy_config = config.PlugyConfig(
    pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 7,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    plugs = True,
)

plug_exp = exp.PlugExperiment(plugy_config)
16.11.20 16:33:03 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 16:33:03 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:33:03 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:33:03 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:33:03 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:33:03 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:33:03 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 16:33:04 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 16:33:04 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 16:33:04 - plugy.data.plug - INFO - Finding plugs
16.11.20 16:33:04 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00, 15.02it/s]
16.11.20 16:33:06 - plugy.data.plug - INFO - Found 1 cycles, 1 with the expected number of samples. Sample count deviations: 0=0. Best barcode detection parameters: times=1.

This way we could detect the long, fused plugs, so the number of samples match.

5.2. Barcode ratio

Sometimes it's not always true that the barcoding blue dye is the highest in the barcode plugs, or it is highest also in some sample plugs. You can adjust this ratio manually, or you can scan a range to find the ratio which results the highest number of detected cycles. For this set the barcoding_method to "adaptive" and use the barcoding_param option. Here we scan a range from 0.3 to 1.6:

In [39]:
plugy_config = config.PlugyConfig(
    pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 7,
    barcoding_method = 'adaptive',
    barcoding_param = {'ratio': (0.3, 1.6)},
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    run = False,
    plugs = True,
)

plug_exp = exp.PlugExperiment(plugy_config)
16.11.20 16:37:05 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 16:37:05 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:37:05 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:37:05 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 16:37:05 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:37:05 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 16:37:05 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 16:37:06 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 16:37:06 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 16:37:06 - plugy.data.plug - INFO - Finding plugs
16.11.20 16:37:07 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [adaptive_method=simple, block_size=7, higher_threshold_factor=1, ratio=1.6, thresholding_method=local]: 100%|██████████| 21/21 [00:01<00:00, 16.56it/s] 
16.11.20 16:37:09 - plugy.data.plug - INFO - Found 1 cycles, 0 with the expected number of samples. Sample count deviations: 0=26. Best barcode detection parameters: adaptive_method=simple, block_size=7, higher_threshold_factor=1, ratio=0.3, thresholding_method=local.

In this example the ratio doesn't really matter, so finally the value 0.3 was selected.

5.3. Further parameters

If the two methods above doesn't work, first inspect the plot of the raw data (pmt_overview.png) and try to find out what is the issue. Sometimes you see between certain samples there are only one barcode plug, or more than 12, or between the cycles there are less than 12 barcode plugs. Or in certain samples there are less than 3 plugs. To address these issues you can adjust the min_between_samples_barcodes, the min_end_cycle_barcodes or the min_plugs_in_sample parameters. There are many further parameters to fine tune the sample detection but these are out of the scope of this tutorial.

6. Merging multiple experiments

Sometimes for technical reasons you can record the data in multiple sessions, but for the analysis and visualization you want to concatenate these sequences. For this, create a config and experiment object for all recordings except the first one, using the run = False and the samples = True parameters:

In [40]:
plugy_config_ii = config.PlugyConfig(
    pmt_file = 'cycle_II_wo adaptor/Exp54_cycle_II_1.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    # we don't want to run the entire thing
    run = False,
    # but want to run until sample detection
    samples = True,
)

plugy_config_iii = config.PlugyConfig(
    pmt_file = 'cycle_III_with adaptor_Not good/Exp54_cycle_III.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 7,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    # we don't want to run the entire thing
    run = False,
    # but want to run until sample detection
    samples = True,
)

plug_exp_ii = exp.PlugExperiment(plugy_config_ii)
plug_exp_iii = exp.PlugExperiment(plugy_config_iii)
16.11.20 17:13:09 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 17:13:09 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:09 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:09 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:09 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_II_wo adaptor/Exp54_cycle_II_1.txt
16.11.20 17:13:09 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_II_wo adaptor/Exp54_cycle_II_1.txt
16.11.20 17:13:09 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 17:13:10 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 17:13:10 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 17:13:10 - plugy.data.plug - INFO - Finding plugs
16.11.20 17:13:11 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00, 13.65it/s]
16.11.20 17:13:12 - plugy.data.plug - INFO - Found 1 cycles, 1 with the expected number of samples. Sample count deviations: 0=0. Best barcode detection parameters: times=1.
16.11.20 17:13:12 - plugy.data.plug - INFO - Labelling samples with compound names

16.11.20 17:13:12 - plugy.data.exp - INFO - Calculating statistics
16.11.20 17:13:12 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 17:13:12 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:12 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:12 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:13:12 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 17:13:12 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle_III_with adaptor_Not good/Exp54_cycle_III.txt
16.11.20 17:13:12 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 17:13:13 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 17:13:14 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 17:13:14 - plugy.data.plug - INFO - Finding plugs
16.11.20 17:13:14 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00, 15.51it/s]
16.11.20 17:13:15 - plugy.data.plug - INFO - Found 1 cycles, 1 with the expected number of samples. Sample count deviations: 0=0. Best barcode detection parameters: times=1.
16.11.20 17:13:15 - plugy.data.plug - INFO - Labelling samples with compound names
16.11.20 17:13:15 - plugy.data.exp - INFO - Calculating statistics

Finally, create a config and an experiment for the first recording, and provide all the subsequent experiment objects in the append parameter.

In [41]:
plugy_config = config.PlugyConfig(
    pmt_file = 'cycle I wo adaptor/exp54_cycle_I.txt',
    seq_file = 'sequence_9_drugs_and_pos_neg_Ctr.csv',
    channel_file = 'Channel_9_drugs_and_pos_neg_Ctr.csv',
    auto_detect_cycles = True,
    peak_max_width = 2.5,
    ignore_qc_result = True,
    normalize_using_control = True,
    readout_analysis_column = 'readout_per_control_z_score',
    figure_export_file_type = 'png',
    # we want to merge these 2,
    # appending them to the end of the first one:
    append = (plug_exp_ii, plug_exp_iii),
)


plug_exp = exp.PlugExperiment(plugy_config)
16.11.20 17:14:37 - plugy.data.exp - INFO - Initializing PlugExperiment
16.11.20 17:14:37 - plugy.data.bd - INFO - Creating ChannelMap object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:14:37 - plugy.data.bd - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/Channel_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:14:37 - plugy.data.bd - INFO - Reading PlugSequence from /home/denes/Dokumentumok/microfluidics/vida/data/exp54/sequence_9_drugs_and_pos_neg_Ctr.csv
16.11.20 17:14:37 - plugy.data.pmt - INFO - Creating PmtData object from file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle I wo adaptor/exp54_cycle_I.txt
16.11.20 17:14:37 - plugy.data.pmt - INFO - Reading file /home/denes/Dokumentumok/microfluidics/vida/data/exp54/cycle I wo adaptor/exp54_cycle_I.txt
16.11.20 17:14:37 - plugy.data.pmt - INFO - Detected uncompressed txt file
16.11.20 17:14:38 - plugy.data.pmt - INFO - Correcting acquisition time
16.11.20 17:14:39 - plugy.data.plug - INFO - Creating PlugData object
16.11.20 17:14:39 - plugy.data.plug - INFO - Finding plugs
16.11.20 17:14:39 - plugy.data.pmt - INFO - Merging plugs with centers closer than 0.20 seconds
Adjusting barcode detection [times=1]: 100%|██████████| 1/1 [00:00<00:00, 13.27it/s]
16.11.20 17:14:40 - plugy.data.plug - INFO - Found 1 cycles, 1 with the expected number of samples. Sample count deviations: 0=0. Best barcode detection parameters: times=1.
16.11.20 17:14:40 - plugy.data.plug - INFO - Labelling samples with compound names

16.11.20 17:14:41 - plugy.data.plug - INFO - Found 2 cycles, 2 with the expected number of samples. Sample count deviations: 0=0, 1=0. Best barcode detection parameters: times=1.
16.11.20 17:14:41 - plugy.data.plug - INFO - Found 3 cycles, 3 with the expected number of samples. Sample count deviations: 0=0, 1=0, 2=0. Best barcode detection parameters: times=1.
16.11.20 17:14:41 - plugy.data.exp - INFO - Calculating statistics
/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/exp.py:252: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. 
  pmt_overview_fig.tight_layout()
16.11.20 17:14:52 - plugy.data.exp - INFO - Plotted PMT data to /home/denes/Dokumentumok/microfluidics/vida/data/exp53/results/qc/pmt_overview.png
16.11.20 17:15:21 - plugy.data.exp - WARNING - Contamination above threshold (0.08873243148938015 > 0.03)
Traceback (most recent call last):
  File "/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/exp.py", line 524, in plot_sample_cycles
    self.plug_data.plot_sample_cycles()
  File "/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/data/plug.py", line 1189, in plot_sample_cycles
    sample_cycle_ax[idx_y][idx_x] = self.plot_sample(
  File "/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/data/plug.py", line 1223, in plot_sample
    axes = self.plot_plug_pmt_data(axes = axes, cut = (peak_data.start_time.min() - offset, peak_data.end_time.max() + offset))
  File "/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/data/plug.py", line 629, in plot_plug_pmt_data
    axes = self.pmt_data.plot_pmt_data(axes, cut = cut)
  File "/home/denes/Dokumentumok/microfluidics/plugy_new/notebooks/plugy/data/pmt.py", line 299, in plot_pmt_data
    axes.set_xticks(range(int(round(df.time.min())), int(round(df.time.max())), major_tick_freq), minor = False)
ValueError: cannot convert float NaN to integer
16.11.20 17:15:34 - plugy.data.exp - ERROR - Failed to plot sample cycles
16.11.20 17:15:34 - plugy.data.exp - CRITICAL - Quality control failed due to the following reasons: Contamination above threshold (0.08873243148938015 > 0.03). See also the QC plots for more information. In case you still want to continue, you can set the `ignore_qc_result` config parameter to True.
16.11.20 17:15:34 - plugy.data.exp - INFO - Running drug combination analysis
16.11.20 17:15:37 - plugy.data.exp - INFO - Saving violin plots to /home/denes/Dokumentumok/microfluidics/vida/data/exp53/results/drug_comb_z_violins.png
16.11.20 17:15:47 - plugy.data.exp - INFO - Saving violin plots to /home/denes/Dokumentumok/microfluidics/vida/data/exp53/results/drug_comb_z_violins_by-cycle.png
16.11.20 17:15:55 - plugy.data.exp - INFO - Saving heatmap(s) to /home/denes/Dokumentumok/microfluidics/vida/data/exp53/results/drug_comb_z_heatmap.png
16.11.20 17:15:59 - plugy.data.exp - INFO - Saving heatmap(s) to /home/denes/Dokumentumok/microfluidics/vida/data/exp53/results/drug_comb_z_heatmap_by-cycle.png

As you see, we ended up with 3 cycles from the three acquisition sessions. All the data has been combined together in the data frames of the last object:

In [42]:
plug_exp.plug_data.sample_df
Out[42]:
level_0 index start_time end_time barcode_peak_median control_peak_median readout_peak_median readout_peak_z_score readout_per_control_z_score readout_per_control cycle_nr sample_nr name compound_a compound_b readout_media_norm readout_media_norm_z_score length
0 0 23.0 1481.650667 1484.040667 0.010987 0.239265 0.280160 -0.795588 -0.286266 1.170919 0 0 Cell Control FS FS 1.111713 -0.834247 2.390000
1 1 24.0 1484.994000 1487.007333 0.009766 0.258797 0.270394 -0.901775 -0.675440 1.044811 0 0 Cell Control FS FS 0.992203 -1.179383 2.013333
2 2 25.0 1491.394000 1492.187333 0.009461 0.264290 0.157170 -2.132872 -2.064537 0.594688 0 0 Cell Control FS FS 0.564986 -2.413158 0.793333
3 3 26.0 1493.367333 1495.810667 0.009766 0.260628 0.307321 -0.500263 -0.260848 1.179156 0 0 Cell Control FS FS 1.120412 -0.809126 2.443333
4 4 27.0 1496.714000 1498.007333 0.009156 0.278939 0.293436 -0.651242 -0.653347 1.051970 0 0 Cell Control FS FS 0.999786 -1.157484 1.293333
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1909 1174 NaN 16475.934333 16478.211000 0.009766 0.319529 0.263985 -0.877151 -1.468434 0.826169 2 73 Cell Control FS FS 1.026962 -1.122568 2.276667
1910 1175 NaN 16478.787667 16481.061000 0.010071 0.311289 0.274972 -0.795263 -1.356541 0.883333 2 73 Cell Control FS FS 1.098463 -0.950920 2.273333
1911 1176 NaN 16481.871000 16485.797667 0.009766 0.299081 0.236518 -1.081868 -1.537635 0.790816 2 73 Cell Control FS FS 0.983842 -1.226086 3.926667
1912 1177 NaN 16487.287667 16491.307667 0.010071 0.313425 0.271310 -0.822556 -1.391195 0.865630 2 73 Cell Control FS FS 1.077742 -1.000663 4.020000
1913 1178 NaN 16492.024333 16493.337667 0.009766 0.317087 0.267342 -0.852131 -1.435257 0.843119 2 73 Cell Control FS FS 1.050419 -1.066256 1.313333

1914 rows × 18 columns

7. Saving the results of subsequent runs into separate directories

By default plugy saves the figures into the results directory. You can provide a different directory by the results_base_dir option. At each subsequent run in the same working directory, the figures will be overwritten. If you want to keep the old figures and create new directories, you can use the option result_subdirs = True. If you want to add a timestamp to these directory names, add the option timestamp_result_subdirs = True. Before version 0.5.0 this was the default behaviour.

In [ ]: