A Theory for High Dimensional Fluorescence Barcoding of Single Cells

Will Dodd, Maddie McCarthy, Marc Birtwistle


Current methods for genetic screening in cells (such as crispr) are unable to track more than one gene at a time. This presents a problem as most tumors rely on 2-8 driver mutations to function. Therefore, it is critically important to develop approaches for unbiased assessment of genetic interactions in order to better understand how tumor genotypes dictate drug sensitivities. In this paper, we present a theoretical validation of an approach to genetic screening using fluorescent proteins called Multiplexing with Spectral Imaging and Combinatorics (MuSIC). This approach relies on linking fluorescent proteins to create a MuSIC probe with a unique emission spectrum; these spectra can be unmixed through matrix algebra. These protein combinations can be further combined to create MuSIC barcodes (2 probes), which can then be associated to gRNAs to track genetic interactions inside cells. Method: We accomplished this through the use of a MATLAB® program designed to simulate the presence of these barcodes in a cell & rate how effectively the computer is able to distinguish barcodes present in a cell. The computer’s effectiveness is rated by the use of the Matthew’s Correlation Coefficient (MCC) which measures the effectiveness of a binary classifier. Results: When all possible probes were included in the simulation, approximately 26% of the probes had MCCs greater than 0.75, meaning that the computer could correctly determine their presence at least 75% of the time they were simulated. Additionally, once the probe pool was trimmed to 10 out of the original 175 probes, a 100% accuracy rate was achieved. This implies that physical experiments using this concept are feasible and should be able to track genetic interactions within cells.



Genetic screening is quickly becoming commonplace and essential in the world of medicine today, especially in cancer treatment. Unfortunately, most methods used for screening are unable to perform a genetic screen on more than one gene at a time; most tumors however harbor anywhere from 2 to 8 driving mutations that cooperate, or interact, to create the phenotypes witnessed in many cancers (1,2). It is necessary, therefore, to create a new method if genetic screening that follow and detect the interactions between multiple genes; in this study we aim to present a theoretical validation of one such method using MuSIC (Multiplexing using Spectral Imaging and Combinatorixs); an approach previously developed by our lab for fluorescence multiplexing. MuSIC is a new approach to fluorescence multiplexing which uses “probes” to track data in solution (Fig 1). A MuSIC is either an individual fluorophore or a stable 2 or 3 way combination of individual fluorophores. For a simple example, consider the fluorescent proteins (FPs) mTFP1 and mVenus. When the emission spectral data of these 2 FPs are recorded, their typical reference spectral data is seen (Fig 1A, left, middle). mTFP1 however is also a significant FRET (Forster Resonance Energy Transfer) donor to mVenus. This means these to FPs can be combined to form a heterodimer mTFP1-mVenus; this protein has a spectral signature unique to itself (Fig 1A, right). If these three MuSIC probes are then mixed together in solution, the emission spectra of the solution can be used with the reference spectral data for each probe to determine the probe levels of the solution (Fig 1B, Fig 1C). The resulting question from this is, of course, how many probes can be formed from a set of FPs? A previous study by our lab investigated these limits and found that from a starting set of 16 FPs, 175 reliable MuSIC probes could be formed (3). In this study, we extend this concept to form MuSIC ‘barcodes’, which are 2 way combinations of probes. Approximately 4000 barcodes can be formed from the 175 MuSIC probes, allowing for excellent tracking of genetic interactions in cells (Fig 2).

Figure 1

Figure 2


In order to validate the MuSIC barcode approach to genetic interactions screening, 3 separate computer simulations were run. The first simulation simulated an equiconcentrated mixture of 4 random probes (from the list of 175); the computer than attempted to classify a probe as either present or absent from the solutions based off a threshold unique to each probe (see appendix A) (Fig 3A). This was run many times to create a statistically powerful data set which could be used to test the accuracy of the simulation. In this study, we used the Matthew’s Correlation Coefficient (MCC) to measure the accuracy of the simulation; this metric is quickly becoming one of the favored methods for evaluating binary classifiers, as it takes into account imbalances in data (4). After the primary simulation was run, we developed a second simulation to test the effects of shrinking the possible pool of barcodes that could be present in solution. This simulation was run nearly identical to the first, however in this case after each data set had been formed, the probe which the computer did the worst job of classifying was removed from the sample pool, therefore preventing it from being simulated again (Fig 3B). This was done until only 4 probes were left in the sample pool. After this second simulation, a third simulation was run which investigated if, in a smaller sample pool, certain probes performed better than others. In this simulation, the sample pool was composed of a random set of 10 of the original 175 probes as opposed to all 175 probes being present in the sample pool; the simulation than proceeded identical to the primary simulation.

Figure 3


After examining the data from the first simulation, it was evident that the primary simulation in and of itself was very successful. Out of the 175 probes simulated, 45 of them had MCC scores over 0.75 (Table 1). This heavily implies the validity of our approach to genetic interaction, as these results show that we are able to accurately distinguish between probes when they are present in a cell. The results from simulation 2 indicate that simulation accuracy improves with decreasing sample pool size; the average probe MCC in simulation 2 increases proportionately to the decrease in number probes (Fig 4). Additionally, simulation 2 found that the ideal sample pool size is 10 probes; this is the maximum amount of probes the computer can simulate with 100% accuracy every time. Simulation 3 was then run to determine if any probes success would be enriched in the small sample pool. Its results indicate that the simulation performs accurately no matter which 10 probes are in the sample pool. This is seen by the fact that the average MCC over a 100 trials of simulation 3 was 0.9906. As a higher MCC indicates greater accuracy, an MCC this high shows that the probes chosen to form the 10 probe sample pool are irrelevant as long as there are only 10 of them.

Table 1

Figure 4


In conclusion, the above data suggest this is a very plausible experiment to perform in vitro. The success which the computer had in determining the presence of probes in solution indicate the same thing is possible in real cells, showing this experiment can be performed in real cells with excellent results. Future steps to take after this study include an experimental validation of this concept, which could than be followed experiments aimed at genetic interaction screening.


  1. Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens. Nature 568, 511–516 (2019).
  2. Vogelstein, B. et al. Cancer Genome Landscapes. Science 339, 1546–1558 (2013).
  3. Holzapfel, Hy. et al. Fluoresence Multiplexing with Spectral Imaging and Combinatorics. ACS Comb 20 (11), 653-659 (2018).
  4. Chicco D, Jurman G (January 2020). “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation”. BMC Genomics. 21 (6). doi:10.1186/s12864-019-6413-7.