DNA Computing
 Member: Bing Hu     QiKai Xu     Chenjue Wang     Xiaoyang Kuang
 

Please click here to see the presentation slides

 

DNA Computing (DNAC)

 DNA computing, in the literal sense, is the use of DNA (Deoxyribose Nucleic Acid) molecules, the molecules which encode genetic information for all living things, in computers. This is accomplished in a suspended solution of DNA, where certain combinations of DNA molecules are interpreted as a particular result to a problem encoded in the original molecules present. DNA computing is currently one of the fastest growing fields in both Computer Science and Biology, and its future looks extremely promising.

A highly interdisciplinary study incorporating the research results of computer scientists and biologists.  In the literal sense, DNA computing is the use of DNA molecules, which encode genetic information for all living things, in computers.  It is accomplished in a suspended solution of DNA, where certain combinations of DNA molecules are interpreted as a particular result to a problem encoded in the original molecules present.  Dr. Leonard Aldeman is a pioneer of DNA computing for his solution to solve Hamiltonian Path Problem using DNA strands.

 First and foremost, DNA computing is useful because it has a capacity lacked by all current electronics-based computers: its massively parallel nature. What does this mean, you ask? Well, essentially while DNA can only carry out computations slowly, DNA computers can perform a staggering number of calculations simultaneously; specifically, on the order of 10^9 calculations per mL of DNA per second! This capability of multiple cotemporal calculations immediately lends itself to several classes of problems which a modern electronic computer could never even approach solving. To give you an idea of the difference in time, a calculation that would take 10^22 modern computers working in parallel to complete in the span of one human's life would take one DNA computer only 1 year to polish off!

History of DNA Computation

 "Computers in the future may weigh no more than 1.5 tons." So said Popular Mechanics in 1949. Most of us today, in the age of smart cards and wearable PCs would find that statement laughable. We have made huge advances in miniaturisation since the days of room-sized computers, yet the underlying computational framework has remained the same. Today's supercomputers still employ the kind of sequential logic used by the mechanical dinosaurs of the 1930s. Some researchers are now looking beyond these boundaries and are investigating entirely new media and computational models. These include quantum, optical and DNA-based computers. It is the last of these developments that this paper concentrates on.

 The idea that living cells and molecular complexes can be viewed as potential machinic components dates back to the late 1950s, when Richard Feynman delivered his famous paper  describing "sub-microscopic" computers. More recently, several people have advocated the realisation of massively parallel computation using the techniques and chemistry of molecular biology.

 DNA computing was grounded in reality at the end of 1994, when Len Adleman of USC announced that he had solved a small instance of a computationally intractable problem using a small vial of DNA. By representing information as sequences of bases in DNA molecules, Adleman showed how to use existing DNA-manipulation techniques to implement a simple, massively parallel random search.

 Scientists at the Universities of Liverpool and Warwick (UK) are currently building a prototype DNA computer to solve a different problem . The problem of "colouring" has a long history. Given a map of mainland Europe, we know that we can colour each country one of four colours such that no two countries sharing a border are coloured the same. However, what happens if we lose a crayon? Can we still legally colour the map using only three colours? This problem is a member of the large class of notoriously intractable NP-complete problems, as are the Travelling Salesman and Bin Packing problems. These problems are characterised by an exponential- size search space; a problem of size 10 may take a fraction of a second to solve on a PC, but one of size 30 may take years.

 

DNA Fundamentals

 DNA, Deoxyribonucleic Acid, is the molecular basis of heredity and localized especially in most cell nucleus. DNA    molecules consist of two long chains held together by complementary base pairs.

 A DNA chain is a long, unbranched polymer composed of only four type subunits. These are the deoxyribonucleotides containing the bases adenine (A), cytosine(C), guanine (G), and thymine (T). The nucleotides are linked together by covalent phosphodiester bonds that join the 5’ carbon of one deoxyribose group to the 3’ carbon of the next. The four kinds of bases are attached to this repetitive sugar-phosphate chain. 

 The two long chains of a DNA molecule are held together by complementary base pairs.  Three hydrogen bonds form between G and C, and two hydrogen bonds exist between A and T.  The base-pairing mechanism is the basis for DNA replication. 

Figure 1         DNA Structure

As a direct consequence of the base-pairing mechanism, it becomes evident that DNA carries information by means of the linear sequence of its nucleotides. Each nucleotide-A, C, T, or G – can be considered a letter in a four-letter alphabet that is used to write our biological messages in a linear “ticker-tape” form. Organisms differ because their respective DNA molecules carry different nucleotide sequences and therefore different biological message.

Since the number of possible sequences in a DNA molecule, which is n nucleotides long, is 4ⁿ, the biological variety that could in principle be generated using even a modest length of DNA is enormous. A typical animal cell contains a meter of DNA (3*10⁹ Nucleotides). Written in a linear alphabet of four letters, an unusually small human gene would occupy a quarter of a page of text, while the genetic information carried in a human cell would fill a book of more than 500,000 pages.

 

The Hamiltonian Path Problem

Figure 1 shows a diagram of the Hamiltonian Path problem. The objective is to find a path from start to end going through all the points only once. This problem is difficult for conventional computers to solve because it is a "non-deterministic polynomial time problem" (NP). NP problems are intractable with deterministic (conventional/serial) computers, but can be solved using non-deterministic (massively parallel) computers. A DNA computer is a type of non-deterministic computer. The Hamiltonian Path problem was chosen because it is known as "NP-complete"; every NP problem can be reduced to a Hamiltonian Path problem.[4]

 

 

 

 

 

 

 

   

The following algorithm solves the Hamiltonian Path problem:

  1. Generate random paths through the graph.
  2. Keep only those paths that begin with the start city (A) and conclude with the end city (G).
  3. If the graph has n cities, keep only those paths with n cities. (n=7)
  4. Keep only those paths that enter all cities at least once.
  5. Any remaining paths are solutions.[1]

The key to solving the problem was using DNA to perform the five steps in the above algorithm. The following operations can be performed with DNA:

    • Synthesis of a desired strand
    • Separation of strands by length
    • Merging: pour two test tubes into one to perform union
    • Extraction: extract those strands containing a given pattern
    • Melting/Annealing: break/bond two ssDNA molecules with complementary sequences
    • Amplification: use PCR to make copies of DNA strands
    • Cutting: cut DNA with restriction enzymes
    • Ligation: Ligate DNA strands with complementary sticky ends using ligase
    • Detection: Confirm presence/absence of DNA in a given test tube[4]

The above operations can be used to "program" a DNA computer.

To implement step 1 of the algorithm, Adleman created a 20-mer sequence of DNA for each city A through G. For each path i>j, an oligonucleotide was created that was the 3' 10-mer of i and 5' 10-mer of j (see figure 2). A>B contained all of A and the 5' 10-mer of B, and E G contained the 3' 10-mer of E and all of G to eliminate any possible sticky ends. For each city i, a Watson-Crick complementary oligonucleotide was synthesized and designated ~i.

For each city except A and G, and for each path, 50 pmol of and 50 pmol of i>j, respectively, were mixed together in a single ligation reaction. The oligonucleotides served as splints to bring paths between common cities to associate for ligation. The ligation reaction therefore resulted in the formation of DNA molecules encoding random paths through the graph.

 

   

Step 2 was implemented by using A and ~G as primers for PCR to amplify all "paths" from starting with A and ending with G.

For step 3, the product of step 2 was run on an agarose gel, and the 140 bp band (corresponding to a dsDNA path entering exactly seven cities) was excised and soaked in double-distilled H2O to extract the DNA. This product was then PCR amplified and gel-purified several times to enhance its purity.

In order to complete step 4, the product of step 3 was affinity-purified with a biotin-avidin magnetic beads system. The dsDNA from step 3 was melted and the ssDNA product was incubated with ~A conjugated to magnetic beads. Only those ssDNA molecules that entered city A at least once were annealed to the bound ~B and retained. The process was successively repeated with ~C, ~D, ~E, and ~F.

Step 5 was performed by amplifying the product of step 4 with PCR and running it on a gel. Figure 3 shows the results of these procedures. In figure 3A, lane 1 is the result of the ligation reaction in step 1. The smear is consistent with the construction of molecules encoding random paths through the graph. Lanes 2-5 show the results of the PCR reaction in step 2. The densest bands correspond to the amplification of molecules encoding paths that began at A and ended at F.

Figure 3B shows graduated PCR of the final product of the computer. Graduated PCR allows one to "print" the results of a computation, using A as the right primer and ~B through ~G as the left primer lanes 1-7, respectively. The graduation shows the progression from A>B>C>D>E>F>G.

 

 

 

   

The above procedure took approximately one week to perform. Although this particular problem could be solved on a piece of paper in under an hour, when the number of cities is increased to 70, the problem becomes too complex for even a supercomputer. The fastest supercomputers can currently perform 1000 million instructions per second (MIPS); a single DNA molecule requires approximately 1000 seconds to perform an instruction (.001 MIPS). Obviously of you want to perform one calculation at a time (serial logic), DNA computers are not a viable option. However, if one wanted to perform many calculations simultaneously (parallel logic), a computer such as the one described above can easily perform 10^14 MIPS. DNA computers also require less energy and space. While existing supercomputers operate 10^9 operations per Joule, a DNA computer could perform 2 x 10^19 operations per Joule (10^10 times more efficient). Data can be stored on DNA at a density of approximately 1 bit per cubic nm, while existing storage media require 10^12 cubic nm to store 1 bit.[1]

Advantages and  Disadvantages

The advantages presented by a DNA computer are amazing. Their capacity for memory storage is tremendous. Also, they are inexpensive to build, being made of common biological materials. Many of the DNA molecules could be reused with a little splicing, so the whole computer is really materialistically very efficient. Then, their computational power is incomparable to anything in existence. First and foremost, DNA computing is useful because it has a capacity lacked by all current electronics-based computers: its massively parallel nature. What does this mean, you ask? Well, essentially while DNA can only carry out computations slowly, DNA computers can perform a staggering number of calculations simultaneously; specifically, on the order of 10^9 calculations per mL of DNA per second! This capability of multiple cotemporal calculations immediately lends itself to several classes of problems which a modern electronic computer could never even approach solving. To give you an idea of the difference in time, a calculation that would take 10^22 modern computers working in parallel to complete in the span of one human's life would take one DNA computer only 1 year to polish off!

 However, DNA computers do have their disadvantages. Although Adleman's first application of the computer took only milliseconds to produce a solution, it took about a week to fish the solution molecules out from the rest of the possible path molecules that had formed. To make these computers more realistically viable, the DNA splicing and selection equipment needs to be refined for this purpose and better methods for fishing developed. There is also no guarantee that the solution produced will necessarily be the absolute best solution, though it will certainly be a very good one, arrived at in a much shorter time than with a conventional computer. DNA computers could not (at this point) replace traditional computers. They are not programmable and the average dunce can not sit down at a familiar keyboard and get to work. Some think that in the future, computers will be a combination of the current models and DNA, using the most attractive features of both to create a vastly improved total product. However, research is ongoing in doing Boolean logic with DNA and designing universal (programmable) DNA computers.

 And of course we are talking about DNA here, the genetic code of life itself. It certainly has been the molecule of this century and most likely the next one. Considering all the attention that DNA has garnered, it isn’t too hard to imagine that one day we might have the tools and talent to produce a small integrated desktop machine that uses DNA, or a DNA-like biopolymer, as a computing substrate along with set of designer enzymes. Perhaps it won’t be used to play Quake IV or surf the web -- things that traditional computers are good at -- but it certainly might be used in the study of logic, encryption, genetic programming and algorithms, automata, language systems, and lots of other interesting things that haven't even been invented yet.

The Future of DNA Computing

Some centers of research in this area are at the University of Southern California at Los Angeles, with Dr. Adleman, Princeton, with Dr. Richard Lipton and his graduate students Dan Boneh and Eric Baum, and the  NEC Research Institute in Princeton, NJ. With others elsewhere, they are developing new branches in this young field. Advancements are being made in cryptography. Researchers are working on decreasing error in and damage to the DNA during the computations/reactions. The Princeton contingent has published papers on models for universal DNA computers, while others have described methods for doing addition and matrix multiplication with these computers.

 Currently, molecular computing is a field with a great deal of potential, but few results of practical value. In the wake of Adleman's solution of the Hamiltonian path problem, there came a host of other articles on computation with DNA; however, most of them were purely theoretical. Currently, a functional DNA "computer" of the type most people are familiar with lies many years in the future. But work continues: in his article Speeding Up Computation via Molecular Biology Lipton shows how DNA can be used to construct a Turing machine, a universal computer capable of performing any calculation. While it currently exists only in theory, it's possible that in the years to come computers based on the work of Adleman, Lipton, and others will come to replace traditional silicon-based machines.

 The field of DNA computing is truly exciting for the revolution it implies will occur within the next few years. It also demonstrates the current trend of merging and lack of distinction between the sciences, where a computer scientist can mess around with biology equipment and come up with something new and valuable.

References

[1] DIMACS Proceedings: DNA Based Computers I (#27), II (#44), III (#48), IV (Special Issue of Biosystems), V(MIT, June 1999)

 [2] Other: Genetic Programming 1 (Stanford, 1997), Genetic Programming 2 (Wisconsin-Madison, 1998), IEEE International Conference on Evolutionary Computation (Indianapolis, 1997)

 [3] Richard P. Feynman, There's Plenty of Room at the Bottom. In D. Gilbert, ed., Miniaturization, 282-296. Reinhold (1961)

 [4] Leonard Adleman, Molecular computation of solutions to combinatorial problems. Science 266, 1021-1024 (1994)

 [5] Martyn Amos, Alan Gibbons and David Hodgson, Error-resistant Implementation of DNA Computations. In Proceedings of the Second Annual Meeting on DNA Based Computers, Princeton University. American Mathematical Society (1996) (To appear)

 [6] Michael R. Garey and David S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company (1979)

 [7] http://www.wi.leidenuniv.nl/~jdassen/dna.html

 

[8] http://dope.caltech.edu/winfree/DNA.html

      

[9] http://www.msci.memphis.edu/~garzonm/bmc.html