Gene Sequence Visualization Sonification

by M. Quinn


The following images are created by Gene Sequence Sonification/Visualization software developed by Martin Quinn.

Angles of movement are used to represent the transitino from one nucleic acid to the next.

The twelve possible transitions are described by moving in some direction from each successive step in the sequence, considering the current step to be the center of an imaginary clock.
The twelve hours of this genomic clock are assigned to transitions as follows:

  1. CT 270 12 o'clock
  2. GC 300 1 o'clock
  3. TA 330 2 o'clock
  4. AG 360 3 o'clock
  5. AT 30 4 o'clock
  6. CG 60 5 o'clock
  7. GA 105 6 o'clock Offset by 15 degrees till 11 o'clock
  8. AC 135 7 o'clock
  9. GT 165 8 o'clock
  10. TC 195 9 o'clock
  11. CA 225 10 o'clock
  12. TG 255 11 o'clock

Mouse

Mouse

Human
Human

Ribosomal L14
Ribosomal

RibosomalL14.aiff
L14AA+TERM.aiff


Bactrio..
Bactrio

CBS
CBS

Random generated sequence
Random

sAVH6 satellite tripeptide sequence in middle
Satellite SAVH6

Genes 1
genes 1

Genes 2
Genes 2

The following image transforms text into a design by using similar angle assignment algorithms, but based on divisions of the circle into 26 parts.

Text converted to geometry


Sonification Examples

In the following example, we present an initial mapping from DNA to music in the form of an audio legend. We use an invented sequence as follows:
 

Audio Legend aiff sound file (long)
      1. acacacacacacacacac
      2. cccccccccccccccccc
      3. agagagagagagagagag
      4. gggggggggggggggggg
      5. atatatatatatatatat
      6. tttttttttttttttttt
      7. cacacacacacacacaca
      8. aaaaaaaaaaaaaaaaaa
      9. cgcgcgcgcgcgcgcgcg
      10. gggggggggggggggggg
      11. ctctctctctctctctct
      12. tttttttttttttttttt
      13. gagagagagagagagaga
      14. aaaaaaaaaaaaaaaaaa
      15. gcgcgcgcgcgcgcgcgc
      16. cccccccccccccccccc
      17. gtgtgtgtgtgtgtgtgt
      18. tttttttttttttttttt
      19. tatatatatatatatata
      20. aaaaaaaaaaaaaaaaaa
      21. tctctctctctctctctc
      22. cccccccccccccccccc
      23. tgtgtgtgtgtgtgtgtg
      24. gggggggggggggggggg
The data maps to sound towards the goal of recognizing structure from sound. Three or more levels of sequence information can be presented simultanously. Single ACGT nucleic acids sound as parts of a drumset. A will sound as a snAre drum, C will sound as a Cymbal, G will sound as a hi(Gh)hat closed cymbal. T will sound as a tom or bass drum. In each case, a sound of the assigned type is played using intelligent rhythm selection technology to enhance long listening and musicality.

Transition from one letter to the next is represented by twelve notes from a C major scale played on marimba. This includes each of the twelve possible transitions. The sample sequence is designed to highlight these transitions.

Amino acid triplets are assigned a string section sound. Hydrophobic AA's sound much higher than other AA which sound below the marimba pitches. All notes again are selected out of a C major scale.



Ribosomal-l14 with a long 'staircase' tripeptide in the middle of it.

Ribosomal-14 with 30 acgt from salmon inserted in the middle of it. Left side plays the original sequence, Right side plays the mutated slightly longer sequence.


Acute biological warfare defense by rapid anti-sense determination using advanced strategic audio biowar defense workstation (asabd) to direct the synthesis of dna and rna active drug in the cbd(chemical and biological defense) command outpost is the goal of this feasibility project.

The key to this design is the use of advanced computer audio, haptic and graphic information display to characterize the threat, and search for candidate matching anti-sense and binding site reagents in minutes. interpretation of this data is augmented by computer, but experienced multi-media human pattern perception and experience is key to the determination of the threat, selection and synthesis of the defense reagent. we see this as a prototype system as the bio-warfare defense equivalent as the sonar(auditory) combined with radar(visual) advanced threat analysis system.

Computer aided surgery, inc. seeks to build from its base in the conceptualization, design, and implementation of advanced tactical audio displays for neurosurgery and break the new high ground in strategic threat display and tactical countermeasure selection and synthesis. We propose to prototype, using elements of our existing sgi computer graphics and lake dsp sonification, novel display technology of sniffer and Merlin, and other multiple reagent binding data and build a display system to enable the user to rapidly characterize a digital morse decomposition of the affinity tree data. display of the tree should then enable the user to make tactical decisions about defensive reagents that can block the dna binding site(s) and order their dispersal to civilians or troops at risk.


Genometry

Genometry is a geometric and sonic representation of the structure of dna sequences. the representation can include several levels of data sequence history. for instance, a no history representation is concerned with only the four individual nucleic acids, a, c, g, t. on the other hand, a two character history is concerned with the current letter, and the prior letter. this is considered a two character transition from one letter to the next. for acgt dna data, there are twelve such transitions. three character transitions consist of the amino acid combinations. there are 64 possible three character combinations which condense to 20 amino acid forms.

Genometry uses colored lines at specific angles to represent sequence information. each angle is calculated from each successive point, considering that point the center of an imaginary circle. the length of the line can be adjusted to display more or less data. the angles are calculated from a horizontal line as a move of 0 or 360 degrees.

Different historical views have different sets of angles to represent the information. individual nucleic acids use angles of 270, 360, 315 and 45. this creates a line that will move always to the right and lines which never cross.

The sonic representation of this lowest level of organization assigns various parts of a drumset to the letters a, c, g and t. not just one sound is heard for each letter, but rather a representative sound from a particular drum set group. for 'a' we hear a snare drum, for 'c' a cymbal, for 'g' a high hat closed cymbal effect, and for 't' we hear tom toms and bass drum. this technique utilizes patterns for each group that do not repeat quickly and so add to the musical quality of the listening experience.

For the visual representation of two letter transitions we use a set of 12 angles generally corresponding to the positions on the face of a clock. for the numbers 6 - 11, however, we add 15 degress to their normal positions. the following code snippet lists the assigned angles.

static int ct = 270; // 12 oclock

static int gc = 300; // 1 oclock

static int ta = 330; // 2 oclock

static int ag = 360; // 3 oclock

static int at = 30; // 4 oclock

static int cg = 60; // 5 oclock

static int ga = 105; // 6 oclock

static int ac = 135; // 7 oclock

static int gt = 165; // 8 oclock

static int tc = 195; // 9 oclock

static int ca = 225; // 10 oclock

static int tg = 255; // 11 oclock

the sonic representation assigns angles, starting with gc at 300, to notes out of a c scale starting from middle c and played on marimba.

For the amino acid display we use a set of 20 angles assigned to highlight the following characteristics.

stongly basic (+) - k,r near 3 o'clock

strongly acidic (-) - d,e near 9 o'clock

hydrophobic - a,i,l,f,w,v spread near 6 o'clock

polar - n,c,q,s,t,y spread near 12 noon

all others are neutral (p, g, etc) spread in 4 directions nw, ne, se, sw
 

The sound for the amino acids are strings, currently hydrophobic ones are heard as very high note, others much lower notes. the terminating amino acids have an additional sound of a big low drum associated with them to highlight the overall structure.

Various techniques will be employed to sonify dna information.

 
- musical tourings at different zoom levels.

controlled by mouse or 3d tracking device. allows for both 'formal' and 'informal' repeatable tours of the data. allows waiving/scanning over/thru the data with resulting sound that reflects the data terrain.

the data will be organized into as many as 7 zoom levels corresponding to current levels seen in the genome viewer by will gilbert at http://bioinformatics.unh.edu.

- the functionality of regions will map to changes in music.

primary focus and goal is transcription and translation areas - thereby identifying promising areas for anti-sense during touring. - translation of information into highly polyphonic music.

scales of instruments, pitches, dynamics, rhythms, patterns, and control changes, along with key word driven textual musical translations identify regions and sequences in the dna data by musical texture. additional musical underpinnings highlight intelligent context information of functional sections identified from gene research.

a formal method allows a tour of the data within an appropriate length of time not to exceed 5-10 minutes.

a sonic counterpart will highlight and identify agct transitions. the sequence of musical events that correspond to the twelve possible transitions between acgt are represented visually by geometric elements employing twelve angles or other such visual components. presenting agct sequences as natural looking coastlines and shapes allows the visual pattern recognition facilities to be integrated with sonic equivalents reinforcing the pattern matching capabilities of the user.
 

Play options will allow for the following:  
Play a grouping of components as an initial chord followed by a series of chords that represent the sequences contained within that grouping. in other words, each component is represented by one or more notes of a particular kind of instrument. the next sound for x sounds is the individual information in each component. say component x has 30 amino acids and component y has 20. you would hear 10 more notes from component x after hearing 20 notes in polyphony from both components. the groupings are separated in 3d sound.

- fly over the data in a sonic plane.

Use 3d sound to identify where we are and rhythmic presentation to identify how fast we are travelling. up high we hear comglomerate sounds. as we zoom in, we get individual components such as proteins, then individual cgta. (or however many levels of groupings that are identified)

- represent large (3 billion) sequences as sound wave forms that are simply played.

In a sense we create a drum head out of the data and play it. if the drum head is played on a few positions we hear harmonic differences due to the underlying data characteristics.

- the dna shaker.

A kind of dna shaker facilitates data touring. consider the shakers boundaries being defined by the dna data itself. at any particular level and/or area of interest, a set number of sound beans can be shaken around within that data space. when the conceptual beans hit the side of the shaker, the data is played at that location. the sound response is fast, as in a real shaker. users define the level of detail and the zoom data level heard at any one time. simple gestures or other input control the zoom level. the user selects the data areas that become the shaker's surface.

Mr. Martin S. Quinn is the well known composer and computer scientist of the 'climate symphony', a 6 minute sonification of 110,000 years of ice core climate data. he specializes in innovative musical generation software based on drumming and combinatoric principles. His work is being well received in scientific presentations throughout the world. he has also created novel geometric methods for perceiving and comparing rhythms and sequences. His music work with doah world music ensemble reached #7 in the billboard charts.