Sunday, January 22, 2017

Confucius in 90 minutes

Confucius in 90 minutes - Paul Strathern

Confucius (Kungfutzu) lived around 600BC in North central coastal China. After his education and working in various professions, he started a successful school for bureaucrats, where he taught his philosophies of conduct and ethics. Among his students were the sons of rulers. Later in his life he traveled through China,  meeting and advising rulers of various states.

Confucianism was a philosophy that evolved out of his teachings. Though widely considered a philosopher, his teachings were practical rather than religious or metaphysical. This contrasted with the other major philosophy of the time, Taoism ("the way"), which dealt with metaphysics. Confucianism dealt with the conduct and morality of rulers, bureaucrats, and citizens . The central premise was that ordinary activities of individuals are sacred and must be conducted in an ethical manner. The central concept is "jen" - a quality of magnanimity, virtue and honesty which every individual should strive for. The goal was to produce a society of individuals who live a life of harmony and virtue.

Confucius' thoughts are mainly encapsulated in pithy sayings documented in his books of sayings - the Analects. Confucianism has other sacred texts (some predated Confucius, others were edited by his followers), called the Four books (one of which is the Analects) and the Five Classics (among which are IChing: Book of Changes, dealing with metaphysics and the cosmos as an interaction of yin and yang, the Book of Poetry, the Book of History). They are considered to broadly encapsulate Confucian thought, and have influenced Chinese culture for over two thousand years.

Sunday, November 24, 2013

The Physics of Wall Street

The Physics of Wall Street - James Owen Weatherall


The book describes the evolution of quantitative trading through a history of the major principles involved in building the financial models, the researchers who proposed them, their impact and the reasons behind their failures. Each chapter deals with a single principle behind a model. Successive chapters follow the evolution of these principles, starting with the random walk model, progressing through delta/dynamic hedging, chaos theory, black box modelling and ending with extreme event detection through log periodic variations. A good read for someone who is interested in understanding the basics of quantitative finance. Recommend.

Synopsis:

 Chapter 1: Louis Bachelier, "A theory of speculation"

  • Bachelier's dissertation ("A theory of speculation") proposed that random walks could be used to model stock prices.
  • Valid if the trade is a fair bet. Intuitively a trade means that buyer believes information is positive, seller believes information is negative, therefore the trade price is the price at probability of going up == probability of going down. Equivalent logic to the Efficient Market Hypothesis (Fama, Chicago School)
  • If stock prices follow a random walk, probability of distance from starting point is normal, variance increases with time. Distribution of future price at a time is normal, mean is at the starting price, variance depends on  the time. As time increases, variance increases, normal curve becomes flatter
  • Bachelier extended the model to options/derivatives. Fair price of an option is price that would make it a fair bet. Used random walk to calculate probability of a future price, and derived a fair price estimate.
  • Basis for model is somewhat flawed, e.g if efficient market hypothesis was true, bubbles could not happen. Also, model was not fully validated with real data.

Chapter 2: Maury Osborne, "Brownian motion in the stock market".

  • If Bacheliers' hypothesis were right, stock prices would be normally distributed, which was not supported by real data. Osborne showed that returns were normally distributed, so stock prices follow a log normal distribution. Rate of returns follow a random walk (prices change by a fixed %age, not by a fixed amount) i.e. prices are log normally distributed
  • Has an intuitive basis: Investors do not care about absolute price, they care about rate of return. Also, from the Weber-Fechner psychological principle: logarithms model human response to stimuli
  • Hypothesis that markets are random seems to indicate that in the long term, investments will yield no gain, However, estimating future values of options can be used to develop instruments that yield a profit.
  • Later Osborne, rejected the memoryless efficient market hypothesis in favour of the memory based models:  after prices go up they are likely to go down and vice versa. Fundamental change in assumption from random walk

Chapter 3: Benoit Mandelbrot,  "The fractal geometry of nature"


  • Mandelbroit's work showed that real market returns are governed by Levy stable distributions with 1 < alpha <2 i.e long tailed distributions
    Extreme events occur much more often than predicted. Makes random walk based models obsolete.
  • Mandelbrots theories emerged around the same time that random walks were gaining traction in financial modeling, but were unable to gain much traction because of complexity/tractability.  Random walk gives good results "most" of the time. Long tailed models are not tractable.
  • Notes on  long tailed distributions
  • Levy-stable distribution: Alpha characterizes the tail.  Normal: 2, Cauchy 1, (<1 => distribution has no average). Self similar features have no average
  • Zipf's law: Frequency of occurrence of events related to ranking.
  • Pareto principle: 80:20 rule
  • Cauchy distribution: Long tailed distributions

 Chapter 4: Edward Thorp, "Delta hedging"


  • Bachelier, Osborne, Mandelbrot did not apply their theories to real investments. Ed Thorp was the first to apply their theories to the market
  • Card counting strategy to achieve a favourable strategy for 21. If you have a strategy (edge) that is probabilistically profitable in the long run,  how can you estimate betting amount to avoid "Gamblers ruin". Thorp used Kelly'c criteria (information theory) Probability of likelihood of correctness when a message is distorted by noise. Calculated the optimal amount to gamble when betting in a favoured bet. Kelly criteria specifies fraction to bet given the advantage and payout. Shows that rate of return equals information rate
  •  Applied the strategy to options (warrants): 
    • Used Osborne/Bachelier equations to estimate how much a warrant should be worth. Thorp found most options were overpriced using pricing theory. This provided an edge in the warrant market (not the stock market). 
    • Used short selling of options to exploit the edge. Short selling:allows investors to bet against a stock, without owning it.
    • Thorp hedged short sale of warrants against underlying stock. - The first hedge fund.  Underlying stock protect against increase in option value. Protects against all but large changes in stock value.  Controls risk, but does not eliminate it.
    • Procedure: Fair price of option is price at which it is a fair bet.Assume stock prices are log normal, calculate option price, calculate  proportion of stocks and options to execute delta hedging
  • Even though hedging guarantees profit, long and short term profits are taxed differently, reducing profits 

Chapter 5: Fischer Black, "Black-Scholes-Merton options pricing model",  "Dynamic hedging"

  • CAPM: Capital Aset Pricing Model: Proposed model that assigned a price to risk. Linked risk and return via a cost benefit analysis of risk premiums
  • Dynamic hedging: It is always possible to construct a portfolio consisting of an asset and its option that is always risk free 
  • Procedure:  
    • Assume there exists a mix of stock/options to construct a risk free portfolio
    • Use CAPM to calculate risk free rate
    • Calculate price of options in order  to realize the risk free return
  • Allowed banks to construct options to sell them. Banks could sell options, and reduce risk, by buying corresponding asset
  • 1987 crash:Portfolio insurance based hedge fund. O'connor used a moidified BlackScholes model that accounted for long tail events- was not impacted by the 1987 crash

Chapter 6:  The Prediction Company, Lorenz, "Chaos theory"

  • Lorenz developed Chaos theory -  Sensitive dependence of state on initial conditions
  • The Prediction Company was started by a group of physicists with expertise in chaos theory, and prediction algorithms. Their objective was to find the signal in the noise, applying understanding of chaos, genetic algorithms Developed statistical arbitrage on correlated assets e.g. Pairs trading, algorithms around voting for trades
  • Most significant contribution was Black box modelling - building balck boxes that predicted based on accuracy on past real data (training sets, etc.)
  • One premise was that markets are inherently unpredictable, obey efficient market hypothesis, which implies they should be impossible to predict. However if anamolies (e.g a stock price is away from its normal, expected value) are detected they can be exploited before market returns to equilibrium. Need computation power and speed to detect, take action

Chapter 7: Didier Sornette: "Self organisation"

  • Ruptures in physical systems result from a self organization of components. Self organisation:uncorrelated entities begin to join together in correlated behaviour
    Log periodic patterns predict ruptures. Used to predict breaking of water tanks, earthquakes. 
  • Specific crashes (Dragon Kings) may be caused by state of the market rather than a particular event. More extreme than long tailed events, may be predictable through log periodic observations. Self organisation is difficult to predict, has fractal properties, but log periodic behaviour in properties may indicate system is in 1a dragon king state
  • Predicted the 1997 Asian currency crash, 2000 dotcom crash

 Chapter 8: New Manhattan Project

  • Gauge theory and its application in calculating a new CPI

Notes:

  • Renaissance Technologies: 
    • Medallion Hedge Fund, approx 2500% return  (compared to 1700% Soros)
    • 40% over lifetime, compared to 20% (Berkshire Hathaway)
    • 80% return in 2008 during the crisis
    • One asset is usually a derivative
  •  Derivative:  Contract based on some kind of security: stock, bond, commodity
    • Objective: Reduce risk (historically with commodity futures), now with stock futures
  • Hedge fund: Counterbalanced protofclio comprising asset and its derivate. Calculate relationship between derivative prices and underlying asset price, quantify risk of a fund based on derivatives, keep portfolio in balance.
  • 1971: Chicago Securities Board allowed the first options market
  • Breton Woods 1944 agreement. Fixed exchange rate, all currently tied to dollar, dollar tied to gold. Abandoned by Nixon on recommendation from Milton Friedman (Chicago school). Currency futures became widely traded after this.
  • 1987 crash: Portofolio insurance: Hedge: Buying a stock, short sell futures. Volatility smile: Abnormality in options pricing graphs becuase of short comings of the Black Scholes model
  • 2007 crash: Banks needed an asset hat was like a treasury bond (low risk), that they could provide as collateral on deposits from corporations/other banks (shadow banking system). They used consumer debt (mortgage, credit card, student loans) - Collateral Debt Obligations (CDO). Shadow banking system collapsed when underlying assets became toxic. Mathematical models made a flawed assumption of the independence of failure of individual assets (mortgages). Failure was followed by run on the banks.

Tuesday, March 12, 2013

What to listen for in music

What to listen for in music: Aaron Copland, 1939

The book summarizes the basics needed to understand and appreciate music at a reasonably deep level. It focuses mostly on western classical music, with some mention of jazz. It covers the process of listening and composing, the 4 major elements (rhythm, melody, harmony, tone color) and musical structure (4 major forms in western classical music). Was written in the 30s, so the focus and writing style reflect that.

Well worth the read for someone (like me) with no formal music training.

Preliminaries

Most people have the prerequisites to developing an appreciation of music, though they may not be aware of it.
  • Short sequence recognition:  Ability to recognise a melody i.e. a short progression of notes 
  • Long  recognition: Ability to relate what happens in a section of music to what happened before and what happens after

    How we listen to music

    There are different ways in which one may attempt to listen to music:
    • Sensuous plane:Listening without thinking, a diversion.
    • Expressive plane: The feeling that the composer is striving to express, or the feeling that the listener feels. The meaning of the music. A controversial topic because of the difficulty in identifying what a musical work expresses.
    • Musical plane: The manipulation of the notes: sequences, combinations, speeds, patterns. This book deals with this plane

    The creative process 

    Music works are composed using different methods. Types of composers include:
    • Spontaneously inspired: Composers begin with a composition that is close to completion. E.g Schubert
    • Constructive: Continuously refinement of themes. E.g. Beethoven, as deduced from his notes.
    • Traditionalist: Starts with a pattern, rather than a theme. The pattern may be, e.g. the music style of the age/place. E.g. Bach
    • Pioneer: Opposite of traditionalist. Is experimental, adds new harmonies, new principles

    Elements of music

    4 essential elements:
    • Rhythm
    • Melody
    • Harmonic
    • Tone color

    Rhythm

    Measured music system: 

    • Rhythmic units are divided into measures separated by bar lines
    • The bar line generally has 4 instants.
    • Number of notes between the bars is used to define the system: E.g. 2/4, 3/4, 5/4, 6/4
    • Stress/Accent: Some notes are stressed/accented (down beat)
    • Meter vs. Rhythm: The stressing of note defines the meter

    History:

    • Measured music system started around 1100 AD. Prior to that most music had rhythm that was based on words (Gregorian chants).
    • End of nineteenth century was when newer features started:
      • Combination meters (2/4 + 3/4) were used e.g. Tchaikovsky
      • Grouping of notes within a bars (2-3-2/8)
      • outside the bar
        •  Polyrhythms Two simultaneous different rhythms, e..g 2/4 coincides with 3/4
        •  Sometimes with non coinciding first beats (length of musical unit is different?). E.g one rhythm is 2/4, which overlaps with 3/4
        • Frequently used in Chinese, Hindustani, African music, madrigals (rhythms from words)

    Melody

    • Progression of notes in time, has a skeletal frame
    • Exists within a scale system
      • Scale: Set of notes between a tone and its octave
      • Octave: 12 equal semitones,
      • CC#DD#EFF#GG#AA#BC
    • Chromatic scale 
      • 12 semitones, i.e. all notes
      • CC#DD#EFF#GG#AA#BC
    • Diatonic scale
      • 7 semitones from the 12: 2 whole tones, half tone, 3 whole tones, half tone
      • 12 possibilities,starting with each semitone
      • Starting tone is called the key or tonic
      • Key may be major or minor mode (?): 12 scales in major mode, 12 in minor mode
      • CDEFGABC
    • Four scale systems:
      • Oriental, Greek, Eccelesiatical, Modern
    • Scales center around the tonic, dominant order is 5th, 4th, 7th degree is the leading tone (leads to tonic)

    Harmony

    Started in the ninth century
    • Organum: Same melody repeated at a 4th or 5th interval above or below
      • Interval: Distance between two notes
    • Descant: Two independent melodies moving in opposite directions
    • Faux bourdon: Intervals of 3rd and 6th
    • All chords are built from the tonic, upwards in a series of intervals of a 3rd
    • Triad chord: 1-3-5, 7th chord:  1-3-5-7, 9th (1-3-5-7-9), 11th (1-3-5-7-9-11), 13th (1-3-5-7-9-11-13)
    • Return to the tonic is a principle in all early harmonic work
    • More recent developments:
      • Atonality: Feeling of central tone lost (Wagner), Abandoning tonality (Schoenberg, Debussy). Opens questions of consonance, dissonance
      • Polytonality: Use of multiple tones (right hand plays in one key, left hand in the other)
      • Most work today is diatonic and tonal

     Tone color (or timbre)

    • Quality of sound from the medium e.g. musical instrument, or voice
    • There is a characteristic way of writing for each instrument
    • Single tone colors: Sections of an orchestra
      • Strings: Violin. viola, cello, bass
      • Woodwind: Flute, oboe, clarinet, basoon
      • Brass: Horn, Trumpet, Trombone, Tuba
      • Percussion: Drums
    • Mixed tone colors:
      • Combination of single tone instruments
      • Sting quartet: 2 violins, viola, cello
      • Melodic line passes from one section to another in an orchestra
    • Jazz: Some instruments provide rhythm (piano, bass, percussion), other harmonic texture, one solo instrument plays the melody

    Music Texture

    • Monophonic: Single melodic line, No harmony. E.g Chinese, Hindustani, Gregorian chants
    • Homophonic: Principal melodic line + Chordal accompaniment
      • Contrapuntal view: Two separate melodies progressing in time
    • Polyphonic: Separate and independent voices in the chordal progressions
      • 2-3 polyphonic voices can be perceived independently
      • E.g. Choral prelude (Bach), Jesu

    Music structure


    • Structural background of a lengthy piece of music. Various structures (sonata, fugue) have evolved over years.
    • Sections have a hierarchy. 
      • Large sections denoted by upper cases (A-B-C etc) called movements or sections
      • Smaller sections denoted by lower case (a-b-c..). Analogous to sections and chapters in a book. The classification is made based on how repetition happens

    Larger sections:
    • Exact repetition
    • Sectional (Symmetrical ) repetition: 2, 3 part, rondo ,free sectional
    • Variation: Basso ostinato, passacaglia, chacome, theme
    • Fugal: Fugue, Concerto grosso, Chorale prelude, Motets & madrigals
    • Development: Sonata
    • Free
    Smaller sections:
    • Exact: a-a-a-a
    • Minor alterations: a-a'-a''-a'''
    • Repetition after digression: a-b-a, a-b-a'
    • Non repetition:a-b-c-d

    Fundamental forms I: Sectional form

    Work is divided into distinct sections
    • 2 part form: A-B-A-B. E.g. Scarlatti's sonata, No 413 (Dminor), 104 ( C major), 338 (G minor)
    • 3 part form: A-B-A, B is sometime called the trio, A is the minuet. Nocturne, ballad, elegy, waltz, intermezzo, are likely to be 3 part forms E.g. Minuets of Haydn (String quartet, Op 17, No 5) and Mozart. Beethoven's Scherzo (Piano Sonata Op 27 No 2)
    • Rondo: A-B-A-C-A-D-A-.... i.e. sections separated by return to A. E.g. Haydn's Piano Sonata No 7 in D Major
    • Free sectional form: Any arrangement, e.g. A-B-B, A-B-C-A Chopin's Prelude in C Minor, No 20

    Fundamental forms II: Variation form

    Piece is composed as a set of variations on a theme:
    • Basso ostinato: Short phrase repeated over and over in the bass section, while upper parts proceed, E.g. Soldier's violin form Stravinsky's The Story of a Soldier
    • Passacaglia: Repeated bass part, but the bass part is a melodic phrase, not a figure, with some variation in each section, the work starts with unaccompanied bass theme. E.g. Bach's organ Passacaglia in C minor
    • Chaccone: Very similar to Passacaglia, no starting unaccompanied bass theme, so sounds like the first variation of a Passcaglia. E.g. last movement of Brahm;s Fourth symphony
    • Theme and variations: Variation of a simple, direct theme. Theme is usually a 2 or 3 part form. Five types of variation: Harmonic, Melodic, Rhythmic, Contrapuntal (Combination). E.g. Mozart's A major Piano Sonata: Theme and six variations. Variation 1 s a florid melodic variation, Variation 4 is a skeletonizing of the harmony, Variation 3 is major key to minor key harmony change

    Fundamental forms III: Fugal form

    • Polyphonic/Contrapuntal in texture: Separate strands of melody concurrently. Needs repeated listening to be able to acquire the skill to differentiate the strands. Types of contrapuntal devices:
      • Imitation: Voices follow a leader, may enter at a different note. Only one melody, but spaced in time.
      • Canon: Imitation from beginning to end of piece
      • Inversion: Melody inverted, one voice follows the melody in the opposite direction. E.g. when the original moves one octave forward, the inverted one, moves an octave downward
      • Augmentation: Double time value of notes, slowing it down
      • Diminution: Halves the time values of notes
      • Cancrizans: Melody read backward
      • Inverted cancrizans: Melody backward, then inverted
    • Types:
      • Fugue proper: 3-4 voices
        • First voice enters, Second voice enters, First voice adds a counter melody,, then starts a free voice,
        • Exposition, Subject, Subject, ...Stretto, Cadence
          • E.g. Bach, Well Tempered Clavichord
      • Concerto Grosso:
        • Two groups of instruments: Large (Tutti) and smaller (Concertino) E.g. Bach's Brandenberg Concerti (6 each having a different concertino)
      • Chorale prelude: Originated in choral works in Churches. Melody is kept intact, harmonies are made complexer. Bach's Orgelbuchlein
      • Motets/madrigals: Choral forms, Vocal fugal form. Motet is based on scared words, madrigal on secular works

    Fundamental forms IV: Sonata form

    • 3 or 4 movements (fast-slow-fast, fast-slow, moderately fast, very fast)
      • Created by Karl Bach (JS Bach's son) (Prior to Bach, a sonata was a instrumental work, contrasting with the vocal cantata)
      • 1st movement: Sonata Allegro:
        • 3 parts (ABA):
        • Exposition (abc): First theme is in tonic, dramatic, second theme is feminine, in dominant, closing them in in dominant
        • Development: Free section, combines material in the exposition, new and foreign keys
        • Recapitulation: Repeats exposition but in dominant key
      • 2nd movement: Slow movement, may be a slow Rondo
      • 3rd movement: Minuet or scherzo, A-B-A, three part form
      • 4th movement: Extended rondo or in sonata allegro
      • Sometimes preceded by introduction and followed by a coda. E.g. Beethoven's Waldstein Sonata
    • Symphony: Sonata for orchestra: E.g. Beethoven's 9 symphonies
    • String quartet: Sonata for 4 strings
    • Concerto: Sonata for solo instrument + Symphony
    • Overtures: First movements of a sonata

    Fundamental forms IV: Free forms

     Does not belong to above structures
    • E.g. Preludes (for Piano). E.g. Bach's prelude, fugue
      • Clear progression of chordal harmonies from beginning to end without repetition of any themes. E.g Bach's B minor Prelude in Well Tempered Clavichord
    • E.g. Symphonic poems: Program music (as opposed to absolute music)

    Tuesday, December 25, 2012

    Designer genes - How the forces of natural selection are about to be replaced by the forces of human selection - Stephen Potter

    Designer genes - How the forces of natural selection are about to be replaced by the forces of human selection - Stephen Potter, 2010


    This book surveys developments in genetics over the last fifty years, in particular developments which have lead towards the possibility of genetic engineering of humans. These include:
    • The double helix model of the DNA (Watson/Crick, Nobel prize 1962)
    • The sequencing of the human genome (DNA sequencing, Sanger/Maxam/Gilbert (Nobel prize 1980), Protein sequencing (Sanger (Nobel prize 1958 )), Human Genome Project (Collins/Venter)
    • PCR: Creation of a large number of copies of a DNA sequence (Mullins, Nobel prize 1993)
    • Stem cells: Conversion of adult cells to stem cells (Yamanaka, Nobel prize 2010), allowing creation of multiple embryos by turning stem cells into gametes
    • Modification of the genes of a cell (Capecchi, Evans, Smithies, Nobel prize 2007)
    The book then discusses the ethical implications of technology that commercializes and combines this research to allow human genetic engineering.

    The current state of genetic engineering and what might be possible in the next few years:

    The following technologies have been demonstrated in research. Commercialization of these technologies to reduce cost and increase speed is underway and advances are expected in the next few years:
    • Preimplantation Genetic Diagnosis (PGD): Identifying the presence or absence of a particular gene in an embryo at an early stage (8 cell blastocyte), by extracting a single cell from an 8 cell embryo from IVF.
    • Complete DNA sequencing on an embryo cell in a short time (order of hours)
    • Creation of thousands of embryos simultaneously through stem cells
    • Screening of multiple embryos in parallel
    • Modification of the genes of an embryo cell followed by implantation

    Which genes do what

    • DNA: A long molecule consisting of sequences of bases (A, G, C, T)
    • Codon: A triplet of bases that codes for an amino acid. 4 bases =>4^3 = 64 possible codons
    • Amino acids: Building blocks of proteins:
      • 20 possible amino acids (out of 64 possible codons => 44 codons either do not specify amino acids or more than 1 codon maps to the same amino acid.)
    • Gene: Sequences of codons that code for protein generation
    • RNA: Long molecule consisting of bases ( A, G, C, U)
      • RNA can act as genetic material as well as a proteins. Might explain origin of life (Altman, Cech (Nobel prize 1989))
    • Proteins: Chain of amino acids (few hundred)  =>  >100^20 possible proteins
    • Generation of proteins from DNA:
      • Transcription: DNA used to generate mRNA, aided by protein RNA polymerase
        • RNA polymerase is a protein that takes one strand of DNA and transcribes a RNA copy (only the genes)
      • Translation: mRNA to Protein, happens in the ribosomes of the cell
    • Gene differences
      • Mutations: Deletion of block in DNA sequence-> Deletion of block in protein
      • Frame shift: Deletion of single base
      • Single base difference: Single nucleotide polymorphism (SNPs):
        • Can be in codons or non coding part
      • An individual has 2 copies of each gene: One from each parent

    Sequencing the human genome

    • Genome: 3 billion bases
    • DNA per protein: 3 bases per amino acid, 100 -1000 amino acids per protein
      • => 3 billion/3000 = 1 million potential protein or genes per genome
    • Turns out there are only 30K genes in the human genome
      • => Approx 2.5%
      • Determined by transcription analysis of RNA
    • Differences in people = 0.1% of the genome i.e 3 million bases out of 3 billion

    Sequencing revolution

    • Objective: Identify gene combinations (from the millions of SNPs in a single DNA sequence), that contribute to variation/disease
    • Map disease/traits to SNPs
    • 3 million differences between individuals
      • => Needs huge amount of data (samples from large population, mapped to diseases), computational power
    • As complexity of the genetic variation cause increases (more SNPs), the number of samples needed to identify it increase

    Time scales

    • Genetic evolution can be rapid: Order of 1000s of years: Rapid, directed
    • E.g. Evolution of dogs from wolves directed by man

    Gene expression

    • Transcription factors: Proteins that regulate transcription i.e expression of genes
      • Like all proteins, their generation is impacted by the genome
      • Can cause genetic cascades: Hundreds of genes altered in expression level
    • Introns: Interrupting codons: Sequence between codons that interrupt codons
      • Increased flexibility of gene expression
      • Transcription removes the introns, a process called RNA splicing
      • Exons: Expressed sequences
      • 2% of sequence are introns that regulate gene expression
    • Genetic regulatory network: Interwoven connection of genes, with some regulating
      • 3% of the genome is regulatory

    Jumping genes

    • Transposable elements: Discrete parts of the chromosome, capable of moving form one chromosome position to the other (McClintock)
      • Enzymes allow the transposable elements to copy themselves, float around and attach to  a new place in the DNA sequence
      • Drosphilia Melanogaster: P elements detected in wild fruitfly DNA which was different from DNA extracted from fruitfly a few years previously/
    • Horizontal gene transfer: Viral DNA: Retroviruses can convert RNA to DNA using enzyme called reverse transcriptase (Temin/Baltimore, Nobel 1975)
      • Causes hybrid dysgenesis i.e. reduced fitness
      • Repressors
    • Transduction: Moving DNA material from one species to another
      • Balance between harmful effects (which are subject to survival of the fittest) and preferential replication). Can spread because of ability to outreplicate the competing genome sequences.
    • DNA: 2% coding, 3% gene expression, 30% parasitic transposable
      • Why is there >50% with no known purpose: Has evolution created this unused portion to mitigate the effects of transposable DNA?

    Genetic disease

    • Every gene has two copies - one from each parent
    • A mutant gene in 2 parent => child has 25% chance of getting a bad genes
    • Nature vs. nurture: Minnesota Twins study: 70-80% of IQ is genetic
    • Genes have a surprising amount of contribution to psychological traits: love, faith

    Embryo

    • Gametes: Have 23 chromosomes
    • Chromosome: Long DNA molecule
      • Individual has 23 pairs of chromosomes: one set from each parent
      • One copy of each chromosome goes into sperm/egg during meiosis => 2^23combinations from a pair of parents
      • Identical twins: Monzygotic (single egg, fertilized by single sperm, divides after blastocyst stage)
      • Fraternal twins: Dizygotic (two eggs fertilized by different sperm)
      • Chimera: Fusion of multiple eggs (fusion of two dizygotic embryos)
    • When number of cells < 32 (?), cells can develop into any type of cell

    Stem cells

    • Origin: 
      • Embryo cells
      • Adult stem cells: Bone marrow cells
    • Programming adult cells to become stem cells
      • History of development:
        • Gene expression appears to be controlled by a master switch and a genetic pyramid of hierarchy.
        • Genes at the top of the hierarchy control those below them
        • Homeobox controls the master blueprint i.e type of the species, e.g. fruitfly vs. mouse
        • Single genes can initiate extensive development programs e.g. growth of a leg drive expression
      • Genes can make adult cells revert to stem cells
        • 4 genes activated in a mouse cell reverted it to a stem cell (Yamanaka, Nobel 2012)
      • Implication: Egg cells can be created, increasing the number of samples available for selection (over the 500 created)

    Gene modification

    • Technology to modify genes (Capecchi, Evans and Smithies, Nobel 2007)
    • Procedure:
      • Desired version of the gene is created in a test tube
        • Generated using DNA synthesis machine or recombinant DNA strategies
      • Synthetic gene introduced into stem cells grown in an incubator
        • Modified gene introduced into stems cells by electroporation
        • Process of change is not clearly understood, happens by DNA recombination similar to meiosis
      • One of these correctly engineered stem cells is used
        • Only 1 in a million stem cells can be used, need screening to detect the stem cells which are good
        • Screening done by polymerase chain reaction (PCR), similar to preclude used in DNA matching
      • Create large number of copies of the DNA sequence (Mullins, Nobel prize 1993)
      • Genetically altered stem cells added to a blastocyte
    • Stem cell cloning: Dolly the sheep, 1997, nucleus of a mammary gland

    Ethical questions

    • Ontogeny recapitulates phylogeny: Development of the individual copies evolutionary history
      • E.g. In embryos a primitive pair of kidneys is formed followed by a more advanced pair, and then the final pair
      • Is the early embryo truly human?
    • Optimal gene combinations
      • Sickle cell gene provides resistance to malaria: Genes are tradeoffs, not binary decisions
      • Connection between artistic genius and mental illness
    • Appearance of the Foxp2 gene responsible for speech (absent in chimpanzees), coincides with explosion in rate of progress.
    • Is genetic engineering any different from eugenics, improvement of the gene pool via human selection

    Thinking, Fast and Slow, Daniel Kahneman

    Thinking, Fast and Slow, Daniel Kahneman, 2011


    Kahneman's work explains areas in behavioural economics, specifically prospect theory (how decisions are made when outcomes are probabilistic) and the effects of cognitive biases on choice. It explains the process by which people reach conclusions/make decisions and why the choices are often the wrong ones. The book has several examples that illustrate errors of judgement and choice in analytical situations, mainly the results of cognitive biases. The book covers research by Kahnemann and others from the 70s to present. Prior to the work described here,  one of the assumptions made by economists/social scientists in their research was that people are rational. Departures from rationality were believed to be functions of emotion. Kahnemann's research made the claim that departures from rationality are because of flaws in cognitive machinery i.e. cognitive biases. His work describes how the mind works based on recent developments in psychology. The mind is subject to the influence of heuristics, intuition and biases and its functioning can be explained by three models:
    •  A model of the mind consisting of two components:
      • System 1: Fast automatic thinking: By intuition or by expertise
      • System 2: Slow engaged thinking: Deliberation, algorithmic, measured
    This model explains how and why humans reach erroneous conclusions when presented with simple mathematical choices. The book describes 10-15 heuristics and biases which cause System 1 to reach erroneous conclusions.
    • Two economic models of human behaviour called Econs (rational, selfish and invariant in tastes) and Humans (real people). Modern economic theory/ modelling is based on Econs which explains why economic models to date are flawed.
    • The Experiencing self and the Remembering Self: Two ways in which humans consider memories of events which cause incorrect decisions because of incorrect assessments of past experiences.
    The work uses these models to illustrate how modern economic models are flawed and how human decision making is flawed when evaluating decisions involving risks.  

    Part I: This section describes the systems.

    • The two systems:
      • System 1: Operates quickly , no effort, no voluntary control
      • System 2: Deliberate, requires attention, can reprogram System 1 for a specific task
      • The division of maximizes performance and minimizes effort.
    • Attention/Effort
      • It takes effort for System 2 to get engaged.
      • Law of least effort: A person will engage the system that allows the task to be performed with least effort.
      • Experts in any field are able to solve problems in their field using System 1.
    • Lazy control
      • System 2 is engaged less often that it should be, because of "laziness"
      • Cognitive load: Load placed on the mind because of System 2 being engaged in one task.
      • Ego depletion: Depletion of self control causes System 1 to be engaged because of cognitive load on System 2 on another task.
      • The nervous system consumes more glucose than most of the rest of the body
      • Unless explicit effort is made, an individual will favor using System 1 without engaging System 2
      • System 2 can be divided into 2 components:
        • Intelligence: IQ
        • Rationality: Immunity to bias
    • Association
      • Association (ideas or suggestion) affects System 1's perceptions/decisions
      • Priming affects System 1's perception/decision
    • Cognitive ease/Cognitive strain
      • Measure of an individuals current condition, can predict likelihood of using system 1 vs System 2
      • When in a state of cognitive ease, System 1 predominates
      • Cognitive ease can be brought on by association, priming
      • Cognitive strain can be brought on by associated difficulties (bad fonts e.g.)
    • Norms, causes
      • Past events can cause System 1 to believe in a norm i.e a stereotype, perception of normal behavior
      • The mind has a need to assign causality to events
      • System 1 is incapable of making correct conclusions about causality - it does not have the ability to think statistically
    • How conclusions are reached by System 1
      • Confirmation Bias: A deliberate search for confirming evidence
      • Halo effect: Tendency to reach erroneous conclusions in one dimension based on liking a person for another dimension
      • Limited evidence (WYSIATI): base errors, framing effects, overconfidence
    • How judgments happen in System 1 when inadequate information is provided
      • Neglect of information, use of basic assessments
    • How questions are answered:
      • Substitution: In case of a difficult question, individuals use heuristic to arrive at a simple problem which can be solved and substitutes it
      • Affect heuristic: Likes and dislikes determine beliefs about the world

    Part II: Heuristics and Biases: This section lists a number of biases/heuristics/intuitive conclusions which cause System 1 to reach erroneous conclusions.

    • Law of small numbers:
      • Even researchers make mistakes on sample size: Sample size is low, even in research experiments. A small sample will exaggerate the effect of outliers.
      • System 1 believes it can see order, where randomness exists
      • Causal explanations of chance events are invariably wrong
      • Solution: When conducting experiments: De correlate results by averaging
    • Anchors
      • Providing an anchor when asking a question can influence the response: E.g would you contribute $100 to this cause? If not how much?
    • Availability
      • Availability of the memory of events, can influence perception of frequency of the events
      • Difficulty in remembering a large number of event is can alter perception of frequency, even if absolute number is higher
    • Impact of availability
      • Emotional tail wags the rational dog
      • Availability bias attempts to create a world that is simpler than reality
      • Availability cascade: Emotional response to availability and results in bias flowing into public policy
    • Representation bias:
      • Stereotyping used without examination of bias, or stats about accuracy of stereotypes
      • Base rate information will always be rejected when specific instance information is available
      • Always apply Bayesian analysis
    • Representation bias with varying degrees of information
      • System 1 often judges conditions with smaller population to be more likely than condition with a larger population because it satisfies a representation bias
    • Causes vs Statistics
      • Base rates are ignored, even causal statistics may not change deeply held beliefs
    • Regression to mean
      • Regression to the mean is often interpreted as a causal event
      • Regression and correlation are related concepts. Where correlation is not perfect, there will be regression to the mean
    • Taming intuitive predictions
      • Use correlation to obtain a prediction that lies between an intuitive prediction and the base rate
      • Unbiased predictions will not predict extreme cases, unless a lot of information is available
      • In some cases, such as venture capital, this may  be detrimental because they are searching for extreme cases

    Part III: Overconfidence: Other reasons System 1 makes mistakes

    • Illusion of understanding
      • The mind creates an illusion of understanding by believing WYSIATI
      • Hindsight bias creates the illusion that outcomes were obvious and that decisions were obvious
      • Outcome bias affects the perception of decisions based on the results
      • Halo effect affects the perception of human decisions based on organization outcomes
    • Illusion of validity: A cognitive illusion
      • The illusion of skill/validity
      • Supported by a powerful professional culture
      • Hedgehogs and foxes: hedgehogs fit events to a single framework and predict based on that
      • Media favors appearance of hedgehogs in debates
    • Intuition vs Formulas
      • System 1 is influenced by several factors (priming etc. above)
      • The result is that statistical prediction will generally  outperform human expert prediction (Meehl, Clinical vs. Statistical prediction)
      • Humans tend to try to think outside the box, adding
      • When predictability is poor, inconsistency (generated by System 1) destroy predictive validity
      • Broken leg rule: Occurrence of outlier events impacts prediction
      • Combining predictors (averaging them) is better than a linear multiple regression algorithm
    • When can we trust expert intuition
      • Other school of thought: Neural Decision Making: Seeks to understand how intuition works (Gary Klein, Sources of Power)
      • Intuition : System 1 implements rapid pattern recognition with System 2 executing a deliberate process to make sure that the decision will work
      • Requirements:
        • An environment that is regular enought to be predictable
        • Prolonged practice at identifying the  regularities
        • E.g Chess players an rapidly and intuitively recognize a situation as weak or strong, but this needs approx 6 years of practice at 5 hrs/day
    • The outside view
      • Inside view vs. Outside view: Knowledge about an individual case makes an insider feel no need for the statistics of the case
      • Exhibited as a belief in the uniqueness of the case
      • Planning fallacy: Unrealistically close to best case
    • The engine of capitalism
      • Irrational optimism: Optimistic bias plays a dominant role in risk taking
      • Overconfidence in ones own forecast: An effect of System 1 and WYSIATI
      • Remedy: Prepare a premortem for all decisions: Assume that decisions made, result in a disaster. Write a postmortem

    Part IV: Choice: What influences human choice

    • Bernoulli's errors
      • Humans vs. Econs
        • Econs: Rational, Selfish, Maximize utility, Tastes do not change
      • Utility theory (Bernoulli)
        • Prior to Bernoulli, outcomes of gambles were compared based on outcomes (expected values)
        • Bernoulli realized that people dislike risk and this was explained by diminishing marginal value of wealth
        • Assigned a utility to each value of wealth, though the increase in utility decreased as wealth increasing
        • Diminishing returns
        • Explains insurance: Risk is transferred from poor person (with higher loss of utility) to a richer person (lower loss of utility)
    • Prospect theory:
      • Utility theory has a flaw: Utility is not absolute, it depends on the reference point
      • Difference is utility can differ based on direct: Loss of $500 has greater neg utility that a gain of $500
      • Depends on increase/decrease: E.g $5M has a different utility if it is considered in the context of an increase from 1M to 5M or a decrease from $10M to $5M
      • Taking this into account, will result in different predictions for how willing a poor or rich person is willing to take risk
      • Conclusion: If all options are bad, people tend to prefer gambling/risk taking, else they  avoid risk
      • Prospect theory
        • How financial decisions are made:
        • Evaluation compare to a reference point: status quo
        • Diminishing sensitivity  to evation of changes
        • Loss aversion
        • Gain/loss vs. Psychological utility is an S curve, but not a symmetric curve
        • Problems: Does not account for regret,disappointment
    • Endowment effect
      • Decisions are impacted by whether a good is meant for exchange or for use
      • Psychological value of a good for use, such as a mug or an already possessed good can change the utility of selling it
    • Bad events
      • Loss aversion is with respect to a reference point
      • Not achieving a goal may be a loss, exceeding a goal may be a gain
      • Impacts negotiations, where parties fight harder to avoid losses than to make gains
      • In a negotiation both parties feel they have lost more than gained
      • The asymmetry between feeling of gain/loss impacts the feeling of fairness: Can impact  whether  customer choose to buy products whose prices have risen
      • Fairness: It is considered unfair to impose losses on a customer, relative to his reference point
      • Reference points cause a sense of entitlement
    • Fourfold pattern
      • Outweighing of Small probability events
      • Decision weights are not identical to probability weights
      • =>Expectations (weighing by probability) is flawed
      • Decisions are made based on decision weights not probabilities
      • Decisions weight = probability, p=0 and p=1, but d ne p for all other value (d <p or d>p depending on d)
      • p=0 is close to possibility and p=100 is close to certainty
      • Fourfold pattern: Gain/Loss vs. High/Low probability
      • The fourfold pattern shows how high/low probability of a gain or loss results in  acceptance/rejection of unfavorable/favorable outcomes in negotiations because of the  aversion to loss/hope of gain and consequent risk taking/aversion
    • Rare events
      • People overestimate probabilities of unlikely event
      • People overweight unlikely events
      • Vivid or alternative descriptions of events influence decision weights (1 in 1000 vs. 0.1%)
    • Risk policies
      • People tend to be risk averse in gains and risk taking in losses
      • Broad framing (the grouping of several decision problems into a single problem) can result in better decisions than narrow framing (separately deciding each problem).
      • Samuelson's problem: Aversion to a single gamble vs expected value of several hundred instances of the gamble
      • Since a life will consist of several such small gambles, it pays to take the small gambles
        • Gambles must be independent experiments
        • Gambles must not be excessive
        • Gambles must not be long shots
      • Loss aversion + narrow framing less to bad (risk averse) decision
      • E.g. individual managers are risk averse because they take individual decisions. A CEO frames the decisions broadly, and favors taking a risk, in the hope that statistically one of them will pay off
    • Keeping score
      • Disposition effect: A product of narrow framing: E.g the tendency to sell winning stock in preference to losing stock, because of the pain caused by acknowledging and closing a losing stock.
      • Sunk cost fallacy: Tendency to throw good money at a bad project in the hope of salvaging it
      • Regret/blame: People have strong reactions to an outcome produced by action, than to an outcome produced by inaction (regret)
      • There is an aversion to trading increased risk for any other advantage, even if the advantage is significantly more gainful than the risk
      • Regret/hindsight bias cause regretful feelings when moderate amount of though has gone into decisions
      • Think deeply and anticipate regret, or think little.
    • Reversals
      • Preference reversals: Preference can change when two choices are compared jointly vs. if they are presented singly
      • Frames and Reality
      • Losses cause stronger negative feelings than cost
      • Framing a decision can impact decisions: gallons per mile vs. miles per gallon

    Part V: Two selves: How memories are assessed

    • Two selves
      • Experienced utility vs. Decision utility
      • Experiencing self vs. Remembering self
      • Experience expresses satisfaction of the whole experience, while remembering may only remember selected parts of the whole experience
      • Peak end rule: Intense events towards the end of an experience are remembered
      • Duration neglect: Durations of experiences are often forgotten while intensity is not
    • Life as a story
      • Duration neglect, peak end rule and the remembering self impact decisions
    • Experienced well being/Thinking about life
      • Measures of happiness  reflect the remembering self not the experienced self
      • Affective forecasting: The effect of recent significant memories on opinion
      • Focusing illusion: Nothing is as important as you think when you are thinking about it
    • Conclusions
      • System1/System2, Econs/Humans, Experiencing self/Remembering self

    Monday, November 26, 2012

    Crossing the chasm: Marketing and selling high tech products to mainstream customers - Geoffrey Moore

    Crossing the chasm: Marketing and selling high tech products to mainstream customers - Geoffrey Moore, 1991, Revised 2002

    Summary

    This book discusses the difficulties that technology firms face in moving from use by early adopters to mass adoption by the  mainstream. The difficulties are discussed in the context of the technology adoption life cycle and the chasm between customer who are early adopters/innovators and those who are the early majority (pragmatists). Navigating this chasm is a period that is sufficiently different from both early growth and the later stages. Moore discusses this stage and how to manage it. The book defines the chasm, the nature of customers on either side of it, how the chasm is to be approached:
    •     Identifying a niche market to attack
    •     Defining the "whole" product
    •     Positioning the product in relation to its competition
    •     Execution of the attack through distribution and pricing
    Finally he concludes with a discussion of how a company must evolve and some of the issues that must be addressed (personnel/compensation) post chasm

    Chapter 1: Chasm defined

    • Four type of customers
      • Early adopter/Innovators, Early majority, Late majority, Laggards
      • The population is distributed in a bell curve
      • Each of the four categories one standard deviation way from the mean
      • Normal technology adoption life cycle (TALC) moves from one segment to the next, left to right
    • Innovations are continuous or discontinuous
      • Technology innovations are often discontinuous
      • This has the effect of disrupting the TALC
    • Gaps/cracks between each segment
      • Innovator to early adopter
      • Early adopter to Early majority: Early adopter wants a change agent, Early majority wants a productivity improvement
      • Moving from early adopter to early majority is a move from a market with no reference/support to one with a well defined reference/support model
    • The chasm is the gap between the early adopter/innovator and the early majority

    Chapter 2: Chasm examined

    • Innovators: Demand extensive information, but will support the product even if the product is half baked
    • Visionaries (Early adopter): Derive value from the the strategic leap forward, not the technology itself
      • Expect breakthrough, not improvement
      • Highly demanding, expect the"dream"
    • Pragmatists (Early majority): The large revenues reside with the pragmatists
      • Slow to make decisions, move only when they sense the market is moving
    • Late majority/Laggards

    Chapter 3: Overview: How to cross the chasm

    • Sell to visionaries <-chasm-> Sell to early adopters
    • How to handle the chasm:
      • Take over a niche market
      • Company must be market driven not sales driven
      • Being sales driven in the chasm period is fatal
      • Problem: Leader like sales driven companies, not market driven companies
      • Provide the "whole" product
    • Market leadership - big fish, small pond
      • Growth will need word of mouth - spreading the customers dilutes this: 10 customer in 10 segments is worse than 3-4 customers in 3 segments
      • Strategic niches
        • Commit to the niche
        • Act locally, not globally
        • Target closed communities
    • Platforms vs. Applications
      • Products must take a vertical approach to cross the chasm i.e must become an application
      • Platforms enable mass market adoption - will help once the chasm is crossed

    Chapter 4: Identify the point of attack

    •  This is a High Risk, Low data decision
      • Informed intuition better than analytical reasoning
    • Define target customer characterizations
      • Use case scenarios
      • Market development strategy checklist
        • Target customer, reason to buy, whole producer, partners, distribution, pricing, competition, positioning, next target customer
    • Size of the market:
      • Pick on someone your own size
    • Steps:
      • List library of target customer scenarios
      • Analyze, rank and decide
      • Commit to the point of attack

    Chapter 5: Define the product

    • Whole product marketing
      • Whole product can be categorized based on satisfaction of requirements:
        • Generic, Expected, Augmented, Potential
      • A seemingly inferior product may actually be inferior only in the "generic", it may be superior in the "whole"
      • A whole product may need support:
        • Third parties usually do not contribute during the chasm
        • Need to form tactical alliances
    • Markets are an ecology of interrelated interests
    • Steps:
      • Develop a whole product diagram (donut)
      • Develop needed alliances/relationships

    Chapter 6: Define the battle

    • Any force can defeat any other if the battle is defined
    • Create the competition:
      • Locate the product in a buying category which has established credibility with pragmatist buyers
      • Focus on the needs of the pragmatists: Use a Competitive Positioning Compass
        • Opinion/knowledge about technology (Specialist/Generalist)
        • Opinion about proposition (Supporter/Skeptic)C
      • Crossing the chasm (Move sales from Supporters of technology/proposition ->  to skeptics of technology/proposition.
      • Move from product metrics (fastest, easiest) to market metric (largest base, cost)
    • Positioning
      • In people heads, not in words
      • Pragmatists are conservative about changes in positioning
      • Positioning is about making a product easier to buy, not easier to sell
    • Process: Claim, Evidence, Communications, Feedback/adjustments
      • Pass the elevator test
      • For A, Who are dissatisfied with B, Our product is C, That provides D, Unlike E, We have assembled F
    • Proof: Market share, Alliance
    • Steps:
    • Focus product by defining competition
    • Define position

     Chapter 7: Launch the attack

    • Objective: Secure a channel into mainstream market with which pragmatist will be comfortable
      • Prioritize above revenue, profits, customer satisfaction
      • Motivate the channel
    • Customer oriented distribution, Distributor oriented pricing
    • Distribution: Direct selling, Retail selling to OEMs to VARs, System integrators
      • Can the channel create a relationship to the mainstream customer?
      • Direct selling is the vest to create the relationship, crossing the chasm
      • Retail fulfills a demand, rather than create it
      • VARs provide support
      • Price point between $10K and $75K is the hardest to sell
      • Products needs marketing and end support
      • VARs do not expand a market
    • Start with direct selling, move to suitable channel after awareness is created
    • Pricing: Customer oriented, vendor oriented, distribution oriented
      • Customer oriented:
        • Visionaries: High cost: Value based pricing
        • Pragmatists: Competitive based pricing
        • Conservatives: Low cost: Cost based pricing
      • Vendor oriented:
        • Internal costs drive pricing decisions
    • Distribution:
      • Price based on  market
      • Price for market leadership
    • Steps:
      • Define the distribution channels
      • Define the pricing model

    Chapter 8: Conclusion

    • Post chasm enterprise bound by the commitments of the  pre chasm enterprise
      • Avoid making wrong commitments in pre-chasm stage
      • Post chasm enterprise: Purpose is to make money
        • Stop custom development and roll out generic product
      • Pre chasm enterprise: Purpose is proof of concept of product and small early revenues
        • Typical mistake: Promise of hockey stick growth post chasm
        • Reality: Staircase: cycles of slow growth, stagnancy and then rapid growth, caused by repeatedly crossing chasms in different market segments
    • Venture capitalist concerns:
      • How long till chasm is crossed. How long before reasonable profit from mainstream market?
      • Chasm can be crossed only when the whole product is built, may need a long time
        • Technologist: Adopt discipline of profitably from day one
          • Except:  When high entry barrier exists
          • Rapid development needed (land grab)
    • Composition of company needs to be different before and after the chasm in Engineering/Sales
    • Navigating the chasm: May need reorgs with new job descriptions to handle the shift

    Monday, October 22, 2012

    "DNA: A graphic guide to the molecule that shook the world" - Israel Rosenfield, Edward Ziff and Borin Van Loon

    "DNA: A graphic guide to the molecule that shook the world" - Israel Rosenfield, Edward Ziff and Borin Van Loon, 2011

    The book discusses the DNA molecule including
    • A historical background of genetics prior to the discovery of the molecular structure of DNA
    • Chemical structure
    • Information storage
    • Information expression
    • Replication
    • Diversity
    • Related topics including cloning, sequencing, stem cells, epigenetics and  the origin of life.

    The topics are covered in chronological order of discovery, with extensive background on the researchers involved in the discoveries. The following key discoveries are discussed in detail:
    • Watson Crick double helix model of the DNA molecular structure
    • Cricks Adapter Hypothesis for information expression
    • Operon model of regulation (Jacob/Monod) for information expression

    Background:

    Modern genetics begins with the discovery of the molecular structure of DNA. Prior to this, researchers had reached the conclusion that cell division along with nuclei fusion was responsible for the transmission of genetic information. It was known that through Mendel's research that phenotypes (observed properties) were the result of genotypes(genetic makeup) and that traits were randomly segregated during reproduction. A substance known as chromatin, which contained chromosomes, was known and it was known to originate from the nucleus. Chromatin was suspected to be involved in heredity, but it was believed that protein sequences  (and not chromosomes/chromatin/DNA) was the primary mechanism for information storage/transfer.

    Chemical structure

    DNA is a molecule that consists of bases, phosphodiester bonds and a sugar backbone.
    • Sugar backbone: The backbone is a chain of sugar molecules. The sugar molecules are comprised of C, H, O atoms connected by single bonds in a 5 sided ring structure (Ribose). The molecules are identical for RNA and DNA except that RNA sugar has OH in one location, while DNA has H in that location (Hence Deoxy-Ribose). Each sugar molecule that is connected to another sugar molecule via a  phosphodiester bond. Each sugar molecule is also connected to one base.
    • Phospdiester bond: Phosphorous atom surround by 4 Oxygen atoms, connects two Sugar molecules
    • Base: The base s a molecule comprised of C, H, O and N atoms connected by single/double bonds. There are 4 types of bases:
      • Adenine, Guanine, Cytosine and Thymine for DNA
      • Adenine, Guanine, Cytosine and Uracil for RNA
    Some terms used to describe this structure:
    • Nucleotide: A nucleotide is a single base, sugar molecule and phospdiester molecule
    • Gene: A sequence of three nucleotides that code an amino acid.
    This structure was discovered by analyzing images from XRay diffraction. The images, along with advances in XRay diffraction analysis at the time indicated the structure to be a double helix along with the constraints that bonding across the helix was restricted to A-T, G-C bonds. This supported earlier studies which indicated that the proportion of A, C,.G an T flowed a set of rules (A+G = C+T, A=T, G=C, called Chargaff's rules)

    Information Storage

    DNA encodes the information for generation of proteins. A gene (3 nucleotides) sequence, also called a codon, maps to a single amino acid. A sequence of genes encodes information for a sequence of amino acids i.e a protein. A gene can encode one of 4^3 = 64 possible amino acids. There are 20 known amino acids, so multiple different triplets (total x) encode the same amino acid. In addition there are special sequences which have special purpose, e.g. coding starts at a delimiter: ATG, and stop at a triplet which does not specify any amino acid (any of the remaining 64-20-1-x). In addition, there is a large quantity of junk DNA between know gene sequences, whose purpose is unknown.

    Proteins are produced inside the ribosomes, the cell's protein factory. The basic process is
            DNA->mRNA->Ribosome->tRNA->AminoAcids->Protein
    The process of protein generation from DNA occurs through the mechanisms of transcription and translation
    • Transcription: Information gets transferred form DNA to the Ribosome via mRNA. The RNA enzyme polymerase nwinds the DNA strand, into 2 strands called the sense and template strands. The template strand directs transcription (What happens to sense strands?) The template strand links to a growing messenger RNA (mRNA) strand by forming A-U, G-C bonds. The start of the sequence is marked by a start codon(AUG) and end is marked by a stop codon (UGA/UAA/UAG). After creation the mRNA moves to Ribosome.
    • Translation: Information from mRNA is used to create proteins via tRNA (adapters molecules from Crick's Adapter Hypothesis). tRNA bind temporally with mRNA, and also their amino acid. As tRNAs enter and leave the Ribosome, they leave behind a growing chain of amino acids, a protein. This continues until a stop codon in the chain.
    Note some enzymes (such as polymerase) that help this process are themselves created by this process (a chicken and egg problem: which came first: the DNA template or polymerase?)

    Information expression

    This is an area that is under active research and seeks to answer the questions as to why the same strand of DNA in each cell can cause the cell to perform different function. The theory of the operon partly explains this. According to this theory, the DNA strand consist of sequences called gene repressors. The gene repressors may suppress expression of a gene by preventing creation of the mRNA. Information expression is vastly different depending on the type of organism:
    • Prokaryotes: No nucleus in cell, transcription/translation side by side in cell. Gene expression changes constantly
    • Eukaryotes: Nucleus in cell, transcription/translation are separated. Cells are highly specialized (e.g. liver cells vs. brain cells) i.e very differentiated in expression. How differentiation happens is a topic of research.

    Replication

    DNA replication happens during cell division. The DNA divides into 2 strands by unwinding.  Enzymes produce bases which bond to each strand, producing 2 new DNA strands.

    Diversity

    Genetic diversity (different expressions within members of a single species) occurs mainly through reproduction.  In this process, genetic information is transmitted through chromosomes. A chromosome is a single double stranded DNA, although it is not a helix.  Chromosomes are formed from mitochondrial DNA which ceases to translate/transcript.  There are 23 kinds, in two copies all pair identical, except for XY in males) (XX in females).  Each chromosome contains specific genetic sequences. (Q:The sequence in each chromosome is random? How does this account for creation of different DNA creation in progeny? How is the DNA formed for chromosomes?) Experiments have shown that organisms are primitive as bacteria can diversify genetic information by this process. Other forms of diversity can be caused by
    • Mutations
    • Viruses

    Other topics discussed include:

    • Cloning: The process consists of the following steps
      • Take nucleus from grown organism
      • Implant it in an enucleated egg
      • The resulting organism should have identical DNA
    • Sequencing
      • Techniques pioneered by Sanger
      • Map gene sequences to characteristics/traits
      • Map gene sequences to diseases
      • SNiPs (Single nucleotide polymorphisms): Responsible for disease
    • Stem cells
      • Mature cell can be converted to stem cells (Shinya Yamanaka)
    • Applications
      • Crime: DNA consists of Variable Number of Tandem Repeat (VNTR) sequences:  Non coding sequences, 9 to 80 bases long, repeated up to 30 times. These can be used to identify DNA with low probability of error.
      • Medical Research, Biotechnology
    • The selfish gene: Organisms exist to propagate DNA
    • Epigenetics, Origin of life