Monday, October 22, 2012

"DNA: A graphic guide to the molecule that shook the world" - Israel Rosenfield, Edward Ziff and Borin Van Loon

Discusses the DNA molecule including
  • A historical background of genetics prior to the discovery of the molecular structure of DNA
  • Chemical structure
  • Information storage
  • Information expression
  • Replication
  • Diversity
  • Related topics including cloning, sequencing, stem cells, epigenetics and  the origin of life.

The topics are covered in chronological order of discovery, with extensive background on the researchers involved in the discoveries. The following key discoveries are discussed in detail:
  • Watson Crick double helix model of the DNA molecular structure
  • Cricks Adapter Hypothesis for information expression
  • Operon model of regulation (Jacob/Monod) for information expression


Modern genetics begins with the discovery of the molecular structure of DNA. Prior to this, researchers had reached the conclusion that cell division along with nuclei fusion was responsible for the transmission of genetic information. It was known that through Mendel's research that phenotypes (observed properties) were the result of genotypes(genetic makeup) and that traits were randomly segregated during reproduction. A substance known as chromatin, which contained chromosomes, was known and it was known to originate from the nucleus. Chromatin was suspected to be involved in heredity, but it was believed that protein sequences  (and not chromosomes/chromatin/DNA) was the primary mechanism for information storage/transfer.

Chemical structure

DNA is a molecule that consists of bases, phosphodiester bonds and a sugar backbone.
  • Sugar backbone: The backbone is a chain of sugar molecules. The sugar molecules are comprised of C, H, O atoms connected by single bonds in a 5 sided ring structure (Ribose). The molecules are identical for RNA and DNA except that RNA sugar has OH in one location, while DNA has H in that location (Hence Deoxy-Ribose). Each sugar molecule that is connected to another sugar molecule via a  phosphodiester bond. Each sugar molecule is also connected to one base.
  • Phospdiester bond: Phosphorous atom surround by 4 Oxygen atoms, connects two Sugar molecules
  • Base: The base s a molecule comprised of C, H, O and N atoms connected by single/double bonds. There are 4 types of bases:
    • Adenine, Guanine, Cytosine and Thymine for DNA
    • Adenine, Guanine, Cytosine and Uracil for RNA
Some terms used to describe this structure:
  • Nucleotide: A nucleotide is a single base, sugar molecule and phospdiester molecule
  • Gene: A sequence of three nucleotides that code an amino acid.
This structure was discovered by analyzing images from XRay diffraction. The images, along with advances in XRay diffraction analysis at the time indicated the structure to be a double helix along with the constraints that bonding across the helix was restricted to A-T, G-C bonds. This supported earlier studies which indicated that the proportion of A, C,.G an T flowed a set of rules (A+G = C+T, A=T, G=C, called Chargaff's rules)

Information Storage

DNA encodes the information for generation of proteins. A gene (3 nucleotides) sequence, also called a codon, maps to a single amino acid. A sequence of genes encodes information for a sequence of amino acids i.e a protein. A gene can encode one of 4^3 = 64 possible amino acids. There are 20 known amino acids, so multiple different triplets (total x) encode the same amino acid. In addition there are special sequences which have special purpose, e.g. coding starts at a delimiter: ATG, and stop at a triplet which does not specify any amino acid (any of the remaining 64-20-1-x). In addition, there is a large quantity of junk DNA between know gene sequences, whose purpose is unknown.

Proteins are produced inside the ribosomes, the cell's protein factory. The basic process is
The process of protein generation from DNA occurs through the mechanisms of transcription and translation
  • Transcription: Information gets transferred form DNA to the Ribosome via mRNA. The RNA enzyme polymerase nwinds the DNA strand, into 2 strands called the sense and template strands. The template strand directs transcription (What happens to sense strands?) The template strand links to a growing messenger RNA (mRNA) strand by forming A-U, G-C bonds. The start of the sequence is marked by a start codon(AUG) and end is marked by a stop codon (UGA/UAA/UAG). After creation the mRNA moves to Ribosome.
  • Translation: Information from mRNA is used to create proteins via tRNA (adapters molecules from Crick's Adapter Hypothesis). tRNA bind temporally with mRNA, and also their amino acid. As tRNAs enter and leave the Ribosome, they leave behind a growing chain of amino acids, a protein. This continues until a stop codon in the chain.
Note some enzymes (such as polymerase) that help this process are themselves created by this process (a chicken and egg problem: which came first: the DNA template or polymerase?)

Information expression

This is an area that is under active research and seeks to answer the questions as to why the same strand of DNA in each cell can cause the cell to perform different function. The theory of the operon partly explains this. According to this theory, the DNA strand consist of sequences called gene repressors. The gene repressors may suppress expression of a gene by preventing creation of the mRNA. Information expression is vastly different depending on the type of organism:
  • Prokaryotes: No nucleus in cell, transcription/translation side by side in cell. Gene expression changes constantly
  • Eukaryotes: Nucleus in cell, transcription/translation are separated. Cells are highly specialized (e.g. liver cells vs. brain cells) i.e very differentiated in expression. How differentiation happens is a topic of research.


DNA replication happens during cell division. The DNA divides into 2 strands by unwinding.  Enzymes produce bases which bond to each strand, producing 2 new DNA strands.


Genetic diversity (different expressions within members of a single species) occurs mainly through reproduction.  In this process, genetic information is transmitted through chromosomes. A chromosome is a single double stranded DNA, although it is not a helix.  Chromosomes are formed from mitochondrial DNA which ceases to translate/transcript.  There are 23 kinds, in two copies all pair identical, except for XY in males) (XX in females).  Each chromosome contains specific genetic sequences. (Q:The sequence in each chromosome is random? How does this account for creation of different DNA creation in progeny? How is the DNA formed for chromosomes?) Experiments have shown that organisms are primitive as bacteria can diversify genetic information by this process. Other forms of diversity can be caused by
  • Mutations
  • Viruses

Other topics discussed include:

  • Cloning: The process consists of the following steps
    • Take nucleus from grown organism
    • Implant it in an enucleated egg
    • The resulting organism should have identical DNA
  • Sequencing
    • Techniques pioneered by Sanger
    • Map gene sequences to characteristics/traits
    • Map gene sequences to diseases
    • SNiPs (Single nucleotide polymorphisms): Responsible for disease
  • Stem cells
    • Mature cell can be converted to stem cells (Shinya Yamanaka)
  • Applications
    • Crime: DNA consists of Variable Number of Tandem Repeat (VNTR) sequences:  Non coding sequences, 9 to 80 bases long, repeated up to 30 times. These can be used to identify DNA with low probability of error.
    • Medical Research, Biotechnology
  • The selfish gene: Organisms exist to propagate DNA
  • Epigenetics, Origin of life

Sunday, October 7, 2012

Freakonomics - Steven Levitt/Stephen Dubner, 2005

 Conclusions that appear to be intuitive/obvious are actually logical fallacies of the type cum hoc ergo propter hoc (correlation does not imply causation). Disproving such arguments is fairly common in some problems in the science/engineering fields, where the conventional methods of doing it are thorough theoretical methods such as regression analysis or via Monte Carlo methods, empirical measurements using:
  •  An identification of all factors that impact the results of an experiment
  •  Repeated experiments under controlled conditions,varying each parameter separately while keep all others constant.
These techniques are either not applicable or are inadequate to apply to problems in sociology/economics, especially the kind of questions that Levitt seeks to answer which involve incomplete/inaccurate quantitative data. Techniques and arguments used include deep studies/interviews, arguments of logic  (where data is unavailable) and cross domain collaborations with experts from other fields.

An outline of the work:
  •  Study of incentives and cheating (Chapter 1)
  •  Study of information and effect of information asymmetry (Chapter 2)
  •  Study of correlation vs. causation or how conventional wisdom is often wrong (Chapter 3/4/5/6)
Chapter 1: Incentives and Cheating
  • Incentive: Mechanism to induce one behavior (favored) over another (unfavored) by providing a reward
  • Cheating: Mechanism to defeat an incentive: acquire reward while performing (unfavored) behavior
  • Three types of incentives: Moral, Social, Economic
  • Any clever incentive scheme will result in the creation (or attempts to create an) equally clever cheating scheme
  • Some conclusions from case studies:
    • Approx 90% of humans do not attempt to cheat particular systems despite ability to do so (bagel experiment)
    •  Cross correlation analysis can be used very successfully to detect cheating (Chicago public education system, Sumo wrestling)

Chapter 2: Information
  • Information asymmetry; When two parties to a transaction have vastly different degrees of expertise
  • Internet has reduced information asymmetry
  • Often exploited (real estate agents, car salesmen)
    • Can be extremely subtle: Terms in real estate ads, selling cars
    • Some revealed through correlation analysis
  • Information crime
    • Crimes committed by exploiting information asymmetry (Enron)
    • Difficult to discover, something drastic must happen

Chapter 3/4: Correlation vs. causation (Conventional wisdom)
  • Convention wisdom (CW) can be incorrect in several cases.
  • Dramatic effects can be caused by subtle, overlooked, non obvious events
  • Events can be explained by careful study of the correct causes
  • Case studies demonstrating this include:
  • CW: Drug dealers make  a lot of money
    • Reality: The structure of a drug dealing organization is similar to a corporation
    • Workers at the bottom have low wages, bad working conditions
    • Upper management keeps disproportionate share of profits
  • CW: Crime decreased in the 90s for several reasons,
    • Correlations and logic used to check a number of likely causes
    • Closest correlation is with the Roe vs. Wade case and the legalization of abortion

Chapter 5/6: Case studies on parenting
  • Correlation between parenting approaches and the future success of children
    • A number of parenting factors are examined including those which show correlation (positive/negative) with child's success and those which have no correlation
    • All factors  appear to be correlated in some way to status of parents (education level of parents/affluence etc.)
    • What a parent does is irrelevant compared to who the parent is (education level/affluence etc.)
  • Correlation between children names and future success of children
    • Choice of name and success are uncorrelated, even though name choice has very strong correlation with race
    • There is a strong correlation between name choice and parents' characteristics (not childs' future)