Monday, October 22, 2012

"DNA: A graphic guide to the molecule that shook the world" - Israel Rosenfield, Edward Ziff and Borin Van Loon

"DNA: A graphic guide to the molecule that shook the world" - Israel Rosenfield, Edward Ziff and Borin Van Loon, 2011

The book discusses the DNA molecule including
  • A historical background of genetics prior to the discovery of the molecular structure of DNA
  • Chemical structure
  • Information storage
  • Information expression
  • Replication
  • Diversity
  • Related topics including cloning, sequencing, stem cells, epigenetics and  the origin of life.

The topics are covered in chronological order of discovery, with extensive background on the researchers involved in the discoveries. The following key discoveries are discussed in detail:
  • Watson Crick double helix model of the DNA molecular structure
  • Cricks Adapter Hypothesis for information expression
  • Operon model of regulation (Jacob/Monod) for information expression

Background:

Modern genetics begins with the discovery of the molecular structure of DNA. Prior to this, researchers had reached the conclusion that cell division along with nuclei fusion was responsible for the transmission of genetic information. It was known that through Mendel's research that phenotypes (observed properties) were the result of genotypes(genetic makeup) and that traits were randomly segregated during reproduction. A substance known as chromatin, which contained chromosomes, was known and it was known to originate from the nucleus. Chromatin was suspected to be involved in heredity, but it was believed that protein sequences  (and not chromosomes/chromatin/DNA) was the primary mechanism for information storage/transfer.

Chemical structure

DNA is a molecule that consists of bases, phosphodiester bonds and a sugar backbone.
  • Sugar backbone: The backbone is a chain of sugar molecules. The sugar molecules are comprised of C, H, O atoms connected by single bonds in a 5 sided ring structure (Ribose). The molecules are identical for RNA and DNA except that RNA sugar has OH in one location, while DNA has H in that location (Hence Deoxy-Ribose). Each sugar molecule that is connected to another sugar molecule via a  phosphodiester bond. Each sugar molecule is also connected to one base.
  • Phospdiester bond: Phosphorous atom surround by 4 Oxygen atoms, connects two Sugar molecules
  • Base: The base s a molecule comprised of C, H, O and N atoms connected by single/double bonds. There are 4 types of bases:
    • Adenine, Guanine, Cytosine and Thymine for DNA
    • Adenine, Guanine, Cytosine and Uracil for RNA
Some terms used to describe this structure:
  • Nucleotide: A nucleotide is a single base, sugar molecule and phospdiester molecule
  • Gene: A sequence of three nucleotides that code an amino acid.
This structure was discovered by analyzing images from XRay diffraction. The images, along with advances in XRay diffraction analysis at the time indicated the structure to be a double helix along with the constraints that bonding across the helix was restricted to A-T, G-C bonds. This supported earlier studies which indicated that the proportion of A, C,.G an T flowed a set of rules (A+G = C+T, A=T, G=C, called Chargaff's rules)

Information Storage

DNA encodes the information for generation of proteins. A gene (3 nucleotides) sequence, also called a codon, maps to a single amino acid. A sequence of genes encodes information for a sequence of amino acids i.e a protein. A gene can encode one of 4^3 = 64 possible amino acids. There are 20 known amino acids, so multiple different triplets (total x) encode the same amino acid. In addition there are special sequences which have special purpose, e.g. coding starts at a delimiter: ATG, and stop at a triplet which does not specify any amino acid (any of the remaining 64-20-1-x). In addition, there is a large quantity of junk DNA between know gene sequences, whose purpose is unknown.

Proteins are produced inside the ribosomes, the cell's protein factory. The basic process is
        DNA->mRNA->Ribosome->tRNA->AminoAcids->Protein
The process of protein generation from DNA occurs through the mechanisms of transcription and translation
  • Transcription: Information gets transferred form DNA to the Ribosome via mRNA. The RNA enzyme polymerase nwinds the DNA strand, into 2 strands called the sense and template strands. The template strand directs transcription (What happens to sense strands?) The template strand links to a growing messenger RNA (mRNA) strand by forming A-U, G-C bonds. The start of the sequence is marked by a start codon(AUG) and end is marked by a stop codon (UGA/UAA/UAG). After creation the mRNA moves to Ribosome.
  • Translation: Information from mRNA is used to create proteins via tRNA (adapters molecules from Crick's Adapter Hypothesis). tRNA bind temporally with mRNA, and also their amino acid. As tRNAs enter and leave the Ribosome, they leave behind a growing chain of amino acids, a protein. This continues until a stop codon in the chain.
Note some enzymes (such as polymerase) that help this process are themselves created by this process (a chicken and egg problem: which came first: the DNA template or polymerase?)

Information expression

This is an area that is under active research and seeks to answer the questions as to why the same strand of DNA in each cell can cause the cell to perform different function. The theory of the operon partly explains this. According to this theory, the DNA strand consist of sequences called gene repressors. The gene repressors may suppress expression of a gene by preventing creation of the mRNA. Information expression is vastly different depending on the type of organism:
  • Prokaryotes: No nucleus in cell, transcription/translation side by side in cell. Gene expression changes constantly
  • Eukaryotes: Nucleus in cell, transcription/translation are separated. Cells are highly specialized (e.g. liver cells vs. brain cells) i.e very differentiated in expression. How differentiation happens is a topic of research.

Replication

DNA replication happens during cell division. The DNA divides into 2 strands by unwinding.  Enzymes produce bases which bond to each strand, producing 2 new DNA strands.

Diversity

Genetic diversity (different expressions within members of a single species) occurs mainly through reproduction.  In this process, genetic information is transmitted through chromosomes. A chromosome is a single double stranded DNA, although it is not a helix.  Chromosomes are formed from mitochondrial DNA which ceases to translate/transcript.  There are 23 kinds, in two copies all pair identical, except for XY in males) (XX in females).  Each chromosome contains specific genetic sequences. (Q:The sequence in each chromosome is random? How does this account for creation of different DNA creation in progeny? How is the DNA formed for chromosomes?) Experiments have shown that organisms are primitive as bacteria can diversify genetic information by this process. Other forms of diversity can be caused by
  • Mutations
  • Viruses

Other topics discussed include:

  • Cloning: The process consists of the following steps
    • Take nucleus from grown organism
    • Implant it in an enucleated egg
    • The resulting organism should have identical DNA
  • Sequencing
    • Techniques pioneered by Sanger
    • Map gene sequences to characteristics/traits
    • Map gene sequences to diseases
    • SNiPs (Single nucleotide polymorphisms): Responsible for disease
  • Stem cells
    • Mature cell can be converted to stem cells (Shinya Yamanaka)
  • Applications
    • Crime: DNA consists of Variable Number of Tandem Repeat (VNTR) sequences:  Non coding sequences, 9 to 80 bases long, repeated up to 30 times. These can be used to identify DNA with low probability of error.
    • Medical Research, Biotechnology
  • The selfish gene: Organisms exist to propagate DNA
  • Epigenetics, Origin of life



Monday, October 15, 2012

The Economics of Information Technology

The Economics of Information Technology - Hal Varian, Joseph Farrell and Carl Shapiro, 2004

The book covers two topics:
  • Section 1 (by Varian) is a comparison of the conventional economy with the information economy.
  • Section 2 (by Farell/Shapiro) is a discussion of intellectual property and its effect on the information economy.

This synopsis covers most of Section 1. Section 2 (and the standards subsection of Section 1) will be covered in another post. The book covers several results from research papers by Varian and other researchers in the field. The results are summarized and some of the principles are shown via simple mathematical models and examples that illustrate the principles. The principles are summarized here in an informal manner.

The primary reason that the information economy differs from conventional economies is the cost structure: Both hardware and software have constant fixed costs, low marginal costs. Economic models indicate that this should result in a monopoly. Varian discusses the factors unique to the information economy which ensure that it does not (described in the sections below). In addition he also discusses factors that have contributed to recent rapid advances in the information economy: the effect of combinatorial innovation and open source software.

Price differentiation

The information economy exhibits price differentiation of three types along with price differentiation based on other factors:
  • First degree: Where the product is marketed individually to customers through individual product and price discrimination (a market of one). Ulf shows that there are 2 effects in this kind of market:
    • Enhanced surplus: Firms charge closer to the "reservation price" of a customer.
    • Intensified competition: Each customer must be contested separately by competitors.
Where customer tastes are similar( homogeneous customers), competition dominates surplus and the customers benefit. The opposite happens in a market of heterogeneous customers.
  • Second degree: Where the product is marketed through product lines i.e. some form of versioning. Versioning promotes consumer welfare, by allowing segments to be addressed, albeit at lower quality level.
  • Third degree: Where the same product is marketed at different prices to different groups (price discrimination. Markets with largely homogeneous customers and fixed costs (Armstrong/Vickers) benefit .
  • Purchase history/Search/Bundling: Pricing based on purely on purchase history does not provide benefit when done by a monopoly. However, conditoning prices based on consumer behavior (recommendations) can extract some value. The information economy has a fraction of consumers who make extensive use of search to find products at suitable prices . Several researchers have shown that when customers use search, firms randomize prices on a day to day basis (low to attract searchers, high for when non searchers arrive). The information economies allow for products which can be bundled together.  The effects of this are:
    • Reduced dispersion of willingness to pay (variance of desired price reduces), making the demand curve more elastic.
    • Increased barriers to entry.

Switching costs/lock in

A simple model can be used to determine the first period (lock in) and second period (monopoly) price/profit.  The analysis shows that competition to acquire customers reduces cost in the lock in period. Lockin can be beneficial to customers, because of intense competition to acquire customers. It results in reduced price for new customers,  but a high price for older customers. Interestingly, studies show that welfare may go either way (towards producers or consumers), but in most cases consumers are worse off.

Supply side economies of scale

As mentioned before, the supply side exhibits high fixed costs and low marginal costs - economic analysis indicates that this should result in a natural monopoly. In the case of the information economy, however the competition for a monopoly and scale appear to benefit consumers. In addition, the information economy has several characteristics that prevent long term monopolies:
  • Fixed costs are continuously reduced by advances in technology
  • Rapid market growth can remove the advantages of a monopoly
  • Disruptive technologies continuously change the nature of the market
  • Technology becomes obsolete before functions become obsolete.
The work analyzes supply side economies of scale through models and the welfare theorems.The theorems of welfare indicate that a competitive economy is better for consumer welfare than a monopoly:
  • Every competitive economy is Pareto efficient (no producer/consumer can improve without worsening some other producer/consumer)
  • Pareto efficient economies result in competitive equilibrium at maximum overall welfare
However, where price discrimination exists (as in the information economy):
  • The monopoly has an equilibrium, where it captures all surplus and achieves Pareto efficiency
  • Competition for monopoly transfers this surplus to consumers with same effect as a competitive economy.

Demand side economies of scale

Products of the information economy exhibit network effects: Direct (where the demand depends on how many other people use it (E.g. Use of email(to send) requires other people to use it (to receive)) or indirect (E.g. Availability of DVDs depends on a large market of people using it, so each user depends on others to increase the market). Systems with network effects exhibit multiple equilibrium points: With elastic supplies there are 3 equilibrium points (no adoption, mass adoption and an unstable middle point). A product must cross the middle equilibrium point to start a positive feedback loop.

System effects

System effects refers to the availability of complementary technology needed by a single consumer (a system). Cournot analyzes the pricing of complements and finds the merger of a complementary pair of producers always benefit consumers and producers.

Transactions

The information economy enables the use of new types of contracts based on fine grain revenue sharing. In most cases these are found to increase efficiency.

Sunday, October 7, 2012

Freakonomics

 Freakonomics - Steven Levitt/Stephen Dubner, 2005

This work illustrates, through case studies, how conclusions that appear to be intuitive/obvious are actually logical fallacies of the type cum hoc ergo propter hoc (correlation does not imply causation). Disproving such arguments is fairly common in some problems in the science/engineering fields, where the conventional methods of doing it are thorough theoretical methods such as regression analysis or via Monte Carlo methods, empirical measurements using:
  •  An identification of all factors that impact the results of an experiment
  •  Repeated experiments under controlled conditions,varying each parameter separately while keep all others constant.
As one may expect, these techniques are either not applicable or are inadequate to apply to problems in sociology/economics, especially the kind of questions that Levitt seeks to answer which involve incomplete/inaccurate quantitative data. This makes his techniques and arguments very interesting, mostly involving deep studies/interviews, arguments of logic  (where data is unavailable) and cross domain collaborations with experts from other fields.

An outline of the work:
  •  Study of incentives and cheating (Chapter 1)
  •  Study of information and effect of information asymmetry (Chapter 2)
  •  Study of correlation vs. causation or how conventional wisdom is often wrong (Chapter 3/4/5/6)
Chapter 1: Incentives and Cheating
  • Incentive: Mechanism to induce one behavior (favored) over another (unfavored) by providing a reward
  • Cheating: Mechanism to defeat an incentive: acquire reward while performing (unfavored) behavior
  • Three types of incentives: Moral, Social, Economic
  • Any clever incentive scheme will result in the creation (or attempts to create an) equally clever cheating scheme
  • Some conclusions from case studies:
    • Approx 90% of humans do not attempt to cheat particular systems despite ability to do so (bagel experiment)
    •  Cross correlation analysis can be used very successfully to detect cheating (Chicago public education system, Sumo wrestling)

Chapter 2: Information
  • Information asymmetry; When two parties to a transaction have vastly different degrees of expertise
  • Internet has reduced information asymmetry
  • Often exploited (real estate agents, car salesmen)
    • Can be extremely subtle: Terms in real estate ads, selling cars
    • Some revealed through correlation analysis
  • Information crime
    • Crimes committed by exploiting information asymmetry (Enron)
    • Difficult to discover, something drastic must happen

Chapter 3/4: Correlation vs. causation (Conventional wisdom)
  • Convention wisdom (CW) can be incorrect in several cases.
  • Dramatic effects can be caused by subtle, overlooked, non obvious events
  • Events can be explained by careful study of the correct causes
  • Case studies demonstrating this include:
  • CW: Drug dealers make  a lot of money
    • Reality: The structure of a drug dealing organization is similar to a corporation
    • Workers at the bottom have low wages, bad working conditions
    • Upper management keeps disproportionate share of profits
  • CW: Crime decreased in the 90s for several reasons,
    • Correlations and logic used to check a number of likely causes
    • Closest correlation is with the Roe vs. Wade case and the legalization of abortion

Chapter 5/6: Case studies on parenting
  • Correlation between parenting approaches and the future success of children
    • A number of parenting factors are examined including those which show correlation (positive/negative) with child's success and those which have no correlation
    • All factors  appear to be correlated in some way to status of parents (education level of parents/affluence etc.)
    • What a parent does is irrelevant compared to who the parent is (education level/affluence etc.)
  • Correlation between children names and future success of children
    • Choice of name and success are uncorrelated, even though name choice has very strong correlation with race
    • There is a strong correlation between name choice and parents' characteristics (not childs' future)