Tuesday, 11 June 2019

Paper Writing - Style

STYLE - The following tips are helpful to maintain a good style of writing.
  1. Cut the clutter
    1. Remove clunky words/phrases - important, as it is well know that, and it should be emphasized that
    2. Remove hedge words - e.g. very and appreciable
    3. Remove unnecessary jargons and acronyms. 
    4. Remove repetitive words/phrases. 
    5. Eliminate negatives. 
    6. Omit needless prepositions - e.g. that
    7. Use adverbs sparingly - e.g. really, basically, and generally
    8. Shorten long words and phrases - e.g. due to the fact that to because
  2. Use the active voice
  3. Write with verbs - Use strong verbs. Don't turn verbs into nouns. Don't bury the main verb
Good Style, John Kirkman - A writer should aim for accuracy, clarity, readability and the right tone in scientific and technical writing.
  1. Sentences - Sentences should be reasonably short and not too complex. In order to rearrange sentences, use compound sentence structures and complex sentence structures. 
  2. Vocabulary 
    1. In general, prefer short words to long words, ordinary words to grand words, familiar words to unfamiliar words, non-technical words to technical words and concrete words to abstract words.
    2. Use jargons only if they are genuinely necessary.
    3. Avoid fashionable words such as functionality, enhance (improve/increase?), parameter (value/variable?), peripheral, viable or inhibit (stop/reduce?) because they have become unreliable units of information exchange. 
  3. Phrases 
    1. Avoid roundabout phrasings that uses abstraction or nominalization. Avoid abstraction by being as being as specific as possible. Avoid nominalization, thereby getting rid of colorless verbs such as achieve, perform, accomplish, carry out, conduct, observe 
    2. Avoid unusual phrasings that use words such as having and being
    3. Avoid excessive use of adjectives as premodifiers. Usually, one or two adjectives, especially number adjectives and color adjectives, come before the noun, and other modifiers come after it. 
    4. Avoid excessive use of nouns as premodifiers. Premodifying nouns are less explicit than postmodifying prepositional constructions. Confusion may also arise when nouns are used as premodifiers along with prepositions, transitive verbs, and to be and to become. The trick is to introduce the agent as soon as possible. 
  4. Verbs 
    1. Tense - Use past tense to state what the objectives were, what equipments were used and what procedures were used. Use present tense to state ‘eternal truths’ and in discussions of data or results. 
    2. Voice - Use a proper mixture of active and passive voice. Use active voice as far as possible. Use passive voice only when the agent is unimportant, when the agent is not known, when we do not want to state who the agent is and while stating a generally-held belief. Improper use of passive voice can lead to distortion in meaning, roundabout phrasing and ambiguity. Active writing does not have to be personal. It may be desirable to ensure objectivity in scientific and technical writing, but not at the cost of clarity. Avoid it...that constructions. 
  5. Punctuation
    1. Use right punctuation at the right places. Punctuations are an integral part of written communication.
    2. Add comma after discourse markers such as however, well, since, ... 
  6. Tone - Avoid cheap attempts at producing user-friendly text that may come across to the reader as overly-patronizing. A comfortable, conversational user-friendly tone is best produced by use of simple vocabulary in direct address to the user.

Paper Writing - Content

Writing in the Sciences, Kristin Sainani
  1. Talk to others about your research
  2. Make sure you have something to say
  3. Logically explain what you have to say
Steps in Paper Writing 
  1. Pre-writing (content dump) - 70%
  2. Writing - 10%
  3. Revision (rewriting, formatting) - 20%
How to Write a Great Scientific Paper
  1. Don’t wait, just start writing - forces us to be clear and focused; open the way to initiate dialogue with others; a way to develop ideas, instead of an output medium 
  2. Identify your key idea - a paper is an idea-conveying mechanism; the paper should have exactly one clear, sharp idea; “the main idea of this paper is...”, “in this section we present the main contributions of the paper.”  
  3. Tell a story
    1. narrative flow: (1) here is a problem, (2) it is an interesting problem, (3) it is an unsolved problem, (4) here is my idea, (5) my idea works, (6) here’s how my idea compares to other people’s approaches
    2. structure: (1) title, (2) abstract, (3) introduction, (4) the problem, (5) my idea, (6) the details, (7) related work, (8) conclusions and future works 
  4. Nail your contributions to the mast - first page of the paper is very important; describe the problem and state your contributions; use an example; idea vs. contribution; 
  5. Related work - put it at the end of the paper; explain your idea first, along with details and then compare to existing work; acknowledge weaknesses in your approach; be generous to the competition; 
  6. Put your readers first - use examples, figures well
  7. Listen to your readers - internal review; experts and non-experts; use the guinea pigs carefully; “just put one mark where you get lost and then we’ll talk about it”;

Friday, 3 May 2019

Standard Errors, Statistics in Plain English, Timothy C. Urdan

Chapter 6: Standard Error

Sampling distribution of a statistic s - The distribution of s over all samples randomly drawn from the population.
Mean of sampling distribution of a statistic s - It is known as the expected value of s because it is same as corresponding parameter of population. For example, the mean of sampling distribution of mean = population mean; the mean of sampling distribution of standard deviation = population standard deviation

Standard deviation of sampling distribution of a statistic s - It is known as the standard error of s. Standard error is very useful is assessing how well a sample randomly drawn from a population represents the population. It is better to have a sample that represents the population well.

Central Limit Theorem (CLT) - According to the CLT, sampling distribution of the mean will be normally distributed, as long as the sample size is reasonably large. 

Q: Given a sample, how likely does it resemble the population? 
A: Using CLT, compute z-score of the sample mean and then compute probability of the z-score from normal distribution

Q: What to do in case the sample size is very small?
A: Use the family of t distributions instead of the normal distribution. For large samples (e.g. >120) use the normal distribution itself.

Note: z score for normal distribution; t value for t distributions

Sampling error
Chapter 7: Statistical Significance 
Statistical Testing - It is known as hypothesis testing (Neyman and Pearson) and significance testing (Fischer). 

An example of a statistical test is one-sample t Test is used to check how well the sample represents the population. An example is when we might want to know whether a random sample of 50 viewers of a television show represents the correct viewer sentiment i.e. all the viewers of the television show.

Statistical Significance - A phenomenon observed in a sample is statistically significant if it has meaningful implications for the population. For example, if the difference between the sample mean and population mean is statistically significant, then the sample might not be a good representation of the population.

Effect Size - Statistical significance is greatly influenced by the sample size. Effect size alleviates this problem and measures the statistical significance without taking into consideration the sample size.

Confidence Interval - CI provides range of values of a population parameter. It is computed from a sample.

Alpha level is a cutoff level for reject the null hypothesis. Lower the alpha level, stronger the argument. It is commonly set to 0.05 (5%).

One-tailed  test vs. two-tailed test

Chapter 9: t Tests 
Independent Samples t Test - It is based on the sampling distribution of the difference between the means of two independent samples. An example is when we might want to know whether a random sample of 50 men differs significantly from a random sample of 50 women in their average enjoyment of a new television show.

Dependent Samples t Test - It is based on the sampling distribution of the difference between the means of two dependent samples. An example is when we might want to know whether a random sample of 50 men before retirement differs significantly from the same 50 men after retirement in their average enjoyment of a new television show.

Monday, 15 April 2019

മലയാളഭാഷാ വ്യാകരണം

Malayalam is a Dravidian language, spoken predominantly in the state of Kerala in India by about  38M people. The word order is subject-object-verb, as opposed to that of English. Malayalam has  been heavily influenced by other languages such as Sanskrit and Tamil. Malayalam, along with other  Dravidian languages, is classified as an agglutinative language by linguistics and the language uses  many inflections. 

സന്ധി - സന്ധി defines the joining of letters . There are many ways to perform സന്ധി in Malayalam: (1) അദേശസന്ധി (മരം+കൾ=മരങ്ങൾ), (2) ലോപസന്ധി  (വനം+മേഖല =വനമേഖല), (3)ആഗമസന്ധി (ദയ+ഉള്ള=ദയ+യ്+ഉള്ള), (4) ദ്വിത്വസന്ധി (അര+പട്ട=അരപ്പട്ട), ... 

സമാസം - സമാസം deals with the joining of words. ഉദാ: തീ തുപ്പുന്ന വണ്ടി = തീവണ്ടി, അച്ഛൻ + അമ്മ = അച്ഛനമ്മമാർ. An example for word composition in English is: attendant during flight = flight attendant 

അലങ്കാരം - അലങ്കാരം deals with figures of speech ഉദാ: ഉപമ, ശ്ലേഷം. It is of two types: (1) ശബ്ദാലങ്കാരം, (2) അർത്ഥാലങ്കാരം 

വൃത്തം - വൃത്തം  deals with prosody. 

Nouns - Nouns are inflected for case and number. Nouns are not inflected for gender in Malayalam. The cases in  Malayalam are as follows:
  1. Nominative (രാമൻ) - Nominative case always denote the subject of the sentence.
  2. Accusative (രാമനെ)  - Accusative noun denotes the object of the sentence. In sentences where there is a nominative, accusative and dative noun, the nominative will be the subject, the accusative the direct object and the dative, the indirect object.
  3. Sociative (രാമനോട്) - Sociative case is grammatically similar to accusative case, but semantically different. The  sociative nouns do not function in the role of experiencer but only as recipients.
  4. Dative (രാമന്, മേരിക്ക്) - In sentences where there is no nominative noun, the dative functions as the subject. In sentences involving both nominative and dative nouns, the latter functions as the indirect object.
  5. Instrumental (രാമനാൽ, വടികൊണ്ട്, വടിയിട്ട്)
  6. Genitive (മേരിയുടെ, രാമന്റെ)
  7. Locative (മുറിയിലേക്ക്, മുറിയിൽ, തണലത്തു, വെള്ളത്തിലൂടെ) - Locative case provides temporal and spatial meanings.
  8. Vocative (രാമാ, രാധേ)
Nominative (e.g. രാമൻ_NOM) + -
Dative (സീതയെ_DAT) + +
Accusative (e.g. പുസ്തകം_ACC) - +
Sociative (e.g. രാമനോട്) - +
Examples: (1) Alice loves Bob. രാമൻ_NOM സീതയെ_DAT ഇഷ്ടപ്പെടുന്നു. (2) Alice gave the book to Bob. രാജു_NOM രാധയ്ക്ക്_DAT ആ പുസ്തകം_ACC കൊടുത്തു. 

Verbs - Morphology of verbs in Malayalam is complex due to the rich agglutination. Verbs are inflected for tense, aspect, mood and voice. There is no inflection for gender, person or number.
Past ചെയ്തു (simple)
ചെയ്തുകൊണ്ടിരുന്നു (continuous)
ചെയ്യാറുണ്ടായിരുന്നു, ചെയ്തിരുന്നു (habitual) ചെയ്തിരുന്നു, ചെയ്തുകഴിഞ്ഞിരുന്നു (pluperfect)
Present ചെയുന്നു (simple)
ചെയ്തുകൊണ്ടിരിക്കുന്നു (continuous)
ചെയ്യാറുണ്ട് (habitual)
ചെയ്തിരിക്കുന്നു, ചെയ്തിട്ടുണ്ട്  (perfect)
Future ചെയ്യും (simple)
ചെയ്തുകൊണ്ടിരിക്കും (continuous)
ചെയ്തിരിക്കും, ചെയ്തുകാണും (perfect)
Mood (1) Indicative - ചെയുന്നു, ചെയ്തു, ... (2) Imperative - ചെയ്യണം, ചെയ്യ്, ...   (3) Interrogative - ചെയ്തോ, ചെയ്യാമോ, ചെയ്യുമോ, ... (4) Subjunctive - ചെയ്യുമായിരുന്നു, ചെയ്തേനെ, ചെയ്തിരുന്നെങ്കിൽ, … (5) Promissive - ചെയ്യും, ചെയ്തിരിക്കും (6) Possibility - ചെയ്യുമായിരിക്കും   (7) Ability - ചെയ്യാം (8) Obligation - ചെയ്യണം
Voice (1) Active - ചെയ്തു (2) Passive - ചെയ്യപ്പെട്ടു
(1) കേവലരൂപം - ചെയ്യുന്നു (2) പ്രയോജകരൂപം - ചെയ്യിക്കുന്നു

Further Reading
  1. Malayalam-English Dictionary, Hermann Gundert, 1872
  2. മലയാളഭാഷാവ്യാകരണം, Hermann Gundert, 1851
  3. കേരളപാണിനീയം, AR Raja Raja Varma, 1896
  4. The Essentials of Malayalam Grammar, L Garthwaite, 1903
  5. ശബ്ദശോധിനി, AR Raja Raja Varma, 1918 (2nd ed. of കേരളപാണിനീയം)
  6. Asher, R.E and Kumari, T.C. (1997) Malayalam, Routledge, London and New York.
  7. A Grammar of Malayalam, PhD Thesis, RSS Nair, 2012
  8. Malayalam Proverbs, Pilo Paul, 1902

Tuesday, 23 October 2018

Communication, Internal Seminar, HITS

Yesterday I attended a seminar led by our current science journalist Kerstin Hoppenhaus at HITS. It was remarkable because it made me think how communication works and why trustworthy journalism is important.

Journalists curate news in such a way that is interesting and digestible to the users. A piece of news can have a certain amount of bias, which might creep in consciously or unconsciously. Some of the tricks of the trade to seduce the users are (1) telling a story, (2) engaging the user, and (3) using techniques such as dramatic pause, captivating background score or a protagonist.

The format of delivery can be literary, auditory or visual. Different formats pose different challenges. For instance, consider the 360-degree video format in which the director can no longer direct the user through the intended narrative because of the shift in control from the director to the user. 

We then discussed how the Internet has disrupted the media industry. With the Internet, quick communication is possible. However a major challenge here is fake news. As a reader, how can you trust a piece of news that you see over the Internet? Some of the parameters are reputation of the news source, valid verifiable references, accuracy of the news content, and style of presentation.

Finally we discussed whether simplification of news pays off or not. Simplification improves the extend to which the news penetrates the audience. However over-simplification might have to let go off some necessary details.

Sunday, 2 September 2018

Mentalese, The Language Instinct, Steven Pinker

This post is a summary of the chapter "Mentalese" from the book "The Language Instinct" written by the eminent linguist Steven Pinker. This chapter deals with the question - Do we think in language? What is the language of thought?

According to the Sapir-Whorf hypothesis, language shapes our model of the world. For example, forms of addressing the listener are different in many languages depending on age or relation. The demands made by the language might influence the speakers to regard the listener with respect. In languages such as Hindi or Tamil, formalness is frequently employed in day-to-day communication. This is in contrast to other languages such as Malayalam, in which formalness is not commonly employed, in spite of the presence of formal and informal forms of address.

Pinker challenges this hypothesis in various ways. First, it is difficult to test this hypothesis because of the circular nature of the existing experiments. A subject can only be evaluated based on what he/she speaks. Second, the 'fact' that the Eskimos have hundreds of words for snow is nothing but an urban legend. According to one survey, the actual number is less than ten. So the Eskimos regard snow similar to the way others do, contrary to what the hypothesis can be taken to mean. Third, there are many beings that cannot speak but possess the faculty of thought: deaf people, babies, animals (, plants?). Fourth, many a time, even though we have a thought in our mind, we struggle to express it clearly. This can be due to two reasons: (1) lack of command over languages, or (2) lack of devices in language to convey our thoughts. Movies are arguably powerful means of art than, say, poem or music, because of the use of audio as well as video. Fifth, it is possible for us to transform (e.g. rotate or zoom in) an image in our minds without speaking about the process loudly. All of the above suggest that English or any other spoken language is not the language of thought.

Pinker claims the language of thought (a.k.a. Mentalese) to be on a level that is different from spoken languages. All thinking creatures use this language to think. Since only human beings possess the ability to talk, there is a program, similar to a compiler for programming languages, in the human brains that converts the high-level Mentalese to the low-level English.

Saturday, 11 August 2018

Chatterboxes, The Language Instinct, Steven Pinker

This post is a summary of the chapter "Chatterboxes" from the book "The Language Instinct" written by the eminent linguist Steven Pinker. This chapter deals with the statement in the title of the book - Is language an instinct for humans?

Language has never been a cultural phenomenon. Language exists and has existed in every culture on earth. No society can claim the title "the cradle of language." This is one of the arguments for claiming that language is innate in human beings. However, for the skeptics, the universality of language may not single-handedly prove the innate nature of language. For instance, Coca-Cola or Facebook are available almost everywhere on earth. Does this mean Coca-Cola and Facebook are innate too? (According to me, the desire to have a drink and to stay connected are innate in human beings. So the universality of language is a conclusive proof.)

Let's look at another argument for the innate nature of language. There are evidences to show that children reinvent language, not because they are asked to do so, but because they have to. Before going into the argument, it is important here to burst two myths about child language acquisition: 
    (1) The first myth is that children learn to speak from their parents (e.g. Motherese - dogggie, pappie, ...). This is not true because parents do not explicitly teach children the rules of the grammar. Chomsky reasoned that this argument of poverty of input is the primary justification for the saying that language is innate.
   (2) The second myth is that children learn to speak by imitating their parents. If this is true, then children should not be making any mistakes when their learn to talk. However children do make mistakes when they learn language (e.g. കാവളവണ്ടി, വെക്കള് )

Now it is time to look at two real-world cases where children reinvented language. One of them is the development of creole from pidgin languages such as pidgin English. Second one is the development of sign languages. In both these cases, phrases and crude sentences of a pseudo-language were converted to a bona fide language by the second generation of users i.e. the children of plantation workers and deaf children respectively.

Finally, another argument for the innate nature of language is that language is different from intelligence (or cognition). Those who suffer from Broca's aphasia are language-retarded. However they have sound cognitive skills. Those who suffer from chatterbox syndrome are language-savvy. However they have negligible cognitive skills. These two cases show that the ability to speak and the ability to, say, cook food are managed by different parts of the brain and hence different faculties.

Saturday, 4 August 2018

Talking Heads, The Language Instinct, Steven Pinker

This post is a summary of the chapter "Talking Heads" from the book "The Language Instinct" written by the eminent linguist Steven Pinker. The previous chapter in the book was a discussion on syntax. This chapter focuses more on semantics and pragmatics.

Parse trees are different from parsing.  Parse trees define the syntax or structure of sentences in a language. The process of parsing defines the processing of a sentences and thus is related to cognition and semantics of a language. Chomsky demonstrated this feature of language processing using the classic example "colorless green ideas sleep furiously". This sentence is syntactically right. However it is rife with absurdity and does not make sense at all. In other words, the sentence is semantically not right.

The chapter then discusses the differences between a human being and a machine and how they interpret a natural language sentence. There are various types of sentences: (1) onion (or Russian doll) sentences, (2) garden-path sentences, and (3) ambiguous sentences.
The first two types are hard for human beings because humans are not good at memory as compared to machines. By memory, we mean short-term memory, something similar to a stack for a machine. On the other hand, the last type is easy for human beings because humans are good at decision-making.  The decision making employs different kinds of knowledge such as background knowledge, commonsense knowledge and world knowledge. A classic example that demonstrates commonsense reasoning is given below:
             Woman: I'm leaving you.             Man: Who is he?
The converse is true for machines. Machines can process onion sentences and garden-path sentences because of the availability of memory. However they perform very badly when it comes to ambiguous sentences. In addition to this, machines are too meticulous in parsing a sentences and identifies interpretations that a human being will never detect (e.g. pigs in a pen). 

In real-life, the task of text processing is worsened by the fact that dialogues are filled with short utterances, lots of pronouns, and fillers such as 'uh' and 'hmm'.

The chapter concludes with a discussion on pragmatics. Parsing a sentence involves more than simply understanding the sentence syntactically. A conversation between two parties can either be co-operational or adversarial. In co-operational conversation, the assumptions made by the speaker are also made by the listener. This phenomenon is absent in adversarial conversation. Legal documents demonstrate a form on adversarial conversation by clearly specifying each and every nuances of a contract. The following anecdote demonstrates the difference between the co-operational and adversarial conversation. Two psychoanalysts meet in the morning. The first psychoanalyst greets the other "Good Morning." The other wonders what he really meant by that statement.

Friday, 3 August 2018

Neural networks, explained - Janelle Shane, Physics World

This post is a summary of  the article published in the 2018 issue of "Physics World." This article is written by Janelle Shane.

  1. They are excellent at recognizing patterns in multivariate data.
  2. They are suitable for problems that are not very well-understood. Traditional systems were either rule-based or feature-based. However manually coming up with rules or features is intellectual challenging and infeasible in many cases such as face recognition. Neural networks are good at feature engineering. 
  1. Interpretability is an issue with neural networks. A neural network acts like a black box because humans cannot easily interpret the the features learnt by the model.
  2. It is necessary to review the results by human experts because neural networks might learn features that are not at all relevant to the task at hand.
  3. Neural networks might suffer from class imbalance in training examples. This is a major issue in the case of rare events, for which it is hard to generate sufficient number of training examples. 
  4. Neural network might suffer from overfitting to training examples. Overfitting can be resolved by testing the network on unseen examples.
"Neural networks can be a very useful tool, but users must be careful not to trust them blindly. Their impressive abilities are a complement to, rather than a substitute for, critical thinking an human expertise."