my writing style

I occassionaly read through what I previously wrote to review a topic or do grammar checks. After my recent round, I realized I have a specific writing style with quirks to fix. For example, I notice myself using avberbs before verbs which makes me sound like a snake oil salesman. In some cases, my writing style tend toward polemic and pedandic rather than informative. Because of these patterns, I decided to perform an analysis of my writing style in order to understand how I write.

I used Spacy, a Python NLP library, to parse my markdown files and tag the words with their parts of speech1. I then created a DataFrame containing the sentences in PoS form and a DataFrame containing each of the words alongside some of the metadata. The results are shown below.

Words

Tag Description Examples
ADJ adjective big, old, green, incomprehensible, first
ADP adposition in, to, during
ADV adverb very, tomorrow, down, where, there
AUX auxiliary is, has (done), will (do), should (do)
CONJ conjunction and, or, but
CCONJ coordinating conjunction and, or, but
DET determiner a, an, the
INTJ interjection psst, ouch, bravo, hello
NOUN noun girl, cat, tree, air, beauty
NUM numeral 1, 2017, one, seventy-seven, IV, MMXIV
PART particle ’s, not
PRON pronoun I, you, he, she, myself, themselves, somebody
PROPN proper noun Mary, John, London, NATO, HBO
PUNCT punctuation ., (, ), ?
SCONJ subordinating conjunction if, while, that
SYM symbol $, %, §, ©, +, −, ×, ÷, =, :), 😝
VERB verb run, runs, running, eat, ate, eating
X other sfpksdpsxmsa
SPACE space  

The barplot shows my ordered usage of different “Parts of Speech”. The interesting part of this is not the nouns, punctuations, verbs, pronouns, or the proper nouns. It is my usage of adpositions (ADP: in, to, during), determiners (DET: a, an, the), auxiliary (AUX: is, will, should), adverbs (ADV: there, before, where), coordinating conjunctions (CCONJ: is, to, but), and subordinating conjunctions (SCONJL: if, while, that).

ADP DET AUX ADV CCONJ SCONJ
• of — 1895
• in — 930
• to — 633
• on — 494
• for — 478
• with — 385
• as — 295
• from — 213
• about — 208
• at — 177
• the — 3745
• a — 1583
• an — 286
• this — 259
• these — 133
• some — 123
• that — 74
• all — 68
• each — 67
• any — 66
• is — 1321
• are — 397
• be — 351
• can — 263
• was — 257
• have — 153
• do — 146
• ‘s — 138
• being — 138
• would — 135
• so — 144
• just — 130
• even — 123
• more — 115
• then — 104
• here — 100
• also — 96
• only — 69
• really — 64
• well — 57
• and — 1593
• or — 353
• but — 346
• & — 43
• + — 35
• so — 12
• either — 9
• yet — 6
• nor — 6
• both — 5
• that — 375
• how — 274
• if — 196
• because — 161
• when — 155
• as — 109
• why — 106
• where — 97
• whether — 29
• while — 28

The table shows my top 10 words used in each of those parts of speech. I recognize the conjunctions because I use them to say too much (CCONJ) and go on tangents (SCONJ). My use of adverbs, the superlatives and intensifiers mentioned earlier, can lesson. The over-usage of passive voice auxiliary words with conjunctions lead to passive and long conditionals. The last critique is the overuse of prepositionss is my worst-habi. All of these pointing to my “streams of consciousness” writing style.

Sentences

I had discovered some problem words too look out for: “that”, “but”, “or”, “just”, “really”, “more”, and “this”. My next goal was identifying specific sentences where I make the mistake of over-chaining adpositions, arguing against the sentence, creating nearly run-on sentences and writing passively.

Adverbs2

I’m going to remove all intensifiers; again they sound snake oil seller-like.

the sometimes often begrudgingly correct articles

just the content

UNICEF’s most efficient lifesaving programs

an entirely absurd topic

Auxiliary Verbs & Subjunctive Conditionals3

These verbs indicate writing in a “passive voice”, using “modal words”, and employing “hedging”. Using modal words directly lead to an abuse of subjunctive conditionals.

It would seem that the results are inconclusive

Linting + Checklist

As a result of this experiment, I have begun taking linting seriously for my blog. I already made scripts in Python to check for specific rules like every quote must have a figcaption under it or all words in quotes must also be italicized. They were good for their use cases.

Grammar-informed linters have formalized capabilities in parsing and running rules on the source “language4. And rules exist for a reason. The reason why I haven’t been actively linting is because there wasn’t one I needed. Markdownlint is a useful linter, but it targets Markdown and not words5. Proselint is good in identifying wrong words uses, but it targets words and not parts of speech.

Now having PoS tagger for English allows me to customize a linter for the blog!

My blog pre/post-upload5 checklist:

  1. Markdownlint
  2. Proselint
  3. PoSlinter
  4. Reread

The file for the Jupyter Notebook is here


  1. It’s interesting how the parsing is done with specialized ML models instead of formalized grammar parsing tools. Reviewing the library made me reflect on what I already know with LL/LR parsing, BNF, and compiling, but I never realized parser generators were specialized transpilers/compilers. 

  2. Temporal adverbs like so, then, earlier, are permitted. What I don’t like are intensifiers. 

  3. Auxiliary verbs are allowed except for modal auxiliary verbs. Any subjunctive is fine except for “just”. 

  4. My Python scripts do have RegEx, so they’re not far from a BNF parser generator. 

  5. It’s interesting how you can load/exec another file in Ruby.  2