my writing style
I occassionaly read through what I previously wrote to review a topic or do grammar checks. After my recent round, I realized I have a specific writing style with quirks to fix. For example, I notice myself using avberbs before verbs which makes me sound like a snake oil salesman. In some cases, my writing style tend toward polemic and pedandic rather than informative. Because of these patterns, I decided to perform an analysis of my writing style in order to understand how I write.
I used Spacy, a Python NLP library, to parse my markdown files and tag the words with their parts of speech1. I then created a DataFrame containing the sentences in PoS form and a DataFrame containing each of the words alongside some of the metadata. The results are shown below.
Words
| Tag | Description | Examples |
|---|---|---|
| ADJ | adjective | big, old, green, incomprehensible, first |
| ADP | adposition | in, to, during |
| ADV | adverb | very, tomorrow, down, where, there |
| AUX | auxiliary | is, has (done), will (do), should (do) |
| CONJ | conjunction | and, or, but |
| CCONJ | coordinating conjunction | and, or, but |
| DET | determiner | a, an, the |
| INTJ | interjection | psst, ouch, bravo, hello |
| NOUN | noun | girl, cat, tree, air, beauty |
| NUM | numeral | 1, 2017, one, seventy-seven, IV, MMXIV |
| PART | particle | ’s, not |
| PRON | pronoun | I, you, he, she, myself, themselves, somebody |
| PROPN | proper noun | Mary, John, London, NATO, HBO |
| PUNCT | punctuation | ., (, ), ? |
| SCONJ | subordinating conjunction | if, while, that |
| SYM | symbol | $, %, §, ©, +, −, ×, ÷, =, :), 😝 |
| VERB | verb | run, runs, running, eat, ate, eating |
| X | other | sfpksdpsxmsa |
| SPACE | space |

The barplot shows my ordered usage of different “Parts of Speech”. The interesting part of this is not the nouns, punctuations, verbs, pronouns, or the proper nouns. It is my usage of adpositions (ADP: in, to, during), determiners (DET: a, an, the), auxiliary (AUX: is, will, should), adverbs (ADV: there, before, where), coordinating conjunctions (CCONJ: is, to, but), and subordinating conjunctions (SCONJL: if, while, that).
| ADP | DET | AUX | ADV | CCONJ | SCONJ |
|---|---|---|---|---|---|
| • of — 1895 • in — 930 • to — 633 • on — 494 • for — 478 • with — 385 • as — 295 • from — 213 • about — 208 • at — 177 | • the — 3745 • a — 1583 • an — 286 • this — 259 • these — 133 • some — 123 • that — 74 • all — 68 • each — 67 • any — 66 | • is — 1321 • are — 397 • be — 351 • can — 263 • was — 257 • have — 153 • do — 146 • ‘s — 138 • being — 138 • would — 135 | • so — 144 • just — 130 • even — 123 • more — 115 • then — 104 • here — 100 • also — 96 • only — 69 • really — 64 • well — 57 | • and — 1593 • or — 353 • but — 346 • & — 43 • + — 35 • so — 12 • either — 9 • yet — 6 • nor — 6 • both — 5 | • that — 375 • how — 274 • if — 196 • because — 161 • when — 155 • as — 109 • why — 106 • where — 97 • whether — 29 • while — 28 |
The table shows my top 10 words used in each of those parts of speech. I recognize the conjunctions because I use them to say too much (CCONJ) and go on tangents (SCONJ). My use of adverbs, the superlatives and intensifiers mentioned earlier, can lesson. The over-usage of passive voice auxiliary words with conjunctions lead to passive and long conditionals. The last critique is the overuse of prepositionss is my worst-habi. All of these pointing to my “streams of consciousness” writing style.
Sentences
I had discovered some problem words too look out for: “that”, “but”, “or”, “just”, “really”, “more”, and “this”. My next goal was identifying specific sentences where I make the mistake of over-chaining adpositions, arguing against the sentence, creating nearly run-on sentences and writing passively.
Adverbs2
I’m going to remove all intensifiers; again they sound snake oil seller-like.
the sometimes often begrudgingly correct articles
just the content
UNICEF’s most efficient lifesaving programs
an entirely absurd topic
Auxiliary Verbs & Subjunctive Conditionals3
These verbs indicate writing in a “passive voice”, using “modal words”, and employing “hedging”. Using modal words directly lead to an abuse of subjunctive conditionals.
It would seem that the results are inconclusive
Linting + Checklist
As a result of this experiment, I have begun taking linting seriously for my blog. I already made scripts in Python to check for specific rules like every quote must have a figcaption under it or all words in quotes must also be italicized. They were good for their use cases.
Grammar-informed linters have formalized capabilities in parsing and running rules on the source “language”4. And rules exist for a reason. The reason why I haven’t been actively linting is because there wasn’t one I needed. Markdownlint is a useful linter, but it targets Markdown and not words5. Proselint is good in identifying wrong words uses, but it targets words and not parts of speech.
Now having PoS tagger for English allows me to customize a linter for the blog!
My blog pre/post-upload5 checklist:
- Markdownlint
- Proselint
- PoSlinter
- Reread
The file for the Jupyter Notebook is here
-
It’s interesting how the parsing is done with specialized ML models instead of formalized grammar parsing tools. Reviewing the library made me reflect on what I already know with LL/LR parsing, BNF, and compiling, but I never realized parser generators were specialized transpilers/compilers. ↩
-
Temporal adverbs like so, then, earlier, are permitted. What I don’t like are intensifiers. ↩
-
Auxiliary verbs are allowed except for modal auxiliary verbs. Any subjunctive is fine except for “just”. ↩
-
My Python scripts do have RegEx, so they’re not far from a BNF parser generator. ↩
-
It’s interesting how you can load/exec another file in Ruby. ↩ ↩2