my writing style
I occasionally read through what I previously wrote to review a topic or do grammar checks. After my recent round, I realized I have a specific writing style with quirks to fix. For example, I notice myself using adverbs before verbs, which makes me sound like a snake oil salesman. In some cases, my writing style tends toward polemic and pedantic rather than informative. Because of these patterns, I decided to perform an analysis of my writing style in order to understand how I write.
I used Spacy, a Python NLP library, to parse my Markdown files and tag the words with their parts of speech1. I then created a DataFrame containing the sentences in parts of speech form and a DataFrame containing each of the words alongside the metadata. The results are shown below.
Words

The bar plot shows my ordered usage of different “parts of speech”. The interesting part of this is not the nouns, punctuations, verbs, pronouns, or the proper nouns. It is my usage of adpositions (ADP: in, to, during), determiners (DET: a, an, the), auxiliary (AUX: is, will, should), adverbs (ADV: there, before, where), coordinating conjunctions (CCONJ: is, to, but), and subordinating conjunctions (SCONJL: if, while, that).
| ADP | DET | AUX | ADV | CCONJ | SCONJ |
|---|---|---|---|---|---|
| of : 1895 | the : 3745 | is : 1321 | so : 144 | and : 1593 | that : 375 |
| in : 930 | a : 1583 | are : 397 | just : 130 | or : 353 | how : 274 |
| to : 633 | an : 286 | be : 351 | even : 123 | but : 346 | if : 196 |
| on : 494 | this : 259 | can : 263 | more : 115 | & : 43 | because : 161 |
| for : 478 | these : 133 | was : 257 | then : 104 | + : 35 | when : 155 |
| with : 385 | some : 123 | have : 153 | here : 100 | so : 12 | as : 109 |
| as : 295 | that : 74 | do : 146 | also : 96 | either : 9 | why : 106 |
| from : 213 | all : 68 | ’s : 138 | only : 69 | yet : 6 | where : 97 |
| about : 208 | each : 67 | being : 138 | really : 64 | nor : 6 | whether : 29 |
| at : 177 | any : 66 | would : 135 | well : 57 | both : 5 | while : 28 |
The table shows my top 10 words used in each of those parts of speech. I recognize the conjunctions because I use them to say too much (CCONJ) and go on tangents (SCONJ). My use of adverbs, the superlatives and intensifiers mentioned earlier can lessen. The over-usage of passive voice auxiliary words with conjunctions leads to passive and long conditionals. The overuse of prepositions another bad habit.
Sentences
I had discovered some problem words to look out for: “that”, “but”, “or”, “just”, “really”, “more”, and “this”. My next goal was identifying specific sentences where I make the mistake of over-chaining adpositions, creating run-on sentences and writing passively.
Adverbs2
I’m going to remove all intensifiers. Again, they sound like a snake oil seller.
the sometimes often begrudgingly correct articles
(DET ADV ADV ADV ADJ NOUN)
just the content
(ADV DET NOUN)
UNICEF’s most efficient lifesaving programs
(PROPN PART ADV ADJ NOUN NOUN)
an entirely absurd topic
(DET ADV ADJ NOUN)
Auxiliary Verbs & Subjunctive Conditionals3
These verbs indicate writing in a “passive voice”, using “modal words”, and employing “hedging”. Using modal words directly lead to an abuse of subjunctive conditionals.
It would seem that the results are inconclusive
(PRON AUX VERB SCONJ DET NOUN AUX ADJ)
Linting + Checklist
As a result of this experiment, I have begun taking linting seriously for my blog. I previously made scripts in Python to check for specific rules, like every quote must have a figcaption under it, or all words in quotes must also be italicized. The problem is that they don’t provide what grammar-informed tools can provide.
Grammar-informed linters have formalized capabilities in parsing and running rules on the source “language”4. The reason why I haven’t been actively linting is because there wasn’t one I needed. Markdownlint is a useful linter, but it targets Markdown and not words5. Proselint is good at identifying wrong word uses, but it targets words and not parts of speech.
My blog pre/post-upload checklist:
- Markdownlint
- Proselint
- PoS linter
- Reread
The file for the Jupyter Notebook is here
-
It’s interesting how the parsing is done with specialized ML models instead of formalized grammar parsing tools. Reviewing the SpaCy made me reflect on what I already know with LL/LR parsing, BNF, and compiling. Though I never realized parser generators were specialized transpilers/compilers. ↩
-
Temporal adverbs like so, then, and earlier are permitted. What I don’t like are intensifiers like just or really. ↩
-
Auxiliary verbs are allowed except for modal auxiliary verbs like would or being. Any subjunctive is fine except for that. ↩
-
My Python scripts do have RegEx, so they’re not far from a BNF parser generator. ↩
-
It’s interesting how you can load/exec another file in Ruby. ↩