my writing style

I occasionally read through what I previously wrote to review a topic or do grammar checks. After my recent round, I realized I have a specific writing style with quirks to fix. For example, I notice myself using adverbs before verbs, which makes me sound like a snake oil salesman. In some cases, my writing style tends toward polemic and pedantic rather than informative. Because of these patterns, I decided to perform an analysis of my writing style in order to understand how I write.

I used Spacy, a Python NLP library, to parse my Markdown files and tag the words with their parts of speech1. I then created a DataFrame containing the sentences in parts of speech form and a DataFrame containing each of the words alongside the metadata. The results are shown below.

Words

Frequency barplot of my parts of speech usage

The bar plot shows my ordered usage of different “parts of speech”. The interesting part of this is not the nouns, punctuations, verbs, pronouns, or the proper nouns. It is my usage of adpositions (ADP: in, to, during), determiners (DET: a, an, the), auxiliary (AUX: is, will, should), adverbs (ADV: there, before, where), coordinating conjunctions (CCONJ: is, to, but), and subordinating conjunctions (SCONJL: if, while, that).

ADP DET AUX ADV CCONJ SCONJ
of : 1895 the : 3745 is : 1321 so : 144 and : 1593 that : 375
in : 930 a : 1583 are : 397 just : 130 or : 353 how : 274
to : 633 an : 286 be : 351 even : 123 but : 346 if : 196
on : 494 this : 259 can : 263 more : 115 & : 43 because : 161
for : 478 these : 133 was : 257 then : 104 + : 35 when : 155
with : 385 some : 123 have : 153 here : 100 so : 12 as : 109
as : 295 that : 74 do : 146 also : 96 either : 9 why : 106
from : 213 all : 68 ’s : 138 only : 69 yet : 6 where : 97
about : 208 each : 67 being : 138 really : 64 nor : 6 whether : 29
at : 177 any : 66 would : 135 well : 57 both : 5 while : 28

The table shows my top 10 words used in each of those parts of speech. I recognize the conjunctions because I use them to say too much (CCONJ) and go on tangents (SCONJ). My use of adverbs, the superlatives and intensifiers mentioned earlier can lessen. The over-usage of passive voice auxiliary words with conjunctions leads to passive and long conditionals. The overuse of prepositions another bad habit.

Sentences

I had discovered some problem words to look out for: “that”, “but”, “or”, “just”, “really”, “more”, and “this”. My next goal was identifying specific sentences where I make the mistake of over-chaining adpositions, creating run-on sentences and writing passively.

Adverbs2

I’m going to remove all intensifiers. Again, they sound like a snake oil seller.

the sometimes often begrudgingly correct articles

just the content

UNICEF’s most efficient lifesaving programs

an entirely absurd topic

Auxiliary Verbs & Subjunctive Conditionals3

These verbs indicate writing in a “passive voice”, using “modal words”, and employing “hedging”. Using modal words directly lead to an abuse of subjunctive conditionals.

It would seem that the results are inconclusive

Linting + Checklist

As a result of this experiment, I have begun taking linting seriously for my blog. I previously made scripts in Python to check for specific rules, like every quote must have a figcaption under it, or all words in quotes must also be italicized. The problem is that they don’t provide what grammar-informed tools can provide.

Grammar-informed linters have formalized capabilities in parsing and running rules on the source “language4. The reason why I haven’t been actively linting is because there wasn’t one I needed. Markdownlint is a useful linter, but it targets Markdown and not words5. Proselint is good at identifying wrong word uses, but it targets words and not parts of speech.

My blog pre/post-upload checklist:

  1. Markdownlint
  2. Proselint
  3. PoS linter
  4. Reread

The file for the Jupyter Notebook is here


  1. It’s interesting how the parsing is done with specialized ML models instead of formalized grammar parsing tools. Reviewing the SpaCy made me reflect on what I already know with LL/LR parsing, BNF, and compiling. Though I never realized parser generators were specialized transpilers/compilers. 

  2. Temporal adverbs like so, then, and earlier are permitted. What I don’t like are intensifiers like just or really

  3. Auxiliary verbs are allowed except for modal auxiliary verbs like would or being. Any subjunctive is fine except for that

  4. My Python scripts do have RegEx, so they’re not far from a BNF parser generator. 

  5. It’s interesting how you can load/exec another file in Ruby.