Hi

“You know,” said Arthur, “it’s at times like this, when I’m trapped in a Vogon airlock with a man from Betelgeuse, and about to die of asphyxiation in deep space that I really wish I’d listened to what my mother told me when I was young.”
“Why, what did she tell you?”
“I don’t know, I didn’t listen.”

-THGTTG

Education

Bachelor of Computer Engineering.

University of Pune

MIT College of Engineering.
Score: 80.87%
Rank: 1 / 142
First Class w/ Distinction

Coursework

Year	Course	Score
I	Mathematics I	87.2
I	Mathematics II	83.0

Curriculum Vitae

VIEW AS A PDF

Final Year Project

Abstract: Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining...

Data Scientist - NLP, Riverus Technology Solutions

I started working in the legal tech field in 2017. I work towards solving all kinds of Natural Language Processing(NLP) tasks using Machine Learning- classification of text, clustering, generating summaries: abstractive and extractive, working with chatbots, topic modelling, NER and production ready ML pipelines. Extensive experience in PostgreSQL, Python, Tensorflow and Scikit learn frameworks, RESTful APIs and code deployment on servers(AWS EC2).

A Discourse on Precedents
Case Text Classification
A legal judgment was parsed into paragraph-sentence-word hierarchy. Sentences were classified into several classes- Issues, Facts, Arguments, Holdings and Outcomes, and Reasoning. This effectively reduced a big unstructured case into a structured, easy-to-interpret document. Random Forest model produced the best results. An example of parsed document can be found here: India Cements Ltd. vs. Commissioner of Income Tax.
Implementing Autocorrect for legal noun phrases
Working towards traversing the parse tree to extract various elements from a case. Extensive use of Spacy and NLTK. Use of PostgreSQL Full Text Search using GIN and GiST Index.
Implemented stored procedures, materialized views, triggers and notifications, and normalization. All of this is supported by Python APIs built using Flask hosted on an AWS server.
Notice how typing the word 'order' also fetches phrases with 'order' being the last word in the phrase
Project Severus
Project Severus(also seen above) is an analytics tool built in Redux/React and Python. With smooth annotations, faster report generation and a space for data science experimentation, Severus(an in-house tool) is used by over 15 people currently. API requests are handled by AWS lambdas. JWT token mechanism is integrated in the system. Releasing an open source version soon.
Process Orchestration
Triggers and stored procedures, analytics pipelines churning out quality data in real time, and pushing it to the product. Too lazy to write this properly right now, some time soon probably.
Parsing Grammar in legal texts to Extract Keyphrases
Working towards traversing the parse tree to extract various elements from a case. Extensive use of Spacy and NLTK.

Projects

A Novel Approach to Automatic Text Summarization

Abstract:
Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. An example of the use of summarization technology is search engines such as Google. Other examples include document summarization, image collection summarization and video summarization. Document summarization, tries to automatically create a representative summary or abstract of the entire document, by finding the most informative sentences. We implement an algorithm in generating extractive summaries and announce a standalone application for the same.
Text Generation using LSTM RNN
The Holy Bible was used as training corpus. A generative Recursive Neural Network was build to generate verses. Used Keras Framework in Python with accelerated GPU usage. Some of the verses generated are as follows:
"Aaron and the whole congregation said unto them would god that we had died in the land of sand"
"And he sail unto him in the sevent of the land and tone the fate of eveel"
"And the lord had made unto hohh abrer his wife she cool and after the nate when in the eroit of the ark and lose be air drown"

Request a copy

Black-Letter Law

Universally acknowledged non-IT laws and rules extracted from the IT corpus.

Read

Anatomy of an Indian Tax Case

Upon analysis, coherency and lucidity was noted in the writing style of HC(High Court) and SC(Supreme Court) judges. They start by clearly describing an issue, and then discussing it in detail before proceeding with another issue. On the other hand, there is a state of chaos in the ITAT(Tribunals) judgments with no such set pattern.

Read on Medium

Python + Flask + Nginx + EC2 = Awesomeness!

A place to start making those amazing ML pipelines production ready!

Read on Medium

The less boring stuff

Thoughts and shorts.

Explore Blog

Get in touch

Like Data Science, Seinfeld, tea or beer? Want to discuss a project? Hit me up.

Email
varunchitale@gmail.com
Social

Hi