Senior Design Team sdmay21-35 • A Part Of Speech tagger for software documentation


Project Description:
The purpose of this senior design project is to create a part of speech (POS) tagger which can accept software documentation in the form of HTML files. These files are then fed into a training pipeline which constructs a model based on them which can be used to tag future sets of documentation. The main problem that we are trying to solve with a newly created POS tagger is the issue that the current standard part of speech (POS) tagging solutions used in industry are not able to tag natural language alongside code. Our solution to this problem was an augmentation of the existing and commonly used Stanford NLP pipeline which is used for conventional natural language processing (NLP).

Project Vision:
Bring the power and flexibility of natural language processing to software documentation.
Create a Parts of Speech tagger for software documentation that will tag both English and parts of code, even when mixed heavily.
This has wide reaching benefits:
  • More data for training natural language <=> code generation
  • Ability to infer information from documentation
  • Possible auto generation of documentation

High Level Overview - News Report