415x Filetype PDF File size 2.00 MB Source: www.tutorialspoint.com
spaCy
i
spaCy
About the Tutorial
spaCy, developed by software developers Matthew Honnibal and Ines Montani, is an
open-source software library for advanced NLP (Natural Language Processing). It is written
in Python and Cython (C extension of Python which is mainly designed to give C like
performance to the Python language programs). spaCy is a relatively new framework but
one of the most powerful and advanced libraries used to implement NLP.
Audience
This tutorial will be useful for graduates, post-graduates, and research students who either
have an interest in NLP or have these subjects as a part of their curriculum. The reader
can be a beginner or an advanced learner.
Prerequisites
The reader must have basic knowledge about NLP and artificial intelligence. He/she should
also be aware about the basic terminologies used in English grammar and Python
programming concepts.
Copyright & Disclaimer
Copyright 2021 by Tutorials Point (I) Pvt. Ltd.
All the content and graphics published in this e-book are the property of Tutorials Point (I)
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish
any contents or a part of contents of this e-book in any manner without written consent
of the publisher.
We strive to update the contents of our website and tutorials as timely and as precisely as
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our
website or its contents including this tutorial. If you discover any errors on our website or
in this tutorial, please notify us at contact@tutorialspoint.com
i
spaCy
Table of Contents
About the Tutorial ............................................................................................................................................ i
Audience ........................................................................................................................................................... i
Prerequisites ..................................................................................................................................................... i
Copyright & Disclaimer ..................................................................................................................................... i
Table of Contents ............................................................................................................................................ ii
1. spaCy — Introduction ............................................................................................................................... 1
Extensions and visualisers ............................................................................................................................... 1
2. spaCy — Getting Started ........................................................................................................................... 4
3. spaCy — Models and Languages ............................................................................................................... 9
4. spaCy — Architecture ............................................................................................................................. 15
5. spaCy — Command Line Helpers ............................................................................................................. 18
6. spaCy — Top-level Functions .................................................................................................................. 32
7. spaCy — Visualization Function .............................................................................................................. 36
8. spaCy — Utility Functions ....................................................................................................................... 44
9. spaCy — Compatibility Functions ............................................................................................................ 59
10. spaCy — Containers ................................................................................................................................ 61
11. spaCy — Doc Class ContextManager and Property .................................................................................. 70
Retokenizer.split ............................................................................................................................................ 72
12. spaCy — Container Token Class .............................................................................................................. 78
13. spaCy — Token Properties ...................................................................................................................... 89
14. spaCy — Container Span Class ................................................................................................................ 95
15. spaCy — Span Class Properties ............................................................................................................. 103
16. spaCy — Container Lexeme Class .......................................................................................................... 110
17. spaCy — Training Neural Network Model ............................................................................................. 117
Steps for Training ........................................................................................................................................ 117
18. spaCy — Updating Neural Network Model ........................................................................................... 120
ii
1. spaCy — Introduction spaCy
In this chapter, we will understand the features, extensions and visualisers with regards
to spaCy. Also, a features comparison is provided which will help the readers in analysis
of the functionalities provided by spaCy as compared to Natural Language Toolkit (NLTK)
and coreNLP. Here, NLP refers to Natural Language Processing.
What is spaCy?
spaCy, which is developed by the software developers Matthew Honnibal and Ines
Montani, is an open-source software library for advanced NLP. It is written in Python and
Cython (C extension of Python which is mainly designed to give C like performance to the
Python language programs).
spaCy is a relatively a new framework but, one of the most powerful and advanced libraries
which is used to implement the NLP.
Features
Some of the features of spaCy that make it popular are explained below:
Fast: spaCy is specially designed to be as fast as possible.
Accuracy: spaCy implementation of its labelled dependency parser makes it one of the
most accurate frameworks (within 1% of the best available) of its kind.
Batteries included: The batteries included in spaCy are as follows:
Index preserving tokenization.
“Alpha tokenization” support more than 50 languages.
Part-of-speech tagging.
Pre-trained word vectors.
Built-in easy and beautiful visualizers for named entities and syntax.
Text classification.
Extensile: You can easily use spaCy with other existing tools like TensorFlow, Gensim,
scikit-Learn, etc.
Deep learning integration: It has Thinc-a deep learning framework, which is designed
for NLP tasks.
Extensions and visualisers
Some of the easy-to-use extensions and visualisers that comes with spaCy and are free,
open-source libraries are listed below:
Thinc: It is Machine Learning (ML) library optimised for Central Processing Unit (CPU)
usage. It is also designed for deep learning with text input and NLP tasks.
1
no reviews yet
Please Login to review.