Metadata-Version: 2.4
Name: pythainlp
Version: 5.3.0
Summary: Thai Natural Language Processing library
Project-URL: homepage, https://pythainlp.org/
Project-URL: source, https://github.com/PyThaiNLP/pythainlp.git
Project-URL: download, https://pypi.org/project/pythainlp/#files
Project-URL: changelog, https://github.com/PyThaiNLP/pythainlp/blob/dev/CHANGELOG.md
Project-URL: releasenotes, https://github.com/PyThaiNLP/pythainlp/releases
Project-URL: documentation, https://pythainlp.org/docs/
Project-URL: issues, https://github.com/PyThaiNLP/pythainlp/issues
Project-URL: Tutorials, https://pythainlp.org/tutorials/
Author: Korakot Chaovavanich, Charin Polpanumas, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, Can Udomcharoenchaikit
Author-email: Wannaphong Phatthiyaphaibun <wannaphong@pythainlp.org>, Arthit Suriyawongkul <suriyawa@tcd.ie>
Maintainer-email: Wannaphong Phatthiyaphaibun <wannaphong@pythainlp.org>, Arthit Suriyawongkul <suriyawa@tcd.ie>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: NLP,Thai NLP,Thai language,ThaiNLP,computational linguistics,linguistics,localization,natural language processing,pythainlp,text processing,tokenization
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: Thai
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Localization
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Requires-Dist: importlib-resources; python_version < '3.11'
Requires-Dist: tzdata; sys_platform == 'win32'
Provides-Extra: abbreviation
Requires-Dist: khamyo>=0.2.0; extra == 'abbreviation'
Provides-Extra: attacut
Requires-Dist: attacut>=1.0.6; extra == 'attacut'
Provides-Extra: attaparse
Requires-Dist: attaparse>=1.0.0; extra == 'attaparse'
Provides-Extra: benchmarks
Requires-Dist: numpy>=1.22; extra == 'benchmarks'
Requires-Dist: pandas>=0.24; extra == 'benchmarks'
Requires-Dist: pyyaml>=5.4.1; extra == 'benchmarks'
Provides-Extra: budoux
Requires-Dist: budoux>=0.7.0; extra == 'budoux'
Provides-Extra: compact
Requires-Dist: nlpo3>=1.4.0; extra == 'compact'
Requires-Dist: numpy>=1.26.0; extra == 'compact'
Requires-Dist: pyicu>=2.3; extra == 'compact'
Requires-Dist: python-crfsuite>=0.9.7; extra == 'compact'
Requires-Dist: pyyaml>=5.4.1; extra == 'compact'
Provides-Extra: coreference-resolution
Requires-Dist: fastcoref>=2.1.5; extra == 'coreference-resolution'
Requires-Dist: spacy>=3.0; extra == 'coreference-resolution'
Provides-Extra: dependency-parsing
Requires-Dist: attaparse>=1.0.0; extra == 'dependency-parsing'
Requires-Dist: spacy-thai>=0.7.1; extra == 'dependency-parsing'
Requires-Dist: transformers>=4.22.1; extra == 'dependency-parsing'
Requires-Dist: ufal-chu-liu-edmonds>=1.0.2; extra == 'dependency-parsing'
Provides-Extra: dev
Requires-Dist: black>=25.11.0; extra == 'dev'
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: bump-my-version>=1.2.6; extra == 'dev'
Requires-Dist: coverage>=7.10.7; extra == 'dev'
Requires-Dist: flake8-type-checking>=3.2.0; extra == 'dev'
Requires-Dist: flake8>=7.0.0; extra == 'dev'
Requires-Dist: mypy>=1.19.1; extra == 'dev'
Requires-Dist: pylint>=4.0.0; extra == 'dev'
Requires-Dist: ruff>=0.14.14; extra == 'dev'
Requires-Dist: tox>=4.30.3; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx-copybutton>=0.5.2; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=3.1.0; extra == 'docs'
Requires-Dist: sphinx>=6.2; extra == 'docs'
Provides-Extra: el
Requires-Dist: multiel>=0.5; extra == 'el'
Provides-Extra: esupar
Requires-Dist: esupar>=1.3.8; extra == 'esupar'
Requires-Dist: numpy>=1.22; extra == 'esupar'
Requires-Dist: transformers>=4.22.1; extra == 'esupar'
Provides-Extra: extra
Requires-Dist: bpemb>=0.3.2; extra == 'extra'
Requires-Dist: budoux>=0.7.0; extra == 'extra'
Requires-Dist: gensim>=4.0.0; extra == 'extra'
Requires-Dist: nltk>=3.3; extra == 'extra'
Requires-Dist: pandas>=0.24; extra == 'extra'
Requires-Dist: ssg>=0.0.8; extra == 'extra'
Requires-Dist: symspellpy>=6.7.6; extra == 'extra'
Requires-Dist: tltk>=1.10; extra == 'extra'
Provides-Extra: full
Requires-Dist: attacut==1.0.6; extra == 'full'
Requires-Dist: attaparse==1.0.0; extra == 'full'
Requires-Dist: bpemb<0.4,>=0.3.6; extra == 'full'
Requires-Dist: budoux==0.7.0; extra == 'full'
Requires-Dist: deepcut==0.7.0.0; extra == 'full'
Requires-Dist: emoji<1,>=0.6.0; extra == 'full'
Requires-Dist: epitran==1.26.0; extra == 'full'
Requires-Dist: esupar<2,>=1.3.9; extra == 'full'
Requires-Dist: fairseq-fixed<0.13,==0.12.3.1; (python_version >= '3.11') and extra == 'full'
Requires-Dist: fairseq<0.13,>=0.10.0; (python_version < '3.11') and extra == 'full'
Requires-Dist: fastai<2,>=1.0.61; extra == 'full'
Requires-Dist: fastcoref==2.1.6; extra == 'full'
Requires-Dist: gensim<5,>=4.3.3; extra == 'full'
Requires-Dist: khamyo>=0.3.0; extra == 'full'
Requires-Dist: khanaa<1,>=0.1.1; extra == 'full'
Requires-Dist: nlpo3>=1.4.0; extra == 'full'
Requires-Dist: nltk<4,>=3.6.6; extra == 'full'
Requires-Dist: numpy<3,>=1.26.0; extra == 'full'
Requires-Dist: onnxruntime>=1.10.0; extra == 'full'
Requires-Dist: oskut>=1.3; extra == 'full'
Requires-Dist: pandas<3,>=2.2.0; extra == 'full'
Requires-Dist: panphon==0.22.2; extra == 'full'
Requires-Dist: phunspell==0.1.6; extra == 'full'
Requires-Dist: pyicu<3,>=2.15.2; extra == 'full'
Requires-Dist: python-crfsuite==0.9.12; extra == 'full'
Requires-Dist: pyyaml<6.0.2,>=5.4.1; extra == 'full'
Requires-Dist: sacremoses==0.1.1; extra == 'full'
Requires-Dist: sefr-cut>=1.1; extra == 'full'
Requires-Dist: sentence-transformers<3,>=2.7.0; extra == 'full'
Requires-Dist: sentencepiece==0.2.1; extra == 'full'
Requires-Dist: spacy-thai==0.7.8; extra == 'full'
Requires-Dist: spacy<4,==3.8.7; extra == 'full'
Requires-Dist: ssg==0.0.8; extra == 'full'
Requires-Dist: symspellpy==6.9.0; extra == 'full'
Requires-Dist: thai-nner==0.3; extra == 'full'
Requires-Dist: tltk<2,>=1.10; extra == 'full'
Requires-Dist: torch<3,>=1.13.1; extra == 'full'
Requires-Dist: transformers==4.57.6; extra == 'full'
Requires-Dist: ufal-chu-liu-edmonds==1.0.3; extra == 'full'
Requires-Dist: word2word<2,>=1.0.0; extra == 'full'
Requires-Dist: wtpsplit==1.3.0; extra == 'full'
Requires-Dist: wunsen==0.0.3; extra == 'full'
Provides-Extra: generate
Requires-Dist: fastai<2.0; extra == 'generate'
Provides-Extra: icu
Requires-Dist: pyicu>=2.3; extra == 'icu'
Provides-Extra: ipa
Requires-Dist: epitran>=1.1; extra == 'ipa'
Provides-Extra: ml
Requires-Dist: numpy>=1.22; extra == 'ml'
Requires-Dist: torch>=1.0.0; extra == 'ml'
Provides-Extra: mt5
Requires-Dist: sentencepiece>=0.1.91; extra == 'mt5'
Requires-Dist: transformers>=4.22.1; extra == 'mt5'
Provides-Extra: nlpo3
Requires-Dist: nlpo3>=1.4.0; extra == 'nlpo3'
Provides-Extra: noauto-cython
Requires-Dist: phunspell>=0.1.6; extra == 'noauto-cython'
Provides-Extra: noauto-network
Requires-Dist: huggingface-hub>=0.16.0; extra == 'noauto-network'
Provides-Extra: noauto-onnx
Requires-Dist: numpy>=1.26.0; extra == 'noauto-onnx'
Requires-Dist: onnxruntime>=1.10.0; extra == 'noauto-onnx'
Requires-Dist: oskut>=1.3; extra == 'noauto-onnx'
Requires-Dist: sefr-cut>=1.1; extra == 'noauto-onnx'
Provides-Extra: noauto-tensorflow
Requires-Dist: deepcut>=0.7.0; extra == 'noauto-tensorflow'
Requires-Dist: numpy>=1.26.0; extra == 'noauto-tensorflow'
Provides-Extra: noauto-torch
Requires-Dist: attacut>=1.0.6; extra == 'noauto-torch'
Requires-Dist: numpy>=1.26.0; extra == 'noauto-torch'
Requires-Dist: sentencepiece>=0.1.91; extra == 'noauto-torch'
Requires-Dist: thai-nner>=0.3; extra == 'noauto-torch'
Requires-Dist: tltk>=1.10; extra == 'noauto-torch'
Requires-Dist: torch>=1.13.1; extra == 'noauto-torch'
Requires-Dist: transformers>=4.22.1; extra == 'noauto-torch'
Requires-Dist: wtpsplit>=1.0.1; extra == 'noauto-torch'
Provides-Extra: onnx
Requires-Dist: numpy>=1.22; extra == 'onnx'
Requires-Dist: onnxruntime>=1.10.0; extra == 'onnx'
Requires-Dist: sentencepiece>=0.1.91; extra == 'onnx'
Provides-Extra: oskut
Requires-Dist: oskut>=1.3; extra == 'oskut'
Provides-Extra: qwen3
Requires-Dist: torch>=1.9.0; extra == 'qwen3'
Requires-Dist: transformers>=4.22.1; extra == 'qwen3'
Provides-Extra: sefr-cut
Requires-Dist: sefr-cut>=1.1; extra == 'sefr-cut'
Provides-Extra: spacy-thai
Requires-Dist: spacy-thai>=0.7.1; extra == 'spacy-thai'
Provides-Extra: spell
Requires-Dist: phunspell>=0.1.6; extra == 'spell'
Requires-Dist: symspellpy>=6.7.6; extra == 'spell'
Provides-Extra: ssg
Requires-Dist: ssg>=0.0.8; extra == 'ssg'
Provides-Extra: textaugment
Requires-Dist: bpemb>=0.3.2; extra == 'textaugment'
Requires-Dist: gensim>=4.0.0; extra == 'textaugment'
Provides-Extra: thai-nner
Requires-Dist: thai-nner>=0.3; extra == 'thai-nner'
Provides-Extra: thai2fit
Requires-Dist: emoji>=0.5.1; extra == 'thai2fit'
Requires-Dist: gensim>=4.0.0; extra == 'thai2fit'
Requires-Dist: numpy>=1.22; extra == 'thai2fit'
Provides-Extra: thai2rom
Requires-Dist: numpy>=1.22; extra == 'thai2rom'
Requires-Dist: torch>=1.0.0; extra == 'thai2rom'
Provides-Extra: transformers-ud
Requires-Dist: transformers>=4.22.1; extra == 'transformers-ud'
Requires-Dist: ufal-chu-liu-edmonds>=1.0.2; extra == 'transformers-ud'
Provides-Extra: translate
Requires-Dist: fairseq-fixed<0.13,==0.12.3.1; (python_version >= '3.11') and extra == 'translate'
Requires-Dist: fairseq<0.13,>=0.10.0; (python_version < '3.11') and extra == 'translate'
Requires-Dist: sacremoses>=0.0.41; extra == 'translate'
Requires-Dist: sentencepiece>=0.1.91; extra == 'translate'
Requires-Dist: torch>=1.0.0; extra == 'translate'
Requires-Dist: transformers>=4.22.1; extra == 'translate'
Requires-Dist: word2word>=1.0.0; extra == 'translate'
Provides-Extra: wangchanberta
Requires-Dist: sentencepiece>=0.1.91; extra == 'wangchanberta'
Requires-Dist: transformers>=4.22.1; extra == 'wangchanberta'
Provides-Extra: wangchanglm
Requires-Dist: pandas>=0.24; extra == 'wangchanglm'
Requires-Dist: sentencepiece>=0.1.91; extra == 'wangchanglm'
Requires-Dist: transformers>=4.22.1; extra == 'wangchanglm'
Provides-Extra: word-approximation
Requires-Dist: panphon>=0.20.0; extra == 'word-approximation'
Provides-Extra: wordnet
Requires-Dist: nltk>=3.3; extra == 'wordnet'
Provides-Extra: wsd
Requires-Dist: sentence-transformers>=2.2.2; extra == 'wsd'
Provides-Extra: wtp
Requires-Dist: transformers>=4.22.1; extra == 'wtp'
Requires-Dist: wtpsplit>=1.0.1; extra == 'wtp'
Provides-Extra: wunsen
Requires-Dist: wunsen>=0.0.3; extra == 'wunsen'
Description-Content-Type: text/markdown

# PyThaiNLP: Thai Natural Language Processing in Python

![Project Logo](./docs/images/logo.png)

[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3519354.svg)](https://doi.org/10.5281/zenodo.3519354)
[![Project Status: Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Codacy Grade](https://app.codacy.com/project/badge/Grade/5821a0de122041c79999bbb280230ffb)](https://www.codacy.com/gh/PyThaiNLP/pythainlp/dashboard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=PyThaiNLP/pythainlp&amp;utm_campaign=Badge_Grade)
[![Coverage Status](https://coveralls.io/repos/github/PyThaiNLP/pythainlp/badge.svg?branch=dev)](https://coveralls.io/github/PyThaiNLP/pythainlp?branch=dev)
[![Google Colab Badge](https://badgen.net/badge/Launch%20Quick%20Start%20Guide/on%20Google%20Colab/blue?icon=terminal)](https://colab.research.google.com/github/PyThaiNLP/tutorials/blob/master/source/notebooks/pythainlp_get_started.ipynb)
[![Facebook](https://img.shields.io/badge/Facebook-0866FF?style=flat&logo=facebook&logoColor=white)](https://www.facebook.com/pythainlp/)
[![Chat on Matrix](https://matrix.to/img/matrix-badge.svg)](https://matrix.to/#/#thainlp:matrix.org)

[pythainlp.org](https://pythainlp.org/)
| [Tutorials](https://pythainlp.org/tutorials)
| [License info](https://pythainlp.org/dev-docs/notes/license.html)
| [Model cards](https://github.com/PyThaiNLP/pythainlp/wiki/Model-Cards)
| [Adopters](https://github.com/PyThaiNLP/pythainlp/blob/dev/INTHEWILD.md)
| *[เอกสารภาษาไทย](https://github.com/PyThaiNLP/pythainlp/blob/dev/README_TH.md)*

Designed to be a Thai-focused counterpart to [NLTK](https://www.nltk.org/),
**PyThaiNLP** provides standard tools for linguistic analysis under
an Apache-2.0 license, with its data and models covered by CC0-1.0
and CC-BY-4.0.

```sh
pip install pythainlp
```

| Version | Python version | Changes | Documentation |
|:-------:|:--------------:|:-------:|:-------------:|
| [5.3.0](https://github.com/PyThaiNLP/pythainlp/releases) | 3.9+ | [Log](https://github.com/PyThaiNLP/pythainlp/issues/1080) | [pythainlp.org/docs](https://pythainlp.org/docs) |
| [`dev`](https://github.com/PyThaiNLP/pythainlp/tree/dev) | 3.9+ | [Log](https://github.com/PyThaiNLP/pythainlp/issues/1169) | [pythainlp.org/dev-docs](https://pythainlp.org/dev-docs/) |

## Features

- **Linguistic units:** Sentence, word, and subword segmentation
  (`sent_tokenize`, `word_tokenize`, `subword_tokenize`).
- **Tagging:** Part-of-speech tagging (`pos_tag`).
- **Transliteration:** Romanization (`transliterate`) and IPA conversion.
- **Correction:** Spelling suggestion and correction (`spell`, `correct`).
- **Utilities:** Soundex, collation, number-to-text (`bahttext`), datetime
  formatting (`thai_strftime`), and keyboard layout correction.
- **Data:** Built-in Thai character sets, word lists, and stop words.
- **CLI:** Command-line interface via `thainlp`.

  ```sh
  thainlp data catalog  # List datasets
  thainlp help          # Show usage
  ```

## Installation options

To install with specific extras (e.g., `translate`, `wordnet`, `full`):

```sh
pip install "pythainlp[extra1,extra2,...]"
```

Possible `extras` included:

- `compact` — install a stable and small subset of dependencies (recommended)
- `translate` — machine translation support
- `wordnet` — WordNet support
- `full` — install all optional dependencies (may introduce conflicts)

The documentation website maintains the
[full list of extras](https://pythainlp.org/dev-docs/notes/installation.html).
To see the specific libraries included in each extra,
please inspect the `[project.optional-dependencies]` section of
[`pyproject.toml`](https://github.com/PyThaiNLP/pythainlp/blob/dev/pyproject.toml).

## Environment variables

| Variable | Description | Status |
|---|---|---|
| `PYTHAINLP_DATA` | Path to the data directory (default: `~/pythainlp-data`). | Current |
| `PYTHAINLP_DATA_DIR` | Legacy alias for `PYTHAINLP_DATA`. Emits a `DeprecationWarning`. Setting both raises `ValueError`. | Deprecated; use `PYTHAINLP_DATA` |
| `PYTHAINLP_OFFLINE` | Set to `1` to disable automatic corpus downloads. Explicit `download()` calls still work. | Current |
| `PYTHAINLP_READ_ONLY` | Set to `1` to enable read-only mode, which prevents implicit background writes to PyThaiNLP's internal data directory (corpus downloads, catalog updates, directory creation). Explicit user-initiated saves to user-specified paths are unaffected. | Current |
| `PYTHAINLP_READ_MODE` | Legacy alias for `PYTHAINLP_READ_ONLY`. Emits a `DeprecationWarning`. Setting both raises `ValueError`. | Deprecated; use `PYTHAINLP_READ_ONLY` |

### Data directory

PyThaiNLP downloads data (see the data catalog `db.json` at
[pythainlp-corpus](https://github.com/PyThaiNLP/pythainlp-corpus))
to `~/pythainlp-data` by default.
Set the `PYTHAINLP_DATA` environment variable to override this location.
(`PYTHAINLP_DATA_DIR` is still accepted but deprecated.)

When using PyThaiNLP in distributed computing environments
(e.g., Apache Spark), set the `PYTHAINLP_DATA` environment variable
inside the function that will be distributed to worker nodes.
See details in
[the documentation](https://pythainlp.org/dev-docs/notes/installation.html).

### Offline mode

Set `PYTHAINLP_OFFLINE=1` to disable **automatic** corpus downloads.
When this variable is set and a corpus is not already cached locally,
a `FileNotFoundError` is raised instead of attempting a network download.
Explicit calls to `pythainlp.corpus.download()` are unaffected.
Use `pythainlp.is_offline_mode()` to check the current state programmatically.

```python
import pythainlp
print(pythainlp.is_offline_mode())  # True if PYTHAINLP_OFFLINE=1
```

### Read-only mode

Set `PYTHAINLP_READ_ONLY=1` to prevent implicit background writes to PyThaiNLP's
internal data directory. This blocks corpus downloads, catalog updates, and
automatic data directory creation — writes that happen as side effects the user
may not be aware of.

> **Note:** Read-only mode is more restrictive than offline mode.
> `PYTHAINLP_OFFLINE=1` blocks only *automatic* downloads triggered by
> `get_corpus_path()`; explicit `pythainlp.corpus.download()` calls still work.
> `PYTHAINLP_READ_ONLY=1` also blocks explicit `download()` calls, because any
> download requires writing to the data directory.
> Use `PYTHAINLP_READ_ONLY` when the data directory is on a read-only file system
> (e.g., a read-only Docker volume or a shared cluster mount).

Operations where the user explicitly specifies an output path are unaffected
(e.g., `model.save("path")`, `tagger.train(..., save_loc="path")`,
`thainlp misspell --output myfile.txt`).

Use `pythainlp.is_read_only_mode()` to check the current state programmatically.

```python
import pythainlp
print(pythainlp.is_read_only_mode())  # True if PYTHAINLP_READ_ONLY=1
```

## Testing

We test core functionalities on all officially supported Python versions.

See [tests/README.md](./tests/README.md) for test matrix and other details.

## Contribute to PyThaiNLP

Please fork and create a pull request.
See [CONTRIBUTING.md](https://github.com/PyThaiNLP/pythainlp/blob/dev/CONTRIBUTING.md)
for guidelines and algorithm references.

## Citations

If you use `PyThaiNLP` library in your project,
please cite the software as follows:

> Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas,
> Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai.
> “PyThaiNLP: Thai Natural Language Processing in Python”.
> Zenodo, 2 June 2024. <https://doi.org/10.5281/zenodo.3519354>.

with this BibTeX entry:

```bibtex
@software{pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat",
    doi = {10.5281/zenodo.3519354},
    license = {Apache-2.0},
    month = jun,
    url = {https://github.com/PyThaiNLP/pythainlp/},
    version = {v5.0.4},
    year = {2024},
}
```

To cite our [NLP-OSS 2023](https://nlposs.github.io/2023/) academic paper,
please cite the paper as follows:

> Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas,
> Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai,
> Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit.
> 2023.
> [PyThaiNLP: Thai Natural Language Processing in Python.](https://aclanthology.org/2023.nlposs-1.4)
> In Proceedings of the 3rd Workshop for Natural Language Processing
> Open Source Software (NLP-OSS 2023),
> pages 25–36, Singapore, Singapore.
> Empirical Methods in Natural Language Processing.

with this BibTeX entry:

```bibtex
@inproceedings{phatthiyaphaibun-etal-2023-pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat  and
      Limkonchotiwat, Peerat  and
      Suntorntip, Thanathip  and
      Udomcharoenchaikit, Can",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.4",
    pages = "25--36",
    abstract = "We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.",
}
```

## Acknowledgements

PyThaiNLP was founded by Wannaphong Phatthiyaphaibun in 2016.
His contributions from 2021 were made during a PhD studentship supported by
[Vidyasirimedhi Institute of Science and Technology (VISTEC)][vistec].

The contributions of Arthit Suriyawongkul to PyThaiNLP
from November 2017 until August 2019 were funded by [Wisesight][].
His contributions from November 2019 until October 2024 were made during
a PhD studentship supported by
[Taighde Éireann – Research Ireland][researchireland]
under Grant Number 18/CRT/6224
([Research Ireland Centre for Research Training in Digitally-Enhanced Reality
(d-real)][dreal]).

The contributions of Pattarawat Chormai to PyThaiNLP from 2018 until 2019
were made during a research internship at the
[Natural Language Processing Lab,
Department of Linguistics, Faculty of Arts,
Chulalongkorn University][nlp-chula].

The contributions of Korakot Chaovavanich and Lalita Lowphansirikul
to PyThaiNLP from 2019 until 2022 were funded by the
[VISTEC-depa Thailand AI Research Institute][airesearch].

The Mac Mini M1 used for macOS testing was donated by [MacStadium][].
This hardware was essential for the project's testing suite from October 2022
to October 2023, filling a critical gap before GitHub Actions introduced
native support for Apple Silicon runners.

[vistec]: https://www.vistec.ac.th/
[airesearch]: https://airesearch.in.th/
[wisesight]: https://wisesight.com/
[researchireland]: https://www.researchireland.ie/
[dreal]: https://d-real.ie/
[nlp-chula]: https://attapol.github.io/lab.html
[macstadium]: https://www.macstadium.com/

![VISTEC-depa Thailand AI Research Institute](./docs/images/airesearch-logo.png)
![MacStadium](./docs/images/macstadium-logo.png)

We have only one official repository at
<https://github.com/PyThaiNLP/pythainlp>
and another mirror at
<https://gitlab.com/pythainlp/pythainlp>.

Beware of malware if you use code from places other than these two.

Made with ❤️ | PyThaiNLP Team 💻 | "We build Thai NLP" 🇹🇭
