Use Case

Text and unstructured content classification with natural language processing (NLP)

Text in documents such as reports, financial statements, invoices, and others can be extremely rich in information. Extracting content from these sources is often done manually and can be time-consuming due to the unstructured nature of text data. Scigility's expertise in NLP and machine learning and in analyzing and structuring text helps businesses to automate their manual processes, discover insights, and make sense of large amounts of data.

Challenge

Text classification assigns documents to an appropriate category. These categories can represent customer sentiment or subject areas related to customer questions and requests. The input data can be very diverse, ranging from PDF documents to customer emails and chatbot messages. Scigility proceeds as follows when implementing a text classification NLP use case and applying our frameworks:

  • Collecting unstructured data with text content is relatively easy; the main challenge is to assign proper labels to data items and build a good quality training sample.
  • Real-life text data and scanned documents contain multiple unstructured elements (tables, hyperlinks, images, etc.) that require setting multiple pre-processing pipelines.
  • Many popular language models are pre-trained on a large context, but they need fine-tuning of business domain-specific text as well to achieve and maintain robust performance.

Solution

Scigility puts special emphasis on building good quality training samples for NLP use cases.

  • We use solutions such as Label Studio and Prodigy, where the manual labeling and tagging tasks can be automated and made more efficient via pre-labeling.
  • We work with the popular BERT NLP model family, and we use the variety of pre-trained models available in model stores such as Hugging Face, spaCy, and fastText.
  • We develop and maintain efficient deployment pipelines to serve our models both on public cloud infrastructure (Azure ML pipelines, AWS Sagemaker) and private cloud or on-premise (Kubeflow, MLflow).

Used Methodology

Scigility Modern Data & AI Architecture
Scigility Data Driven Enablement
Scigility Use Cases Accelerator
Scigility MLOps & AI Industrialization
Learn more about the Scigility Framework

Used Technology

Label Studio, Prodigy, etc. for labeling and tagging automation
BERT NLP and pretrained models such as Hugging Face, spaCy, and fastText
AzureML, AWS Sagemaker, Databricks, MLflow, etc. for MLOPS
Learn more about our technologies

We look forward to speaking with you.

Do you have questions about a case, would you like a quote, or to get to know us better?

Or are you a data scientist, an awesome coder or a passionate engineer searching for a brilliant team and cool challenges?

Regardless of what you need, we're here for you.

Christof Studer
Business Developer
+41 44 214 62 89 sales@scigility.com
Federica Suardi
Recruiting
+41 44 214 62 89 jobs@scigility.com
Christian Gügi
Principal Engineer
+41 44 214 62 89 devs@scigility.com