Noisy text is an electronically-stored communication that cannot be categorized properly by a text mining
software program. In an electronic document, noisy text is characterized by a discrepancy between
the letters and symbols in the HTML code and the author's intended meaning.
Noisy text does not comply with rules the program uses to identify and categorize words, phrases
and clauses in a particular language. Idiomatic expressions, abbreviations, acronyms and
business-specific lingo can all cause noisy text. It is particularly prevalent in the unstructured
text found in blog posts, chat conversations, discussion threads and SMS text
messages. Other potential causes include poor spelling and punctuation, typographical errors and
poor translations from optical (OCR) and speech recognition
programs.
See also: fuzzy
logic, noisy
data
This was last updated in May 2012
Email Alerts
Register now to receive SearchBusinessAnalytics.com-related news, tips and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
More News and Tutorials
-
Companies can use social media analytics tools to distill business insights from the abundance of data on social networks. That's the goal, at least.
-
NASA relies on numeric data to explain what incidents occur, but text data from incident reports and warning signals reveals why they do.
-
Tag clouds are dead, according to the presenters at the eighth annual Text Analytics Summit who made a case for advanced text visualization techniques.
-
Articles
-
Resources from around the Web
Join the conversationComment
Share
Comments
Results
Contribute to the conversation