机器学习：通过利用数据来构建大图|电子设计

What you’ll learn:

不同类型的机器学习。
了解监督和无监督的机器学习方法的类型。

Machine learning (ML) is a method of data analysis that automates analytical model building. It’s a branch of artificial intelligence (AI) based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention. ML algorithms build a model based on sample data, or training data, to make predictions or decisions without being programmed to accomplish any given task.

Such algorithms are used in myriad applications, including medicine, autonomous vehicles, speech recognition, and machine vision, where it’s difficult or unfeasible to utilize traditional algorithms to perform the required tasks. It’s also behind chatbots and predictive text, language-translation apps, and even the shows and movies recommended by Netflix.

When companies employ artificial-intelligence programs, chances are they’re using machine learning. So much so that the terms are often used interchangeably and sometimes ambiguously as an all-encompassing form of AI. This sub-field aims to create computer models that exhibit intelligent behaviors similar to humans, meaning they can recognize a visual scene, understand a text written in natural language, or perform an action in the real world.

机器学习形式

ML与计算统计数据有关，该计算统计侧重于使用计算机进行预测，但并非所有ML都是统计学习。ML的某些实现使用数据和神经网络以模仿生物大脑的工作方式。

The study of mathematical optimization provides methods, theory, and application domains to ML. Data mining is another related field of study, focusing on exploratory data analysis through unsupervised learning.

来that end, learning algorithms function on the basis that strategies, algorithms, and interpretations worked well in the past, so they’re likely to continue working well in the future. These inferences can be obvious, such as “since the sky is blue today, it will most likely be blue tomorrow.”

They also can be nuanced, meaning that, although the platform may be the same, there can be subtle differences within the subset. For example, if X number of families have geographically separate species with different color variants, there’s a good chance that several Y variants exist.

ML方法

Machine learning utilizes a decision-making process that produces results based on the input data, which can be labeled or unlabeled. Most are equipped with an error function that evaluates the prediction of the model.

如果有已知的示例，则错误函数可以进行比较以评估模型的准确性。如果模型可以更好地适合训练集中的数据点，则将调整权重以减少已知示例和模型估计之间的差异。该算法将重复评估和优化过程，自动更新权重，直到满足一定程度的准确性。

The methods（请参见上图）用于达到准确的结果分为四个主要类别：

Supervised learning

监督学习是通过使用标记的数据集来训练对数据进行分类或准确预测结果的算法来定义的。学习算法接收一组输入和相应的正确输出，该算法通过将其实际输出与正确输出进行比较以查找错误来学习。然后，它相应地修改模型。然后，使用交叉验证过程来确保模型避免过度拟合或不足。

监督学习可以帮助组织在大规模上解决各种现实世界问题，例如将垃圾邮件分类为与收件箱的单独文件夹中的垃圾邮件分类。监督学习中使用的一些方法包括神经网络，天真的贝叶斯，线性回归，逻辑回归，随机森林，支持向量机（SVM）等。

无监督的学习

无监督的学习是针对没有历史标签的数据，这意味着该系统没有被告知正确的答案，并且该算法必须弄清楚正在显示的内容。目标是探索数据并找到隐藏在其中的结构或模式。此方法在交易数据上很好地工作。

For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other.

Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering, and singular value decomposition. These algorithms also are used to segment text topics, recommend items, and identify data outliers. On top of that, they’re used to reduce the number of features in a model through the process of dimensionality reduction, principal component analysis (PCA), and singular value decomposition (SVD). Other algorithms applied in unsupervised learning include neural networks, probabilistic clustering methods, and more.

半监督学习

This approach to ML offers a happy medium between the supervised and unsupervised methods. During training, it uses a smaller labeled dataset to guide classification and feature extraction from a larger, unlabeled dataset.

这种类型的学习可以与分类，回归和预测等方法一起使用，并可以解决没有足够标记的数据（或无法负担得起足够标签数据）来培训监督学习算法的问题。当与标签相关的成本太高以至于允许完全标记的培训过程时，这也很有帮助。半监督学习的示例包括面部和对象识别。

强化学习

强化学习is often associated with robotics, autonomous vehicles, gaming, and navigation. This method enables the algorithm to discover, via trial and error, which actions produce the most significant rewards.

Three primary components are associated with this type of learning: the agent (the learner or decision-maker), the environment (everything the agent interacts with), and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent can reach the goal quickly by following a good policy. Thus, the goal in reinforcement learning is to learn the best policy.

Dimensionality reduction

降维的任务是减少the number of features in a dataset. Often, there are too many variables to process in ML tasks, such as regression or classification. These variables also are called features—the higher the number of features, the more difficult it is to model them. Moreover, some of these features can be redundant, adding unnecessary noise to the dataset.

Dimensionality reduction lowers the number of random variables under consideration by garnering a set of principal variables, which can then be divided into feature selection and feature extraction.

应用程序

许多现实世界的应用程序都利用了机器学习，包括人工神经网络（ANN），它们以其生物学对应物进行建模。这些由数千或数百万的处理节点组成，这些节点密集地互连以处理许多任务，包括语音识别/翻译，游戏，社交网络，医学诊断等。

With Facebook, for example, ML personalizes how a member’s feed is delivered. If the member regularly stops to read posts from certain groups, it will prioritize those activities earlier in the feed.

此外，ML用于语音应用中，包括语音到文本，该语音到文本使用自然语言处理（NLP）将人类语言转换为文本。还可以使用Siri和Alexa等数字助手找到，它们使用语音识别进行应用程序交互。自动化客户服务，推荐引擎，计算机视觉，气候科学甚至农业是许多其他应用程序之一。