 Research
 Open Access
 Published:
Social learning for resilient data fusion against data falsification attacks
Computational Social Networks volume 5, Article number: 10 (2018)
Abstract
Background
Internet of Things (IoT) suffers from vulnerable sensor nodes, which are likely to endure data falsification attacks following physical or cyber capture. Moreover, centralized decisionmaking and data fusion turn decision points into single points of failure, which are likely to be exploited by smart attackers.
Methods
To tackle this serious security threat, we propose a novel scheme for enabling distributed decisionmaking and data aggregation through the whole network. Sensor nodes in our scheme act following social learning principles, resembling agents within a social network.
Results
We analytically examine under which conditions local actions of individual agents can propagate through the network, clarifying the effect of Byzantine nodes that inject false information. Moreover, we show how our proposed algorithm can guarantee high network performance, even for cases when a significant portion of the nodes have been compromised by an adversary.
Conclusions
Our results suggest that social learning principles are well suited for designing robust IoT sensor networks and enabling resilience against data falsification attacks.
Background
Motivation
Internet of Things (IoT) is expected to play a central role in future digital society. However, to fully adopt this technology, it is crucial to guarantee its security, specially for public utilities whose safety is essential for the wellbeing of our society [1]. Recent cyberattacks that created significant damage have been widely reported, e.g. the selfpropagating malware WannaCry that caused a infamous worldwide network hack in May 2017 [2]. Developing technologies that can guarantee the safety of large information networks, such as IoT, is a challenging but urgent need. As information networks get more closely intertwined within our daily lives, ensuring their security and thus safety is becoming an even more challenging issue.
As the level of security is typically determined by the weakest link, a major dilemma of IoT security lies in the lowcomplexity sensor networks that are located at the network edge. These sensor networks are usually composed by a large number of autonomous electronic devices, which collect critical information for the control and operation of IoT [3, 4]. By monitoring extensive geographical areas, these networks can enable a wide range of services to society, becoming a key element for the wellbeing of future smart cities [5, 6]. These networks may also perform sensitive tasks, including the surveillance over military or secure zones, intrusion detection to private property, monitoring of drinkable water tanks and protection from chemical attacks [7, 8].
Although the design of secure wireless sensor networks have been widely studied (e.g. [9,10,11] and references therein), there remain many open problems of both theoretical and engineering nature [12]. In particular, as the number of sensors is usually very large, precise management of them is challenging or even infeasible. A significant portion of the sensors might be deployed in unprotected areas, where it is impossible to ensure their physical or cyber security (e.g. war zones, or regions easily accessed by adversaries). Furthermore, sensor nodes are generally not tamperproof due to cost restrictions, and have limited computing and networking capabilities. Therefore, they may not be capable of employing complex cryptographic or security protocols.
The vulnerability of sensor nodes makes them potential victims of cyber/physical attacks driven by intelligent adversaries. Attacks to information networks are usually categorized into outsider attacks and insider attacks. Outsider attacks include (distributed) denial of service (DoS) attacks, which use the broadcasting nature for wireless communications to disrupt the communications capabilities [10]. In contrast, in insider attacks the adversary “recruits” sensor nodes by malware through cyber/wireless means, or directly by physical substitution [13]. Following the classical Byzantine generals problem [14], these “Byzantine nodes” are authenticated, and recognized as valid members of the network. Byzantine nodes can hence generate false data, exhibit arbitrary behaviour, and collude with others to create network malfunctions. In general, insider attacks are considered to be more potentially harmful to information networks than outside attacks.
The effect of Byzantine nodes and data falsification over distributed sensor networks has been intensely studied; the impact over the network performance has been characterized, and various defense mechanisms has been proposed (c.f. [15] for an overview, and also [16,17,18,19,20] for some recent contributions). However, all these works focus on networks with star or tree topology, and rely on centralizing the decisionmaking in special nodes, called “fusion centers” (FCs), which gather all the sensed data. Therefore, a key element in these approaches is a strong division of labour: ordinary sensor nodes merely sense and forward data, while the processing is done exclusively at the FC corresponding to a distributedsensing/centralizedprocessing approach. This literature implicitly assume that the FCs are capable of executing secure coding and protocols, and hence, are out of the reach of attackers. However, large information networks might require another kind of mediator devices, known as data aggregators (DAs), which have the capability to access the cloud through highbandwidth communication links [21]. DAs are attractive targets for insider attacks, as they might also be located in unsafe locations due to the limited range of sensor node radios. Please note that a tampered DA can completely disable the sensing capabilities of all the nodes whose information has been aggregated, generating a single point of failure that is likely to be exploited by smart adversaries [22].
An attractive route to address this issue is to consider distributedsensing/distributedprocessing schemes, which avoid centralized decisionmaking by distributing processing tasks throughout the network [23]. However, the design of practical distributedsensing/distributedprocessing schemes is a challenging task, as collective computation phenomena usually exhibit highly nontrivial features [24, 25]. In effect, even though the distributedsensing literature is vast (for classic references c.f. [26,27,28], and more modern surveys see [3, 4, 29, 30]), the construction of optimal distributed schemes is in general NPhard [31]. Moreover, although in many scenarios the optimal schemes can be characterized as a set of thresholds for likelihood functions, the determination of these thresholds is usually an intractable problem [26]. For example, homogeneous thresholds can be suboptimal even for networks with similar sensors arranged in star topology [32], being only asymptotically optimal in the network size [33]. Moreover, symmetric strategies are not suitable for more complicated network topologies, requiring heuristic methods.
Distributed decisionmaking and social learning
In parallel, significant research efforts have been dedicated to analysing social learning, which refers to the decisionmaking processes that take place within social networks [34]. In these scenarios, agents make decisions based on two elements: private information that represents agent’s personal knowledge, and social information derived from previous decisions made by the agent’s peers [35].
Social learning has been investigated in pioneering works that study sequential decisionmaking of Bayesian agents over simple social network structures [36, 37]. These models showed how, thanks to social interactions, individuals with weak private signals can harvest information from the decisions of other agents [38]. Interestingly, it was also found that aggregation of rational decisions through information cascades could generate suboptimal collective responses, degrading the “wisdom of the crowds” into mere herd behaviour. After these initial findings, researchers have aimed at developing a deeper understanding of information cascades extending the original models by considering more general cost metrics [39,40,41], and by studying the effects of the network topology on the aggregated behaviour [42,43,44,45]. NonBayesian learning models have also been explored, where agents use simple ruleofthumb methods to exchange information [46,47,48,49,50,51,52].
Social learning plays a crucial role in many important social phenomena, e.g. in the adoption or rejection of new technology, or in the formation of political opinions [34]. Social learning models are particularly interesting for studying information cascades and herd dynamics, which arises when the social information pushes all the subsequent agents to ignore their own personal knowledge and adopt a homogeneous behaviour [37]. Moreover, there have been a renewed interest in understanding information cascades in the context of ecommerce and digital society [45]. For example, information cascades might have tremendous consequences in online stores where customers can see the opinions of previous customers before deciding to buy a product, or in the emergence of viral media contents based on sequential actions of “like” or “dislike”. Therefore, developing a deep understanding of the mechanics behind information cascades, and how they impact social learning, is fundamental for our modern networked society.
The main motivation behind this article is to explore the connections between social learning and secure sensor networks, building a bridge between the research done separately by economists and sociologist on one side and electrical engineers and computer scientists on the other. A key insight for establishing this connection is to realize that each agent’s decision corresponds to a compressed description of his/her private information. Therefore, the fact that agents cannot access the private information of others, but can only observe their decisions, can be understood as a constraint on the communication resources. In this way, social learning can be regarded as an information network that performs distributed inference under communication constraints (see Table 1). Moreover, it would be natural to use social learning principles in the design of distributedsensing/distributedprocessing schemes, with the hope that this might enable additional robustness to decisionmaking processes in sensor networks.
Contributions
In contrast to almost all the existing research, this work considers powerful topologyaware data falsification attacks, where the adversary knows the network topology and leverages this knowledge to take control of the most critical nodes of the network—either regular nodes, DAs or FCs. This represents a worstcase scenario where the network structure has been disclosed or inferred through network tomography via traffic analysis [53]. The reason why this adversary model has not been popular in the literature might be because traditional distributedsensing schemes do not offer any resistance against this kind of attack.
This works presents a distributedsensing/distributedprocessing scheme for sensor networks that uses social learning principles in order to deal with a topologyaware adversary. The scheme is a thresholdbased data fusion strategy, related to those considered in [26]. However, its relationship with social decisionmaking allows an intuitive understanding of its mechanisms. For avoiding security threats introduced by FCs, our scheme adopt tandem or serial decision sequencing [27, 54,55,56,57]. It is noted that, contrasting with some related literature, our analysis does not focus on optimality aspects of data fusion, but aims to illustrate how distributed decisionmaking can enable network resilience against powerful topologyaware data falsification attacks. We demonstrate how network resilience hold even when a significant number of nodes have been compromised.
Our work exploits a positive effect of information cascades that have been overlooked before: information cascades make a large number of agents/nodes to hold equally qualified estimators, generating many locations where a network operator can collect aggregated data. Therefore, information cascades are crucial in our solution for avoiding single points of failure. For enabling a better understanding of information cascades, this work extends results presented in [58] providing a mathematical characterization of information cascades under data falsification attacks. In particular, our results clarify the conditions upon which local actions of individual agents can propagate across the network, compromising the collective performance. These results provide a first step towards the clarification of these nontrivial social dynamics, enriching our understanding of decisionmaking processes in biased social networks.
This paper expands the ideas presented in [59] by developing a formalism that allows considering incomplete or imperfect social information. This formalism is used to overcome the strongest limitation of the scheme presented in [59], namely the fact that each node was required to overhear and store all the previous transmissions in the network. Clearly this cannot take place in a large sensor network, due both to the storage constraints of the nodes, and to the large energy consumption required to transmit and receive across all pairs of nodes [60]. Therefore, this research presents an important step towards practical applications.
The rest of this article is structured as follows: “System model and problem statement” section introduces the system model, describing the network controller and the adversary behaviour. Our social learning data fusion scheme is then described in “Social learning as a data aggregation scheme” section, where some basic statistical properties are explored, and a practical algorithm for implementing the decision rule is derived. “Information cascade” section analyses the mathematical properties of the decision process, providing a geometrical description and a characterization of information cascades. All these ideas are then illustrated in a concrete scenario in “Proof of concept” section. Finally, “Conclusions” section summarizes our main conclusions.
Notation: uppercase letters are used to denote random variables, i.e. X, and lowercase letters their realizations, e.g. x. Boldface letters \(\varvec{X}\) and \(\varvec{x}\) represent random vectors and their realizations, respectively. Also, \(\mathbb {P}_{w}\left\{ X=xY=y \right\} = \mathbb {P}\left\{ X=xY=y,W=w \right\}\) is used as a shorthand notation. A table summarizing the symbols and notation used through this article can be found in Appendix D.
System model and problem statement
System model
We consider a sensor network of N nodes, each corresponding to an informationprocessing device that has been deployed in an area of interest. Each node is equipped with sensory equipment to track variables of interest following a scheduled duty cycle. The measurement of the nth sensor node is denoted by \(S_n,\) taking values over a set \(\mathcal {S} \subset \mathbb {R}\) that can be discrete or continuous.^{Footnote 1} Based on these signals, the network needs to infer the value of an underlying binary variable W.
We consider networks where all the nodes have equal sensing capabilities, that is, the signals \(S_n\) are assumed to be identically distributed. Unfortunately, the general distributed detection problem for arbitrarily correlated signals is known to be NPhard [31]. Hence, for the sake of tractability, it is assumed that the variables \(S_1,\dots , S_N\) are conditionally independent given the event \(\{W=w\},\) ^{Footnote 2} following a probability distribution denoted by \(\mu _w.\) It is also assumed that both \(\mu _0\) and \(\mu _1\) are absolutely continuous with respect to each other [67], i.e. no particular signal determines W unequivocally. This property guarantees that the loglikelihood ratio of these two distributions is always well defined, being given by the logarithm of the corresponding Radon–Nikodym derivative^{Footnote 3} \(\Lambda _S(s) = \log \frac{d \mu _1}{d \mu _0} (s) .\)
In addition to sensing hardware, each node is equipped with limited computing capability and a radio to wirelessly transit and receive data. Two nodes in the network are assumed to be connected if they can exchange information wirelessly. Note that, sensor nodes usually have a very limited battery budget, which imposes severe restrictions on their communication capabilities [68]. Therefore, it is assumed that each node forwards its data to others only by broadcasting a binary variable \(X_n.\) These simple signals do not impose an additional burden on the communication resources, as they could be appended to existent wireless control packages and viceversa, or could be shared by light, ultrasound or other alternative media.
We focus on the case in which the sensing capabilities of each sensor are limited, and hence, any inference about W made based only on the sensed data \(S_n\) cannot achieve a high accuracy. Interestingly, due to the nature of wireless broadcasting, nearby transmissions can be overheard and their information can be fused with what is extracted from the local sensor. The information that a node can extract from overhearing transmissions of other nodes is called “social information”, contrasting with the “sensorial information” that is obtained from the sensed signal \(S_n.\)
Without loss of generality, nodes transmit their signals sequentially according to their indices (i.e. node 1 transmits first, then node 2, etc.).^{Footnote 4} It is assumed that this sequence is randomly chosen, and can be changed by the network operator at any time and be redistributed through the network (c.f. “The sensor network operator and the adversary” section). In general the broadcasted signals \(X_1,\dots ,X_{n1}\) might not be directly observable by the nth agent because of various restrictions, including range limitations of the node’s receiver radio [70], or the limited duty cycles imposed by battery restrictions [68]. Therefore, the social observations obtained by the nth node are represented by \(\varvec{G}_n\in \mathcal {G}_n,\) which can be a random scalar, vector, matrix or other mathematical object. Some cases of interest are as follows:

(i)
The k previous decisions: \(\varvec{G}_n = (X_{nk},\dots ,X_{n1}).\)

(ii)
The average value of all the previous decisions: \(\varvec{G}_n=\frac{1}{n1} \sum _{k=1}^{n1} X_k.\)

(iii)
The decisions of agents connected by an Erdös–Rényi random network with parameter \(\xi \in [0,1],\) i.e. \(\varvec{G}_n=(Z_1,\dots ,Z_{n1}) \in \{0,1,e\}^{n1},\) where
$$\begin{aligned} Z_k = {\left\{ \begin{array}{ll} X_k \quad & \text {with probability }\xi , \\ e \quad & \text {with probability } 1\xi .\end{array}\right. } \end{aligned}$$(1)
Please note that the Erdös–Rényi model in (iii) has only been used as an illustrative example, and it can be easily generalized to consider the topology of any stochastic network of interest.
In this work, we study the social dynamics based on the properties of the transition probability from state \(\varvec{g'}\in \mathcal {G}_{n1}\) to \(\varvec{g}\in \mathcal {G}_{n},\) as given by the conditional probabilities
where \(x_{n1}\in \{0,1\}.\) It is also assumed that the social dynamics are causal, meaning that \(\varvec{G}_n\) is conditionally independent of \(S_m\) given W for all \(m\ge n.\)
The sensor network operator and the adversary
The network is managed by a network operator, who is an external agent that uses the network as a tool to build an estimate of W. The network operator is opposed by an adversary, whose goal is to disrupt the inference capabilities of the network. For this aim, the adversary controls a group of authenticated Byzantine nodes without being noticed by the network operator, which have been captured by malware through cyber/wireless means, or by physical substitution.
The overall performance of a network of N nodes is defined by the accuracy of the inference of the last node in the decision sequence. As the decision sequence is generated randomly by the network operator, every node is equally likely to be at the end of the decision sequence. It is further assumed that the adversary has no knowledge of the decision sequence, as it can be chosen at runtime and changed regularly. Therefore, as the adversary has no reason to target any particular node in the network, hence, it is reasonable to assume that the adversary captures nodes randomly. Byzantine nodes are, hence, assumed to be uniformly distributed over the network.
For simplicity, we model the strength of the attack with a single parameter \(p_{\text{b}},\) which corresponds to the probability of a node being compromised.^{Footnote 5} Moreover, we assume that the capture probability does not depend on W. Hence, the number of Byzantine nodes, denoted by \(N^*,\) is a Binomial random variable with \(\mathbb {E} \left\{ N^* \right\} = p_{\text{b}}N.\) Due to the law of large numbers, \(N^*\approx p_{\text{b}}N\) for a large network, and hence, \(p_{\text{b}}\) is also the ratio of expected Byzantine nodes in the network, which is the traditional metric for attack strength used in the literature.
For enabling data processing and forwarding, the network operator defines a strategy, i.e. a data fusion scheme given by a collection of (possibly stochastic) functions \(\{\pi _n\}_{n=1}^\infty,\) such that \(\pi _n:\mathcal {S}\times \mathcal {G}_n \rightarrow \{0,1\}\) for all \(n\in \mathbb {N}.\) On the other hand, the adversary can freely set the values of the binary signals transmitted by Byzantine nodes. This can be modelled as a random mapping \(C{:}\, \{0,1\}\rightarrow \{0,1\}\) that corrupts broadcasted signals. Therefore, the signal broadcasted by the nth node is given by
Furthermore, as broadcasted signals are binary, the corruption mapping \(C(\cdot )\) can be characterized by the conditional probabilities \(c_{00}\) and \(c_{01},\) where \(c_{ij} = \mathbb {P}\left\{ C(\pi ) = i  \pi = j \right\} .\)
The rest of this work focuses on the case in which the network operator can deduce the corruption function and can estimate the capture risk \(p_{\text{b}}.\) Then, the average network missdetection and false alarm rates for an attack of intensity \(p_{\text{b}}\) are defined as
respectively (note that \(p_{\text{b}}\) implicitly affects the distribution of \(\varvec{G}_N\)). The case in which these quantities are unknown can be addressed using the current framework with a minmax analysis, which is left for future studies.
Problem statement
Our goal is to develop a resilient strategy, in order to provide a reliable estimation of W even under a significant number of unidentified Byzantine nodes. Note that in most surveillance applications, missdetections are more important than false alarms, being difficult to estimate the cost of the worstcase scenario. Therefore, the average network performance is evaluated following the Neyman–Pearson criteria, by setting an allowable false alarm rate \(\alpha\) and focusing on reducing the missdetection rate [72]. By denoting by \(\mathcal {P}\) the set of all strategies, we have the following optimization problem:
Finding an optimal solution to (6) is a formidable challenge, even for the simple case of networks with start topology and no Byzantine attacks (see [30, 73] and references therein). Therefore, our aim is to develop a suboptimal strategy that enables resilience, while being suitable for implementation in sensor nodes with limited computational power.
Social learning as a data aggregation scheme
This section describes our proposed data fusion scheme, and explains its function against topologyaware data falsification attacks. In the sequel, “Data fusion rule” section describes and analyses the data fusion rule, then “Decision statistics” section derives basic properties of its statistics, and finally “An algorithm for computing the social loglikelihood” section presents a practical algorithm for its implementation.
Data fusion rule
Let us assume that each sensor node is a rational agent that tries to maximizes the profit of an inference within a social network. Rational agents follow Bayesian strategies,^{Footnote 6} which can be elegantly described by the following thresholdbased decision rule [72, Chapt. 2]:
Above, \(u(\pi _n,w)\) is a cost assigned to the decision \(\pi _n\) when \(W=w,\) which can be engineered in order to match the relevance of missdetections and false alarms [72].
Let us find a simpler expression for the decision rule (7). Due to the causality constraint (c.f. “System model” section), \(\varvec{G}_n\) can only be influenced by \(S_1,\dots ,S_{n1};\) and therefore, it is conditionally independent of \(S_n\) given W. Using this conditional independence condition, one can find that
where \(\Lambda _S(S_n)\) is the loglikelihood ratio of \(S_n\) (c.f. “System model” section) and \(\Lambda _{\varvec{G}_{n}}(\varvec{G}_{n})\) is the loglikelihood ratio of \(\varvec{G}_{n}.\) Then, using (8) one can rewrite (7) as
where \(\tau _0 = \log \frac{ \mathbb {P}\left\{ W=0 \right\} }{ \mathbb {P}\left\{ W=1 \right\} } + \log \frac{ u(0, 0)  u(1, 0) }{ u(1, 1)  u(0, 1) }.\) In simple words, (9) states how the nth node should fuse the private and social knowledge: the evidence is provided by the corresponding loglikelihood terms, which are then simply added and then compared against a fixed threshold.^{Footnote 7}
Further understanding of the above decision rule can be attained by studying it from the point of view of communication theory [58]. We first note that the decision is made not over the raw signal \(S_n\) but over the “decision signal” \(\Lambda _S(S_n).\) Interestingly, the processing done by the function \(\Lambda _S(\cdot )\) might serve for dimensionality reduction, as \(\Lambda _S(S_n)\) is always a single number even though \(S_n\) may be a matrix or a highdimensional vector. Due to their construction and the underlying assumptions over \(S_n\) (c.f. “System model” section), the variables \(\Lambda _S(S_n)\) are identically distributed and conditionally independent given \(W=w.\) Moreover, by introducing the shorthand notation \(\tau _n (\varvec{G}_n) = \tau _0  \Lambda _{\varvec{G}_n}(\varvec{G}_n),\) one can rewrite (9) as
Therefore, the decision is made by comparing the decision signal with a decision threshold \(\tau _n(\varvec{G}_n),\) which can be efficiently computed using the algorithm proposed in “An algorithm for computing the social loglikelihood” section. Note that this represents a comparison between the sensed data, summarized by \(\Lambda _S(S_n),\) and the social information carried by \(\tau _n(\varvec{G}_n).\)
Decision statistics
Let us find expressions for the probabilities of the actions of the nth agent, first focusing on the case \(n=1.\) Note that
where \(F_w^\Lambda (\cdot )\) is the c.d.f. of \(\Lambda _S\) conditioned on \(W=w.\) Then, considering the possibility that the first node could be a Byzantine node, one can show that
where we are introducing \(z_0:= p_{\text{b}}c_{01}\) and \(z_1:= 1  p_{\text{b}}(1c_{00} + c_{01} )\) as shorthand notation, which are nonnegative constants that summarize the strength of the adversary. In particular, when the adversary is powerless then \(z_0=0\) and \(z_1 = 1,\) and hence \(\mathbb {P}_{w}\left\{ \pi _1(S_1)=0 \right\} = \mathbb {P}_{w}\left\{ X_1=0 \right\}.\)
By considering the nth node, one can find that
The first equality is a consequence of the fact that \(S_n\) is conditionally independent of \(\varvec{G}_n\) given \(W=w,\) while the second equality is a consequence that \(X_n\) can be expressed as a deterministic function of \(\varvec{G}_{n}\) and \(S_n,\) and hence, becomes conditionally independent of W. Above, (16) shows that \(\tau _n\) is a sufficient statistic for predicting \(X_n\) with respect to \(\varvec{G}_{n}.\) Note that \(F_w^\Lambda (x)\) can be directly computed from the statistics of the distribution of \(S_n\) (c.f. Appendix A). Moreover, using (16) and following a similar derivation as in (12), one can conclude that
Let us now study the statistics of \(\varvec{G}_n.\) By using the definition of the transition coefficients \(\beta _w^n(\varvec{g}_{n+1}x_n,\varvec{g}_{n}),\) one can find that
Note that, using the above derivations, the terms \(\mathbb {P}_{w}\left\{ X_{n} = x_n, \varvec{G}_{n} = \varvec{g}_{n} \right\}\) can be further expressed as
where \(\lambda (p,x) = x (1p) + (1x) p.\) Therefore, a closed form expression can be found for (18) recursively over \(\varvec{G}_n.\)
An algorithm for computing the social loglikelihood
The main challenge for implementing (9) as a data processing method in a sensor node is to have an efficient algorithm for computing \(\tau _n(\varvec{g}_n).\) Leveraging the above derivations, we develop Algorithm 1 as an iterative procedure for computing \(\tau _n.\)
The inputs of Algorithm 1 can be classified into two groups. First, the terms \(N,F_0^\Lambda (\cdot ),F_1^\Lambda (\cdot ),\beta _w^n(\cdot \cdot ,\cdot )\) are properties of the network (position of the node within the decision sequence, sensor statistics and social observability, respectively) that the network operator could measure. On the other hand, \(\tau _0,z_0,z_1\) are properties of the adversary profile that depend on the prior statistics of W, the rate of compromised nodes \(p_{\text{b}}\) and the corruption function defined by \(c_{00}\) and \(c_{01}\) (c.f. “The sensor network operator and the adversary” section). In most scenarios, the knowledge of the network controller about these quantities is limited, as attacks are rare and might follow unpredictable patterns. Limited knowledge can still be exploited using e.g. Bayesian estimation techniques [75]. If no knowledge is available for the network controller, then these quantities can be considered free parameters of the strategy that span a range of alternative balances between missdetections and false positives, i.e. a receiver operating characteristic (ROC) space.
Algorithm 1 initialises from the initial decision threshold \(\tau _0,\) and explores all the relevant scenarios iteratively in order to build estimations of the likelihood functions that are required to compute \(\tau _N.\) The computation of the terms \(\mathbb {P}_{w}\left\{ \varvec{G}_n=\varvec{g} \right\}\) is done following (18), while the ones involving \(\mathbb {P}_{w}\left\{ X_n=x_n,\varvec{G}_n=\varvec{g} \right\}\) follow (20). Please note that the algorithm’s complexity scales gracefully for many cases of interest. For the particular case of nodes with memory of length k (i.e. \(\varvec{G}_n=(X_{nk1},\dots ,X_{n1})\)), the complexity of Algorithm 1 is \(\mathcal {O}( 2^k N),\) and therefore grows linearly with the size of the network, while being limited in the values of k that one can consider. In general, the algorithm complexity scales linearly with N as long as the cardinality of \(\mathcal {G}_n\) are bounded, or if a significant portion of the terms \(\beta _w^n(\varvec{g}_{n+1}  x_n,\varvec{g}_n)\) are zero.
Information cascade
The term “social learning” refers to the fact that \(\pi _n(S_n,\varvec{G}_n)\) becomes a better predictor of W as n grows; and hence, larger networks tend to develop a more accurate inference. However, as the number of shared signals grows, the corresponding “social pressure” can make nodes to ignore their individual measurements to blindly follow the dominant choice, triggering a cascade of homogeneous behaviour. It is our interest to clarify the role of the social pressure in the decisionmaking of the agents involved in a social network, as information cascades can introduce severe limitations in the asymptotic performance of social learning [44].
Moreover, an adversary can leverage the information cascade phenomenon. In effect, if the number of Byzantine nodes \(N^*\) is large enough then a misleading information cascade can be triggered almost surely, making the learning process to fail. However, if \(N^*\) is not enough then the network may undo the pool of wrong opinions and end up triggering a correct cascade.
In the sequel, the effect of information cascades is first studied in individual nodes in “Local information cascades” section. Then, the propagation properties of cascades are explored in “Social information dynamics and global cascades” section.
Local information cascades
In general, the decision \(\pi _n(S_n,\varvec{G}_n)\) is made based on the evidence provided by both \(S_n\) and \(\varvec{G}_{n}.\) A local cascade takes place in the nth agent when the information conveyed by \(S_n\) is ignored in the decisionmaking process due to a dominant influence of \(\varvec{G}_n.\) We use the term “local” to emphasize that this event is related to the data fusion of an individual agent. This idea is formalized in the following definition using the notion of conditional mutual information [76], denoted as \(I(\cdot ;\cdot \cdot ).\)
Definition 1
The social information \(\varvec{g}_{n} \in \mathcal {G}_n\) generates a local information cascade for the nth agent if \(I(\pi _n;S_n\varvec{G}_n = \varvec{g}_n) = 0.\)
The above condition summarizes two possibilities: either \(\pi _n\) is a deterministic function of \(\varvec{G}_n,\) and hence there is no variability in \(\pi _n\) once \(\varvec{G}_n\) has been determined; or there is still variability (i.e. \(\pi _n\) is a stochastic strategy) but it is conditionally independent of \(S_n.\) In both cases, the above formulation highlights the fact that the decision \(\pi _n\) contains no information coming from \(S_n.\) ^{Footnote 8}
Lemma 1
The variables \(\varvec{G}_n \rightarrow \tau _n \rightarrow \pi _n\) form a Markov Chain (i.e. \(\tau _n\) is a sufficient statistic of \(\varvec{G}_n\) for predicting the decision \(\pi _n\))
Proof
Using (16) one can find that
and therefore the conditional independency of \(\pi _n\) and \(\varvec{G}_n\) given \(\tau _n\) is clear. \(\square\)
Let us now introduce the notation \(U_s = {{\mathrm{ess\,sup}}}_{s\in \mathcal {S}} \Lambda _S(S_n=s)\) and \(L_s = {{\mathrm{ess\,inf}}}_{s\in \mathcal {S}} \Lambda _S(S_n=s)\) for the essential supremum and infimum of \(\Lambda _S(S_n),\) being the signals within \(\mathcal {S}\) that most strongly support the hypothesis \(\{W=1\}\) over \(\{W=0\}\) and vice versa.^{Footnote 9} If one of these quantities diverge, this would imply that there are signals \(s\in \mathcal {S}\) that provide overwhelming evidence in favour of one of the competing hypotheses. If both are finite then the agents are said to have bounded beliefs [44]. As sensory signals of electronic devices are ultimately processed digitally, the number of different signals that an agent can obtain are finite, and hence their supremum is always finite. Therefore, in the sequel we assume that both \(L_s\) and \(U_s\) are finite. Using these notions, the following proposition provides a characterization for local information cascades.
Proposition 1
The social information \(\varvec{g}_{n} \in \mathcal {G}_n\) triggers a local information cascade if and only if the agents have bounded beliefs and \(\tau _n(\varvec{g}_{n}) \notin [L_s,U_s]\).
Proof
Let us assume that the agents have bounded beliefs. From the definition of \(F_w^\Lambda,\) which is a cumulative density function, it is clear that if \(\tau _n<L_s\) then \(F_0^\Lambda (\tau _n) = F_1^\Lambda (\tau _n) = 0,\) while if \(\tau _n>U_s\) then \(F_0^\Lambda (\tau _n) = F_1^\Lambda (\tau _n) = 1.\) Therefore, if \(\tau _n(\varvec{g}_{n}) \notin [L_s,U_s]\) then, according to (16), it determines \(\pi _n\) almost surely, making \(\pi _n\) and \(S_n\) conditionally independent.
To prove the converse by contrapositive, let us assume that \(L_s< \tau _n(\varvec{g}_{n}) < U_s.\) Using again (16) and the definition of \(U_s\) and \(L_s\), one can conclude that this implies that \(0< \mathbb {P}_{w}\left\{ \pi _n=0\varvec{G}_n \right\} < 1\) for both \(w\in \{0,1\}.\) This, in turn, implies that the sets \(\mathcal {S}^0(\tau ) = \{ s\in \mathcal {S}  \Lambda _S(s) < \tau _n(\varvec{G}_n \}\) and \(\mathcal {S}^1(\tau ) = \mathcal {S}  \mathcal {S}^0\) both have positive probability under \(\mu _0\) and \(\mu _1,\) which in turn implies the existence of conditional interdependency between \(\pi _n\) and \(S_n\) in this case. \(\square\)
Intuitively, Proposition 1 shows that a local information cascade happens when the social information goes above the most informative signal that could be sensed. Some consequences of this result are explored in the next section.
Social information dynamics and global cascades
It is of great interest to predict when a local information cascade could propagate across the network, disrupting the collective behaviour and hence affecting the network performance. The following definition captures how, during a “global information cascade”, the broadcasted signals \(X_n\) do not convey information about the corresponding sensor signals anymore.
Definition 2
The social information \(\varvec{g}_n\in \mathcal {G}_n\) triggers a global information cascade if \(I(X_m;S_m\varvec{G}_n = \varvec{g}_n) = 0\) holds for all \(m\ge n.\)
A global information cascade is a succession of local information cascades. As Proposition 1 showed that agents are free from local cascades as long as \(\tau _n\in [L_s,U_s],\) one can guess that global cascades are related to the dynamics of \(\tau _n.\) These dynamics are determined by the transitions of \(\varvec{G}_n,\) which follows the behaviour dictated by the transition coefficients \(\beta _w^n(\cdot \cdot ,\cdot ).\) To further study the social information dynamics, we introduce the following definitions.
Definition 3
The collection \(\{\varvec{G}_n\}_{n=1}^\infty\) is said to have:

1.
Strongly consistent transitions if, for any \(W=w,\) \(\varvec{g}\in \mathcal {G}_n\) and \(\varvec{g'}\in \mathcal {G}_{n1},\) \(\beta _w^n( \varvec{g}1,\varvec{g'} )>0\) implies \(\tau _{n}(\varvec{g}) \le \tau _{n1}(\varvec{g'}),\) while if \(\beta _w^n(\varvec{g}0,\varvec{g'})>0\) implies \(\tau _{n}(\varvec{g}) \ge \tau _{n1}(\varvec{g'}).\)

2.
Weakly consistent transitions if, for all \(\varvec{g}\in \mathcal {G}_n\) and \(\varvec{g'}\in \mathcal {G}_{n1},\) \(\tau _{n1}(\varvec{g'}) \le L_s\) and \(\mathbb {P}_{w}\left\{ \varvec{G}_n=g\varvec{G}_{n1}=\varvec{g'} \right\} >0\) implies \(\tau _{n}(\varvec{g}) \le L_s,\) while \(\tau _{n1}(\varvec{g'}) \ge U_s\) and \(\mathbb {P}_{w}\left\{ \varvec{G}_n=\varvec{g}\varvec{G}_{n1}=\varvec{g'} \right\} >0\) implies \(\tau _{n}(\varvec{g}) \ge U_s.\) ^{Footnote 10}
Intuitively, strong consistency means that the decision threshold evolves monotonically with respect to the broadcasted signals \(X_n.\) Correspondingly, weak consistency implies that \(\tau _n\) cannot return to the interval \([L_S,U_S]\) once it goes out of it. Moreover, the adjectives “strong” and “weak” reflect the fact that weak consistency only takes place outside the boundaries of the signal likelihood, while the strong consistency affects all the decision space. Moreover, strongly consistent transitions imply weakly consistent transitions when there are no Byzantine nodes, as shown in the next lemma.^{Footnote 11}
Lemma 2
Strongly consistent transitions satisfy the weak consistency condition if \(p_{\text{b}}=0\).
Proof
See Appendix B. \(\square\)
Next, it is shown that if the evolution of \(\varvec{G}_n\) becomes deterministic and 1–1 after leaving the interval \([L_s,U_s]\) (henceforth called weakly invertible transitions), then it satisfies the weak consistency condition.
Lemma 3
Weakly invertible transitions imply weakly consistent transitions.
Proof
See Appendix C. \(\square\)
Now we present the main result of this section, which is the characterization of information cascades for the case of social information that follows weakly consistent transitions.
Theorem 1
If the social information have weakly consistent transitions, then every local information cascade triggers a global information cascade.
Proof
Let us consider \(\varvec{g}_0\in \mathcal {G}_n\) such that it produces a local cascade in the nth node. Then, due to Proposition 1, this implies that \(\tau _n(\varvec{g})\notin [L_s,U_s]\) almost surely. This, combined with the weak consistency assumption, implies that \(\tau _{n+1}(\varvec{G}_{n+1})\notin [L_s,U_s]\) almost surely. A second application of Proposition 1 shows that \(\mathbb {P}_{w}\left\{ \pi = 0  \varvec{G}_{n+1} \right\}\) is equal to 0 o 1. This, in turn, guarantees that \(I(\pi _{n+1}:S_{n+1}  \varvec{G}_{n} = \varvec{g}) = 0\) almost surely, showing that the \((n+1)\)th node experiences a local information cascade because of \(\varvec{G}_n = \varvec{g}_0.\)
A recursive application of the above argument allows one to prove that \(I(\pi _{n+m};S_{n+m}  \varvec{G}_{n} = \varvec{g}) = 0\) for all \(m\ge 0,\) proving the existence of a global cascade. \(\square\)
This theorem has a number of important consequences. Firstly, it provides an intuitive geometrical description about the nature of global cascades for networks with weak consistency. One can imagine the evolution of \(\tau _n(\varvec{G}_n)\) as function of n as a random walk within the interval \([L_s,U_s].\) Because of the weak consistency condition, if the random walk step out of the interval, it will never come back. Moreover, as a consequence of this theorem, the stepping out of \([L_s,U_s]\) is a necessary and sufficient condition to trigger a global information cascade over the network.
Also, note that when \(G_n = \varvec{X}^n\) (i.e. each node overhears all previous decision) one can prove that \(G_n\) has weakly invertible transitions. Therefore, Theorem 1 is a generalization of Theorem 1 of [58] to the case of a network with Byzantine nodes.
Proof of concept
This section illustrates the main results obtained in “Social learning as a data aggregation scheme” and “Information cascade” sections in a simple scenario. In the following, the scenario is described in “Scenario description” section, and numerical simulations are discussed in “Discussion” section.
Scenario description
Let us consider a sensor network that has surveillance duties over a sensitive geographical area. The sensitive area could correspond to a factory, a drinkable water container or a warzone, whose key variables need to be supervised. The task of the sensor network is, through the observation of these variables, to detect the events \(\{W=1\}\) and \(\{W=0\}\) that correspond to the presence or absence of an attack to the surveilled area, respectively. No knowledge about of the prior distribution of W is assumed.
We consider nodes that have been deployed randomly over the sensitive area, and hence their locations follow a Poisson point process (PPP). The ratio of the area of interest that falls within the range of each sensor is denoted by r. If attacks occur uniformly over the surveilled area, then r is also the probability of an attack taking place under the coverage area of a particular sensor. Note that, due to the limited sensing range, the missdetection rate of individual nodes is roughly equal to \(1r.\) As r is usually a small number (\(5\%\) in our simulations), this implies that each node is extremely unreliable without cooperation.
Each node measures its environment using a digital sensor of m levels dynamical range (i.e. \(S_n\in \{0,1,\dots ,m1\}\)). Under the absence of an attack, the measured signal is assumed to be normally distributed with a particular mean value and variance. For simplicity of the analysis, we assume that when conditioned in \(\{W=0\}\) the signal \(S_n\) is distributed following a binomial distribution of parameters (m, q), i.e.
which, due to the central limit theorem, approximates a Gaussian variable when m is relatively large. Moreover, it is assumed that the sensor dynamical range is adapted to match the mean value on the lower third of the sensor dynamical range, i.e. \(\mathbb {E} \left\{ S_n W=0 \right\} = m/3.\) This naturally imposes the requirement \(q=1/3.\)
Following standard statistical approaches, it is further assumed that the sensors observe the environment looking for anomalous events, i.e. when the measurement is larger than the mean value in more than two standard deviations. This may correspond, for example, to when a specific chemical compound trespasses safe concentration values, or when too much movement has been detected over a given time window (see e.g. [79]). Using the fact that \(\text {Var}\{S_n\} = mq(1q),\) this gives a threshold \(T = \mathbb {E} \left\{ S_n \right\} + 2 \sqrt{ \text {Var}\{S_n\} } = np + 2\sqrt{nq(1q)}.\) Therefore, it is assumed that an attack is related to the event of \(S_n\) being uniformly distributed in [T, m]. Therefore, one finds that
where H(x) is the discrete Heaviside (step) function given by
In summary, \(S_n\) conditioned on \(\{W=1\}\) is modelled as a mixture model between a Binomial and a truncated uniform distribution, where the relative weight between them is determined by r (c.f. Fig. 1, top). Finally, using (21) and (22), the loglikelihood function of the signal \(S_n\) can be determined as (see Fig. 1, bottom)
We are interested in studying how a restricted listening period affects the network performance. Restricted listening periods are usually mandatory for energylimited IoT devices.^{Footnote 12} For simplicity of the analysis, we focus on scenarios in which a node can overhear the transmissions of all the other nodes, and hence the social information gathered by the nth node is \(\varvec{G}_n = (X_{nk1},\dots ,X_{n1})\) if \(n > k.\) Here k is a design parameter, whose impact on the network performance is studied in the next section.
Discussion
We analysed the performance of networks of \(N=300\) sensor nodes, each of which can monitor \(r=5\%\) of the target area. Using the definition given in (4) and (5), combined with (16), missdetection and false alarm rates are computed as
where the terms \(\mathbb {P}_{w}\left\{ \varvec{G}_n=\varvec{g} \right\}\) are computed using Algorithm 1 (c.f. “An algorithm for computing the social loglikelihood” section). In order to favour the reduction of missdetections over false alarms \(\tau _0=0\) is chosen, as it is the lowest value that still allows a nontrivial inference process.^{Footnote 13} We consider an upper bound of \(5\%\) over the tolerable false alarm rate.
Simulations demonstrate that the proposed scheme enables strong network resilience in this scenario, allowing the sensor network to maintain a low missdetection rate even in the presence of a large number of Byzantine nodes (see Fig. 2). Please recall that if a traditional distributed detection scheme based on centralized decision is used, a topologyaware attacker can cause a missdetection rate of \(100\%\) by just compromising the few nodes that perform data aggregation [i.e. the FC(s)]. Figure 2 shows that nodes that individually would have a missdetection rate of \(95\%\) can improve up to around \(10\%\) even when \(30\%\) of the nodes are under the control of the attacker. Therefore, by making all the nodes to aggregate data, the network can overcome the influence of Byzantine nodes, generating correct inferences even when a significant fraction of nodes have been compromised.
Please note that, for the case of data falsification attack illustrated by Fig. 2, the missdetection rate improves until the network size reaches \(N=500,\) achieving a performance of \(\approx 10^{12}\) (not shown in the Figure). This result has two important implications. First, this confirms the prediction of Theorem 1 that, if the signal loglikelihood is bounded, then information cascades are eventually dominant, hence stopping the learning process of the network (for a more detailed discussion about this issue please c.f. [58]). Secondly, this result stresses a key difference of our approach with respect to the existent literature about information cascades: even if information cascades become dominant and perfect social learning cannot be achieved, the achieved performance can still be very high, and hence useful in a practical informationprocessing setup.
The network resilience provided by our scheme is influenced by the sensor dynamical range, m, as a higher sensor resolution is likely to provide more discriminative power. Our results show three sharply distinct regimes (see Fig. 3). First, if m is too small (\(m\le 4\)) the network performance is very poor, irrespective of the number of Byzantine nodes. Secondly, if \(8\le m \le 32\) the missdetection rate without Byzantine nodes is approx. \(10\%\) (cf. Fig. 3) and is exponentially degraded by the presence of Byzantine nodes. Finally, if \(m\ge 64\) then the performance under no Byzantine nodes is very high, and is degraded superexponentially by the presence of Byzantine nodes. Interestingly, the point at which the missdetection rate of this regime goes above \(10^{1}\) is \(N^*/N=1/3,\) having some resemblance with the wellknown 1/3 threshold of the Byzantine generals problem [14]. Also, it is intriguing that variations between 8 and 32 levels in the dynamical range provide practically no performance benefits.
Our results also illustrate the effects of the memory size, k, showing that larger values of k provide great benefits for the network resilience (see Fig. 4). In effect, by performing an optimal Bayesian inference over 8 broadcasted signals the network missdetection rate remains below \(10\%\) up to an attack intensity of \(50\%\) of Byzantine nodes. Unfortunately, the computation and storage requirements of Algorithm 1 grow exponentially with k, and hence using memories beyond \(k=10\) is not practical for resourcelimited sensor networks. Overcoming this limitation is an interesting future line of investigation.
Conclusions
Traditional approaches to data aggregation over information networks are based on a strong division of labour, which discriminates between sensing nodes that merely sense and forward data, and FC that monopolize all the processing and inference capabilities. This generates a single point of failure that is likely to be exploited by smart adversaries, whose interest is the disruption of the network capabilities.
This serious security threat can be overcome by distributing the decisionmaking process across the network using social learning principles. This approach avoids single points of failure by generating a large number of nodes from where aggregated data can be accessed. In this paper, a social learning data fusion scheme has been proposed, which is suitable to be implemented in sensor networks consisting of devices with limited computational capabilities.
We showed that if the private signals are bounded then each local information cascade triggers a global cascade, extending previous results to the case where an adversary controls a number of Byzantine nodes. This result is highly relevant for sensor networks, as digital sensors are intrinsically bounded, and hence satisfy the assumptions of these results. However, contrasting with the literature, our approach does not focus on the conditions that guarantee perfect asymptotical social learning (i.e. missdetection and false alarm rates converging to zero), but if their limits are small enough for practical applications. Our results show that this is indeed the case, even when the number of "overheard transmissions is limited.
Moreover, our results suggest that social learning principles can enable significant resilience of an information network against topologyaware data falsification attacks, which can totally disable the detection capabilities of traditional sensor networks. Furthermore, our results illustrate how the network resilience can persist even when the attacker has compromised an important number of nodes.
It is our hope that these results can motivate further explorations on the interface between distributed decisionmaking, statistical inference and signal processing over technological and social networks and multiagent systems.
Notes
 1.
The generalization of our framework and results to vectorvalued sensor outputs is straightforward.
 2.
The conditional independence of sensor signals is satisfied when the sensor noise is due to local causes (e.g. thermal noise), but do not hold when there exist common noise sources (e.g. in the case of distributed acoustic sensors [61]). For works that consider sensor interdependence see [62,63,64,65,66].
 3.
When \(S_n\) takes a finite number of values then \(\frac{{\text{d}} \mu _1}{{\text{d}} \mu _0} (s) = \frac{ \mathbb {P}\left\{ S_n=sW=1 \right\} }{ \mathbb {P}\left\{ S_n=sW=0 \right\} },\) while if \(S_n\) is a continuous random variable with conditional p.d.f. \(p(S_nW=w)\) then \(\frac{{\text{d}} \mu _1}{{\text{d}} \mu _0} (s) = \frac{ p(sW=1) }{ p(sW=0) }.\)
 4.
Note that the synchronization requirements of this procedure are low, so standard techniques can be used to keep the nodes’ local clocks within the required synchronization constraints (see e.g. [69]).
 5.
This attack model assumes implicitly that the capture of each node is an independent event. Extensions considering cyberinfection propagation properties are possible (c.f. [71]), being left for future studies.
 6.
Although Bayesian models are elegant and tractable, they assume agents act always rationally [74] and make strong assumptions on the knowledge agents have about posterior probabilities [49]. However, Bayesian models provide an important benchmark, not necessarily due to their accuracy but because they give an important reference point with which other models can be compared [35].
 7.
As the prior distribution of W is usually unknown, \(\tau _0\) is a free parameter of the scheme. Following the discussion in “Problem statement” section, the network operator shall select the lowest value of \(\tau _0\) that satisfies the required false alarm rate specified by the Neyman–Pearson criteria.
 8.
Recall that \(S_n\) and \(\varvec{G}_n\) are conditionally independent given \(W=w\) (c.f. “Data fusion rule” section), and hence there cannot be redundant information about W that is conveyed by \(S_n\) and also \(\varvec{G}_n.\) For a more detailed discussion about redundant information c.f. [77].
 9.
The essential supremum is the smallest upper bound over \(\Lambda _S(S_n)\) that holds almost surely, being the natural measuretheoretic extension of the notion of supremum [78].
 10.
Note that the condition \(\mathbb {P}_{w}\left\{ \varvec{G}_n=\varvec{g}\varvec{G}_{n1}=\varvec{g'} \right\} >0\) is equivalent to either \(\beta _w^n(\varvec{g},0, \varvec{g'})\) or \(\beta _w^n(\varvec{g},1, \varvec{g'})\) being strictly positive.
 11.
It is possible to build examples where weak consistency does not follow from strong consistency when \(p_{\text{b}}>0.\)
 12.
It is well known that the wireless radios of small sensor nodes consume a similar amount of energy while transmitting or receiving data, and hence reducing overhearing periods is key for attaining energy efficiency, and hence long network lifetime [60].
 13.
Simulations showed that if \(\tau <0\) then \(X_n=1\) for all \(n\in \mathbb {N}\) independently of the value of W, triggering a premature information cascade.
References
 1.
Kim KD, Kumar PR. Cyber–physical systems: a perspective at the centennial. Proc IEEE. 2012;100(Special Centennial Issue):1287–308.
 2.
Response SS. What you need to know about the WannaCry Ransomware. https://www.symantec.com/blogs/threatintelligence/wannacryransomwareattack
 3.
Veeravalli VV, Varshney PK. Distributed inference in wireless sensor networks. Philos Trans R Soc Lond A. 2012;370(1958):100–17.
 4.
Barbarossa S, Sardellitti S, Di Lorenzo P. Distributed detection and estimation in wireless. Academic Press library in signal processing: communications and radar signal processing. London: Academic Press; 2013. p. 329.
 5.
Hancke GP, Hancke GP Jr. The role of advanced sensing in smart cities. Sensors. 2012;13(1):393–425.
 6.
Difallah DE, CudreMauroux P, McKenna SA. Scalable anomaly detection for smart city infrastructure networks. IEEE Internet Comput. 2013;17(6):39–47.
 7.
Lambrou TP, Panayiotou CG, Polycarpou MM. Contamination detection in drinking water distribution systems using sensor networks. In: Control Conference (ECC), 2015 European. New York: IEEE; 2015. p. 3298–303.
 8.
Lambrou TP, Anastasiou CC, Panayiotou CG, Polycarpou MM. A lowcost sensor network for realtime monitoring and contamination detection in drinking water distribution systems. IEEE Sens J. 2014;14(8):2765–72.
 9.
Perrig A, Stankovic J, Wagner D. Security in wireless sensor networks. Commun ACM. 2004;47(6):53–7.
 10.
Shi E, Perrig A. Designing secure sensor networks. IEEE Wirel Commun. 2004;11(6):38–43.
 11.
Pathan ASK, Lee HW, Hong CS. Security in wireless sensor networks: issues and challenges. In: The 8th international conference of advanced communication technology, 2006. ICACT 2006, vol. 2. New York: IEEE; 2006. p. 6.
 12.
Trappe W, Howard R, Moore RS. Lowenergy security: limits and opportunities in the internet of things. IEEE Secur Priv. 2015;13(1):14–21. https://doi.org/10.1109/MSP.2015.7.
 13.
Marano S, Matta V, Tong L. Distributed detection in the presence of Byzantine attacks. IEEE Trans Signal Process. 2009;57(1):16–29.
 14.
Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans Program Lang Syst (TOPLAS). 1982;4(3):382–401.
 15.
Vempaty A, Tong L, Varshney PK. Distributed inference with Byzantine data: stateoftheart review on data falsification attack. IEEE Signal Process Mag. 2013;30(5):65–75.
 16.
Nadendla VSS, Han YS, Varshney PK. Distributed inference with MAry quantized data in the presence of Byzantine attacks. IEEE Trans Signal Process. 2014;62(10):2681–95. https://doi.org/10.1109/TSP.2014.2314072.
 17.
Zhang J, Blum RS, Lu X, Conus D. Asymptotically optimum distributed estimation in the presence of attacks. IEEE Trans Signal Process. 2015;63(5):1086–101. https://doi.org/10.1109/TSP.2014.2386281.
 18.
Kailkhura B, Han YS, Brahma S, Varshney PK. Distributed Bayesian detection in the presence of Byzantine data. IEEE Trans Signal Process. 2015;63(19):5250–63. https://doi.org/10.1109/TSP.2015.2450191.
 19.
Kailkhura B, Brahma S, Han YS, Varshney PK. Distributed detection in tree topologies with Byzantines. IEEE Trans Signal Process. 2014;62(12):3208–19.
 20.
Kailkhura B, Brahma S, Dulek B, Han YS, Varshney PK. Distributed detection in tree networks: Byzantines and mitigation techniques. IEEE Trans Inf Forensics Secur. 2015;10(7):1499–512. https://doi.org/10.1109/TIFS.2015.2415757.
 21.
Chen KC, Lien SY. Machinetomachine communications: technologies and challenges. Ad Hoc Netw. 2014;18:3–23.
 22.
Parno B, Perrig A, Gligor V. Distributed detection of node replication attacks in sensor networks. In: 2005 IEEE symposium on security and privacy (S&P’05). New York: IEEE; 2005. p. 49–63.
 23.
Lin SC, Chen KC. Improving spectrum efficiency via innetwork computations in cognitive radio sensor networks. IEEE Trans Wirel Commun. 2014;13(3):1222–34.
 24.
Daniels BC, Ellison CJ, Krakauer DC, Flack JC. Quantifying collectivity. Curr Opin Neurobiol. 2016;37:106–13.
 25.
Brush ER, Krakauer DC, Flack JC. Conflicts of interest improve collective computation of adaptive social structures. Sci Adv. 2018;4(1):1603311.
 26.
Tsitsiklis JN. Decentralized detection. Adv Stat Signal Process. 1993;2(2):297–344.
 27.
Viswanathan R, Varshney PK. Distributed detection with multiple sensors I. Fundamentals. Proc IEEE. 1997;85(1):54–63.
 28.
Blum RS, Kassam SA, Poor HV. Distributed detection with multiple sensors I. Advanced topics. Proc IEEE. 1997;85(1):64–79.
 29.
Chen B, Tong L, Varshney PK. Channel aware distributed detection in wireless sensor networks. IEEE Signal Process Mag. 2006;23(4):16–26.
 30.
Chamberland JF, Veeravalli VV. Wireless sensors in distributed detection applications. IEEE Signal Process Mag. 2007;24(3):16–25.
 31.
Tsitsiklis J, Athans M. On the complexity of decentralized decision making and detection problems. IEEE Trans Autom Control. 1985;30(5):440–6.
 32.
Warren D, Willett P. Optimum quantization for detector fusion: some proofs, examples, and pathology. J Franklin Inst. 1999;336(2):323–59.
 33.
Chamberland JF, Veeravalli VV. Asymptotic results for decentralized detection in power constrained wireless sensor networks. IEEE J Sel Areas Commun. 2004;22(6):1007–15.
 34.
Easley D, Kleinberg J. Networks, crowds, and markets, vol. 1(2.1). Cambridge: Cambridge University Press; 2010. p. 2–1.
 35.
Acemoglu D, Ozdaglar A. Opinion dynamics and learning in social networks. Dyn Games Appl. 2011;1(1):3–49.
 36.
Banerjee AV. A simple model of herd behavior. Q J Econ. 1992;107:797–817.
 37.
Bikhchandani S, Hirshleifer D, Welch I. A theory of fads, fashion, custom, and cultural change as informational cascades. J Political Econ. 1992;100:992–1026.
 38.
Bikhchandani S, Hirshleifer D, Welch I. Learning from the behavior of others: conformity, fads, and informational cascades. J Econ Perspect. 1998;12(3):151–70.
 39.
Smith L, Sørensen P. Pathological outcomes of observational learning. Econometrica. 2000;68(2):371–98.
 40.
Bala V, Goyal S. Conformism and diversity under social learning. Econ Theory. 2001;17(1):101–20.
 41.
Banerjee A, Fudenberg D. Wordofmouth learning. Games Econ Behav. 2004;46(1):1–22.
 42.
Gale D, Kariv S. Bayesian learning in social networks. Games Econ Behav. 2003;45(2):329–46.
 43.
Gill D, Sgroi D. Sequential decisions with tests. Games Econ Behav. 2008;63(2):663–78.
 44.
Acemoglu D, Dahleh MA, Lobel I, Ozdaglar A. Bayesian learning in social networks. Rev Econ Stud. 2011;78(4):1201–36.
 45.
Hsiao J, Chen KC. Steering information cascades in a social system by selective rewiring and incentive seeding. In: to Be included in 2016 IEEE international conference on communications (ICC) 2016.
 46.
DeMarzo PM, Zwiebel J, Vayanos D. Persuasion bias, social influence, and unidimensional opinions. In: Social Influence, and UniDimensional Opinions (November 2001). MIT Sloan Working Paper (433901). 2001.
 47.
Golub B, Jackson MO. Naive learning in social networks and the wisdom of crowds. Am Econ J. 2010;2(1):112–49.
 48.
Acemoglu D, Ozdaglar A, ParandehGheibi A. Spread of (mis) information in social networks. Games Econ Behav. 2010;70(2):194–227.
 49.
Jadbabaie A, Molavi P, Sandroni A, TahbazSalehi A. NonBayesian social learning. Games Econ Behav. 2012;76(1):210–25.
 50.
Lalitha A, Sarwate A, Javidi T. Social learning and distributed hypothesis testing. In: 2014 IEEE international symposium on information theory. New York: IEEE; 2014. p. 551–5.
 51.
Rhim JB, Goyal VK. Distributed hypothesis testing with social learning and symmetric fusion. IEEE Trans Signal Process. 2014;62(23):6298–308.
 52.
Huang SL, Chen KC. Information cascades in social networks via dynamic system analyses. In: 2015 IEEE international conference on communications (ICC); 2015. p. 1262–7. https://doi.org/10.1109/ICC.2015.7248496.
 53.
Castro R, Coates M, Liang G, Nowak R, Yu B. Network tomography: recent developments. Stat sci. 2004;19:499–517.
 54.
Viswanathan R, Thomopoulos SC, Tumuluri R. Optimal serial distributed decision fusion. IEEE Trans Aerospace Electron Syst. 1988;24(4):366–76.
 55.
Papastavrou JD, Athans M. Distributed detection by a large team of sensors in tandem. IEEE Trans Aerospace Electron Syst. 1992;28(3):639–53.
 56.
Swaszek PF. On the performance of serial networks in distributed detection. IEEE Trans Aerospace Electron Syst. 1993;29(1):254–60.
 57.
Bahceci I, AlRegib G, Altunbasak Y. Serial distributed detection for wireless sensor networks. In: Proceedings. International symposium on information theory, ISIT 2005. New York: IEEE; 2005. p. 830–4.
 58.
Rosas F, Hsiao JH, Chen KC. A technological perspective on information cascades via social learning. IEEE Access. 2017;5:22605–33.
 59.
Rosas F, Chen KC. Social learning against data falsification in sensor networks. In: International workshop on complex networks and their applications. New York: Springer; 2017. p. 704–16.
 60.
Rosas F, Oberli C. Modulation and SNR optimization for achieving energyefficient communications over shortrange fading channels. IEEE Trans Wirel Commun. 2012;11(12):4286–95.
 61.
Bertrand A. Applications and trends in wireless acoustic sensor networks: a signal processing perspective. In: 2011 18th IEEE symposium on communications and vehicular technology in the Benelux (SCVT); 2011. p. 1–6. https://doi.org/10.1109/SCVT.2011.6101302.
 62.
Kam M, Zhu Q, Gray WS. Optimal data fusion of correlated local decisions in multiple sensor detection systems. IEEE Trans Aerospace Electron Syst. 1992;28(3):916–20.
 63.
Chen JG, Ansari N. Adaptive fusion of correlated local decisions. IEEE Trans Syst Man Cyberne Part C (Appl Rev). 1998;28(2):276–81.
 64.
Willett P, Swaszek PF, Blum RS. The good, bad and ugly: distributed detection of a known signal in dependent Gaussian noise. IEEE Trans Signal Process. 2000;48(12):3266–79.
 65.
Chamberland JF, Veeravalli VV. How dense should a sensor network be for detection with correlated observations? IEEE Trans Inf Theory. 2006;52(11):5099–106.
 66.
Sundaresan A, Varshney PK, Rao NS. Copulabased fusion of correlated decisions. IEEE Trans Aerospace Electron Syst. 2011;47(1):454–71.
 67.
Loeve M. Probability theory, vol. 1. New York: Springer; 1978.
 68.
Karl H, Willig A. Protocols and architectures for wireless sensor networks. Chichester: Wiley; 2007.
 69.
Sundararaman B, Buy U, Kshemkalyani AD. Clock synchronization for wireless sensor networks: a survey. Ad hoc Netw. 2005;3(3):281–323.
 70.
Rosas F, Brante G, Souza RD, Oberli C. Optimizing the code rate for achieving energyefficient wireless communications. In: Wireless communications and networking conference (WCNC), 2014 IEEE. New York: IEEE; 2014. p. 775–80.
 71.
Karyotis V, Khouzani M. Malware diffusion models for modern complex networks: theory and applications. Cambridge: Morgan Kaufmann; 2016.
 72.
Poor HV. An introduction to signal detection and estimation. BerlinHeidelberg: Springer; 2013.
 73.
Smith P, Hutchison D, Sterbenz JP, Schöller M, Fessi A, Karaliopoulos M, Lac C, Plattner B. Network resilience: a systematic approach. IEEE Commun Mag. 2011;49(7):88–97.
 74.
Shiller RJ. Conversation, information, and herd behavior. Am Econ Rev. 1995;85(2):181–5.
 75.
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Boca Raton: CRC Press; 2014.
 76.
Cover TM, Thomas JA. Elements of information theory. New Jersey: Wiley; 2012.
 77.
Rosas F, Ntranos V, Ellison CJ, Pollin S, Verhelst M. Understanding interdependency through complex information sharing. Entropy. 2016;18(2):38.
 78.
Dieudonne J. Treatise on analysis, vol. II. New York: Associated Press; 1976.
 79.
McKenna SA, Wilson M, Klise KA. Detecting changes in water quality data. J Am Water Works Assoc. 2008;100(1):74.
Authors’ contributions
All the authors participated in the development of the concepts and the writing of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
Fernando Rosas is supported by the European Union’s H2020 research and innovation programme, under the Marie SkłodowskaCurie Grant Agreement No. 702981.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Appendices
Appendix A: Properties of \(F_w^\Lambda\)
For simplicity let us consider the case of realvalue signals, i.e. \(S_n\in \mathbb {R}.\) In this case, the c.d.f. of the signal likelihood is given by
where \(\mathcal {S}^y = \{ x\in \mathbb {R}  \Lambda _s(x) \le y \}.\) If \(\Lambda _s\) is an increasing function, then \(\mathcal {S}^y=\{x\in \mathbb {R} x \le \Lambda _s^{1}(y) \} = (\infty , \Lambda _s^{1}(y) ]\) and hence
where \(H_w(s)\) is the cumulative density function (c.d.f.) of \(S_n\) for \(W=w.\) For the general case where \(\Lambda _s\) is an arbitrary (piecewise continuous) function, then \(\mathcal {S}^y\) can be expressed as the union of intervals. Then \(\cup _{j=1}^{\infty } [a_j(y),b_j(y)] = \mathcal {S}^y\) (note that \(\Lambda _s(a_j(y))=\Lambda _s(b_k(y))=y\)) and hence from (27) is clear that
Appendix B: Proof of Lemma 2
Proof
Lets assume that the process \(\varvec{G}_n\) has strong consistent transitions and consider \(\varvec{g'}\in \mathcal {G}_{n1}\) such that \(\tau _{n1}(\varvec{g'}) \le L_s.\) Note that, under these conditions \(F_w^\Lambda (\tau _{n1}(\varvec{g'})) = 0,\) and hence
holds for any \(w\in \{0,1\}.\) Moreover, this allows to find that
Therefore, due to the strongly consistent transition property, if \(\mathbb {P}_{w}\left\{ \varvec{G}_n=\varvec{g}\varvec{G}_{n1}=\varvec{g'} \right\} = \beta _w^n(\varvec{g}1,\varvec{g'}) > 0\) then
proving the weak consistent transition property. The proof for the case of \(\tau _{n1}(\varvec{g'}) \ge U_s\) is analogous. \(\square\)
Appendix C: Proof of Lemma 3
Proof
Let us consider \(\varvec{g}_0\in \mathcal {G}_{n}\) such that \(\tau _{n}(\varvec{g}_0) \notin [L_s,U_s].\) Then, due to the weakly invertible evolution, for each \(x\in \{0,1\}\) there exists \(\varvec{g}(x)\in \mathcal {G}_{n+1}\) such that
Moreover, note that while the deterministic assumption implies that the event \(\{\varvec{G}_{n}=\varvec{g}_0\}\) could be followed by either \(\{\varvec{G}_{n+1}=\varvec{g}(0)\}\) or \(\{\varvec{G}_{n+1}=\varvec{g}(1)\},\) the 1–1 assumption requires that \(\varvec{g}(0) = \varvec{g}(1).\) With this, note that
Above, (34) is a consequence of \(\varvec{g}(0) = \varvec{g}(1),\) while (35) is because of the 11 condition over the dynamic. Finally, to justify (36) let us first consider
Because \(\tau _n(\varvec{g}_0) \notin [L_s,U_s]\) then \(F_w^\Lambda (\tau _n(\varvec{g}_0))\) is either 0 or 1; in any case it does not depend on W. This, in turn means that \(\mathbb {P}_{1}\left\{ X_n=x\varvec{G}_n=\varvec{g}_0 \right\} = \mathbb {P}_{0}\left\{ X_n=x\varvec{G}_n=\varvec{g}_0 \right\},\) which explains how (36) is obtained.
Please note that (36) shows that, once \(\tau _n\) leaves \([L_s,U_s],\) it keeps a constant value. This, in turn, shows that weakly deterministic transitions satisfy the weakly consistency condition. \(\square\)
Appendix D: List of symbols
Table2 presents a summary of the notation and symbols used in this work.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Rosas, F., Chen, K. & Gündüz, D. Social learning for resilient data fusion against data falsification attacks. Comput Soc Netw 5, 10 (2018). https://doi.org/10.1186/s4064901800577
Received:
Accepted:
Published:
Keywords
 Distributed decisionmaking
 Data fusion
 Sensor networks
 Social networks
 Data falsification attacks
 Byzantine nodes
 Collective behaviour
 Multiagent systems
 Social learning
 Information cascades