Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Book Review: Data-Driven Security: Analysis, Visualization and Dashboards

samzenpus posted about 2 months ago | from the read-all-about-it dept.

Books 26

benrothke writes There is a not so fine line between data dashboards and other information displays that provide pretty but otherwise useless and unactionable information; and those that provide effective answers to key questions. Data-Driven Security: Analysis, Visualization and Dashboards is all about the later. In this extremely valuable book, authors Jay Jacobs and Bob Rudis show you how to find security patterns in your data logs and extract enough information from it to create effective information security countermeasures. By using data correctly and truly understanding what that data means, the authors show how you can achieve much greater levels of security. Keep reading for the rest of Ben's review.The book is meant for a serious reader who is willing to put in the time and effort to learn the programming necessary (mainly in Python and R) to truly understand what information exists deep in the recesses of their logs. As to R, it is a GNU project and a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. For analysis the level of which Jacobs and Rudis prescribe, R is a godsend.

After completing the book, the reader will have the ability to know which questions to ask to gain security insights, and use that data to ensure the overall security of their data and networks. Getting to that level is not a trivial at all a trivial task; even if there are vendors who can promise to do that.

For many people performing data analysis, the dependable Excel spreadsheet is their basic choice for data manipulation. The book calls the spreadsheet a gateway tool between a text editor and programming. The book notes that spreadsheets work as long as the data is not too large or complex. The book quotes a 2013 report to shareholders from J.P. Morgan in which parts of their 2012 $6 billion in losses was due in part to problems with their Excel spreadsheets.

The authors suggest using Excel as a temporary solution for quick one-shot tasks. For those that have repeating analytical tasks or models that are used repeatedly, it's best to move to some type of structured programming language, specifically those that the book suggest and for provides significant amounts of code examples; all of which are available on the companion website here.

The goal of all data extraction is to use data analysis to answer real questions. A large part of the book focuses on how to ask the right question. In chapter 1, the authors write that every good data analysis project begins with setting a goal and creating one or more research questions. Without a well-formed question guiding the analysis, you may wasting time and energy seeking convenient answers in the data, or worse, you may end up answering a question that nobody was asking in the first place.

The value of the book is that it shows the reader how to focus on context and purpose of the data analysis by setting the research question appropriately; rather than simply parsing large amounts of data. It's ultimately irrelevant if you can use Hadoop to process petabytes of data if you don't know what you are looking for.

Visualization is a large part of what this book is about, and in chapter 6 — Visualizing Security Data, the book notes that the most efficient path to human understanding is via the visual sense. It goes on to details the many advantages data visualization has, and the key to making it work.

As important as visualization is, describing the data is equally important. In chapter 7, the book introduces the VERIS(Vocabulary for Event Recording and Incident Sharing) framework. VERIS is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner. VERIS helps organizations collect useful incident-related information and to share that information, anonymously and responsibly with others.

The book shows how you can use dashboards for effective data visualization. But the authors warn that a dashboard is not an art show. They caution that given the graphical nature of dashboards, it's easy to fall into the trap of making them look like pieces of modern or fringe art; when they are far more akin to architectural and industrial diagrams that require more controlled, deliberate and constrained design.

As to dashboards the authors do not like, they consider the Cyber Security Situational Awarenessto be glitzy but not informative. Personally, I thought the dashboard has a lot of good information.

The book uses the definition of dashboard according to Stephen Few, in that it's a "visual display of the most important information needed to achieve one or more objectives that has been consolidated in a single computer screen so it can be monitored at a glance". The book enables the reader to create dashboards like that.

Data-Driven Security: Analysis, Visualization and Dashboards is a superb book written by two experts who provide significant amounts of valuable information in every chapter. For those that are willing to put the time and effort into the serious amount of work that the book requires, they will find it a vital resource that will certainly help them achieve much higher levels of security.

Reviewed by Ben Rothke.

You can purchase Data-Driven Security: Analysis, Visualization and Dashboards from amazon.com. Slashdot welcomes readers' book reviews (sci-fi included) -- to see your own review here, read the book review guidelines, then visit the submission page. If you'd like to see what books we have available from our review library please let us know.

cancel ×

26 comments

Sorry! There are no comments related to the filter you selected.

Question, what does R do that other lingos cannot? (0)

Anonymous Coward | about 2 months ago | (#47401657)

Does it just have statistical functions built in and ready to go?

Re:Question, what does R do that other lingos cann (3, Informative)

vux984 (928602) | about 2 months ago | (#47401851)

Question, what does R do that other lingos cannot?

Nothing. I'm sure other languages can do everything R can do.

Does it just have statistical functions built in and ready to go?

It does have that, along with an active community and growing popularity in scientific circles, so there is lots cutting edge interesting work being done with R -- and a lot of its free and open source. Plus it has multi-core support in several libraries places, and even gpu support in some.

Re:Question, what does R do that other lingos cann (1)

majid_aldo (812530) | about 2 months ago | (#47402363)

Question, what does R do that other lingos cannot?

Nothing. I'm sure other languages can do everything R can do.

Does it just have statistical functions built in and ready to go?

It does have that, along with an active community and growing popularity in scientific circles, so there is lots cutting edge interesting work being done with R -- and a lot of its free and open source. Plus it has multi-core support in several libraries places, and even gpu support in some.

since it has cutting-edge stat functions that's plenty of functionality that R has that other languages DON'T have.

Re:Question, what does R do that other lingos cann (1)

vux984 (928602) | about 2 months ago | (#47402659)

since it has cutting-edge stat functions that's plenty of functionality that R has that other languages DON'T have.

MATLAB, Python and other languages have stuff in the same class as R. R is particularly well suited for stats functionality... but its is not UNIQUELY suited for it.

Re:Question, what does R do that other lingos cann (0)

Anonymous Coward | about 2 months ago | (#47402817)

that is correct ...but MATLAB is expensive. R is a free and open framework.

Re:Question, what does R do that other lingos cann (0)

Anonymous Coward | about 2 months ago | (#47404161)

I hope you appreciated that softball. ;)

true. All languages can do exactly the same things (1)

raymorris (2726007) | about 2 months ago | (#47403285)

Question, what does R do that other lingos cannot?

Nothing. I'm sure other languages can do everything R can do.

This is an interesting point, which I'm going to veer slightly off topic with. All general purpose programming languages* can do _precisely_ the same things. All fit the requirements to be "Turing complete". ANY Turing complete language "A" can emulate any other Turing complete language "B", and therefore "A" can do the anything that "B" can do. Since "B" can also emulate "A", the two languages can do precisely the same things. (Church-Turing thesis). An interesting example of this is that JavaScript can do everything that CPU microcode can do, as shown at http://bellard.org/jslinux/ [bellard.org] .

Therefore, the question is never "which language can do more", it's always "which language can do it most quickly, most securely, etc." C is often faster than Java for many operations. R is more convenient for statistics, PHP 5.3 makes security bugs less likely than PHP 4.0, but all of those languages can run the exact same programs.

Contrast HTML and XML, which being markup languages rather than general purpose programming languages, are not Turing complete. Standard regexs are also not Turing complete, though Perl's extended regexs very well may be.

Re:true. All languages can do exactly the same thi (1)

vux984 (928602) | about 2 months ago | (#47403899)

All general purpose programming languages* can do _precisely_ the same things.

For a rather broad and mathematically abstract definition of "precisely".

The Church-Turing thesis applies to computers and computation in the abstract. Actual computer languages on actual hardware may theoretically be able to do the same things in an abstract sense, but not necessarily do precisely the same things with the actual physical hardware they run on.

Not necessarily due to the language itself, but the nature of how they are compiled, interpreted, and/or otherwise used in practice.

Re:true. All languages can do exactly the same thi (0)

Anonymous Coward | about 2 months ago | (#47404529)

wateva....

the truth is that R is a specialized language...and for this purposes...it WORKS!

really does.

Exmpl? If an interpreter for A can be written in B (1)

raymorris (2726007) | about 2 months ago | (#47406567)

If an interpreter for language A can be written in language B, then B can therefore do everything A does, by running that interpreter. Do you have an example in mind of two languages that can do very different things?

Re:Exmpl? If an interpreter for A can be written i (1)

vux984 (928602) | about 2 months ago | (#47407977)

If an interpreter for language A can be written in language B, then B can therefore do everything A does, by running that interpreter.

Mathematically speaking yes. Practically speaking no.

Do you have an example in mind of two languages that can do very different things?

Postscript is Turing complete. Now go write an interpreter for C / C++ with it, and use it to play Call of Duty.

You can write an interpreter for C/C++ with it.

Hypothetically speaking it would compile and run the source code for Call of Duty.

Practically speaking however, it would not work. This abomination would not have access to directX, game controller inputs, sound, multiplayer / networking because postscript doesn't have those things, and therefore the interpreted C code would not have those things.

Those things aren't required to be Turing Complete, but they are required to play Call of Duty on a modern PC precisely the same way one might play it.

You could create something mathematically equivalent of the computation required for Call of Duty, but you could not "play" it precisely the same way.

If one were to build the relevant api/libraries and make them available to postscript then you could, but that doesn't exist right now and its worth pointing out that those graphics, sound, networking, and input APIs could not themselves be written entirely in postscript.

So while postscript and C are mathematically equivalent in a Church-Turing thesis sense they really aren't equivalent on real hardware in the real world.

You could not start with what is in the world today, and writing postscript, only postscript code, and nothing else, come up with a playable call of duty.

sure it does. If you sandbox J, it's sandboxed too (1)

raymorris (2726007) | about 2 months ago | (#47409609)

If you sandbox Java in the browser, or sandbox a plugin written in C, it can't access DirectX either. The fact that people often choose to run a program in a sandbox doesn't mean anything about the language(s) the program is written in. Try writing a C compiler in C. It's not easy in any language. It's possible in any.

ps - I wouldn't want to write COD in Postscript (1)

raymorris (2726007) | about 2 months ago | (#47409657)

Ps, it would certainly be EASIER to write Call of Duty in some languages than it would in others. It would be difficult to get it to run QUICKLY in some languages (actually that's true of all languages). It could be done, though, and that's point. The question isn't what CAN the language do, the question is what it's best suited for. Just because you CAN write a pixel shader in Perl doesn't mean you should.

Re:sure it does. If you sandbox J, it's sandboxed (1)

vux984 (928602) | about 2 months ago | (#47410141)

The fact that people often choose to run a program in a sandbox doesn't mean anything about the language(s) the program is written in.

The fact that you can theoretically put any language into a given sandbox or theoretically take it out of one is not equivalent to a real ability to actually do it in the real world today.

It's not easy in any language. It's possible in any.

Imagine a turing complete toy language which only operates on binary values. The only implementation of that language allocates a byte to hold a one or a zero. An "8 bit integer" would require 8 bytes of computer memory. Horribly wasteful I know, but its still a turing complete language.

You cannot implement an x86 C compiler in this language. (At least not today.) Not because the language itself is incapable of computing it, but the implementation of the language lacks the ability to output valid x86 C code that an x86 CPU will execute. If for example, my executable program needs to have its first byte as 110100011, this language cannot output that. I'll get

00000001, 00000001, 00000000, 00000001, 00000000, 00000000, 00000000, 00000001

which can be shown to be mathematically equivalent by a simple function of ignoring the leading 7 bits of each byte and placing the remaining 8th bit into a single byte. BUT this language lacks the capability to actually do THAT in practice.

I could even write an x86 CPU emulator in my toy language and use it to run my "equivalent to x86 but not x86" machine code. The emulator would emulate the 32-bit CPU registers with 32 bytes each containing 1 bit, etc...

But no matter how much I twist and contort, I can't get 11010011 into a single hardware byte. I don't need to do that for the language to be turing complete, since it can *simulate* the ability to do that, without actually doing it.

And THAT is my point. Church-Turing is satisfied by simulation. A simulation of a thing isn't necessarily precisely the same thing as being able to actually do the thing directly.

Now of course, one could re-implement the language differently (allowing bits to set within a physical byte), and then one could do this. But the reimplemention would have to be done in a different language -- one simply could not bootstrap what was needed from the original implementation of the language.

Despite it being Turing complete.

Interesting, but not a Turing machine, unless is (1)

raymorris (2726007) | about 2 months ago | (#47410635)

We're way off in the weeds here, of course, but that's cool. I don't mind playing in the weeds.

What you've done there is analogous to Dear Leader's argument "it's Constitutional because it is not a tax and is a tax". You've tried to say "it can write the single value 00000001, which is eight values". Either that's one value or eight, pick one.

The definition of a Turing machine has requires very few capabilities. One of the very few things required by the definition of a Turing machine is that is has to be able to update memory one value at a time (block writes aren't good enough). That's the DEFINITION of a Turing machine - it's a machine that writes individual symbols to a strip of tape of other storage.

You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

If we want it to be Turing complete, we can interpret it as one value by saying that the LANGUAGE writes "1" and the HARD DRIVE happens to store that physically with eight molecules. The language would then be Turing complete since it's updating the single value "1". Fine. The language can write 1010101, 11111, 0000, 01010, or any other series since it's writing one value at a time. Perhaps the hard drive stores "10" physically as 1111111100000000, but the hard drive is going to read back what was written to it. Write a "1", get a "1" back. That's part of the definition of Turing complete because the storage in a turing complete system can be like a dumb piece of paper - it doesn't change what you write to it. Given that the tape doesn't change what's written to it, the language can write valid machine code and get valid machine code back.

You can't have it both ways. If "1" is one value, it can write "1", then write "0", in whatever pattern is needed to produce valid machine code. If it can only write the eight separate values 0,0,0,0,0,0,0,1 that's not a Turing machine.

Re:Interesting, but not a Turing machine, unless i (1)

vux984 (928602) | about 2 months ago | (#47411145)

We're way off in the weeds here, of course, but that's cool. I don't mind playing in the weeds.

Way out there. :)

You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

No you are mistaken.

From the point of view of the LANGUAGE, each bit is individually and directly accessible. All the language sees is

0, 1, 0, 1, 0 ...

The implementation of the language however, runs on x86, and uses a byte to represent each 0, and each 1. (as 00000000 and 00000001 respectively.

When the language saves a file out to disk, it writes out its binary bits one bit at a time, but they are each saved as a byte. When it reads them back in, the byte 00000001 is read, and stored in memory as 00000001 in a byte... but the language, just sees 1.

The problem if you try to write a C compiler in the language, is that as far as the language is concerned the first 8 bits of the program it compiled *IS* 11010001, however, what is in the physical computer memory to represent that is 8 bytes each 000000000 or 00000001. What gets written to the file on the disk is the same. The representation of the compiled C program I generated, is mathmatically equivalent to the actual C program ... but it cannot be run as the representation is wrong.

One of the very few things required by the definition of a Turing machine is that is has to be able to update memory one value at a time

The definition of a turing machine places NO restriction on the representation of its "memory".

So you are mistaken or misunderstood what I've done. I am *absolutely* free to use arrays of 8 bit bytes that each contain one of 2 bit patterns to simulate a turing tape.

You've defined a language that can only update eight bits at a time, and additionally you've said it updates them only in certain patterns. That's not Turing complete.

The language doesn't see 8 bits at a time. The arrays of bit patterns are an implementation detail that the language is not 'aware' of.

If we want it to be Turing complete, we can interpret it as one value by saying that the LANGUAGE writes "1" and the HARD DRIVE happens to store that physically with eight molecules.

Ok... yes. Exactly Right. Except that rather than molecules I'm asserting the language simply dumps it to the hard disk using the logical file system already in place (say by Windows or Linux or whatever) the way its stored in memory... as a stream of bytes, containing one of two patterns.

. Fine. The language can write 1010101, 11111, 0000, 01010, or any other series since it's writing one value at a time. Perhaps the hard drive stores "10" physically as 1111111100000000, but the hard drive is going to read back what was written to it. Write a "1", get a "1" back.

Right again. So far so good. The language writes a "1" and it reads back a "1". Yes.

Given that the tape doesn't change what's written to it, the language can write valid machine code and get valid machine code back.

Swing and a miss... ok not a miss... foul ball.

It can read and write "valid" machine code back, subject to the constraint that we view it from within the language / language implementation. If we look at what is actually in the physical computer memory, it is not valid machine code from the physical computers perspective.

This language can compute results that are logically equivalent to machine code, but they are not actually usable as machine code. We can't simply set the ACTUAL CPU instruction pointer to the spot in physical computer memory I'm storing my compiled C program because it is just a sequence of 111111111 and 000000000 ... a representation of the machine code, but not usable machine code itself.

Further because the language *implementation* provides no way of setting the physical bits of the *underlying* computer in an arbitrary way, there is no way to directly compile to usable machine code using this implementation of the language. DESPITE it being a Turing complete language, and despite the implementation being sufficient to simulate a Turing machine.

You can't have it both ways.

Can and do. We have 2 Turing machines!

The language / implementation / system only needs to be internally consistent with the requirements for Turing completeness to be Turing complete. If I *layer* Turing machines such that one runs on top of another one -- which is exactly what I've done by taking an actual computer (Turing machine 1) and implementing my toy language on it (Turing Machine 2). TM1 is therefore effectively simulating TM2 which is fine. But there is no requirement that TM2 be able to directly control TM1 for TM2 to be Turing complete.

TM2 -- might be able to control TM1 -- and in fact in many such layering scenarios this IS possible. But it is not a requirement, and some scenarios exist where TM2 can't control TM1.

Clearly then not all Turing machines are equal in all ways. ;)

They ARE of course equal in the specific way identified by the Church-Turing thesis, which effectively states (amongst other things) that any TM can simulate any other TM (implying that any TM can 'host' another TM).

But it does NOT make any claims about the hosted (or simulated) TM being able to control or directly reprogram the simulator simulating it (ie re-program and control its own host). And it turns out there exist TMs which CAN directly reprogram the TM hosting them, and TMs that cannot.

So to illustrate:

x86 PC (TM1) can simulate ARM PC (TM2), and then of course TM2 can turn around and emulate an x86PC (TM3)... etc ad nauseum per Church-Turing.

Suppose in this case though that TM2, being an emulation of an ARM based system was implemented such that not only can it arrange its own "simulated memory" in arbitrary ways, but can also effectively arrange the memory of TM1 in arbitrary ways, it can also, at least in theory, emit machine code and directly reprogram the TM1 that is simulating it.

It can control and program the *actual* TM1, not just TM3 which is itself a simulation of the same type of computer that TM1 is.

Church-Turing only requires that TM2 be able to simulate TM3, not control TM1.

contrast to:

x86 PC (TM1) can simulate my-toy-turingmachine (TM4), and TM4 being Turing Complete CAN simulate an x86PC (TM5)
Because the implementation of TM4 lacks the ability to arrange the memory of TM1 in arbitrary ways it cannot directly control TM1, even though it can simulate a machine equivalent to TM1. (In this example that's TM5)

Re:Interesting, but not a Turing machine, unless i (0)

Anonymous Coward | about 2 months ago | (#47412267)

this is most interesting...why not post this as an article on the slashdot...

cuz this has nothing though to do with this books review.

Re:Question, what does R do that other lingos cann (0)

Anonymous Coward | about 2 months ago | (#47416139)

R has more than just the statistical functions. It has language syntax and assumptions that are different that other languages. Also it is class based. Some of the base level are: Arrays, Matrix's, Vector's and Data Frames. Personally, I don't know the subtle differences ( I am not a R programmer, I work with R programmers)
Example: Consider a classic array of numbers (intergers not reals), 5 columns (called a, b, c, d, e) by N number of rows, the goal is to square each and every cell.
a simple bit of code would be a loop on the number of rows, within which there would be a square of column.
Assuming the array is called M, in R you could simple code:

M - M^2

and that's it. You do not need to be concerned about either the number of rows or the number of columns, because this functionality is built into the language itself.
At my company the R programmers create plots, graphs, and such. While I organize, categorize and present them on a web application.
Using R (or Sas) or other datacentric programming language is a different way of thinking.

Re:Question, what does R do that other lingos cann (1)

Anonymous Coward | about 2 months ago | (#47402199)

http://en.wikipedia.org/wiki/R_%28programming_language%29

R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software[2][3] and data analysis.[3] Polls and surveys of data miners are showing R's popularity has increased substantially in recent years.[4][5][6]

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. R was created by Ross Ihaka and Robert Gentleman[7] at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.[8]

R is a GNU project.[9][10] The source code for the R software environment is written primarily in C, Fortran, and R.[11] R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; however, several graphical user interfaces are available for use with R.

Re:Question, what does R do that other lingos cann (0)

Anonymous Coward | about 2 months ago | (#47440933)

Ready...but not to go.

Basically Bullshit (0)

gweihir (88907) | about 2 months ago | (#47403521)

This may have some use against script-kiddies, bot-nets and similarly non-sophisticated adversaries. It is worse than nothing against other adversaries, as it creates a false sense of security.

Re:Basically Bullshit (0)

Anonymous Coward | about 2 months ago | (#47403757)

What are you responding to?

It has nothing to do with the book.

Re:Basically Bullshit (0)

Anonymous Coward | about 2 months ago | (#47404521)

Recursive...your comment on Basically Bullshit is what your comment is.

Re:Basically Bullshit (0)

Anonymous Coward | about 2 months ago | (#47412289)

shut up...u r an angry man!

Re:Basically Bullshit (1)

strikethree (811449) | about 2 months ago | (#47428705)

Hm. I am going to have to disagree with you there. A false sense of security can be gleaned from such data; however, a false sense of security can be had from NO information at all. A false sense of security is a failing in the security practitioner, not a result of the data. For example, let's say someone has done this analysis and thinks they are secure and then reads your comment. They then know the limitations of what their data can expose and can continue to look for more subtle traces leading to discovery of more sophisticated agents operating within their systems.

A false sense of security is in essence another formulation of what I call the arrogance problem. Always keep your mind open to new ideas and interpretations and you can avoid this issue more often than not. I hope you did not get modded down for displaying a potential problem. You were a bit harsh but skins should not be thin here.

To be fair, it is always good to eliminate the obvious first. Stopping there is stupid, but starting there is good.

Re:Basically Bullshit (0)

Anonymous Coward | about 2 months ago | (#47440963)

you make zero...wait...less than zero sense!

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>