Peter Lubell-Doughtie

Helioid Preview

Thursday, February 24, 2011

A couple months ago we opened up the Helioid development version to the public, but have not yet announced it. We have recently migrated servers and tweaked algorithms and have significantly improved the performance. There is still a lot of work to be done but we believe this reflects a healthy starting point.

The graphic below illustrates how you can use Helioid to go from a question or idea to a organized lists of results that helps you build better search queries.

Helioid infographic

This version of Helioid is a meta-search engine offering category based personalization. Below is screenshot of the results after querying Helioid:

Helioid screenshot

By interacting with the categories on the left you can choose the results to be shown on the right. Please let us know if you have any suggestions, comments, criticisms, complaints; anything. Try out the Helioid preview now.

Thanks to Ben de Jesus for helping with site design and user experience.

Improving Result Diversity using PLSA at DIR2011

Monday, February 07, 2011

Katja and I presented a poster regarding our paper entitled "Improving Result Diversity using PLSA" at DIR 2011 on Friday. The poster is below:

The abstract is included below. The full paper is available for download as well.

IA-SELECT is a recently developed algorithm for increasing the diversity of a search result set by reordering an original document list based on manually generated clusters. In this paper we extend this approach to create a diversification framework in which arbitrary clustering methods can be used, and where the influence of clusters can be balanced against the original rank of documents. We study whether clusters that are automatically generated using probabilistic latent semantic analysis (PLSA) can compete with manually created clusters, and investigate how balancing the influence of clusters and original document rank affects diversity scores. As there are currently few datasets for evaluating diversity, we develop a new dataset, which is released with this paper. Our results show that diversification using PLSA can improve diversity, but that there is a large gap in performance between automatically and manually created clusters.

A P2P Telecom

Monday, November 09, 2009

Update 02/2011: I just became aware of The Serval Project which addresses the same idea:

Communicate anywhere, any time … without infrastructure, without mobile towers, without satellites, without wifi hotspots, and without carriers. Use existing off-the-shelf mobile cell phone handsets. Use your existing mobile phone number wherever you go, and never pay roaming charges again.

They have code for their phone app and their distributed naming system available.

Below I present the idea for a P2P Telecom. That is, a telecom in which traffic is routed from sender to receiver through peers. Not through a central hub controlled by a company.

Cons against this implementation.

You will only benefit from a P2P telecom if the people you're communicating with are using the P2P telecom.

Pros in favor of a P2P Telecom.

Since users will be controlling the network, users will be setting the prices and it will benefit all users for prices to be as low as possible. This is a stark contrast to the current system where prices are set by a handful of powerful companies with little worry of losing subscribers.
No additional infrastructure will need to be put in place. In fact, towers, antennas, relays, etc. can be removed from densely populated areas because they are unnecessary for a P2P Telecom to function.
A P2P Telecom will have more privacy because you will trust no one but yourself and encryption levels you control. If you want to use a theoretically unbreakable but data intensive encryption protocol that's up to you and no one will know your data but you and the other user you are communicating with. Like any communications system, the privacy of a P2P Telecom is not perfect but, as any cryptographer worthy of their title will tell you, the system is open and this is leaps and bounds more secure than the closed systems of big industry telecoms.
There is no centralized control. This helps to enable better data security and increased privacy, among other things. Without centralized control no one will be able to shut off the network and there will be no data overloads in case of an emergency (such as happened in New York City during September 11, 2001).
There will be increased capacity. This lack of centralized control enables increased capacity and redundancy in populated areas beyond anything approachable by current industry telecoms.
A P2P Telecom can transfer any form of data. Because of their popularity phone calls are the most obvious forms of data to be transferred but if desired the network can transfer anything else such as video or raw data.

It is clear to me that the pros of a P2P Telecom definitively outweigh the cons and it is time we start building this network, which turns out to be rather simple.

The Implementation of a P2P Telecom

The implementation is based on modifying the functionality of existing cell phones be they GSM, CDMA, or some other unheard of system.

Cell phones must be retrofit so they can act as peers with one another. At their core cell phones send and receive signals, these phones must be modified so that they can receive and send signals directly from and to other phones.

Accomplishing this may or may not require hardware modification. If hardware modification is required we must determine how this can be done. One simple method would be the use of a modified SIM chip. If the change needed is more fundamental a systematic approach to performing it can be developed.
Software modification will be required. We must:
1. Change what type of signals the phone looks for so that it can find signals from other phones
2. Change how the phone interprets the signals it finds so that it can interpret them as:
  - Data passing. The phone must know how to move data from itself to other peers. This will require an algorithm informing the phone of the priorities for where to forward data it receives. This algorithm will benefit from, but not require, partial location awareness of peers and unique identification of peers. This can all be done while maintaining anonymity.
  - Data receiving. The phone must know that this data is destined for it and know how to handle decoding of the data.
  - Data sending. The phone must be able to uniquely encoded it's data so it is only decodable by the receiver.

The Long Distance Uplinks in a P2P Telecom

A problem with the described P2P telecom is that if you want to call someone outside of your peer neighborhoods range or someone who doesn't use the P2P system you'll have to place the call over the existing industry telecom network. A solution to this problem is to give every P2P neighborhood an internet uplink and connect multiple neighborhoods with a VoIP services. In this manner, if either of the above situations occur, the call can be placed using VoIP technologies.

The number of peers with access to the uplink and therefore the bandwidth of the circuit between the peers and the uplink will depend on the volume of calls going out of the local neighborhood.

The Algorithms for a P2P Telecom

Finding a peer and establishing a circuit
Sending data through a circuit - Encrypt data using public/private key system
Throttling of uplink bandwidth - Better to deny calls that increase latency on calls
Location anonymnity

Information Security Bookmarks

Saturday, November 07, 2009

Below is a list of bookmarks dealing with information security. The list has not been thoroughly reviewed so please let me know if something is out of date and should be removed. One of the most helpful sites to me when I began researching info sec was a blog post with a bunch of bookmarks, hopefully this will be similarly helpful to you.

Information Security Companies

Information Security Research

VoIP Watermarking Defenses

Wednesday, February 20, 2008

A couple of years ago (2005) the Rome Air Force Base sponsored research [1] into de-anonymizing VoIP traffic. The researchers developed a modification to the Linux Kernel which inserted a watermark into Skype VoIP traffic that is passed through a low-latency anonymizing network. A 24-bit watermark is inserted through the modulation of the inter-packet timing of data packets. This is essentially the establishment of a covert channel through a timing attack.

The attacker reads the probabilistically hidden bits in the traffic to reconstruct and identify of the originating and terminating nodes of a VoIP call. A defense against this would be to scrub your outgoing traffic to remove the covert channel or increase the probability of error in bit recovery beyond the acceptable rate. The attacker is not manipulating packets as they leave the origin, since then they would presumably already know the origin. The suggested implementation is to watermark packets as they transit through a VoIP gateway. Because of this it is necessary to scrub packets beyond the gateway; after they have been marked.

More interesting would be to alter the packet timing in a controlled manner and embed bits of your choosing. If you had enough knowledge as to how bit patterns are assigned to identities you could arbitrarily alter your identity and pose as another. You could also add incorrect watermarks to random VoIP traffic.

To detect a watermark you can exploit the embedding process. The technique relies on existing latency in the VoIP calls and is able to function with around 20ms - 30ms of latency by making a 3ms adjustment to packet arrival times. A suggestion is to make the latency as low as possible therefore making the existence of a watermark more detectable since the latency would need to be adjusted to unexpected levels. It may not be feasible to keep low latency for a long period of time but that would not necessarily be necessary. Latency could intermittently be pushed to the lowest possible levels and a check for embedded bits could be performed. The method uses the existing latency in the first minutes of the call to determine what an acceptable level of latency to add is. Exploiting this, the first minutes (or so) of the call could be made with high, but still believable, latency so the attacker embeds bits with the appropriate higher latency. Once a watermark has been embedded the latency could be significantly reduced and the alteration of packet timing should be noticeable.

Covert channels based on packet timing have many applications, beyond de-anonymization, and could be made very difficult to detect. Steganographic style embedding of traffic is a possibility as well as watermarking for authentication purposes by the originating and terminating parties.

[1] S. Chen, S. Jajodia, and X. Wang. Tracking Anonymous Peer-to-Peer VoIP Calls on the Internet. In CCS '05. ACM, November 2005

← older posts

newer posts →