<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-25021721</id><updated>2012-02-13T23:50:02.219-05:00</updated><title type='text'>Qing Zhang's technical cube</title><subtitle type='html'>My technical memo and random thoughts, mostly in database systems, business intelligence, and information security and privacy.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>21</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-25021721.post-2268204575322506983</id><published>2011-11-20T01:55:00.001-05:00</published><updated>2011-11-20T01:57:15.986-05:00</updated><title type='text'>Stevey's Google Platforms Rant</title><content type='html'>You cannot miss this great and interesting artical about the comparison about platform architecture and IT management! http://steverant.pen.io/&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2268204575322506983?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://steverant.pen.io/' title='Stevey&apos;s Google Platforms Rant'/><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2268204575322506983/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2268204575322506983' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2268204575322506983'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2268204575322506983'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/11/steveys-google-platforms-rant.html' title='Stevey&apos;s Google Platforms Rant'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-7503564559834140641</id><published>2011-07-26T02:56:00.000-04:00</published><updated>2011-07-26T02:57:30.130-04:00</updated><title type='text'>some notes about In Memory Database</title><content type='html'>Accessing data in memory reduces the I/O reading activity when querying the data which provides faster and more predictable performance than disk. In applications where response time is critical, such as telecommunications network equipment, main memory databases are often used.&lt;br /&gt;&lt;br /&gt;ACID support&lt;br /&gt;Volatile memory-based MMDBs can, and often do, support the other three ACID properties of atomicity, consistency and isolation. Many MMDBs add durability via the following mechanisms:&lt;br /&gt;- Snapshot files, or, checkpoint images, which record the state of the database at a given moment in time. &lt;br /&gt;    typically generated periodically, or, at least when the MMDB does a controlled shut-down. Only offer partial durability.&lt;br /&gt;Full durability need to be supplemented by one of the following: &lt;br /&gt;&lt;br /&gt;- Transaction logging, &lt;br /&gt;   records changes to the database in a journal file, facilitates automatic recovery &lt;br /&gt;- Non-volatile random access memory (NVRAM), &lt;br /&gt;   static RAM backed up with battery power (battery RAM), or an electrically erasable programmable ROM (EEPROM). Recover the data store from its last consistent state upon reboot. &lt;br /&gt;- High availability (reduncy/replication)&lt;br /&gt;   rely on database replication, with automatic failover to an identical standby database in the event of primary database failure. &lt;br /&gt;&lt;br /&gt;To protect against loss of data in the case of a complete system crash, replication of a MMDB is normally used in conjunction with one or more of the mechanisms listed above. &lt;br /&gt;&lt;br /&gt;--------------&lt;br /&gt;&lt;br /&gt;Oracle In-Memory Database Cache Architecture&lt;br /&gt;- shared libraries&lt;br /&gt;It's in contrast to a more conventional RDBMS, which is implemented as a collection of executable programs to which applications connect, typically over a client/server network. &lt;br /&gt;&lt;br /&gt;- memory-resident data structures&lt;br /&gt;It is maintained in shared memory segments in the operating system and contains all user data, indexes, system catalogs, log buffers, lock tables and temp space&lt;br /&gt;.&lt;br /&gt;- database processes&lt;br /&gt;a separate process to each database to perform operations including the following tasks:&lt;br /&gt;&gt; Loading the database into memory from a checkpoint file on disk&lt;br /&gt;&gt; Recovering the database if it needs to be recovered after loading it into memory&lt;br /&gt;&gt; Performing periodic checkpoints in the background against the active database&lt;br /&gt;&gt; Detecting and handling deadlocks&lt;br /&gt;&gt; Performing data aging&lt;br /&gt;&gt; Writing log records to files&lt;br /&gt;&lt;br /&gt;- administrative programs&lt;br /&gt;Utility programs are explicitly invoked by users, scripts, or applications to perform services such as interactive SQL, bulk copy, backup and restore, database migration and system monitoring.&lt;br /&gt;&lt;br /&gt;- IMDB Cache &lt;br /&gt;A cache group is created to hold the cached data. It is a collection of one or more tables arranged in a logical hierarchy by using primary key and foreign key relationships. Each table in a cache group is related to a database table. A cache table can contain all rows and columns or a subset in the related table. Cache groups support these features:&lt;br /&gt;&gt; Applications can read from and write to cache groups.&lt;br /&gt;&gt; Cache groups can be refreshed from Oracle data automatically or manually.&lt;br /&gt;&gt; Updates to cache groups can be propagated to Oracle tables automatically or manually.&lt;br /&gt;&gt; Changes to either Oracle tables or the cache group can be tracked automatically.&lt;br /&gt;&lt;br /&gt;When rows in a cache group are updated by applications, the corresponding rows in tables can be updated synchronously as part of the same transaction, or asynchronously immediately afterward. The asynchronous configuration produces significantly higher throughput and much faster application response times.&lt;br /&gt;&lt;br /&gt;Changes that originate in the tables are refreshed into the cache by the cache agent.&lt;br /&gt;&lt;br /&gt;Each cache group has a root table that contains the primary key for the cache group. &lt;br /&gt;Rows in the root table may have one-to-many relationships with rows in child tables, &lt;br /&gt;each of which may have one-to-many relationships with rows in other child tables.&lt;br /&gt;&lt;br /&gt;Each primary key value in the root table specifies a cache instance. Cache instances form the unit of cache loading and cache aging. &lt;br /&gt;&lt;br /&gt;The most commonly used cache group types are:&lt;br /&gt;&gt; Read-only cache group: committed updates to tables are automatically refreshed to the corresponding cache tables in the IMDB Cache database.&lt;br /&gt;&gt; Asynchronous writethrough (AWT) cache group: committed updates to cache tables in the IMDB Cache database are automatically propagated to the corresponding tables asynchronously.&lt;br /&gt;&gt; Synchronous writethrough (SWT) cache group: committed updates to cache tables in the IMDB Cache database are automatically propagated to the corresponding tables synchronously.&lt;br /&gt;&gt; User managed cache group&lt;br /&gt;&lt;br /&gt;Cache groups can be either dynamically loaded or explicitly loaded.&lt;br /&gt;&gt; explicitly loaded cache groups: the application preloads data into the cache tables from the database using a load cache group operation. .&lt;br /&gt;&gt; dynamic cache groups: cache instances are automatically loaded into the IMDB Cache from the database when the application references cache instances that are not already in the IMDB Cache. The use of dynamic cache groups is typically coupled with least recently used (LRU) aging.&lt;br /&gt;&lt;br /&gt;Keep a cache group synchronized with the corresponding data in the Oracle tables:&lt;br /&gt;&gt; Autorefresh&lt;br /&gt;   * incremental autorefresh: updates only records that have been modified since the last refresh. &lt;br /&gt;      best when updated often, but only a few rows are changed with each update.&lt;br /&gt;   * full autorefresh operation, refreshes the entire cache group at specified time intervals.&lt;br /&gt;      best if table is updated only once a day and many rows are changed. &lt;br /&gt;&gt; Manual refresh &lt;br /&gt;       best if the application logic knows when the refresh should happen.&lt;br /&gt;&lt;br /&gt;Aging:&lt;br /&gt;Records can be automatically aged out, usage-based or time-based.&lt;br /&gt;&lt;br /&gt;Passthrough feature&lt;br /&gt;checks whether the SQL statement can be handled locally by the cached tables in the IMDB Cache or if it must be redirected to the database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-7503564559834140641?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/7503564559834140641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=7503564559834140641' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/7503564559834140641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/7503564559834140641'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/07/some-notes-about-in-memory-database.html' title='some notes about In Memory Database'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-8924926374571576748</id><published>2011-06-26T10:43:00.004-04:00</published><updated>2011-06-26T12:06:01.301-04:00</updated><title type='text'>How to make contributions to Hadoop Hackathon</title><content type='html'>Joined the Apache Hadoop Hackathon meeting hosted by the LA Hadoop User Group in Shopzilla last week. Here's a detailed explanation how you can make contributions:&lt;br /&gt;&lt;a href="http://bit.ly/ifGaMc"&gt;http://bit.ly/ifGaMc&lt;/a&gt;&lt;br /&gt;submitting patches, or simply running tests and identifying bugs would be great.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-8924926374571576748?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/8924926374571576748/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=8924926374571576748' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8924926374571576748'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8924926374571576748'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/06/how-to-make-contributions-to-hadoop.html' title='How to make contributions to Hadoop Hackathon'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2516502771537303415</id><published>2011-04-22T22:40:00.004-04:00</published><updated>2011-04-22T23:33:04.044-04:00</updated><title type='text'>The Amazon's lesson on Cloud Computing</title><content type='html'>Read about this article in today's New York Times: &lt;a href="http://www.nytimes.com/2011/04/23/technology/23cloud.html?_r=1"&gt;Amazon’s Trouble Raises Cloud Computing Doubts&lt;/a&gt;. The difficulties EC2 has come across impacted many business on their platform.&lt;br /&gt;&lt;br /&gt;In my view, fault tolerance is still the key concern for big companies with critical data. That's the reason that despite the overwhelming platforms and applications based on MapReduce day by day, big banks or retailers still prefers traditional data warehouses such as Teradata, as it provides more safety for their data storage (another important reason is the strong query optimization and efficient data analytics). &lt;br /&gt;&lt;br /&gt;In the long run, I believe failures will be handled better and better for the Cloud Computing and MapReduce technologies. We have seen the same path for online trading platforms: when they're coming out initially, system breaks here and there. But after several years, almost all people are doing their trades online. The current difficulities with Cloud and MapReduce will also go away as time goes on. Traditional data warehouse companies also believe in this. Teradata, DB2, Exadaa, GreenPlum,... all of them are investigating, or already proposed some hybrid models combining cloud and MapReduce into their product.&lt;br /&gt;&lt;br /&gt;For now, it's still strongly recommended that business should maintain some backup system/storage on their own, before the cloud is resilient enough. And for the cloud service providers such as Amazon, they should give higher priority to fault tolerance system design. If they cannot provide 24/7 at this time, at least they should make the recovery more quickly and transparent to customers. In the current event, the impact has lasted two days. This is just way toooo bad.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2516502771537303415?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2516502771537303415/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2516502771537303415' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2516502771537303415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2516502771537303415'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/04/amazons-lesson-on-cloud-computing.html' title='The Amazon&apos;s lesson on Cloud Computing'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-4147909281475035802</id><published>2011-04-17T15:22:00.004-04:00</published><updated>2011-04-17T15:31:47.307-04:00</updated><title type='text'>Facebook finally launches social deal</title><content type='html'>Refer to this link: &lt;a href="http://gigaom.com/2011/04/15/how-facebook-can-beat-groupon-by-making-deals-social/"&gt;http://gigaom.com/2011/04/15/how-facebook-can-beat-groupon-by-making-deals-social/&lt;/a&gt;.&lt;br /&gt;Just as I expected in my last Dec.'s post, I thought Groupon is an easy to duplicate model, and should join Fackbook instead of being bought by google or going independent IPO. Now it seems it's missing this opportunity as Facebook already started its own social deal. The deal sharing within a social context would be more effective: anyway, friends often have similar interest and taste, and recommendations from friends are automatically including trust and reputation mechanisms. As long as Facebook doesn't make stupid mistakes in the future, Groupon will be in big trouble.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-4147909281475035802?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/4147909281475035802/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=4147909281475035802' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4147909281475035802'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4147909281475035802'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/04/facebook-finally-launches-social-deal.html' title='Facebook finally launches social deal'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2471766025195075736</id><published>2011-04-15T10:14:00.003-04:00</published><updated>2011-04-15T10:20:31.658-04:00</updated><title type='text'>Amazing work of object tracking!</title><content type='html'>Can't wait for new products adopting this technique!&lt;br /&gt;&lt;a href="http://www.youtube.com/watch?v=1GhNXHCQGsM"&gt;http://www.youtube.com/watch?v=1GhNXHCQGsM&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2471766025195075736?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2471766025195075736/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2471766025195075736' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2471766025195075736'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2471766025195075736'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/04/amazing-work-of-object-tracking.html' title='Amazing work of object tracking!'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-5630815354605675430</id><published>2011-03-31T14:36:00.003-04:00</published><updated>2011-03-31T15:15:13.383-04:00</updated><title type='text'>Opinions about ActiveBase webcast</title><content type='html'>I attended the ActiveBase webcast just now. The brief introduction of their product can be found here: &lt;a href="http://www.dynamicdatamasking.com/activesecurity.html"&gt;link&lt;/a&gt;. Here's a quick summary of their product, &lt;br /&gt;&lt;br /&gt;ActiveBase, currently support oracle and sql server, provides sql proxies based on query rewriting to dynamiclly anonymize data. There's no need of change to application and db. The product can be installed 1. on each database server. 2. dedicated app servers as proxies/firewall. &lt;br /&gt;&lt;br /&gt;The protection solutions they provide: masks, scrambles, hides, audits, blocks fields. They adopt Role Based Access Aontrol to determine the anonymization methods and results returned to user.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Major concerns I have about their product:&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;- It seems the query rewritting technique needs a lot of manual effort, for each type of query. It is hard to automate for ad hoc queries. So there's big maintaince cost.&lt;br /&gt;&lt;br /&gt;- They don't have any privacy guarantee on their solution. Specificly, they use the query proxy technique and dynamiclly rewrite each query and return answers. Will the attacker be able to design queries carefully, and get multi versions of the anonymized data, make comparisons and thus lead to information leakage?&lt;br /&gt;&lt;br /&gt;- What's the usability of the anonymized dataset? For simple masking, the data doesn't have much usability after anonymization.&lt;br /&gt;&lt;br /&gt;Anyway, I'm still glad to see privacy and anonymization are getting more and more attention, and new products being developed. An interesting future work would be: what are the special techniques needed to provide anonymity to no-sql databases and cloud db?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-5630815354605675430?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/5630815354605675430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=5630815354605675430' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/5630815354605675430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/5630815354605675430'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/03/opinions-about-activebase-webcast.html' title='Opinions about ActiveBase webcast'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2808227475320701956</id><published>2011-03-14T22:18:00.002-04:00</published><updated>2011-03-23T11:24:48.921-04:00</updated><title type='text'>Data Protection general practices</title><content type='html'>&lt;em&gt;&lt;strong&gt;Data Protection Layers:&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;- Application&lt;br /&gt;- Database&lt;br /&gt;- File System&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Operation Cost Factors:&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;- Performance&lt;br /&gt;- Storage: data storage requirements&lt;br /&gt;- Security&lt;br /&gt;- Transparency: change to applications, and supports to utilities.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Data Protection options:&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;- Clear: actual value  &lt;br /&gt;&lt;br /&gt;- Hash: unreadable, not reversible&lt;br /&gt;  keyed hash(HMAC) provides strong protection&lt;br /&gt;  Considerations: key rotation&lt;br /&gt;&lt;br /&gt;- Encryption: unreadable, reversible&lt;br /&gt;  Considerations: storage type, transparency to applications, key rotation&lt;br /&gt;&lt;br /&gt;- Format controlling encryption: unreadable, reversible&lt;br /&gt;  Considerations: key rotation.&lt;br /&gt;&lt;br /&gt;- Replacement(tokens): unreadable, reversible&lt;br /&gt;  Proxy value created to replace original data.&lt;br /&gt;  Considertations: transparency for applications needing original data.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Continuous data protection:&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;- Automatically save a copy of every change made to the data. It allows the user or administrator to restore data to any point in time.&lt;br /&gt;&lt;br /&gt;- Advantage: Most continuous data protection solutions save byte or block-level differences rather than file-level differences. So if the portion of write data is small, save only the write changes will require less space on backup media.&lt;br /&gt;&lt;br /&gt;- Cost: introduce extra disk write operations and continuous network usage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2808227475320701956?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2808227475320701956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2808227475320701956' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2808227475320701956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2808227475320701956'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/03/data-protection-general-practices.html' title='Data Protection general practices'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-8587026330301944329</id><published>2011-03-14T01:05:00.004-04:00</published><updated>2011-03-14T22:18:41.053-04:00</updated><title type='text'>Failure Protection in Teradata</title><content type='html'>let's first look at Data Allocation in Teradata: Bacisally, OS recognize logical units&lt;em&gt;(LUN)&lt;/em&gt;, which is composed of slices(UNIX) or partitions(Windows/Linux) from each of the disk drives of a disk rank. Then the PDE translates the LUN into one or more &lt;em&gt;pdisks&lt;/em&gt;. psdisks are then assigned to AMPs. All the logical disk spaces an AMP manages is called a &lt;em&gt;vdisk&lt;/em&gt;. In general all pdisks from a rank will be assigned to the same AMP.&lt;br /&gt;&lt;br /&gt;Failure protection in Teradata falls in the several different levels:&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Disk drive level: RAID&lt;/strong&gt;&lt;br /&gt;RAID: Redundant Array of Independent(or Inexpensive) Disks.  &lt;br /&gt;The various designs of RAID systems involve two key goals: increase data reliability and increase input/output performance. There are six different designs RAID 1 to RAID 6 that provides fault tolerance (There's also so called RAID 0 which has no fault tolerance, and RAID 10 TBD.)&lt;br /&gt;&lt;br /&gt;Teradata supports RAID 1 and RAID 5.&lt;br /&gt;- RAID 1(mirroring without parity): Data is fully replicated in mirror disk(s). Read blocks from the 1st available disk. Besides failure protection it also provides great performance benefit.&lt;br /&gt;&lt;br /&gt;- RAID 5(block-level striping with distributed parity): Data is striped across a rank of disks one segment at a time. Parity is also striped all disk drives, interleaved with data. When a disk fails, data is reconstructed on the fly using existing data and parity.&lt;br /&gt;&lt;br /&gt;RAID 1 is faster than RAID 5, as the two(or more) disks are read parallelly, and no parity computation. &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;AMP level: Fallback tables&lt;/strong&gt;&lt;br /&gt;Storing a 2nd copy of each row of a table on a different AMP in the same cluster. Specified during table creation. Fallback will cause twict I/O on data modifications.&lt;br /&gt;&lt;br /&gt;Obviously the highest level of protection is RAID 1 with Fallback protection.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Componenet/Process Level: Journal&lt;/strong&gt;&lt;br /&gt;Journals are used for specific types of data or process recovery.&lt;br /&gt;&lt;em&gt;Recovery Journals:&lt;/em&gt; maintained by system automatically. Two different types:&lt;br /&gt;- transient journal: keeps "before image" of changed rows so data can be restored to previous state in case of an interrupted transaction. Happens in each AMP.&lt;br /&gt;- down AMP recovery journal: log write changes to data on the failed AMP by other AMPs in the cluster. Then applying changes to the recovered AMP.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Permanet Journals:&lt;/em&gt; optional, user specifies at table level, and can store before images or after images to provide full-table recovery to a specific point in time.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Database Object Level: Locks&lt;/strong&gt;&lt;br /&gt;Applied at 3 different levels: Database/Table/Row Hash&lt;br /&gt;4 types:&lt;br /&gt;- Exclusive: at db/table level, used for DDL, blocks all other locks&lt;br /&gt;- Write: ensures data consistency while writing, only allow access locks&lt;br /&gt;- Read: ensures data consistency while reading, allows read/access locks&lt;br /&gt;- Access: allows table update only for small single-row changes, blocks exclusive locks.&lt;br /&gt;Local deadlocks are checked at AMP level, and global deadlocks are coordinated by PE on a timed basis.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-8587026330301944329?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/8587026330301944329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=8587026330301944329' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8587026330301944329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8587026330301944329'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/03/data-protection-in-teradata.html' title='Failure Protection in Teradata'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-553473499571361408</id><published>2011-03-10T01:12:00.004-05:00</published><updated>2011-03-23T11:22:03.646-04:00</updated><title type='text'>Top 10 Database Security Threats</title><content type='html'>Here's a brief digest of the docs from Imperva Inc. Due to copy right I won't post the original pdf here, but can easily search for it online.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 1 - Excessive Privilege Abuse: &lt;/em&gt;&lt;br /&gt;When users (or applications) are granted database access privileges that exceed the requirements of their job function, these privileges may be abused for malicious purpose. &lt;br /&gt;    &gt; Prevension - Query-Level Access Control&lt;br /&gt;* restricts database privileges to minimum-required SQL operations (SELECT, UPDATE, etc.) and data.&lt;br /&gt;* most database has some level of query-level access control (triggers, row-level security, etc), but too time consuming to do manually.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 2 - Legitimate Privilege Abuse&lt;/em&gt;&lt;br /&gt;Users may also abuse legitimate database privileges for unauthorized purposes.&lt;br /&gt;   &gt; Prevension: Understanding the Context of Database Access&lt;br /&gt;Enforcing policy for client applications, time of day, location, etc., identify users access in a suspicious manner.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 3 - Privilege Elevation&lt;/em&gt;&lt;br /&gt;Attackers may take advantage of database platform software vulnerabilities to convert access privileges from those of an ordinary user to those of an administrator. Vulnerabilities may be found in stored procedures, built-in functions, protocol implementations, and even SQL statements.&lt;br /&gt;    &gt; Prevension: Intrusion prevention systems(IPS) and Query Level Access Control&lt;br /&gt;IPS inspects database traffic to identify patterns which correspond to known vulnerabilities.&lt;br /&gt;(pls report back to DB venders and get patched if you find such)&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 4 - Platform Vulnerabilities&lt;/em&gt;&lt;br /&gt;Vulnerabilities in underlying operating systems (Windows 2000, UNIX, etc.) and additional services installed on a database server may lead to unauthorized access, data corruption, or denial of service.&lt;br /&gt;    &gt; Prevension: Software Updates and Intrusion Prevention&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 5 - SQL Injection&lt;/em&gt;&lt;br /&gt;When a perpetrator inserts (or injects) unauthorized database statements into a vulnerable SQL data channel. &lt;br /&gt;&gt; prevention: Three techniques can be combined to effectively combat SQL injection: intrusion prevention (IPS), query-level access control, and event correlation.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 6 - Weak Audit Trail&lt;/em&gt;&lt;br /&gt;Weakness may come from several aspects:&lt;br /&gt;Lack of User Accountability when users access via web apps; degrading system performance; limited granularity etc.&lt;br /&gt;&gt; prevension: Increase performance; Separation of duty, audit duties should ideally be separate from both database administrators and the database server platform; Cross-platform auditing. Network-based audit appliances can help all these.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 7 - Denial of Service&lt;/em&gt;&lt;br /&gt;Access to network applications or data is denied to intended users. Resource overload is particularly common in database environments.&lt;br /&gt;&gt; Prevension:requires protections at multiple levels. In this database-specific context, deployment of connection rate control, IPS, query access control, and response timing control are recommended.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 8 - Database Communications Protocol Vulnerabilities&lt;/em&gt;&lt;br /&gt;&gt; prevention: Protocol validation: parses (disassembles) database traffic and compares it to expectations. In&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 9 - Weak Authentication&lt;/em&gt;&lt;br /&gt;Stealing or otherwise obtaining login credentials by means of: brute force, social engineering, credential theft.&lt;br /&gt;&gt; prevention: strong authentication (in practice strong password); directory integration, use single set of login across enterprise.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Threat 10 - Backup Data Exposure&lt;/em&gt; &lt;br /&gt;&gt; prevention: database backups should be encrypted.(addition: should apply the same security constraints as the original data)&lt;br /&gt;&lt;br /&gt;In short, I will classify them into 4 categories:&lt;br /&gt;- Bugs in access control(privilege design/assignment)&lt;br /&gt;- software/hardware/network vulnerabilities and attacks&lt;br /&gt;- User Accountability&lt;br /&gt;- Backup data explosure&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-553473499571361408?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/553473499571361408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=553473499571361408' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/553473499571361408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/553473499571361408'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/03/top-10-database-security-threats.html' title='Top 10 Database Security Threats'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-431447072400527340</id><published>2011-02-05T21:50:00.001-05:00</published><updated>2011-02-05T21:53:04.100-05:00</updated><title type='text'>IPv4 addresses finally run out</title><content type='html'>2^32, and finally all used up. We're in a new era.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-431447072400527340?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/431447072400527340/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=431447072400527340' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/431447072400527340'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/431447072400527340'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/02/ipv4-addresses-finally-run-out.html' title='IPv4 addresses finally run out'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-1318532039200739979</id><published>2011-01-12T22:29:00.003-05:00</published><updated>2011-01-12T23:03:44.159-05:00</updated><title type='text'>Verizon's 2010 Data Breach Report</title><content type='html'>Went through the &lt;a href="http://www.verizonbusiness.com/resources/reports/rp_2010-data-breach-report_en_xg.pdf"&gt;Verizon 2010 Data Breach Investigation Report&lt;/a&gt; these days. Made a digest of things which are of interests to me:&lt;br /&gt;&lt;br /&gt;Their classification of types of breach:&lt;br /&gt;misuse, hacking, malware, social tactics, physical attacks.&lt;br /&gt;&lt;br /&gt;Harm done by external agents far outweighs that done by insiders and partners. External breaches are largely the work of organized criminals. Overall, insiders were not responsible for a large share of compromised records but system and network administrators nabbed most of those that were. This finding is not surprising since higher privileges offer greater opportunity for abuse. In general, we find that employees are granted more privileges than they need to perform their job duties and the activities of those that do require higher privileges are usually not monitored in any real way.&lt;br /&gt;&lt;br /&gt;Top 3 industries affected by data breach: Financial Services, Hospitality, and Retail. And the most popular data compromised are:&lt;br /&gt;- Payment card data&lt;br /&gt;- Personal information&lt;br /&gt;- Bank account&lt;br /&gt;- Authentication credentials&lt;br /&gt;&lt;br /&gt;Malware and hacking composed of more than 95% of all compromised records. Cases involving the use of social tactics more than doubled. Physical attacks like theft, tampering, and surveillance ticked up several notches.&lt;br /&gt;&lt;br /&gt;Malware factored into 38% of 2009 breach cases and 94% of all data lost. The most frequent malware infection vector is installation or injection by a remote attacker. This is often accomplished through SQL injection or after the attacker has root access to a system.&lt;br /&gt;&lt;br /&gt;Malware functionality by percent of records:&lt;br /&gt;- Backdoor 85%&lt;br /&gt;- send data to external site/entity 81%&lt;br /&gt;- capture data resident on system 84%&lt;br /&gt;- system/network utilities(PsTools, Netcat) 83%&lt;br /&gt;- Packet sniffer 80%&lt;br /&gt;&lt;br /&gt;97% of the 140+ million records were compromised through customized malware. Some are simply repackaged versions of existing malware in order to avoid AV detection. More often they altered the code of existing malware or created something entirely new.&lt;br /&gt;&lt;br /&gt;Two hacking types that resulted in the largest percent of data breach:&lt;br /&gt;- Use of stolen credentials: 86%&lt;br /&gt;Mostly obtained by malware. Ration 2:1 to other attacks including phishing, SQL injection.&lt;br /&gt;- SQL injection: 89%&lt;br /&gt;It is almost always an input validation failure. Main uses are for query data, modify data, and deliver malware.&lt;br /&gt;&lt;br /&gt;Most used path of intrusion is web applications.&lt;br /&gt;&lt;br /&gt;Nearly all data were breached from servers and applications. Breaches involving end-user devices nearly doubled in 2009. Much of this growth can be attributed to credential-capturing malware.&lt;br /&gt;&lt;br /&gt;15% attacks are of high difficulty: Advanced skills, significant customization, and/or extensive resources required. And they contribute to 87% of data breach.&lt;br /&gt;&lt;br /&gt;In 2009, targeted attacks accounted for 89% of records compromised.&lt;br /&gt;&lt;br /&gt;In over 60% of breaches investigated in 2009, it took days or longer for the attacker to successfully compromise data, but 31% only takes minutes. More than 37% takes months to discover the compromises. An 29% also takes months to contain the compromise after it is discovered.&lt;br /&gt;&lt;br /&gt;Third party fraud detection is still the most common way breach victims come to know of their predicament.&lt;br /&gt;&lt;br /&gt;Event monitoring and log analysis successfully alerted only 6% of breach victims. This year that figure has dropped to 4%. The reason IDS doesn't work usually due to the poor configuration and monitoring. Actually 86% of the breaches have log evidence. Ways to study log: 1) abnormal increase in log data, 2) abnormal length of lines within logs, 3) absence of (or abnormal decrease in) log data.&lt;br /&gt;&lt;br /&gt;Anti-forensics consist of actions taken by the attacker to remove, hide, and corrupt evidence or otherwise foil post-incident investigations. Data wiping, which includes removal and deletion, is still the most common but declined slightly. Data hiding rose by over 50%, and data corruption tripled. The use of encryption for the purposes of hiding data contributed most significantly to the increase in that technique while the most common use of data corruptions remains log tampering.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-1318532039200739979?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/1318532039200739979/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=1318532039200739979' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/1318532039200739979'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/1318532039200739979'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2011/01/verizons-2010-data-breach-report.html' title='Verizon&apos;s 2010 Data Breach Report'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-4175939296835106671</id><published>2010-12-16T21:22:00.005-05:00</published><updated>2011-01-13T00:42:39.327-05:00</updated><title type='text'>Techniques on large data analysis</title><content type='html'>Requirement of analysis on large data sets are exploding nowadays. Recently I did a brief investigation on the techniques on large data analysis.&lt;br /&gt;&lt;br /&gt;1. Hashing&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp Fast searching, insertion, deletion. Data set often fits in memory.&lt;br /&gt;- Design:&lt;br /&gt;&amp;nbsp&amp;nbsp * hash function selection for different data type&lt;br /&gt;&amp;nbsp&amp;nbsp * collison resolution&lt;br /&gt;- Extension:&lt;br /&gt;&amp;nbsp&amp;nbsp d-left hashing: separate hash table into d segaments. Hash into the one with less collision. If tie choose the left one.&lt;br /&gt;&lt;br /&gt;2. Bit map&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp Fast searching, duplicate checking, deletion.&lt;br /&gt;- Deisgn:&lt;br /&gt;&amp;nbsp&amp;nbsp Use bit array to represent data.&lt;br /&gt;- Extension:&lt;br /&gt;&amp;nbsp&amp;nbsp * Bloom filter: Use multiple bits to represent one data point.&lt;br /&gt;&lt;br /&gt;3. Bloom filter (I think it worths a dedicated entry here)&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp Duplicate checking, Set union, data dictionary&lt;br /&gt;- Deisgn:&lt;br /&gt;&amp;nbsp&amp;nbsp k hash functions mapping to the same domain. So for one data item there will be k set bits.&lt;br /&gt;- Extension:&lt;br /&gt;&amp;nbsp&amp;nbsp Counting bloom filter: expand each bit into a counter, then it can support deletion.&lt;br /&gt;&lt;br /&gt;4. Heap&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp  Top N in a large data set.&lt;br /&gt;- Design: &lt;br /&gt;&amp;nbsp&amp;nbsp (do i need to write anything here?)&lt;br /&gt;- Extension:&lt;br /&gt;&amp;nbsp&amp;nbsp Use a max-heap and a min-heap at the same time to maintain medium&lt;br /&gt;&lt;br /&gt;5. Bucket&lt;br /&gt;- Usage: &lt;br /&gt;&amp;nbsp&amp;nbsp  Looking for the k-th element, medium, duplicate or non-duplicate elements&lt;br /&gt;- Deisgn:&lt;br /&gt;&amp;nbsp&amp;nbsp * step 1: define the range of each bucket, count statistics in each bucket.&lt;br /&gt;&amp;nbsp&amp;nbsp * step 2: compute the target's index in the corresponding bucket, look for the target. May further seperating bucket if remaining data is still too big.&lt;br /&gt;&lt;br /&gt;6. External Sorting&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp Ordering, duplicate removing&lt;br /&gt;- Design: (a sort-merge strategy)&lt;br /&gt;&amp;nbsp&amp;nbsp * In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. &lt;br /&gt;&amp;nbsp&amp;nbsp * In the merge phase, the sorted subfiles are combined into a single larger file.&lt;br /&gt;&lt;br /&gt;7. Inverted Index&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp  Allow fast full text searches. It is the most popular data structure used in document retrieval systems such as search engines.&lt;br /&gt;- Deisgn:&lt;br /&gt;&amp;nbsp&amp;nbsp  A mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents.&lt;br /&gt;&lt;br /&gt;8 . Trie&lt;br /&gt;- Usage: &lt;br /&gt;&amp;nbsp&amp;nbsp * Similar stucture of data, lots of duplicates.&lt;br /&gt;&amp;nbsp&amp;nbsp * Storing dictionary or implementing approximate matching algorithms&lt;br /&gt;- Design&lt;br /&gt;&amp;nbsp&amp;nbsp Often use recursive definiton for prefix&lt;br /&gt;- Extension&lt;br /&gt;&amp;nbsp&amp;nbsp Compression&lt;br /&gt;&lt;br /&gt;9. Database index&lt;br /&gt;(Classic, should I explain here?)&lt;br /&gt;&lt;br /&gt;10. MapReduce&lt;br /&gt;- Usage:&lt;br /&gt;&amp;nbsp&amp;nbsp  Large dataset with limited data types. Distributed processing.&lt;br /&gt;- Design&lt;br /&gt;&amp;nbsp&amp;nbsp * "Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes.&lt;br /&gt;&amp;nbsp&amp;nbsp * "Reduce" step: The master node then takes the answers to all the sub-problems and combines them in some way to get the output .&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-4175939296835106671?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/4175939296835106671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=4175939296835106671' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4175939296835106671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4175939296835106671'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/12/techniques-on-large-data-analysis.html' title='Techniques on large data analysis'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2548213082508464265</id><published>2010-12-04T11:28:00.004-05:00</published><updated>2010-12-05T16:02:57.545-05:00</updated><title type='text'>Groupon Turns Down Google’s Takeover Bid</title><content type='html'>6 billion, and Groupon says &lt;a href="http://gigaom.com/2010/12/03/groupon-turns-down-googles-takeover-bid/"&gt;NO&lt;/a&gt;. I think for Google it maybe a good thing. But for Groupon, it says it's considering staying independent for an IPO. I really doubt about this 6-billion decision, and its growth estimation as an independent company, considering its eas-to-replicate model, and loose ads connection with "Local" business. I still think they should go with Facebook.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2548213082508464265?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2548213082508464265/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2548213082508464265' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2548213082508464265'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2548213082508464265'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/12/groupon-turns-down-googles-takeover-bid.html' title='Groupon Turns Down Google’s Takeover Bid'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-4603974477691469268</id><published>2010-11-23T13:10:00.002-05:00</published><updated>2010-11-23T13:23:09.978-05:00</updated><title type='text'>Google to buy Groupon?</title><content type='html'>Words came out these days that &lt;a href="http://www.webpronews.com/topnews/2010/11/19/acquisition-rumors-link-google-groupon"&gt;google is going to buy Groupon&lt;/a&gt;. Some &lt;a href="http://www.businessinsider.com/why-a-google-groupon-deal-would-be-a-huge-winner-2010-11"&gt;analysis&lt;/a&gt; about this basically suggests that it can integrate the Groupon ads into its search results, youtube videos etc. It's kind of a surprise to me as I had thought facebook would be the final buyer, especially as facebook is putting special effort into location based services. It also looked to me promising as Groupon is essentially a collective buying site, and if integrated with facebook's socail model, it's easy to add recommendation/reputation context on that. Maybe facebook is &lt;a href="http://www.nacsonline.com/NACS/News/Daily/Pages/ND1108105.aspx"&gt;doing that&lt;/a&gt; by their own? Anyway, let's seat and see how things will be going.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-4603974477691469268?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/4603974477691469268/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=4603974477691469268' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4603974477691469268'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/4603974477691469268'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/google-to-buy-groupon.html' title='Google to buy Groupon?'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2828543307531686989</id><published>2010-11-15T01:26:00.004-05:00</published><updated>2010-12-16T21:21:58.781-05:00</updated><title type='text'>privacy, privacy, cost of privacy ...</title><content type='html'>In social network, privacy is always one of the major concern. Google just got another lesson for their negligent(?): &lt;a href="http://tech.fortune.cnn.com/2010/11/02/buzzkill-google-settles-google-buzz-privacy-suit-for-8million-donation/"&gt;Google settles Google Buzz privacy suit for $8.5 million donation&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2828543307531686989?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2828543307531686989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2828543307531686989' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2828543307531686989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2828543307531686989'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/privacy-privacy-cost-of-privacy.html' title='privacy, privacy, cost of privacy ...'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-8959855321653152057</id><published>2010-11-07T01:32:00.008-04:00</published><updated>2010-11-07T11:57:12.954-05:00</updated><title type='text'>Era of Big Data</title><content type='html'>Since the acquisition of DATAllegro from Microsoft in 2008, there are more big decisions this year: EMC bought GreenPlum, IBM bought Netezza, and Oracle upgrades Exadata(interestingly, this is announced by Mark Hurd who joined Oracle as the new president no more than half month). All of these highlight the incoming of the era of big data, and industry's leading companies' strong desire to expand their large data management and business analysis.&lt;br /&gt;Here're several interesting links:&lt;br /&gt;&lt;a href="http://gigaom.com/cloud/ibm-to-buy-netezza-for-1-7-billion/"&gt;Big Data Means Big Sales&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.zdnet.com/blog/btl/emcs-launches-greenplum-appliance/40281"&gt;EMC's launches Greenplum appliance&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Big data not only comes from the quick expanding web activities we are encountering everyday from facebook, twitter, amazon etc. The traditional industries are also generating huge amount of data every minute with the help of the latest technologies. Sensor and RFID technology has been widely used by giant companies such as walmart, target to help collect data and enable more intelligent supply chain management. And I also went across the following vision:&lt;br /&gt;&lt;a href="http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/"&gt;Sensor Networks Top Social Networks for Big Data&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As visioned we're soon entering the Exa- and Zetta-byte age in the next couple of years. The imminent future of big data era calls for more aggressive advances in big data management and sharing, more intelligent and effective business analytics, and the security and privacy primitives associated with all of them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-8959855321653152057?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/8959855321653152057/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=8959855321653152057' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8959855321653152057'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/8959855321653152057'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/era-of-big-data.html' title='Era of Big Data'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-189617310054930759</id><published>2010-11-06T18:09:00.007-04:00</published><updated>2010-11-07T00:23:57.551-04:00</updated><title type='text'>Access Control</title><content type='html'>After a user is authenticated and logon to a system, its access to resources on a computer or network system are controlled by access control modules. &lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Discretionary Access Control(DAC)&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;In a DAC model, a subject has complete control over the objects that it owns and the programs that it executes. Owner associates each of its objects with an access control list (ACL), containing a list of users and their level of access to this object. DAC is based on the owner's granting and revoking of privileges. Access to an resource is denied by default unless explicitly authorized. Most of today's OS are using DAC model.&lt;br /&gt;&lt;br /&gt;The key weakness of DAC is that it suffers from Trojan horse attacks.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Mandatory Access Control(MAC)&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;MAC is the most strict of all levels of control. The MAC model targeted for systems in which confidentiality has the highest priority, such as military or government agencies. In a MAC enforced system, both subjects and objects will get assigned clearance levels(security labels). The administrator takes control of security label defintion and assignment. Access to objects are constrained by policies on the security clearance, which are also defined by administrator. The general access rule is no read up, no write down following the Bell-Lapadula Model, but it's also possible to exptend and define dedicate rules depending on the practical security requirements. MAC is fine-grained and can provide row or column level access control.&lt;br /&gt;&lt;br /&gt;Often seen as the most secure access control environment, MAC also requires extra effort in pre-planning in order to be effectively and securely implemented. It also calls for continuous system management overhead to control new users, objects, and changes of security label defintions.&lt;br /&gt;&lt;br /&gt;Oracle 9i has implemented label security to meet the MAC requirements and provide row level access control, and hierarchy labels are coded as numeric values. DB2 also provides LBAC to provide MAC for both row and column. The security label is composed of one or more security label components of three types: arrays(hierarchy), sets and trees.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;strong&gt;Role Based Access Control (RBAC)&lt;/strong&gt;&lt;/em&gt;&lt;br /&gt;In a RBAC system, user also doesn't have discretionary access to objects. Instead, administrator create roles with a collection of permissions for different job functions or responsibilities. Each user will be assigned to one or more roles, and delegated all the privileges associated with that role. RBAC greatly simplifies the  management of individual user rights and authorizations.&lt;br /&gt;&lt;br /&gt;Many database systems have some implementation of RBAC, including Teradata, Oracle, DB2, SQL Server.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-189617310054930759?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/189617310054930759/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=189617310054930759' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/189617310054930759'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/189617310054930759'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/access-control.html' title='Access Control'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-3212973139231523822</id><published>2010-11-04T17:46:00.006-04:00</published><updated>2010-11-07T00:20:59.842-04:00</updated><title type='text'>Shared Nothing Architecture</title><content type='html'>Shared nothing system greatly reduces the resource contention for memory, locks, or processors. As pointed out by DeWitt et al., among the three widely used approaches, shared memory is the least scallable, shared disk medium, and shared nothing is most scalable. A shared nothing system can scale almost linearly and infinitely, simply by adding more inexpensive nodes. Shared nothing is now prevalent in the Data Warehousing space due to its potential for scaling. &lt;br /&gt;&lt;br /&gt;As one of the earliest implementation, in teradata, each AMP virtual processor(vproc) manages its own dedicated portition of the system's disk space(vdisk, which can be multiple disk array ranks). Rows are distributed to the AMPs according to the hash of the primary index(PI). For NoPI table supported from TD 13.0, it either hashes on the Query ID for a row, or it uses a different algorithm to assign the row to its home AMP. The unconditional parallelism and linearly expandability makes its leading position in enterprise data warehousing.&lt;br /&gt;&lt;br /&gt;Nowadays the shared nothing architecture is adopted by most high performance scalable DBMSs, including Teradata, Netezza, Greenplum, DB2 and Vertica. It is also used by most of the high-end e-commerce platforms, inclusing Amazon, Yahoo, Google, and Facebook.&lt;br /&gt;&lt;br /&gt;In DB2 UDB Enterprise-Extended Edition (EEE), partition key is chosen as one or more columns and hash of the partition key determines which node/node group a row should be sent to.&lt;br /&gt;&lt;br /&gt;Oracle is a shared-disk approach. In Oracle shared nothing is at logical level. Once the degree of parallelism is chosen as a power of 2, number of partitions are decided and partitions are generated by the range - hash partitioning.&lt;br /&gt;&lt;br /&gt;Cons: Shared Nothing Architectures takes longer to respond to queries that involve joins over large data sets from different partitions. For example, in Teradata OLTP is not efficient, CPU cycles are distributed to several AMPs and PE, PEs may get easily congested by massive OLTP requests.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-3212973139231523822?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/3212973139231523822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=3212973139231523822' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/3212973139231523822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/3212973139231523822'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/shared-nothing-architecture.html' title='Shared Nothing Architecture'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-9046134356876158686</id><published>2010-11-02T00:17:00.007-04:00</published><updated>2010-11-07T00:08:12.598-04:00</updated><title type='text'>Always On - Aster Data example</title><content type='html'>on June 29th, 2010, Google's Adwords stopped serving Ads sometime around 1:40pm PST and lasted for about 3 hours. The estimated cost is about $7.8 million. For Amazon or ebay, even some shoppers may come back later, they still lose impulse buyers, which counts for about millions per hour. &lt;br /&gt;&lt;br /&gt;Currently zero downtime practices have been widely deployed for data migration. But for database/data warehouse, it is still a challenging problem. In general the system downtime can be classified as planned and unplanned. As 24x7 availability is becoming more and more critical for Data warehouse systems, it is expected that system is always on during the planned or unplanned downtime. &lt;br /&gt;&lt;br /&gt;As claimed by Aster Data, they built solutions upon the Recovery-Oriented Computing to achieve this goal. The basic functionalities include:&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;In-cluster replication and transparent fail-over&lt;/em&gt;&lt;br /&gt;  Data replicas are placed across the cluster, and server failure are transparently transferred to replicas within the cluster.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Self diagnostics&lt;/em&gt;&lt;br /&gt;  If permanent failure, creating new replicas on existing or new servers without downtime. If transient failure, resync after the server recovers.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Network aggregation&lt;/em&gt;&lt;br /&gt;  Multiple network hardware to provide parallelism and redundancy.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Separation of duty&lt;/em&gt; &lt;br /&gt;  Dedicated servers for loading/exporting data, and backup/restore.&lt;br /&gt;&lt;br /&gt;- &lt;em&gt;Workload prediction&lt;/em&gt;&lt;br /&gt;  Policy-driven tools to manage priority of workloads and dynamically assign resources.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-9046134356876158686?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/9046134356876158686/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=9046134356876158686' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/9046134356876158686'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/9046134356876158686'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/11/always-on-aster-data-example.html' title='Always On - Aster Data example'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-25021721.post-2392056594789904785</id><published>2010-10-31T01:34:00.003-04:00</published><updated>2010-11-07T00:21:17.565-04:00</updated><title type='text'>Column Store DBMS</title><content type='html'>By its name, contents are stored by column. Sybase maybe the earliest commercial products implementing column store. &lt;a href="http://www.vertica.com/"&gt;Vertica&lt;/a&gt; and its academic precursor C-store are now a mature commercial product.&lt;br /&gt;&lt;br /&gt;Benefits and challenges:&lt;br /&gt;- Query optimization: when queries involving adding a new column for all rows, aggregating along only a few columns, column store will be much more effective.It is especially advantagerous for data warehourse, as queries are often on some specific dimentions of the data.&lt;br /&gt;- High compression is possible as columns of homogeneous datatype are stored together.&lt;br /&gt;&lt;br /&gt;In general, OLTP is more row-oriented, while OLAP is more column-oriented. Combined row and column store provides exceptional benefits and challanges in the above mentioned areas. The system level design also give more oppotunities in parallelism and failure recovery, if we take different store mechanism in the RAID or fallback protection. &lt;a href="http://www.asterdata.com/"&gt;Aster Data&lt;/a&gt;, another successful startup company, claimed it has the hybrid row and column store architecture.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/25021721-2392056594789904785?l=qingzhang-tech.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://qingzhang-tech.blogspot.com/feeds/2392056594789904785/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=25021721&amp;postID=2392056594789904785' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2392056594789904785'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/25021721/posts/default/2392056594789904785'/><link rel='alternate' type='text/html' href='http://qingzhang-tech.blogspot.com/2010/10/column-store-dbms.html' title='Column Store DBMS'/><author><name>Qing</name><uri>http://www.blogger.com/profile/10218094547702857298</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='23' src='http://1.bp.blogspot.com/-tNEFYH9qreE/Ti5j9FnwwbI/AAAAAAAABgE/AfRj3k447yM/s220/head.jpg'/></author><thr:total>0</thr:total></entry></feed>
