Create Athena Tables

Now we can create our Amazon Athena tables. We have two options for this - one would be to have AWS Glue crawl the data and discover the schema - since we’ve already done this once we’ll save the time of running a Glue crawler and instead manually create the tables and schemas.

First we’ll go to the Athena console and run:

create database socialanalyticsblog

Now lets create a few tables.

Be sure to replace `` with the corresponding output from our CloudFormation console.
CREATE EXTERNAL TABLE socialanalyticsblog.tweets (
    coordinates STRUCT<
        type: STRING,
        coordinates: ARRAY<
            DOUBLE
        >
    >,
    retweeted BOOLEAN,
    source STRING,
    entities STRUCT<
        hashtags: ARRAY<
            STRUCT<
                text: STRING,
                indices: ARRAY<
                    BIGINT
                >
            >
        >,
        urls: ARRAY<
            STRUCT<
                url: STRING,
                expanded_url: STRING,
                display_url: STRING,
                indices: ARRAY<
                    BIGINT
                >
            >
        >
    >,
    reply_count BIGINT,
    favorite_count BIGINT,
    geo STRUCT<
        type: STRING,
        coordinates: ARRAY<
            DOUBLE
        >
    >,
    id_str STRING,
    timestamp_ms BIGINT,
    truncated BOOLEAN,
    text STRING,
    retweet_count BIGINT,
    id BIGINT,
    possibly_sensitive BOOLEAN,
    filter_level STRING,
    created_at STRING,
    place STRUCT<
        id: STRING,
        url: STRING,
        place_type: STRING,
        name: STRING,
        full_name: STRING,
        country_code: STRING,
        country: STRING,
        bounding_box: STRUCT<
            type: STRING,
            coordinates: ARRAY<
                ARRAY<
                    ARRAY<
                        FLOAT
                    >
                >
            >
        >
    >,
    favorited BOOLEAN,
    lang STRING,
    in_reply_to_screen_name STRING,
    is_quote_status BOOLEAN,
    in_reply_to_user_id_str STRING,
    user STRUCT<
        id: BIGINT,
        id_str: STRING,
        name: STRING,
        screen_name: STRING,
        location: STRING,
        url: STRING,
        description: STRING,
        translator_type: STRING,
        protected: BOOLEAN,
        verified: BOOLEAN,
        followers_count: BIGINT,
        friends_count: BIGINT,
        listed_count: BIGINT,
        favourites_count: BIGINT,
        statuses_count: BIGINT,
        created_at: STRING,
        utc_offset: BIGINT,
        time_zone: STRING,
        geo_enabled: BOOLEAN,
        lang: STRING,
        contributors_enabled: BOOLEAN,
        is_translator: BOOLEAN,
        profile_background_color: STRING,
        profile_background_image_url: STRING,
        profile_background_image_url_https: STRING,
        profile_background_tile: BOOLEAN,
        profile_link_color: STRING,
        profile_sidebar_border_color: STRING,
        profile_sidebar_fill_color: STRING,
        profile_text_color: STRING,
        profile_use_background_image: BOOLEAN,
        profile_image_url: STRING,
        profile_image_url_https: STRING,
        profile_banner_url: STRING,
        default_profile: BOOLEAN,
        default_profile_image: BOOLEAN
    >,
    quote_count BIGINT
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '<TwitterRawLocation>';

Next we’ll create tables for the tweet entities and tweet sentiments.

Remember to replace `` and `` with the corresponding outputs from CloudFormation.
CREATE EXTERNAL TABLE socialanalyticsblog.tweet_entities (
    tweetid BIGINT,
    entity STRING,
    type STRING,
    score DOUBLE
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '<TwitterEntitiesLocation>';
CREATE EXTERNAL TABLE socialanalyticsblog.tweet_sentiments (
    tweetid BIGINT,
    text STRING,
    originalText STRING,
    sentiment STRING,
    sentimentPosScore DOUBLE,
    sentimentNegScore DOUBLE,
    sentimentNeuScore DOUBLE,
    sentimentMixedScore DOUBLE
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '<TwitterSentimentLocation>'

Now we can write queries against our tweet data! Let’s see what comprehend discovered by grouping all of the tweets by the recognized entity types.

select type, count(*) cnt from socialanalyticsblog.tweet_entities
group by type order by cnt desc