Usage examples
Here you can find different snippets which can help in building fast and simple data pipelines with the usage of PCloud.jl. These snippets are not the best possible ways to solve problems, but they can be used as a starting point. Also they illustrate ways how to apply various Julia techniques such as broadcasting and anonymous functions together with pCloud to achieve goals without too much efforts.
Uploading and downloading CSV
CSV is rather common format for storing data, and CSV.jl provides convenient function CSV.write which can store data in IOBuffer which in turn can be uploaded to pCloud.
Let's create DataFrame
using CSV
using DataFrames
using Random
df = DataFrame(x = rand(10), y = rand(1:10, 10), z = [randstring(5) for _ in 1:10])
# 10×3 DataFrame
# │ Row │ x │ y │ z │
# │ │ Float64 │ Int64 │ String │
# ├─────┼───────────┼───────┼────────┤
# │ 1 │ 0.0756344 │ 6 │ H3BIk │
# │ 2 │ 0.396882 │ 5 │ Rv2SB │
# │ 3 │ 0.797529 │ 5 │ M61Hw │
# │ 4 │ 0.856915 │ 5 │ jLc7K │
# │ 5 │ 0.0120147 │ 1 │ HgZMA │
# │ 6 │ 0.493593 │ 3 │ ENfu3 │
# │ 7 │ 0.27618 │ 2 │ MIU5B │
# │ 8 │ 0.492329 │ 10 │ QflU7 │
# │ 9 │ 0.398613 │ 10 │ 4XioP │
# │ 10 │ 0.40273 │ 10 │ PQs14 │To store this dataframe in pCloud we write it's contents to IOBuffer and upload resulting buffer to pCloud with the help of uploadfile function
using PCloud
using PCloud: uploadfile, getfilelink
token = # HERE SHOULD BE YOUR TOKEN
client = PCloudClient(auth_token = token)
buffer = CSV.write(IOBuffer, df)
res = uploadfile(client, files = "data.csv" => buf)Returned reponse res contains necessary information about resulting file. And to get it back we can use getfilelink
using UrlDownload
using Underscores
df2 = @_ getfilelink(client, fileid = first(res.fileids)) |>
urldownload("https://" * first(__.hosts) * __.path) |> DataFrame
# 10×3 DataFrame
# │ Row │ x │ y │ z │
# │ │ Float64 │ Int64 │ String │
# ├─────┼───────────┼───────┼────────┤
# │ 1 │ 0.0756344 │ 6 │ H3BIk │
# │ 2 │ 0.396882 │ 5 │ Rv2SB │
# │ 3 │ 0.797529 │ 5 │ M61Hw │
# │ 4 │ 0.856915 │ 5 │ jLc7K │
# │ 5 │ 0.0120147 │ 1 │ HgZMA │
# │ 6 │ 0.493593 │ 3 │ ENfu3 │
# │ 7 │ 0.27618 │ 2 │ MIU5B │
# │ 8 │ 0.492329 │ 10 │ QflU7 │
# │ 9 │ 0.398613 │ 10 │ 4XioP │
# │ 10 │ 0.40273 │ 10 │ PQs14 │Working with comressed CSV
Since csv files can be rather large it is a common practice to compress them before uploading. It can be done as follows (assuming the same df from the previous example)
using CodecZlib
buf = CSV.write(IOBuffer(), df) |> seekstart |> GzipCompressorStream
res = uploadfile(client, files = "data.csv.gz" => buf)Note that we should use seekstart here, since after IOBuffer is written, it's pointer located at the end and subsequent reading of the buffer in uploadfile return empty array. Also, in this exampe we used GzipCompressorStream, but any other compressing algorithm can be used, refer TranscodingStreams.jl.
And to verify the result of upload
using UrlDownload
using Underscores
df2 = @_ getfilelink(client, fileid = first(res.fileids)) |>
urldownload("https://" * first(__.hosts) * __.path) |> DataFrame
# 10×3 DataFrame
# │ Row │ x │ y │ z │
# │ │ Float64 │ Int64 │ String │
# ├─────┼───────────┼───────┼────────┤
# │ 1 │ 0.0756344 │ 6 │ H3BIk │
# │ 2 │ 0.396882 │ 5 │ Rv2SB │
# │ 3 │ 0.797529 │ 5 │ M61Hw │
# │ 4 │ 0.856915 │ 5 │ jLc7K │
# │ 5 │ 0.0120147 │ 1 │ HgZMA │
# │ 6 │ 0.493593 │ 3 │ ENfu3 │
# │ 7 │ 0.27618 │ 2 │ MIU5B │
# │ 8 │ 0.492329 │ 10 │ QflU7 │
# │ 9 │ 0.398613 │ 10 │ 4XioP │
# │ 10 │ 0.40273 │ 10 │ PQs14 │Uploading generated image
In this example we will use Luxor.jl for image generation and also use getfilepublink to generate public link to the resulting image.
using Luxor
d = Drawing(600, 400, :png)
origin()
background("white")
for θ in range(0, step=π/8, length=16)
gsave()
scale(0.25)
rotate(θ)
translate(250, 0)
randomhue()
julialogo(action=:fill, color=false)
grestore()
end
gsave()
scale(0.3)
juliacircles()
grestore()
translate(200, -150)
scale(0.3)
julialogo()
finish()Please notice, that we used :png keyword in Drawing definition, to force in-memory image processing.
using PCloud
using PCloud: uploadfile, getfilepublink
token = # HERE SHOULD BE YOUR TOKEN
client = PCloudClient(auth_token = token)
res = uploadfile(client, files = "logo.png" => d.buffer)
getfilepublink(client, fileid = first(res.fileids)).link
# "https://u.pcloud.link/publink/show?code=XZh8FEkZ6vBed7DI1Wys8g7BHl8FFVuhUSSX"If you follow this link, you can see that it is valid png file.
Project Gutenberg and downloadfile
Method PCloud.downloadfile can download file from urls directly to pCloud. This can be very useful during web crawling, when various information of interest should be saved for further investigation. As an example we download 10 top books from Project Gutenberg
using Underscores
using Gumbo
using Cascadia
using Cascadia: matchFirst
using UrlDownload
using PCloud
using PCloud: createfolder, downloadfile
token = # HERE SHOULD BE YOUR TOKEN
client = PCloudClient(auth_token = token)
# folder where books will be stored
folderid = createfolder(client, folderid = 0, name = "Gutenberg").metadata.folderid
host = "https://www.gutenberg.org"
# helper function for parsing data downloaded by `urldownload` to a more useful format
pageparser(x) = parsehtml(String(x)).root
# helper function which finds download url on each book page
# should be used for parsing each individual book page, for example
# getlink("https://www.gutenberg.org/ebooks/1342") would produce url to
# "Pride and Prejudice" in epub format.
getlink(url) = @_ urldownload(url, parser = pageparser) |>
host*matchFirst(sel"a[type='application/epub+zip']", __).attributes["href"]
# This is central function, which parses top scores page, extract top 10 books,
# extract download url for each book with the help of `getlink` and finally
# download everything to pCloud
@_ urldownload("https://www.gutenberg.org/browse/scores/top", parser = pageparser) |>
matchFirst(sel"ol", __) |> eachmatch(sel"li", __)[1:10] |>
matchFirst.(Ref(sel"a"), __) |> map(host*_.attributes["href"], __) |>
getlink.(__) |> join(__, " ") |>
downloadfile(client, url = __, folderid = folderid)